JP2015118710A

JP2015118710A - Conversation support device, method, and program

Info

Publication number: JP2015118710A
Application number: JP2015003544A
Authority: JP
Inventors: 裕美若木; Yumi Wakagi; 尚義永江; Hisayoshi Nagae; 康顕有賀; Yasuaki Ariga; 憲治岩田; Kenji Iwata; 住田　一男; Kazuo Sumita; 一男住田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2015-01-09
Filing date: 2015-01-09
Publication date: 2015-06-25

Abstract

PROBLEM TO BE SOLVED: To support conversation when speakers talk in different languages or the same language, by presenting information to supplement knowledge according to the conversation.SOLUTION: According to an embodiment, a conversation support device comprises: an input part; a voice recognition part; a conversation history database; an estimation part; a determination part; a creation part; a selection part; and a presentation part. The input part inputs a voice in conversation between speakers. The voice recognition part recognizes the input voice and converts it into corresponding text information. The conversation history database stores all or a part of the text information as a conversation history. The estimation part estimates speaking action on the basis of the text information. The determination part determines whether supplementary information is presented on the basis of the estimated speaking action. The creation part creates candidates of the supplementary information when determining that the supplementary information is presented. The selection part selects one candidate to be presented among candidates of the supplementary information by using the conversation history. The presentation part presents the selected supplementary information.

Description

本発明の実施形態は、対話支援装置、方法及びプログラムに関する。 Embodiments described herein relate generally to a dialogue support apparatus, method, and program.

近年、音声による機械翻訳器が登場し、知らない言語の相手と会話することが可能になってきた。しかし、入力された情報を正しく翻訳しさえすれば、その情報について、その情報を発した元の話者と同様の理解を、もう一方の話者が得られるとは限らない。例えば、地名又は料理名などのような名称（固有表現）等の中には、元の話者の文化圏又は国等では一般に良く知られているが、もう一方の話者の文化圏等では一般に良く知られていないようなものも多く存在し、たとえ元の話者が発した名称等が訳語に翻訳されたとしても、もう一方の話者は、知識がないために、その訳語を理解できないことがある。 In recent years, voice-based machine translators have been introduced, and it has become possible to talk with a partner in an unknown language. However, as long as the input information is correctly translated, the other speaker does not always have the same understanding as the original speaker that issued the information. For example, some names (proprietary expressions) such as place names or cooking names are generally well known in the original speaker's cultural area or country, but in the other speaker's cultural area, etc. There are many things that are not well known in general, and even if the name given by the original speaker is translated into a translation, the other speaker understands the translation because there is no knowledge There are things that cannot be done.

また、ある名称等について、それを理解する人（例えば、その名称等が一般的であるような文化圏等の人）が丁寧に説明したつもりでも、話者の間で、背景知識が大きく異なるため、同様のイメージが伝わらない場合がある。例えば、レストランにおいて料理名からその内容が分からないために、説明を求めて説明を受けた場合に、文化圏等によって一般的な調理法が異なるなどの前提知識の相違によって、実際にはイメージが正しく伝わっておらず、結局、実際に食べるまではどのような料理かが分からないということがある。 Also, even if a person who understands a name, etc. (for example, a person in a cultural area where the name is common) intends to explain it carefully, background knowledge varies greatly among speakers. Therefore, the same image may not be transmitted. For example, when you ask for an explanation because the contents are not known from the name of the dish in the restaurant, the image may actually be different due to differences in prerequisite knowledge such as different general cooking methods depending on the cultural sphere etc. It is not communicated correctly and, in the end, you may not know what food is until you actually eat it.

説明する人が、聞き手の文化圏等を考慮して、聞き手の文化圏における類似する料理又は調理法に喩えること又は違いを説明することによって、説明することができれば、聞き手もイメージしやすい。しかしながら、説明する人が、聞き手の文化圏等に関する知識を持たない場合も多く、そのような場合には、喩えること又は違いを説明することが難しい。 If the explaining person can explain by explaining the difference or the similar cooking or cooking method in the listener's cultural sphere in consideration of the listener's cultural sphere etc., the listener is also easy to imagine. However, it is often the case that the person who explains is not knowledgeable about the listener's cultural sphere, and in such cases it is difficult to compare or explain the differences.

これは、背景知識の異なる話者が、同じ言語で会話をする場合にも、同様に生じ得る。 This can also occur when speakers with different background knowledge have a conversation in the same language.

従来、翻訳をする際に、特定の固有名詞については、その訳語に必ず補足情報を付加して出力するシステムが知られている。しかし、音声による対話時に、特定の固有表現が出現するごとに常に補足情報が出力されるのでは、かえって煩わしく、スムーズな対話を阻害する。 2. Description of the Related Art Conventionally, a system is known in which supplementary information is always added to a translated word for a specific proper noun when translated. However, supplementary information is always output every time a specific specific expression appears during a voice conversation, which is rather troublesome and hinders a smooth conversation.

また、ユーザが音声ガイドと対話する際に、音声ガイドがユーザの戸惑いを検出して、戸惑いを解消するための情報を提供するシステムが知られている。しかし、このシステムでは、ユーザが戸惑いを示さない場合には機能せず、また、あらかじめユーザが戸惑わないように音声ガイドが先回りして情報を提供することもできない。さらに、このシステムは、背景知識の異なる人間同士の会話を対象とするものではない。 Also, a system is known in which when a user interacts with an audio guide, the audio guide detects the user's confusion and provides information for resolving the confusion. However, this system does not function when the user does not feel confused, and the voice guide is not able to provide information in advance so as not to be confused by the user. Furthermore, this system is not intended for conversations between people with different background knowledge.

特開２００４−２２０４１６号公報JP 2004-220416 A 特開２０００−２５９１７７号公報JP 2000-259177 A

話者が異なる言語で又は同じ言語で対話をする場合に、対話に応じて知識を補うための情報を提示して対話を支援する技術は、知られていなかった。 There has been no known technique for assisting a dialogue by presenting information for supplementing knowledge in accordance with the dialogue when the speaker speaks in different languages or in the same language.

本実施形態は、話者が異なる言語で又は同じ言語で対話をする場合に、対話に応じて知識を補うための情報を提示して対話を支援することの可能な対話支援装置、方法及びプログラムを提供することを目的とする。 The present embodiment provides a dialogue support apparatus, method, and program capable of supporting a dialogue by presenting information for supplementing knowledge according to the dialogue when a speaker conducts a dialogue in different languages or in the same language. The purpose is to provide.

実施形態によれば、入力部と、音声認識部と、対話履歴データベースと、推定部と、判定部と、生成部と、選択部と、提示部とを備える。入力部は、複数の話者による対話に係る音声を入力する。音声認識部は、入力された前記音声を音声認識して、対応するテキスト情報に変換する。対話履歴データベースは、前記テキスト情報の全部又は一部を対話履歴として記憶する。推定部は、前記テキスト情報に基づいて、発話行為を推定する。判定部は、推定された前記発話行為に基づいて、補足情報を提示するかどうか判定する。生成部は、前記補足情報を提示すると判定された場合に、補足情報の候補を生成する。選択部は、前記対話履歴を利用して、前記補足情報の候補のうちから、提示すべきものを選択する。提示部は、選択された前記補足情報を提示する。 According to the embodiment, an input unit, a voice recognition unit, a dialogue history database, an estimation unit, a determination unit, a generation unit, a selection unit, and a presentation unit are provided. The input unit inputs sound related to dialogues by a plurality of speakers. The voice recognition unit recognizes the input voice and converts it into corresponding text information. The dialogue history database stores all or part of the text information as a dialogue history. The estimation unit estimates an utterance action based on the text information. The determination unit determines whether to present supplementary information based on the estimated speech act. A generation part produces | generates the candidate of supplement information, when it determines with presenting the said supplement information. The selection unit uses the dialog history to select a candidate to be presented from the supplementary information candidates. The presenting unit presents the selected supplementary information.

第１の実施形態に係る対話支援装置の構成例を示す図である。It is a figure which shows the structural example of the dialogue assistance apparatus which concerns on 1st Embodiment. 第１の実施形態に係る対話支援装置の処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the process sequence of the dialogue assistance apparatus which concerns on 1st Embodiment. 第１の実施形態に係る対話支援装置の利用シチュエーションを説明するための図である。It is a figure for demonstrating the use situation of the dialog assistance apparatus which concerns on 1st Embodiment. 音声認識処理部の処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the process sequence of a speech recognition process part. 音声認識処理部の処理手順の他の一例を示すフローチャートである。It is a flowchart which shows another example of the process sequence of a speech recognition process part. 機械翻訳部の処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the process sequence of a machine translation part. 発話行為推定部の処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the process sequence of a speech act estimation part. 発話行為タグの例を示す図である。It is a figure which shows the example of a speech act tag. 提示要否判定部の処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the process sequence of a presentation necessity determination part. 補足情報提示可否データベースの一例を示す図である。It is a figure which shows an example of a supplementary information presentation availability database. 提示候補生成部の処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the process sequence of a presentation candidate production | generation part. 補足説明データベースの一例を示す図である。It is a figure which shows an example of a supplementary explanation database. 候補選択部の処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the process sequence of a candidate selection part. 第１の実施形態に係る対話支援装置の動作例を説明するための図である。It is a figure for demonstrating the operation example of the dialogue assistance apparatus which concerns on 1st Embodiment. 対話例を示す図である。It is a figure which shows the example of a dialog. 対話支援装置の他の構成例を示す図である。It is a figure which shows the other structural example of a dialog assistance apparatus. 対話支援装置のさらに他の構成例を示す図である。It is a figure which shows the further another structural example of a dialog assistance apparatus. 補足情報提示内容データベースの例を示す。The example of a supplement information presentation content database is shown. 対話例を示す図である。It is a figure which shows the example of a dialog. 第２の実施形態に係る対話支援装置の構成例を示す図である。It is a figure which shows the structural example of the dialogue assistance apparatus which concerns on 2nd Embodiment. 第２の実施形態に係る対話支援装置の処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the process sequence of the dialogue assistance apparatus which concerns on 2nd Embodiment.

以下、図面を参照しながら本発明の実施形態に係る対話支援装置について詳細に説明する。なお、以下の実施形態では、同一の番号を付した部分については同様の動作を行うものとして、重ねての説明を省略する。 Hereinafter, a dialogue support apparatus according to an embodiment of the present invention will be described in detail with reference to the drawings. Note that, in the following embodiments, the same numbered portions are assumed to perform the same operation, and repeated description is omitted.

（第１の実施形態）
対話しようとする二人の話者が、共通の言語（自然言語）を使用できない場合（例えば、それらの話者が、いずれも、母国語のみを使用し且つ相手の母国語を理解できない場合、あるいは、それらの話者の一方又は両方は、複数の言語を使用できるが、それらの話者が共通して使用できる言語がない場合など）において、そのような話者の間に翻訳器を介在させることによって、それら話者が異なる言語（例えばそれぞれの母国語等）を使用して対話をすることを支援することができる。しかし、一般に、話者同士の背景知識には差があることも多い。それゆえ、翻訳器が一方の話者により発せられた情報を他方の話者向けに正しく翻訳しさえすれば、一方の話者から他方の話者へ情報が正しく伝わる、ということが必ずしも期待できないことがある。これは、背景知識に差がある二人の話者が、共通の言語で会話をする場合にも、同様に生じ得る。 (First embodiment)
If two speakers trying to interact cannot use a common language (natural language) (for example, they both use only their native language and cannot understand the other's native language) Or, one or both of these speakers can use multiple languages, but there is no language that they can use in common, for example, a translator between them By doing so, it is possible to support the speakers using different languages (for example, their native languages). However, there are often differences in background knowledge between speakers. Therefore, as long as the translator correctly translates the information emitted by one speaker for the other speaker, it is not necessarily expected that the information will be correctly transmitted from one speaker to the other. Sometimes. This can also occur when two speakers with different background knowledge have a conversation in a common language.

第１の実施形態では、対話に機械翻訳が介在する場合に、対話に応じて知識を補うための情報を提示する場合を例にとって説明する。 In the first embodiment, a case will be described as an example where information for supplementing knowledge is presented according to a dialogue when machine translation is involved in the dialogue.

第１の実施形態において、対話支援装置を利用する二人の話者を、第一話者及び第二話者と呼ぶものとする。 In the first embodiment, two speakers using the dialogue support apparatus are referred to as a first speaker and a second speaker.

以下では、第一話者を客、第二話者を接客者（例えば接客する側である店員等）とし、第一話者が理解及び（音声入力のために）発話する第一言語を日本語とし、第二話者が理解及び（音声入力のために）発話する第二言語を英語とした具体例を使用するものとする。もちろん、本実施形態はこれに制限されるものではなく、第一言語が日本語以外の言語でも可能であり、第二言語が英語以外の言語でも可能である。 In the following, the first speaker is the customer, the second speaker is the customer service (for example, the store clerk who is the customer), and the first language that the first speaker understands and speaks (for voice input) is Japan. A specific example is used in which the second language that the second speaker understands and speaks (for speech input) is English. Of course, the present embodiment is not limited to this, and the first language can be a language other than Japanese, and the second language can be a language other than English.

また、以下では、第一話者をレストランの客、第二話者をレストランの接客者とする場合を例に取って説明する。もちろん、本実施形態はこれに制限されるものではなく、他のサービスを提供する際の異言語間のコミュニケーション、商品を販売する際の異言語間のコミュニケーションなど、異言語間の様々な接客コミュニケーションに適用可能である。 In the following description, the first speaker is a restaurant customer and the second speaker is a restaurant customer. Of course, the present embodiment is not limited to this, and various customer communication between different languages such as communication between different languages when providing other services, communication between different languages when selling products, etc. It is applicable to.

また、以下では、対話を支援する対話支援装置として、接客者が客を接客する接客コミュニケーションを支援する接客支援装置を例にとって説明するが、本実施形態は、接客以外の対話にも適用可能である。 In the following description, a customer service support device that supports customer communication in which a customer service a customer is described as an example of a dialog support device that supports a dialog. However, the present embodiment can also be applied to conversations other than customer service. is there.

以下では、第一話者と第二話者が異なる言語を使用して会話する際の対話状態を利用して、一方の話者にとって未知情報があることを判断し、翻訳結果を出力する際に所定のタイミングで該一方の話者向けに補足情報（例えば、説明文）を提示する例を示す。ここでは、一方の話者を第一話者として説明するが、これに制限されない。また、未知情報があることを判断し、翻訳結果を出力する際に第二話者向けに補足情報を提示する対象を、一方の話者ではなく、両方の話者とすることも可能である。 In the following, when the first speaker and the second speaker are talking using different languages, it is determined that there is unknown information for one speaker and the translation result is output. Shows an example in which supplemental information (for example, explanatory text) is presented to the one speaker at a predetermined timing. Here, one speaker is described as the first speaker, but the present invention is not limited to this. In addition, it is possible to determine that there is unknown information and present the supplementary information for the second speaker when outputting the translation result, not both speakers but both speakers .

最初に、図３を参照しながら、本実施形態の接客支援装置（対話支援装置）の利用シチュエーションの例を示す。 First, an example of a usage situation of the customer service support device (dialog support device) of the present embodiment will be shown with reference to FIG.

まず、ＳＴＥＰ−Ｃ１において、第二話者（接客者）が、英語により、注文を聞くための内容（例えば、“Are you ready to order?”）を発話する。すると、ＳＴＥＰ−Ｃ２において、接客支援装置が、その内容を日本語に翻訳し、その翻訳内容（例えば、「ご注文はお決まりですか？」）を提示（発声及び／又は表示）する。 First, in STEP-C1, the second speaker (customer) speaks the content for listening to an order (for example, “Are you ready to order?”) In English. Then, in STEP-C2, the customer service support device translates the content into Japanese and presents (speaks and / or displays) the translated content (for example, “Is your order decided?”).

ＳＴＥＰ−Ｃ３において、上記翻訳内容を聞いた及び又は読んだ第一話者（客）が、日本語により、お勧め料理を尋ねるための内容（例えば、「お勧めの料理は何ですか？」）を発話する。すると、ＳＴＥＰ−Ｃ４において、接客支援装置が、その内容を英語に翻訳し、その翻訳内容（例えば、“Which one do you recommend?”）を提示（発声及び／又は表示）する。 In STEP-C3, the first speaker (customer) who heard and / or read the above-mentioned translation content asks for a recommended dish in Japanese (for example, “What is the recommended dish?” ). Then, in STEP-C4, the customer service support device translates the content into English and presents (speaks and / or displays) the translated content (for example, “Which one do you recommend?”).

ＳＴＥＰ−Ｃ５において、上記翻訳内容を聞いた及び又は読んだ第二話者が、英語により、お勧め料理を答えるための内容（例えば、“I recommend Wiener Schnizel.”）を発話する。すると、ＳＴＥＰ−Ｃ６において、接客支援装置が、その内容を日本語に翻訳し、その翻訳内容（例えば、「ウィンナーシュニッチェルがおすすめです。」）を提示（発声及び／又は表示）する。 In STEP-C5, a second speaker who has heard and / or read the translated content speaks in English the content for answering a recommended dish (for example, “I recommend Wiener Schnizel.”). Then, in STEP-C6, the customer service support device translates the content into Japanese, and presents (speaks and / or displays) the translated content (for example, “Winner Schnitzel is recommended.”).

ここで、更に、本実施形態の接客支援装置は、詳しくは後述するように補足情報の提示の要否を判定しており、補足情報の提示が必要であると判断したとすると、この例において、例えば、ＳＴＥＰ−Ｃ７において、補足情報（この例の場合、第一話者（客）に対応する日本語により、例えば「ウィンナーシュニッチェルは、ウィーン風カツレツで、オーストリアの代表的な料理です。」）を提示（発声及び／又は表示）する。 Here, the customer service support apparatus according to the present embodiment further determines whether or not supplementary information needs to be presented as will be described in detail, and if it is determined that supplementary information needs to be presented, For example, in STEP-C7, supplementary information (in this example, in Japanese corresponding to the first speaker (customer), for example, “Winner Schnitzel is a Viennese cutlet and is a typical Austrian dish.” .)) Is presented (spoken and / or displayed).

図１に、本実施形態の接客支援装置の機能構成例を示す。 In FIG. 1, the function structural example of the customer service assistance apparatus of this embodiment is shown.

図１に示されるように、本実施形態の接客支援装置は、入力部１０１、音声認識部１０２、機械翻訳部１０３、発話行為推定部１０４、提示要否判定部１０５、提示候補生成部１０６、候補選択部１０７、提示部１０８を含む。 As shown in FIG. 1, the customer service support apparatus according to the present embodiment includes an input unit 101, a speech recognition unit 102, a machine translation unit 103, a speech act estimation unit 104, a presentation necessity determination unit 105, a presentation candidate generation unit 106, A candidate selection unit 107 and a presentation unit 108 are included.

入力部１０１は、第一話者が発した第一言語による音声と、第二話者が発した第二言語による音声を受け付ける。入力部１０１は、例えば、マイクロフォンを用いて、音声を入力し、これをデジタイズして、音声認識部１０２に渡しても良い。 The input unit 101 accepts a voice in the first language uttered by the first speaker and a voice in the second language uttered by the second speaker. For example, the input unit 101 may input a voice using a microphone, digitize the voice, and pass it to the voice recognition unit 102.

音声認識部１０２は、第一言語又は第二言語である入力言語による音声情報を認識し、その入力言語によるテキスト情報（翻訳前テキスト）に変換する。 The speech recognition unit 102 recognizes speech information in an input language that is a first language or a second language, and converts the speech information into text information (pre-translation text) in the input language.

機械翻訳部１０３は、上記テキスト情報をもとに、その入力言語（第一言語又は第二言語）から他方の言語（第二言語又は第一言語）へ翻訳したテキスト（翻訳テキスト）を生成する。 Based on the text information, the machine translation unit 103 generates a text (translation text) translated from the input language (first language or second language) into the other language (second language or first language). .

発話行為推定部１０４は、翻訳テキスト及び／又は翻訳前テキストから、発話行為を推定する。 The utterance action estimation unit 104 estimates the utterance action from the translated text and / or the pre-translation text.

対話履歴データベース１２１は、音声認識部１０２で得られる翻訳前テキストによる対話履歴と、機械翻訳部１０３で得られる翻訳テキストによる対話履歴とを保存するためのデータベースであるである。対話履歴データベース１２１に、更に、発話行為推定部１０４で推定された発話行為による対話履歴を保存しても良い。 The dialogue history database 121 is a database for storing a dialogue history based on pre-translation text obtained by the speech recognition unit 102 and a dialogue history based on translation text obtained by the machine translation unit 103. The dialogue history database 121 may further store a dialogue history based on the speech act estimated by the speech act estimating unit 104.

対話履歴データベース１２１は、候補選択部１０７が、候補選択時に参照する。 The dialogue history database 121 is referred to by the candidate selection unit 107 when selecting a candidate.

補足情報提示要否データベース１２２は、補足情報を提示するかどうかを判定するために使用する情報を格納するためのデータベースであるである。 The supplementary information presentation necessity database 122 is a database for storing information used for determining whether or not supplementary information is presented.

提示要否判定部１０５は、推定された発話行為から、第一話者にとっての未知情報が存在し、かつ、その未知情報に対する補足情報を提示するかどうかについて、判定する。 The presentation necessity determination unit 105 determines whether there is unknown information for the first speaker and whether supplementary information for the unknown information is presented based on the estimated utterance action.

補足説明データベース１２３は、提示すべき補足情報の候補（提示候補）を生成する際に使用する知識情報（補足説明）を格納するためのデータベースである。 The supplementary explanation database 123 is a database for storing knowledge information (supplementary explanation) used when generating supplementary information candidates (presentation candidates) to be presented.

提示候補生成部１０６は、提示要否判定部１０５により補足情報の提示が必要であると判定された場合に、補足説明データベース１２３を参照して、補足情報（あるいは、補足説明）の候補を生成（あるいは、取得）する。 The presentation candidate generation unit 106 generates a candidate for supplementary information (or supplementary explanation) by referring to the supplementary explanation database 123 when the presentation necessity judgment unit 105 determines that presentation of supplementary information is necessary. (Or get).

候補選択部１０７は、生成（あるいは、取得）された補足情報（あるいは、補足説明）の候補のうちから、提示すべきもの（あるいは、提示のために使用すべきもの）を選択する。その際、候補選択部１０７は、対話履歴データベース１２１を参照して、選択を行う。 The candidate selection unit 107 selects what should be presented (or what should be used for presentation) from the generated supplementary information (or supplementary explanation) candidates. At that time, the candidate selection unit 107 refers to the dialogue history database 121 to make a selection.

提示部１０８は、翻訳結果を提示するととともに、補足情報を、所定のタイミングで提示する。提示部１０８は、例えば、翻訳結果と補足情報を、聴覚的に提示（例えば、スピーカから発声）しても良いし、それと共に又はその代わりに、視覚的に提示（例えば、液晶ディスプレイ等の表示画面に表示）しても良い。なお、翻訳結果と補足情報で同一の提示方法を使用しても良いし、補足情報を、翻訳結果とは異なる方法で提示しても良い。例えば、翻訳結果と補足情報を、いずれも、視聴覚的に提示しても良いし、翻訳結果は聴覚的にのみ表示し、補足情報は視覚的にのみ提示しても良いし、それら以外の提示方法も可能である。 The presentation unit 108 presents the translation result and presents supplementary information at a predetermined timing. For example, the presentation unit 108 may present the translation result and supplementary information audibly (for example, uttered from a speaker), or may be presented visually (for example, a display such as a liquid crystal display). (Displayed on the screen). Note that the same presentation method may be used for the translation result and the supplemental information, or the supplementary information may be presented by a method different from the translation result. For example, both the translation result and the supplemental information may be presented audiovisually, the translation result may be displayed only auditorily, the supplemental information may be presented only visually, or any other presentation A method is also possible.

また、提示部１０８は、翻訳結果を視覚的に表示する場合に、翻訳前テキストも併せて視覚的に表示するようにしても良い。 In addition, when the translation unit visually displays the translation result, the presentation unit 108 may also visually display the pre-translation text.

さらに、提示部１０８は、補足情報を提示するにあたって、補足情報の提示であることをユーザ知らしめるために、音を鳴らす、表示を点滅させる、装置を振動させるなどの処理を行っても良い。 Furthermore, when presenting supplementary information, the presentation unit 108 may perform processing such as sounding, blinking the display, or vibrating the device in order to inform the user that supplementary information is being presented.

なお、本実施形態では、補足情報を、客向けに第一言語で提示する例を中心に説明しているが、補足情報を、接客者向けに第二言語で提示すること、両者向けに第一言語及び第二言語で提示することを可能にしても良く、これらの場合に、客向けの補足情報を聴覚的に提示するときであっても、接客者のみに向けた補足情報は、客に聞こえないように、視覚的に提示するようにしても良い。 In the present embodiment, the supplemental information is mainly described in the first language for the customer. However, the supplementary information is presented in the second language for the customer service, It may be possible to present in one language and in a second language. In these cases, even when supplementary information for customers is presented audibly, supplementary information for customers only You may make it present visually so that it may not be heard.

図２に、本実施形態の全体的な処理手順の一例を示す。 FIG. 2 shows an example of the overall processing procedure of this embodiment.

ユーザが会話を行うと、いずれの話者の音声も入力部１０１に入り（ステップＳ１）、音声認識部１０２による音声認識（ステップＳ２）、機械翻訳部１０３による機械翻訳（ステップＳ３）が順次行われる。なお、例えば、ステップＳ３以降の適当なタイミングで（例えば、ステップＳ３の直後で）、提示部１０８において機械翻訳結果を提示しても良い。 When the user has a conversation, the voice of any speaker enters the input unit 101 (step S1), the voice recognition by the voice recognition unit 102 (step S2), and the machine translation by the machine translation unit 103 (step S3) are sequentially performed. Is called. For example, the presentation unit 108 may present the machine translation result at an appropriate timing after step S3 (for example, immediately after step S3).

更に、発話行為推定部１０４による発話行為の推定（ステップＳ４）、提示要否判定部１０５による補足情報の提示要否判定（ステップＳ４）が順次行われる。 Further, the utterance act estimation by the utterance act estimation unit 104 (step S4) and the supplemental information presentation necessity determination (step S4) by the presentation necessity determination unit 105 are sequentially performed.

提示が必要と判定された場合には（ステップＳ５）、提示候補生成部１０６により、補足情報の候補（提示候補）の生成（ステップＳ６）が行われ、次いで、候補選択部１０７により、提示候補の中から提示に使用する１以上の提示候補を選択する（ステップＳ７）。なお、提示候補をそのまま補足情報としても良いし、ステップＳ７において、候補選択部１０７により、提示候補をもとに説明文を生成して、これを補足情報としても良い。そして、提示部１０８において、適切なタイミングで、補足情報を提示する（ステップＳ８）。そして、ステップＳ１に戻って、一連の処理を繰り返す。 If it is determined that the presentation is necessary (step S5), the candidate candidate generation unit 106 generates a supplemental information candidate (presentation candidate) (step S6), and then the candidate selection unit 107 displays the presentation candidate. One or more presentation candidates to be used for presentation are selected from the list (step S7). In addition, a presentation candidate is good also as supplementary information as it is, and in step S7, the candidate selection part 107 produces | generates an explanatory text based on a presentation candidate, and is good also as supplementary information. Then, the presentation unit 108 presents supplemental information at an appropriate timing (step S8). And it returns to step S1 and repeats a series of processings.

一方、提示が不要と判定された場合には（ステップＳ５）、ステップＳ６〜８をスキップして、ステップＳ１に戻って、一連の処理を繰り返す。 On the other hand, when it is determined that the presentation is unnecessary (step S5), steps S6 to S8 are skipped, the process returns to step S1, and a series of processes is repeated.

以下、図４〜図１３を参照しながら、各構成の処理例についてより詳しく説明する。 Hereinafter, processing examples of each configuration will be described in more detail with reference to FIGS.

（音声認識部１０２）
図４に、音声認識部１０２の処理の例を示す。 (Voice recognition unit 102)
FIG. 4 shows an example of processing of the voice recognition unit 102.

ステップＳ１１において、入力部１において受け付けられた音声を、入力部１から入力する。 In step S <b> 11, the voice received by the input unit 1 is input from the input unit 1.

ステップＳ１２において、入力音声を、音声認識を行うことによって、テキスト化する（翻訳前テキストを生成する）。 In step S12, the input speech is converted into text by performing speech recognition (pre-translation text is generated).

ステップＳ１３において、翻訳前テキストを出力する。 In step S13, the pre-translation text is output.

なお、音声認識時に言語判定を必要とする場合には、図５に示すように、音声認識部１０２は、ステップＳ１４において、言語判定を行う。言語判定では、入力音声から、直接、言語を特定する処理を行っても良いし、その代わりに、例えば、予め接客者（店員）とその声の特徴と使用する言語との対応を設定しておき、入力音声の特徴から話者が接客者か否かを判定することによって、接客者に対応する言語か否かを判定するようにしても良い。あるいは、発話内容から抽出される特徴などを使用しても良い。あるいは、ユーザ自身が、発話と同時に又は発話の前若しくは後に、ボタン等により言語選択を指示しても良い。あるいは、入力部１０１において、話者ごとに専用のマイクロフォンを用意し、マイクロフォンと言語との対応を設定しておき、音声が、いずれのマイクロフォンから入力されたかによって、話者と言語を特定する方法も可能である。また、発話される方向と話者とを対応付けておき、発話される方向を推定し、その推定結果によって話者を特定しても良い。また、第一話者と第二話者のボタンを用意し、話者がボタンを押して、選択しても良い。その際、例えば、第一話者／第二話者が選択されて、音声が入力されたときに、自動的に第二話者／第一話者が選択された状態にすることによって、交互に発話する場合のボタン操作を不要とし、同じ話者が続けて発話するときにのみ、その話者のボタンを押すようにしても良い。また、これらの他にも、言語を特定又は指定する様々な方法が可能である。なお、この場合、ステップＳ１２では、ステップＳ１４における言語判定の結果に従って、音声認識、テキスト化を行う。 If language determination is required during speech recognition, as shown in FIG. 5, the speech recognition unit 102 performs language determination in step S14. In the language determination, the language may be directly specified from the input voice. Instead, for example, the correspondence between the customer (store clerk) and the characteristics of the voice and the language to be used is set in advance. Alternatively, it may be determined whether or not the language corresponds to the customer by determining whether or not the speaker is a customer. Alternatively, features extracted from the utterance content may be used. Alternatively, the user may instruct language selection with a button or the like simultaneously with the utterance or before or after the utterance. Alternatively, in the input unit 101, a dedicated microphone is prepared for each speaker, the correspondence between the microphone and the language is set, and the speaker and the language are specified depending on which microphone the voice is input from. Is also possible. Alternatively, the direction in which the utterance is spoken is associated with the speaker, the direction in which the utterance is made is estimated, and the speaker may be specified based on the estimation result. Alternatively, buttons for the first speaker and the second speaker may be prepared, and the speaker may select the button by pressing the button. At that time, for example, when the first speaker / second speaker is selected and the voice is input, the second speaker / first speaker is automatically selected to be alternately selected. It is also possible to eliminate the need for the button operation when uttering a voice and press the button of the speaker only when the same speaker continuously utters. In addition to these, various methods for specifying or specifying a language are possible. In this case, in step S12, speech recognition and text conversion are performed according to the result of language determination in step S14.

（機械翻訳部１０３）
図６に、機械翻訳部１０３の処理の例を示す。 (Machine translation unit 103)
FIG. 6 shows an example of processing of the machine translation unit 103.

ステップＳ２１おいて、音声認識部１０２によりテキスト化された翻訳前テキストを、入力する。 In step S21, the pre-translation text converted into text by the speech recognition unit 102 is input.

ステップＳ２２おいて、入力テキストの言語判定を行う。 In step S22, the language of the input text is determined.

音声認識部１０２において言語判定が行われない場合に、言語判定は、上記した音声認識部１０２における言語判定と同様でも良い。その代わりに、例えば、翻訳前テキストから、言語判定を行っても良い。 When language determination is not performed in the speech recognition unit 102, the language determination may be the same as the language determination in the speech recognition unit 102 described above. Instead, for example, language determination may be performed from the pre-translation text.

音声認識部１０２において言語判定が行われる場合に、言語判定は、音声認識部１０２による言語判定結果を利用して、ステップＳ２２を省略しても良いし、ステップＳ２２で独自に言語判定を行っても良い。 When language determination is performed in the speech recognition unit 102, the language determination may be performed by using the language determination result by the speech recognition unit 102, omitting step S22, or performing language determination independently in step S22. Also good.

ここで、入力された言語が第一言語である場合、ステップＳ２３において、第一言語から第二言語への翻訳を行い、ステップＳ２４において、第二言語のテキスト（翻訳テキスト）を出力する。 If the input language is the first language, translation from the first language to the second language is performed in step S23, and text (translation text) in the second language is output in step S24.

一方、入力された言語が第二言語である場合、ステップＳ２３において、第二言語から第一言語への翻訳を行い、ステップＳ２４において、第一言語のテキスト（翻訳テキスト）を出力する。 On the other hand, if the input language is the second language, translation from the second language to the first language is performed in step S23, and text (translation text) in the first language is output in step S24.

例えば、ステップＳ２２において、入力言語が日本語であるか英語であるかを判定し、かりに英語と判定したならば、入力テキストを、英語から日本語へ翻訳し、日本語の翻訳テキストを出力する。 For example, in step S22, it is determined whether the input language is Japanese or English. If it is determined that the input language is English, the input text is translated from English to Japanese and the Japanese translation text is output. .

（発話行為推定部１０４）
図７は、発話行為推定部１０４の処理の例を示す。 (Speech act estimation unit 104)
FIG. 7 shows an example of processing of the speech act estimating unit 104.

発話行為推定部１０４は、発話行為推定の処理（ステップＳ３２）においては、常に翻訳後のテキストを使って両言語に対して判定できるようにしても良いし、機械翻訳部１０３のステップＳ２２の言語判定の判定結果又は音声認識部１０２のステップＳ１４の言語判定の判定結果を利用して、翻訳前か翻訳後を選択し、常に同一言語に対して推定するようにしても良い。 The speech act estimation unit 104 may always be able to determine both languages using the translated text in the speech act estimation process (step S32), or the language of step S22 of the machine translation unit 103. Using the determination result or the determination result of the language determination in step S14 of the speech recognition unit 102, it is possible to select before translation or after translation and always infer the same language.

ここでは、常に第二言語でテキスト入力される場合を例示する。 Here, a case where text is always input in the second language is illustrated.

ステップＳ３１において、翻訳前または翻訳後の第二言語のテキスト（すなわち、第一話者に係る翻訳テキストまたは第二話者に係る翻訳前テキスト）が入力される。 In step S31, text in the second language before translation or after translation (that is, translation text relating to the first speaker or text before translation relating to the second speaker) is input.

ステップＳ３２において、ステップ４０２で得られたテキストを用いて、発話行為タグの推定を行う。 In step S32, the speech act tag is estimated using the text obtained in step 402.

なお、発話行為タグの推定時に、対話履歴を使って、前の発話も考慮するようにしても良い。すなわち、ステップＳ３１において、対話履歴データベース１２１にある履歴情報（発話内容と話者タグ）も併せて入力し、ステップＳ３２において、ステップ４０２で得られたテキストと、上記履歴情報（発話内容と話者タグ）を用いて、発話行為タグの推定を行っても良い。 Note that the previous utterance may be taken into account using the dialogue history when estimating the utterance action tag. That is, in step S31, history information (utterance content and speaker tag) in the dialogue history database 121 is also input, and in step S32, the text obtained in step 402 and the history information (utterance content and speaker) are input. Tag) may be used to estimate the speech act tag.

ステップＳ３３において、発話行為タグ推定結果として得られた発話行為タグを出力する。 In step S33, the speech act tag obtained as the speech act tag estimation result is output.

図８に、発話行為タグの例を示す。この例では、「挨拶」「謝辞・謝罪」「相槌」「応答」「質問」「提案」「情報伝達」「要求」といった発話行為タグが定義されている。 FIG. 8 shows an example of the speech act tag. In this example, speech act tags such as “greeting”, “acknowledgment / apology”, “conformity”, “response”, “question”, “suggestion”, “information transmission”, and “request” are defined.

発話行為の推定には、機械学習などの統計的手法を用いて行う方法、手掛り表現などからルールで推定する方法など、種々の方法が利用できる。例えば、「ウィンナーシュニッチェルはいかがですか？」という入力であれば、発話行為タグ＝「提案」、「はい」という入力であれば、発話行為タグ＝「応答」、「ウィンナーシュニッチェルを１つ」という入力であれば、発話行為タグ＝「依頼」、「ウィンナーシュニッチェルというのはどんな料理ですか？」であれば、発話行為タグ＝「質問」のように、発話行為タグを推定することができる。 Various methods such as a method using a statistical method such as machine learning or a method using a rule from a clue expression can be used for estimating the speech act. For example, if the input is "How about Wiener Schnitzel?", If the input is "Suggesting action tag =" Suggestion "," Yes ", then the speaking action tag =" Response ", If the input is “1”, the utterance action tag = “request”, and “What kind of dish is Wiener Schnitchel?”, The utterance action tag = “question”. Can be estimated.

（提示要否判定部１０５）
図９に、提示要否判定部１０５の処理の例を示す。 (Presentation necessity determination unit 105)
FIG. 9 shows an example of processing of the presentation necessity determination unit 105.

ステップＳ４１において、発話行為推定部１０４により出力された発話行為タグと、機械翻訳部１０３のステップＳ２２の言語判定の判定結果または音声認識部１０２のステップＳ１４の言語判定の判定結果とを使用して、話者判定を行う。 In step S41, the speech act tag output by the speech act estimating unit 104 and the determination result of the language determination in step S22 of the machine translation unit 103 or the determination result of the language determination in step S14 of the speech recognition unit 102 are used. Perform speaker determination.

例えば、第一話者が客であり、第二話者が接客者である場合に、事前に接客者の言語である英語が第二言語であると設定しておけば、言語判定結果を利用して言語が第二言語であれば第二話者である接客者が話者タグとして得られる。 For example, if the first speaker is a customer and the second speaker is a customer, if the customer language is set to English as the second language, the language determination result is used. If the language is the second language, the customer who is the second speaker is obtained as the speaker tag.

なお、その代わりに、例えば、予め接客者（店員）とその声の特徴との対応を設定しておき、入力音声の特徴から、話者が接客者か否かを判定するようにしても良い。 Instead of this, for example, the correspondence between the customer service (clerk) and the characteristics of the voice may be set in advance, and it may be determined from the characteristics of the input voice whether or not the speaker is a customer service. .

また、前述したように、音声認識部１０２または機械翻訳部１０３において、言語判定のために話者判定を行っている場合には、その結果を利用しても良い。 In addition, as described above, when the speech recognition unit 102 or the machine translation unit 103 performs speaker determination for language determination, the result may be used.

次に、ステップＳ４２において、発話行為タグと話者タグが入力され、ステップＳ４３において、入力された発話行為タグと話者タグの組み合わせを使用して、補足情報提示要否データベース１２２を参照する。 Next, in step S42, an utterance action tag and a speaker tag are input, and in step S43, the supplementary information presentation necessity database 122 is referred to using the input combination of the utterance action tag and the speaker tag.

図１０に、補足情報提示要否データベース１２２の例を示す。この例では、「話者タグ」、「発話行為タグ」、「提示要否」の３つ組データが複数定義されており、「話者タグ」と「発話行為タグ」の組み合わせに対応する「提示要否」欄の内容を参照することによって、要否判定が可能になる。なお、この例では、便宜上、「提示が必要」を○で、「提示が不要」を×で示している。 FIG. 10 shows an example of the supplementary information presentation necessity database 122. In this example, a plurality of triple data sets of “speaker tag”, “utterance action tag”, and “presentation necessity” are defined, and “speaker tag” and “speech action tag” corresponding to a combination of “ The necessity determination can be made by referring to the contents of the “necessity of presentation” column. In this example, for the sake of convenience, “presentation is required” is indicated by ○, and “not required” is indicated by ×.

さらに、各条件に対して、補足情報を提示する提示タイミングを合わせて記述しても良い。提示タイミングを記述する場合には、「提示タイミング」欄の内容に従って、補足情報を提示することになり、提示タイミングを記述しない場合には、予め定められたタイミングで（例えば、直ちに、あるいは、接客者の次の発言の後で、など）補足情報を提示することになる。 Furthermore, the presentation timing for presenting supplementary information may be described together with each condition. When describing the presentation timing, supplementary information will be presented according to the contents of the “presentation timing” column. When not presenting the presentation timing, the timing is set at a predetermined timing (for example, immediately or when serving customers). Supplementary information will be presented).

なお、本実施形態では、補足情報は第一話者の使用する第一言語で提示される場合を例にとって説明しているが、各条件に対して、補足情報を提示する際に使用する言語を記載する欄を設けても良い。この場合には、「言語」欄に記載された言語で補足情報を提示することになる。これによって、例えば、補足情報を、客向けに第一言語で提示することと、補足情報を、接客者向けに第二言語で提示することを指示できるようになる。また、この場合に、「言語」欄には、第一言語又は第二言語を記載するようにしても良い。さらに、「言語」欄に、第一言語と第二言語の両方を記載することを、可能にしても良い。 In the present embodiment, the supplemental information is described by way of example in the first language used by the first speaker. However, the language used when presenting the supplemental information for each condition is described. May be provided. In this case, supplementary information is presented in the language described in the “Language” column. Thereby, for example, it can be instructed to present the supplementary information in the first language for the customer and to present the supplementary information in the second language for the customer service. In this case, the “language” field may describe the first language or the second language. Furthermore, it may be possible to describe both the first language and the second language in the “Language” column.

図１０の例の場合、ステップＳ４３では、ステップＳ４２で入力された「発話行為タグ」と「話者タグ」の組み合わせに対応する「提示要否」欄の内容（「提示タイミング」欄を設ける場合には、「提示要否」欄の内容及び「提示タイミング」欄の内容）を取得する。 In the case of the example of FIG. 10, in step S43, the content of the “presentation necessity” column corresponding to the combination of the “utterance action tag” and the “speaker tag” input in step S42 (when the “presentation timing” column is provided) In the “presentation necessity” column and the “presentation timing” column).

ステップＳ４４において、補足情報提示要否データベース１２２上で、ステップＳ４２で入力された「発話行為タグ」と「話者タグ」の組み合わせに対応する「提示要否」欄の内容が、「提示が必要」に該当するかを判定する。 In step S44, the content of the “presentation necessity” column corresponding to the combination of the “utterance action tag” and the “speaker tag” input in step S42 on the supplementary information presentation necessity database 122 is “presentation required”. Is determined.

「提示が必要」と判定された場合には、ステップＳ４５において、提示要求を出し、提示が不要と判定された場合には、ステップＳ４６において、非提示要求を出す。 If it is determined that “presentation is necessary”, a presentation request is issued in step S45, and if it is determined that presentation is not required, a non-presentation request is issued in step S46.

「提示タイミング」欄を設ける場合に、「提示タイミング」欄の内容は、提示要求に付加するなどして、提示候補生成部１０６と候補選択部１０７を介して、提示部１０８に伝えても良いし、あるいは、「提示タイミング」欄の内容を提示部１０８に（候補選択部１０７が「提示タイミング」欄の内容を使用する場合には、候補選択部１０７にも）直接伝えるようにしても良い。 When the “presentation timing” field is provided, the content of the “presentation timing” field may be transmitted to the presentation unit 108 via the presentation candidate generation unit 106 and the candidate selection unit 107, for example, by adding to the presentation request. Alternatively, the content of the “presentation timing” field may be directly communicated to the presentation unit 108 (or the candidate selection unit 107 when the candidate selection unit 107 uses the content of the “presentation timing” field). .

なお、「提示タイミング」欄を設ける場合に、「提示タイミング」欄の内容は、ステップＳ４４において、「提示が必要」に該当すると判定されたときのみ、ステップＳ４６の前に取得するようにしても良い。 When the “presentation timing” column is provided, the contents of the “presentation timing” column may be acquired before step S46 only when it is determined in step S44 that the “presentation is required”. good.

（提示候補生成部１０６）
図１１に、提示候補生成部１０６の処理の例を示す。 (Presentation candidate generation unit 106)
FIG. 11 shows an example of processing of the presentation candidate generation unit 106.

提示要否判定部１０５により提示要求が出力された場合にのみ、提示候補生成部１０６の処理を行う。 Only when the presentation request is output by the presentation necessity determination unit 105, the processing of the presentation candidate generation unit 106 is performed.

提示候補生成部１０６は、提示候補生成の処理（ステップＳ５２）においては、常に翻訳後のテキストを使って両言語に対して判定できるようにしても良いし、機械翻訳部１０３のステップＳ２２の言語判定の判定結果または音声認識部１０２のステップＳ１４の言語判定の判定結果を利用して、翻訳前か翻訳後を選択し、常に同一言語で判定できるようにしても良い。ここでは、常に第二言語でテキスト入力される場合を例示する。 The presentation candidate generation unit 106 may always be able to determine both languages using the translated text in the presentation candidate generation process (step S52), or the language in step S22 of the machine translation unit 103. By using the determination result or the determination result of the language determination in step S14 of the voice recognition unit 102, it is possible to select before translation or after translation and always be able to determine in the same language. Here, a case where text is always input in the second language is illustrated.

ステップＳ５１において、翻訳前または翻訳後の第二言語のテキスト（すなわち、第一話者に係る翻訳テキストまたは第二話者に係る翻訳前テキスト）が入力される。 In step S51, text in the second language before translation or after translation (that is, translation text relating to the first speaker or text before translation relating to the second speaker) is input.

次に、ステップＳ５２において、テキストから、キーワードを抽出する。 Next, in step S52, keywords are extracted from the text.

ステップＳ５３において、ステップＳ５２で抽出された各キーワードについて、それが補足説明データベース１２３上に存在するかを確認する。あるキーワードについて、補足説明データベース１２３上に１つ以上のキーワードの補足説明が登録されている場合には、ステップＳ５４において、当該補足説明を出力する。一方、あるキーワードについて、１つもキーワードが登録されていない場合には、ステップＳ５５において、ｎｕｌｌを出力する。 In step S53, it is confirmed whether or not each keyword extracted in step S52 exists on the supplementary explanation database 123. If a supplementary explanation of one or more keywords is registered in the supplementary explanation database 123 for a certain keyword, the supplementary explanation is output in step S54. On the other hand, if no keyword is registered for a certain keyword, null is output in step S55.

図１２に、補足説明データベース１２３の例を示す。補足説明データベース１２３には、索引付けされたキーワードである「単語」欄、補足説明に関する言語情報を示す「言語」欄、複数の補足説明間の優先度を示す「優先度」欄、各言語に対応して与えられる「補足説明」欄の項目がある。この具体例では、接客者が使用する第二言語である英語の“ＷｉｅｎｅｒＳｃｈｎｉｔｚｅｌ”を、日本語、イタリア語などを使用する客にとって、未知情報であるものとしてキーワード登録している。なお、図１２の例では、「優先度」欄に記載されている数値が低い程、優先度が高いものとする。 FIG. 12 shows an example of the supplementary explanation database 123. The supplementary explanation database 123 includes a “word” field that is an indexed keyword, a “language” field that indicates language information related to the supplementary explanation, a “priority” field that indicates a priority among a plurality of supplementary explanations, and a language for each language. There is an item in the “supplemental explanation” column given correspondingly. In this specific example, “Wiener Schnitzel” in English, which is the second language used by the customer, is registered as a keyword as unknown information for customers using Japanese or Italian. In the example of FIG. 12, the lower the numerical value described in the “priority” column, the higher the priority.

例えば、図１２の具体例において、ステップＳ５１のテキスト入力が常に第二言語の場合に、ステップＳ５２でキーワードとして“ＷｉｅｎｅｒＳｃｈｎｉｔｚｅｌ”が得られたとき、ステップＳ５４では、「ウィーン風カツレツ」「油で揚げた牛肉」「オーストリアの代表的な料理」の３つの補足説明を出力する。 For example, in the specific example of FIG. 12, when the text input in step S51 is always in the second language, when “Wiener Schnitzel” is obtained as a keyword in step S52, in step S54, “Vienna cutlet” “oiled” Output three supplementary explanations: “Fried beef” and “Austrian typical dishes”.

（候補選択部１０７）
図１３に、候補選択部１０７の処理の例を示す。 (Candidate selection unit 107)
FIG. 13 shows an example of processing of the candidate selection unit 107.

ステップＳ６１において、提示候補生成部１０６により出力された補足説明を、提示候補として入力する。 In step S61, the supplementary explanation output by the presentation candidate generation unit 106 is input as a presentation candidate.

ステップＳ６２において、複数の補足情報のうちから、提示すべき１以上の補足情報を選択する。 In step S62, one or more supplement information to be presented is selected from the plurality of supplement information.

候補選択の際には、対話履歴データベース１２１の情報（例えば、発話内容のみ、あるいは、発話内容と話者タグ）を調べて、既に発話された内容と類似する補足説明を除いた補足説明を選択するのが望ましい。あるいは、対話履歴データベース１２１の情報から話者タグと発話行為タグの組み合わせにより既知情報を判定して、既知情報と類似する補足説明を除いた補足説明を選択しても良い。例えば、ある対話履歴において、話者タグ＝「客」であり、「ウィンナーシュニッチェルが食べたいんですが、…」のような内容であり、発話行為タグ＝「願望」である場合には、ウィンナーシュニッチェルについては知っているはずなので、図１２のＪＰ−３（日本語、優先度３）の「オーストリアの代表的な料理」のような一般的な知識による補足説明を選択しない、といった判断をしても良い。 When selecting a candidate, the information in the dialogue history database 121 (for example, only the utterance content or the utterance content and the speaker tag) is examined, and a supplementary explanation excluding the supplementary explanation similar to the already uttered content is selected. It is desirable to do. Alternatively, the known information may be determined based on the combination of the speaker tag and the utterance action tag from the information in the dialogue history database 121, and the supplementary explanation excluding the supplementary explanation similar to the known information may be selected. For example, in a dialogue history, if the speaker tag = "customer", "I want to eat Wiener Schnell, but ..." and the speech act tag = "wish" , Because we should know about Wiener Schnitchell, do not select supplementary explanations based on general knowledge such as “Representative dish of Austria” of JP-3 (Japanese, priority 3) in FIG. Such a judgment may be made.

この選択にあたっては、例えば、優先度の高いものから順に、予め定められた個数（例えば、１つ、２つ、など）の補足説明を選択しても良い。 In this selection, for example, a predetermined number (for example, one, two, etc.) of supplementary explanations may be selected in order from the highest priority.

また、この選択にあたっては、「提示タイミング」欄の内容を考慮しても良い。例えば、提示タイミングが「判定直後」であれば、一番優先度の高い補足説明を選択しても良い。あるいは、例えば、提示タイミングが「次発話の直後」であれば、次発話で使われた発話内容と類似する補足説明を除いた補足説明を選択するなどしても良い。 In this selection, the content of the “presentation timing” column may be taken into consideration. For example, if the presentation timing is “immediately after determination”, the supplementary explanation with the highest priority may be selected. Alternatively, for example, if the presentation timing is “immediately after the next utterance”, a supplementary explanation excluding a supplementary explanation similar to the utterance content used in the next utterance may be selected.

ステップＳ６３において、選択された１つ以上の補足説明から、補足情報（ここでは、説明文とする）を生成する。 In step S63, supplemental information (herein, an explanatory text) is generated from the selected one or more supplementary explanations.

ステップＳ６４において、生成された説明文を出力する。 In step S64, the generated explanatory text is output.

例えば、「ウィンナーシュニッチェルはどんな料理ですか？」と客が質問した場合、話者タグ＝「客」であり、発話行為タグ＝「質問」であるので、「接客者の回答後」に補足説明あるいは補足説明をもとに生成された説明文を提示することが予定される。そこで、接客者が「ウィンナーシュニッチェルは油であげた牛肉です」と回答した場合、補足説明データベース１２３のうち、ＪＰ−２（日本語、優先度２）の補足説明が回答されてしまったため、これを補足説明から除いて、例えば、ＪＰ−２（日本語、優先度２）の補足説明とＪＰ−３（日本語、優先度３）の補足説明から、「ウィーン風カツレツで、オーストリアの代表的な料理です」のような説明文を生成して出力するようにしても良い。 For example, when a customer asks “What kind of dish is Wiener Schnitchel?”, The speaker tag = “customer” and the utterance action tag = “question”, so “after the customer's answer” It is scheduled to present supplementary explanations or explanations generated based on the supplementary explanations. Therefore, when the customer replied, “Wiener Schnitchell is beef raised with oil,” the supplementary explanation of JP-2 (Japanese, priority 2) in the supplementary explanation database 123 was answered. Excluding this from the supplementary explanation, for example, from the supplementary explanation of JP-2 (Japanese, priority 2) and the supplementary explanation of JP-3 (Japanese, priority 3), "Vienna cutlet, Austrian A description such as “Typical dish” may be generated and output.

また、例えば「ウィンナーシュニッチェルはいかがですか？」と接客者が提案した場合、話者タグ＝「接客者」であり、発話行為タグ＝「提案」であるので、「接客者の発話直後」に補足情報（説明文）を提示することになる。このとき、優先度の高い２つの補足説明（ＪＰ−１とＪＰ−２）を使用して、「ウィーン風カツレツで、油であげた牛肉になります」のような補足説明を生成して出力するようにしても良い。 Also, for example, when the customer proposes “How about Winner Schnitzel?”, Since the speaker tag = “customer” and the utterance action tag = “suggest”, “just after the customer speaks” ", Supplementary information (descriptive text) will be presented. At this time, using two supplementary explanations (JP-1 and JP-2) with high priority, a supplementary explanation such as “Vietnamese cutlets will be beef raised with oil” is generated and output. You may make it do.

あるいは、話者タグ＝「接客者」であり、発話行為タグ＝「提案」であるようなときには、具体的な調理法などが出ないようにするように、補足説明データベース１２３で発話行為タグ及び／又は接客者タグごとの優先度を付けても良い。 Alternatively, when the speaker tag = “customer” and the utterance action tag = “suggestion”, the utterance action tag and the supplementary explanation database 123 are set so as not to give a specific cooking method. A priority for each customer tag may be attached.

なお、選択された補足説明をそのまま補足情報としても良い。 Note that the selected supplementary explanation may be used as supplementary information as it is.

（提示部１０８）
提示部１０８は、前述したように、少なくとも補足情報を所定のタイミングで提示する。 (Presentation unit 108)
As described above, the presentation unit 108 presents at least supplemental information at a predetermined timing.

図１４に、本実施形態の接客支援装置の動作例を示す。 FIG. 14 shows an operation example of the customer service support apparatus of this embodiment.

客が日本語で「ウィンナーシュニッチェルって何ですか？」と質問すると、その英語による翻訳結果が得られ、これが提示される。 When a customer asks "What is Wiener Schnitchell" in Japanese, the translation result in English is obtained and presented.

ここで、話者タグ＝「客」であり、発話行為タグ＝「質問」であるので、「接客者の回答後」のタイミングで補足情報を提示することになる。補足説明は、例えば、ＪＰ−１（日本語、優先度１）の補足説明「ウィーン風カツレツ」が選択される。 Here, since the speaker tag = “customer” and the utterance action tag = “question”, the supplementary information is presented at the timing “after the customer's answer”. As the supplementary explanation, for example, the supplementary explanation “Vienna cutlet” of JP-1 (Japanese, priority 1) is selected.

上記質問に対して、接客者が「油であげた牛肉です。」と回答すると、その日本語による翻訳結果が得られ、これが提示される。また、このタイミングで、例えば「ウィーン風カツレツです。」のような説明文が補足情報として提示される。 When the customer responds to the above question, “Beef raised with oil”, the translation result in Japanese is obtained and presented. At this timing, an explanatory note such as “Vienna cutlet” is presented as supplementary information.

図１５に、本実施形態の接客支援装置を使用した対話例と補足説明の例を示す。 FIG. 15 shows an example of dialogue using the customer service support apparatus of this embodiment and an example of supplementary explanation.

対話例１では、客が「今日のお勧めコースの内容は何ですか？」と質問したとすると、話者タグ＝「客」かつ発話行為タグ＝「質問」であることから、提示要求が出されるが、補足説明データベースに登録されているキーワードが存在せず、結局、補足情報の提示はない。 In the dialogue example 1, if the customer asks “What is the content of the recommended course today?”, Since the speaker tag = “customer” and the speech act tag = “question”, the presentation request is made. Although there are no keywords registered in the supplementary explanation database, no supplementary information is presented.

次に、接客者が「○○○○、ウィンナーシュニッチェル、○○○○です。」と回答したとすると、話者タグ＝「接客者」かつ発話行為タグ＝「回答」であることから、提示要求が出される。また、提示タイミングは、例えば「接客者の発話直後」となる。そして、補足説明データベースに登録されているキーワード＝ウィンナーシュニッチェルが存在し、現在、「接客者の発話直後」であるので、このタイミングで、例えば「ウィンナーシュニッチェルは、ウィーン風カツレツです。」のような説明文が補足情報として提示される。 Next, if the customer responds “It is XXXXX, Wiener Schnitchell, XXXXX”, the speaker tag = “customer” and the utterance action tag = “answer”. A presentation request is issued. The presentation timing is, for example, “immediately after the customer service utterance”. And there is a keyword registered in the supplementary explanation database = Wiener Schnitzel, and now it is “immediately after the customer's utterance”, so at this time, for example, “Wiener Schnitzel is a Viennese cutlet. "Is presented as supplementary information.

対話例２では、客が「ウィンナーシュニッチェルって何ですか？」と質問したとすると、話者タグ＝「客」かつ発話行為タグ＝「質問」であることから、提示要求が出される。また、提示タイミングは、例えば「接客者の回答後」となる。そして、補足説明データベースに登録されているキーワード＝ウィンナーシュニッチェルが存在し、説明文を補足情報として提示することになる。 In Dialogue Example 2, if the customer asks "What is Wiener Schnell?", The presentation request is issued because the speaker tag = “customer” and the speech act tag = “question”. . The presentation timing is, for example, “after the customer's answer”. Then, there is a keyword = Wiener Schnitchle registered in the supplementary explanation database, and the explanatory text is presented as supplementary information.

ここで、接客者が「当店のシュニッチェルは牛肉を油で揚げたものです。」と回答したとする。この場合、ＪＰ−２（日本語、優先度２）の補足説明「油であげた牛肉」に類似する内容が発話されたので、これを候補から除外し、例えば、ＪＰ−１（日本語、優先度１）の補足説明「ウィーン風カツレツ」が選択される。そして、現在、「接客者の回答後」であるので、このタイミングで、例えば「いわゆるウィーン風カツレツです。」のような説明文が補足情報として提示される。 Here, suppose that the customer responds, "Our Schnitzel is fried beef." In this case, the content similar to JP-2 (Japanese, priority 2) supplementary explanation “beef raised with oil” was spoken, so this was excluded from the candidates, for example, JP-1 (Japanese, The supplementary explanation “Vienna style cutlets” of priority 1) is selected. Since it is “after the customer's answer”, an explanatory sentence such as “so-called Viennese cutlet” is presented as supplementary information at this timing.

以上では、｛第一話者＝客＝第一言語＝日本語、第二話者＝接客者＝第二言語＝英語｝の組み合わせを例にとって説明したが、これに制限されない。他の如何なる組み合わせも可能である。例えば、｛第一話者＝客＝第一言語＝英語、第二話者＝接客者＝第二言語＝日本語｝であっても良いし、例えば、日本語と中国語の組み合わせ、あるいは、英語と中国語の組み合わせであっても良い。 In the above description, the combination of {first speaker = customer = first language = Japanese, second speaker = customer = second language = English} has been described as an example, but the present invention is not limited to this. Any other combination is possible. For example, {first speaker = customer = first language = English, second speaker = customer = second language = Japanese}, for example, a combination of Japanese and Chinese, or A combination of English and Chinese may be used.

また、言語の組み合わせは、予め２つの言語に固定されていても良いし、３以上の言語からユーザが自由に選択できるようにしても良い。また、客と接客者の一方又は両方の言語が予め１つの言語に固定されていても良いし、ユーザが自由に選択できるようにしても良い。 In addition, the combination of languages may be fixed in advance to two languages, or the user may freely select from three or more languages. Further, one or both languages of the customer and the customer service may be fixed in advance to one language, or the user may freely select.

また、第一話者が接客者であり、第二話者が客であっても良い。さらに、２名の話者が接客者−客という関係でなくても良い。 The first speaker may be a customer service and the second speaker may be a customer. Further, the two speakers need not have the customer-customer relationship.

また、以上では、未知情報があることを判断して補足情報を提示する対象を、第一話者としたが、第二話者とすることも、両方の話者とすることも可能である。また、第一話者を対象にするか、第二話者を対象にするか、両方の話者を対象にするかについて、予め固定されていても良いし、ユーザが自由に選択できるようにしても良い。 In addition, in the above, the first speaker is the target to present supplementary information after judging that there is unknown information, but it is also possible to be the second speaker or both speakers . In addition, whether to target the first speaker, the second speaker, or both speakers may be fixed in advance, and the user can freely select. May be.

以上のように、本実施形態によれば、話者が異なる言語で又は同じ言語で対話をする場合に、対話に応じて知識を補うための情報を提示して対話を支援することが可能になる。 As described above, according to the present embodiment, when a speaker interacts in different languages or in the same language, it is possible to support the conversation by presenting information for supplementing knowledge according to the conversation. Become.

（第１の変形例）
図１６に、ボタンにより言語選択を行う場合の構成例を示す。この場合、これまで説明した構成において、入力部１０１の前に、言語選択入力部１１１を設け、言語選択入力部１１１により入力する言語を指定あるいは選択する。 (First modification)
FIG. 16 shows a configuration example when language selection is performed using buttons. In this case, in the configuration described so far, the language selection input unit 111 is provided in front of the input unit 101, and the language input by the language selection input unit 111 is designated or selected.

（第２の変形例）
図１７に、提示内容判定する場合の構成例を示す。この場合、提示要否判定部１０５の代わりに提示内容判定部１１２を設ける。また、補足説明データベース１２３に加えて、所定数の他のデータベースを設ける。ここでは、一例として、たとえば各料理に関する推奨情報を保持する商品情報データベース１２５と、たとえば各料理の残り数のような管理情報を保持する管理情報データベース１２６を設けるものとする。 (Second modification)
FIG. 17 shows a configuration example in the case of determining the presentation content. In this case, a presentation content determination unit 112 is provided instead of the presentation necessity determination unit 105. In addition to the supplementary explanation database 123, a predetermined number of other databases are provided. Here, as an example, it is assumed that a product information database 125 that holds recommended information about each dish, for example, and a management information database 126 that holds management information such as the remaining number of each dish, for example.

また、補足情報提示要否データベース１２２の代わりに補足情報提示内容データベース１２４を使用する。 Further, the supplement information presentation content database 124 is used instead of the supplement information presentation necessity database 122.

図１８に、補足情報提示内容データベース１２４の例を示す。補足情報提示内容データベース１２４が、補足情報提示要否データベース１２２と相違する点は、「表示要否」欄の代わりに、「参照データベース」欄を有する点である。「参照データベース」には、提示要否ではなく、参照すべきデータベースが記載される。例えば、ＤＢ１は補足説明データベース１２３を示し、ＤＢ２は商品情報データベース１２５を示し、ＤＢ３は管理情報データベース１２６を示す。なお、必ず「参照データベース」欄にいずれかのデータベースを記載するようにしても良いし、「参照データベース」欄を空欄にする（すなわち、空欄は、提示が不要であることを示す）ことを、可能にしても良い。 FIG. 18 shows an example of the supplementary information presentation content database 124. The supplementary information presentation content database 124 is different from the supplementary information presentation necessity database 122 in that a “reference database” column is provided instead of the “display necessity” column. In the “reference database”, not the necessity of presentation, but a database to be referred to is described. For example, DB1 indicates the supplementary explanation database 123, DB2 indicates the product information database 125, and DB3 indicates the management information database 126. Note that any database may be described in the “reference database” field, and the “reference database” field is left blank (that is, the blank field indicates that presentation is unnecessary), It may be possible.

本実施形態では、提示内容判定部１１２は、「参照データベース」欄の内容によって、参照すべきデータベースを特定する。なお、「参照データベース」欄を空欄にすることを、可能にする場合には、提示内容判定部１１２は、提示の要否及び提示が必要である場合において参照すべきデータベースを特定する。 In the present embodiment, the presentation content determination unit 112 identifies a database to be referred to based on the content of the “reference database” column. When it is possible to leave the “reference database” field blank, the presentation content determination unit 112 identifies the necessity of presentation and the database to be referred to when presentation is necessary.

図１９に、本実施形態における対話例を示す。客の質問あるいは接客者の提案の際には、補足情報を提示する点は上述のシステムと同様である。（ａ）の客の質問「ウィンナーシュニッチェルって何ですか？」と、（ｂ）の接客者の提案「ウィンナーシュニッチェルはいかがでしょうか？」は、補足説明データベース１２３を参照するので、いずれも、第１の実施形態と同様の結果になる。 FIG. 19 shows an example of interaction in the present embodiment. In the case of a customer question or a customer service suggestion, the point that supplementary information is presented is the same as the above-described system. (A) Customer's question “What is Wiener Schnitchell?” And (b) Customer service suggestion “How about Winner Schnitzel?” Refer to supplementary explanation database 123 In both cases, the same result as in the first embodiment is obtained.

これらに対して、（ｃ）のように客が「ウィンナーシュニッチェルをお願いします。」と要求をした場合には、管理情報データベース１２６を参照することになる。たとえば、管理情報データベース１２６には、各料理ごとの残り数が保持されており、ウィンナーシュニッチェルに対応する残り数を検索して、例えば「残り１人前」のような補足情報を提示し、既に売切れてしまった料理の注文を受けないようにしても良い。その際、補足情報提示内容データベース１２４に「言語」として第二言語が指示されている場合に、例えば「残り１人前」のような補足情報を接客者向けに第二言語で提示するようにしても良い。 On the other hand, when the customer requests “Please winner schnitzel” as shown in (c), the management information database 126 is referred to. For example, the management information database 126 holds the remaining number for each dish, searches for the remaining number corresponding to the Wiener Schnitchell, and presents supplementary information such as “one serving”, You may not accept orders for dishes that have already been sold out. At this time, when the second language is designated as “language” in the supplementary information presentation content database 124, for example, supplementary information such as “remaining one serving” is presented to the customer in the second language. Also good.

また、（ｄ）のように接客者が「ウィンナーシュニッチェルですね。」と確認をした場合には、商品情報データベース１２５を参照することになり、たとえば、接客者向け及び／又は客向けに、さらに追加で頼むと良いお勧め情報（例えばウィンナーシュニッチェルに合うお勧めワインリストのような情報）のような補足情報を提示するようにしても良い。 In addition, when the customer confirms “It is Wiener Schnitchel” as in (d), the product information database 125 is referred to, for example, for the customer and / or the customer. Further, supplementary information such as recommended information (for example, information such as a recommended wine list suitable for Wiener Schnitchell) may be presented.

なお、ここで説明した各種データベースは一例であり、様々なデータベースを利用して補足情報を生成し、提示することが可能である。 Note that the various databases described here are examples, and supplementary information can be generated and presented using various databases.

なお、第１の実施形態における第１の変形例と第２の変形例の一方又は両方を、第２の実施形態に組み合わせて実施することが可能である。 It should be noted that one or both of the first modification and the second modification in the first embodiment can be implemented in combination with the second embodiment.

（第２の実施形態）
以下では、第２の実施形態について、第１の実施形態と相違する点を中心に説明する。 (Second Embodiment)
In the following, the second embodiment will be described with a focus on differences from the first embodiment.

第１の実施形態は、異なる言語を使用する話者の対話に機械翻訳を利用する際に、補足説明を提示するものであったが、第２の実施形態は、背景知識の異なる話者が、機械翻訳を利用せず、同一の言語で対話をする際に、補足説明を提示するものである。 In the first embodiment, supplementary explanation is presented when machine translation is used for dialogue between speakers using different languages. However, in the second embodiment, speakers with different background knowledge are presented. When a dialogue is conducted in the same language without using machine translation, a supplementary explanation is presented.

図２０に、本実施形態の接客支援装置の機能構成例を示す。 FIG. 20 shows a functional configuration example of the customer service support apparatus of the present embodiment.

図２０に示されるように、本実施形態の接客支援装置は、入力部１０１、音声認識部１０２、話者特定部１１０３、発話行為推定部１０４、提示要否判定部１０５、提示候補生成部１０６、候補選択部１０７、提示部１０８を含む。すなわち、図１の構成例において、機械翻訳部１０３の代わりに話者特定部１１０３を有するものである。 As shown in FIG. 20, the customer service support apparatus according to the present embodiment includes an input unit 101, a speech recognition unit 102, a speaker identification unit 1103, a speech act estimation unit 104, a presentation necessity determination unit 105, and a presentation candidate generation unit 106. The candidate selection unit 107 and the presentation unit 108 are included. That is, in the configuration example of FIG. 1, the speaker specifying unit 1103 is provided instead of the machine translation unit 103.

入力部１０１は、特定の言語（例えば、日本語、英語など、両話者に共通の言語）により、第一話者による音声と、第二話者による音声とを受け付ける。 The input unit 101 receives the voice of the first speaker and the voice of the second speaker in a specific language (for example, a language common to both speakers such as Japanese and English).

音声認識部１０２は、特定の言語により、音声情報を認識し、テキスト情報に変換する。 The voice recognition unit 102 recognizes voice information in a specific language and converts it into text information.

話者特定部１１０３は、話者を特定する。 The speaker specifying unit 1103 specifies a speaker.

話者特定部１１０３は、第１の実施形態で例示した話者特定方法（言語により話者を特定するものを除く）を使用しても良いし、他のどのような方法でも良い。 The speaker specifying unit 1103 may use the speaker specifying method (except for specifying the speaker by language) exemplified in the first embodiment, or any other method.

なお、音声認識部１０２が話者を特定する機能を有する場合には、音声認識部１０２が有する話者特定機能を、話者特定部１１０３としても良い。 Note that when the voice recognition unit 102 has a function of specifying a speaker, the speaker specifying function of the voice recognition unit 102 may be the speaker specifying unit 1103.

発話行為推定部１０４は、テキスト情報から発話行為を推定する。 The utterance action estimation unit 104 estimates the utterance action from the text information.

対話履歴データベース１２１は、次の点以外、第１の実施形態と同様である。本実施形態では、両話者が共通の言語を使用し、機械翻訳を行わないので、機械翻訳部で得られる翻訳テキストによる対話履歴を保存しない。 The dialogue history database 121 is the same as that of the first embodiment except for the following points. In this embodiment, since both speakers use a common language and do not perform machine translation, the conversation history by the translated text obtained by the machine translation unit is not saved.

補足情報提示要否データベース１２２は、第１の実施形態と同様である。 The supplementary information presentation necessity database 122 is the same as in the first embodiment.

提示要否判定部１０５は、第１の実施形態と同様である。また、本実施形態では、話者特定部１１０３により話者特定結果を利用することができる。 The presentation necessity determination unit 105 is the same as that in the first embodiment. In this embodiment, the speaker specifying unit 1103 can use the speaker specifying result.

補足説明データベース１２３は、次の点以外、第１の実施形態と同様である。本実施形態では、両話者が共通の言語を使用し、機械翻訳を行わないので、「言語」欄は不要であり、「単語」欄と「補足説明」欄には同一の言語が使用される。 The supplementary explanation database 123 is the same as that of the first embodiment except for the following points. In this embodiment, since both speakers use a common language and do not perform machine translation, the “language” field is unnecessary, and the same language is used in the “word” field and the “supplemental explanation” field. The

提示候補生成部１０６は、第１の実施形態と同様である。 The presentation candidate generation unit 106 is the same as that in the first embodiment.

候補選択部１０７は、第１の実施形態と同様である。 The candidate selection unit 107 is the same as that in the first embodiment.

提示部１０８は、第１の実施形態と同様である。ただし、本実施形態では、翻訳テキストを提示することはない。また、提示部１０８は、補足情報のみを提示するようにしても良い。 The presentation unit 108 is the same as that in the first embodiment. However, in this embodiment, the translation text is not presented. Further, the presentation unit 108 may present only supplemental information.

図２１に、本実施形態の全体的な処理手順の一例を示す。 FIG. 21 shows an example of the overall processing procedure of this embodiment.

ユーザが会話を行うと、いずれの話者の音声も入力部１０１に入り（ステップＳ１０１）、音声認識部１０２による音声認識（ステップＳ１０２）、話者特定部１１０３による話者の特定（ステップＳ１０３）が順次行われる。 When the user has a conversation, the voice of any speaker enters the input unit 101 (step S101), voice recognition by the voice recognition unit 102 (step S102), and speaker identification by the speaker identification unit 1103 (step S103). Are performed sequentially.

更に、発話行為推定部１０４による発話行為の推定（ステップＳ１０４）、提示要否判定部１０５による補足情報の提示要否判定（ステップＳ１０４）が順次行われる。 Further, the utterance act estimation by the utterance act estimation unit 104 (step S104) and the supplemental information presentation necessity judgment (step S104) by the presentation necessity judgment unit 105 are sequentially performed.

提示が必要と判定された場合には（ステップＳ１０５）、提示候補生成部１０６により、補足情報の候補（提示候補）の生成（ステップＳ１０６）が行われ、次いで、候補選択部１０７により、提示候補の中から提示に使用する１以上の提示候補を選択する（ステップＳ１０７）。なお、提示候補をそのまま補足情報としても良いし、ステップＳ１０７において、候補選択部１０７により、提示候補をもとに説明文を生成して、これを補足情報としても良い。そして、提示部１０８において、適切なタイミングで、補足情報を提示する（ステップＳ１０８）。そして、ステップＳ１０１に戻って、一連の処理を繰り返す。 When it is determined that the presentation is necessary (step S105), the candidate for supplementary information (presentation candidate) is generated (step S106) by the presentation candidate generation unit 106, and then the candidate for presentation is displayed by the candidate selection unit 107. One or more presentation candidates to be used for presentation are selected from the list (step S107). In addition, a presentation candidate is good also as supplementary information as it is, and in step S107, the candidate selection part 107 produces | generates an explanatory note based on a presentation candidate, and is good also as supplementary information. Then, the presentation unit 108 presents supplementary information at an appropriate timing (step S108). And it returns to step S101 and repeats a series of processes.

一方、提示が不要と判定された場合には（ステップＳ１０５）、ステップＳ１０６〜１０８をスキップして、ステップＳ１０１に戻って、一連の処理を繰り返す。 On the other hand, when it is determined that the presentation is unnecessary (step S105), steps S106 to S108 are skipped, the process returns to step S101, and a series of processes is repeated.

図２０及び図２１を参照して説明した構成に、更に、第１の実施形態で説明した第１の変形例と第２の変形例の一方又は両方を組み合わせて実施することも可能である。 The configuration described with reference to FIGS. 20 and 21 may be further implemented by combining one or both of the first modification and the second modification described in the first embodiment.

なお、第２の変形例を適用する場合に、補足情報提示内容データベース１２４は、第１の実施形態と同様で構わない。 In addition, when applying a 2nd modification, the supplementary information presentation content database 124 may be the same as that of 1st Embodiment.

なお、以上の各実施形態又は各変形例で説明したデータベースは、例えば、対話支援装置の内部に存在しても良い。あるいは、一部又は全部のデータベースが、例えばＬＡＮ等のネットワーク上に存在し、対話支援装置が該ネットワークを介してデータベースから情報を取得するようにしても良い。 Note that the database described in each of the above-described embodiments or modifications may exist, for example, inside the dialogue support apparatus. Alternatively, a part or all of the database may exist on a network such as a LAN, and the dialogue support apparatus may acquire information from the database via the network.

また、上述の実施形態の中で示した処理手順に示された指示は、ソフトウェアであるプログラムに基づいて実行されることが可能である。汎用の計算機システムが、このプログラムを予め記憶しておき、このプログラムを読み込むことにより、上述した実施形態の対話支援装置による効果と同様な効果を得ることも可能である。上述の実施形態で記述された指示は、コンピュータに実行させることのできるプログラムとして、磁気ディスク（フレキシブルディスク、ハードディスクなど）、光ディスク（ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷ、ＤＶＤ−ＲＯＭ、ＤＶＤ±Ｒ、ＤＶＤ±ＲＷなど）、半導体メモリ、またはこれに類する記録媒体に記録される。コンピュータまたは組み込みシステムが読み取り可能な記録媒体であれば、その記憶形式は何れの形態であってもよい。コンピュータは、この記録媒体からプログラムを読み込み、このプログラムに基づいてプログラムに記述されている指示をＣＰＵで実行させれば、上述した実施形態の対話支援装置と同様な動作を実現することができる。もちろん、コンピュータがプログラムを取得する場合または読み込む場合はネットワークを通じて取得または読み込んでもよい。
また、記録媒体からコンピュータや組み込みシステムにインストールされたプログラムの指示に基づきコンピュータ上で稼働しているＯＳ（オペレーティングシステム）や、データベース管理ソフト、ネットワーク等のＭＷ（ミドルウェア）等が本実施形態を実現するための各処理の一部を実行してもよい。
さらに、本実施形態における記録媒体は、コンピュータあるいは組み込みシステムと独立した媒体に限らず、ＬＡＮやインターネット等により伝達されたプログラムをダウンロードして記憶または一時記憶した記録媒体も含まれる。
また、記録媒体は１つに限られず、複数の媒体から本実施形態における処理が実行される場合も、本実施形態における記録媒体に含まれ、媒体の構成は何れの構成であってもよい。 The instructions shown in the processing procedure shown in the above embodiment can be executed based on a program that is software. The general-purpose computer system stores this program in advance and reads this program, so that it is possible to obtain the same effect as that obtained by the dialogue support apparatus of the above-described embodiment. The instructions described in the above-described embodiments are, as programs that can be executed by a computer, magnetic disks (flexible disks, hard disks, etc.), optical disks (CD-ROM, CD-R, CD-RW, DVD-ROM, DVD). ± R, DVD ± RW, etc.), semiconductor memory, or a similar recording medium. As long as the recording medium is readable by the computer or the embedded system, the storage format may be any form. If the computer reads the program from the recording medium and causes the CPU to execute instructions described in the program based on the program, the same operation as the dialogue support apparatus of the above-described embodiment can be realized. Of course, when the computer acquires or reads the program, it may be acquired or read through a network.
In addition, the OS (operating system), database management software, MW (middleware) such as a network, etc. running on the computer based on the instructions of the program installed in the computer or embedded system from the recording medium implement this embodiment. A part of each process for performing may be executed.
Furthermore, the recording medium in the present embodiment is not limited to a medium independent of a computer or an embedded system, but also includes a recording medium in which a program transmitted via a LAN, the Internet, or the like is downloaded and stored or temporarily stored.
Further, the number of recording media is not limited to one, and when the processing in this embodiment is executed from a plurality of media, it is included in the recording medium in this embodiment, and the configuration of the media may be any configuration.

なお、本実施形態におけるコンピュータまたは組み込みシステムは、記録媒体に記憶されたプログラムに基づき、本実施形態における各処理を実行するためのものであって、パソコン、マイコン等の１つからなる装置、複数の装置がネットワーク接続されたシステム等の何れの構成であってもよい。
また、本実施形態におけるコンピュータとは、パソコンに限らず、情報処理機器に含まれる演算処理装置、マイコン等も含み、プログラムによって本実施形態における機能を実現することが可能な機器、装置を総称している。 The computer or the embedded system in the present embodiment is for executing each process in the present embodiment based on a program stored in a recording medium. The computer or the embedded system includes a single device such as a personal computer or a microcomputer. The system may be any configuration such as a system connected to the network.
In addition, the computer in this embodiment is not limited to a personal computer, but includes an arithmetic processing device, a microcomputer, and the like included in an information processing device, and is a generic term for devices and devices that can realize the functions in this embodiment by a program. ing.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although several embodiments of the present invention have been described, these embodiments are presented by way of example and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

１０１…入力部、１０２…音声認識部、１０３…機械翻訳部、１０４…発話行為推定部、１０５…提示要否判定部、１０６…提示候補生成部、１０７…候補選択部、１０８…提示部、１１１…言語選択入力部、１１２…提示内容判定部、１１０３…話者特定部、１２１…対話履歴データベース、１２２…補足情報提示要否データベース、１２３…補足説明データベース、１２４…補足情報提示内容データベース、１２５…管理情報データベース、商品情報データベース１２６。 DESCRIPTION OF SYMBOLS 101 ... Input part, 102 ... Speech recognition part, 103 ... Machine translation part, 104 ... Speech action estimation part, 105 ... Presentation necessity judgment part, 106 ... Presentation candidate production | generation part, 107 ... Candidate selection part, 108 ... Presentation part, DESCRIPTION OF SYMBOLS 111 ... Language selection input part, 112 ... Presentation content determination part, 1103 ... Speaker specific part, 121 ... Dialog history database, 122 ... Supplementary information presentation necessity database, 123 ... Supplementary explanation database, 124 ... Supplementary information presentation content database, 125... Management information database, product information database 126.

Claims

A generating unit that generates supplementary information from a supplementary explanation database based on a speech act tag and a speaker tag based on input text information;
And an output unit that outputs the supplementary information.

A dialogue history database for storing all or part of the text information as a dialogue history;
An estimation unit for estimating a speech act tag based on the text information;
A determination unit that determines whether to output the supplementary information based on the estimated speech act tag and the speaker tag;
A selection unit that selects one or more presentation candidates to be used for presentation among the supplementary information, and
The interactive apparatus according to claim 1, wherein the generation unit generates the supplemental information when it is determined to output the supplemental information.

A speaker specifying unit for specifying a speaker for the text information;
The dialogue apparatus according to claim 2, wherein the determination unit determines whether to output the supplementary information based on the estimated speech act tag and the specified speaker.

A machine translation unit that translates the text information in the first language and generates translated text information in a second language different from the first language;
The dialogue apparatus according to claim 2 or 3, wherein the dialogue history database stores all or part of text information before or after translation in the first language or the second language. .

Whether the determination unit specifies a speaker for the text information or acquires a speaker identification result, and outputs the supplementary information based on the estimated speech act tag and the identified speaker 5. The interactive apparatus according to claim 2, wherein the interactive device is determined.

6. The selection unit according to claim 2, wherein the selection unit excludes one or more presentation candidates whose corresponding contents are already included in the dialogue history from the candidates. The interactive apparatus according to claim 1.

The generating unit generates the supplemental information when a keyword registered in advance as unknown information exists in the text information that is determined to output the supplemental information. The interactive apparatus according to any one of claims 2 to 6.

The generation unit executes generation with reference to a reference database,
8. The method according to claim 2, wherein the determination unit determines whether to refer to any one of the plurality of predetermined reference databases. 9. The interactive device described in 1.

9. The dialogue apparatus according to claim 2, wherein the estimation unit estimates the utterance action tag also using the dialogue history.

10. The dialog history database stores the estimated utterance action tag together with the text information that is the basis of the estimated utterance action tag. The interactive apparatus according to claim 1.

The dialogue apparatus according to claim 10, wherein the selection unit selects one to be presented from the one or more presentation candidates using the dialogue history including the utterance action tag.

The determination unit also determines an output timing for outputting the supplemental information when it is determined to output the supplemental information,
The interactive device according to any one of claims 2 to 11, wherein the output unit outputs the supplementary information according to the output timing.

A dialogue method of a dialogue device including a generation unit and an output unit,
The generating unit generating supplementary information from a supplementary explanation database based on a speech act tag and a speaker tag based on input text information;
The output unit includes a step of outputting the supplementary information.

A program for causing a computer to function as an interactive device having a generation unit and an output unit,
A generating unit that generates supplementary information from a supplementary explanation database based on a speech act tag and a speaker tag based on input text information;
A program for causing a computer to realize an output unit that outputs the supplementary information.