JP7221902B2

JP7221902B2 - Dialogue device, program and method for switching dialogue content according to user's interest level

Info

Publication number: JP7221902B2
Application number: JP2020044600A
Authority: JP
Inventors: 剣明呉; 正樹内藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2020-03-13
Filing date: 2020-03-13
Publication date: 2023-02-14
Anticipated expiration: 2040-03-13
Also published as: JP2021144633A

Description

本発明は、ユーザと自然な対話を実現する対話装置の技術に関する。 TECHNICAL FIELD The present invention relates to technology of a dialogue device that realizes natural dialogue with a user.

対話装置は、一般的に、マイクによってユーザの発話音声を収音し、その発話音声から音声認識によってテキストに変換し、そのテキストに応じた対話文を生成し、その対話文から音声合成によって生成した対話音声をスピーカから発声する。対話型ＡＩ(Artificial Intelligence)としては、例えば、「Siri（登録商標）」や「しゃべってコンシェル（登録商標）」のような対話システムや、「Google Home（登録商標）」や「Amazon Echo（登録商標）」のようなスマートスピーカがある。このような技術は、音楽の再生や、天気予報・ニュースを知らせるなどの特定タスクを実行することに適する。 A dialogue device generally picks up a user's uttered voice with a microphone, converts the uttered voice into text by voice recognition, generates a dialogue sentence according to the text, and generates a dialogue sentence from the dialogue sentence by speech synthesis. The dialogue voice is uttered from the speaker. Examples of interactive AI (Artificial Intelligence) include dialogue systems such as “Siri (registered trademark)” and “Talking Concier (registered trademark)”, “Google Home (registered trademark)” and “Amazon Echo (registered trademark)”. trademark)”. Such technology is suitable for performing specific tasks, such as playing music or providing weather forecasts and news.

また、ユーザの対話相手を擬人化した「SOTA（登録商標）」や「ユニボー（登録商標）」のようなロボットの技術もある。この技術によれば、ロボットは、ユーザの周辺状況からキーワードを抽出し、そのキーワードをテンプレートに埋め込んだ対話文を生成し、その対話文をユーザへ発声する。これによって、対話のきっかけをユーザの周辺状況から得ることができる。 There are also robot technologies such as "SOTA (registered trademark)" and "Unibo (registered trademark)" that personify the user's conversation partner. According to this technology, a robot extracts a keyword from the user's surroundings, generates a dialogue sentence with the keyword embedded in a template, and utters the dialogue sentence to the user. This makes it possible to obtain a trigger for dialogue from the surrounding situation of the user.

従来、ユーザからの要求に返答するだけでなく、対話を継続するための技術がある。例えば複数の対話学習エンジンを備えた上で、ユーザから入力された発話文と過去の対話履歴とを用いて、強く関連する対話学習エンジンを選択し、その対話学習エンジンから返答する技術がある（例えば特許文献１参照）。
また、話題毎にキーワードリストを対応付けた上で、ユーザの発話文の中から形態素解析によって複数のキーワードを抽出し、キーワードリストと所定関係（類似関係及び上下関係）にある話題で対話を継続させる技術もある（例えば特許文献２参照）。この技術によれば、キーワードと関係がない対話に対しては、予め用意された対話シナリオに沿って対話を進行させる。 Conventionally, there are techniques for continuing dialogue in addition to responding to requests from users. For example, there is a technique in which a plurality of dialogue learning engines are provided, and a strongly related dialogue learning engine is selected using the utterance sentence input by the user and the past dialogue history, and the dialogue learning engine replies ( For example, see Patent Document 1).
In addition, after associating a keyword list with each topic, multiple keywords are extracted from the user's utterances by morphological analysis, and the conversation continues on topics that have a predetermined relationship (similarity and hierarchical relationship) with the keyword list. There is also a technique to allow the operation (see, for example, Patent Literature 2). According to this technique, for dialogues unrelated to keywords, the dialogue proceeds according to a prepared dialogue scenario.

特開２００７－４７４８８号公報JP-A-2007-47488 特開２０１７－４９４７１号公報JP 2017-49471 A

ＮＴＴコミュニケーション科学基礎研究所、「汎用的な意味解析技術への挑戦」、[online]、［令和２年３月５日］、インターネット＜URL:https://www.ntt.co.jp/journal/0806/files/jn200806024.pdf＞NTT Communication Science Laboratories, "Challenges to Versatile Semantic Analysis Technology", [online], [March 5, 2020], Internet <URL: https://www.ntt.co.jp/ journal/0806/files/jn200806024.pdf> MathWorks、「顔認識」、[online]、［令和２年３月５日］、インターネット＜https://jp.mathworks.com/discovery/face-recognition.html＞MathWorks, "Face Recognition", [online], [March 5, 2020], Internet <https://jp.mathworks.com/discovery/face-recognition.html>

前述した特許文献１及び２に記載の技術は、予め用意された対話シナリオに沿って対話を進行するために、話題が少ないという課題があった。特に、特許文献２に記載の技術は、ユーザの発話文と話題との所定関係でしか参照していないために、話題の展開は、キーワードリストに依存することとなっていた。 The techniques described in Patent Literatures 1 and 2 described above have a problem in that there are few topics to talk about because the dialogue progresses according to dialogue scenarios prepared in advance. In particular, the technology described in Patent Literature 2 refers only to a predetermined relationship between a user's utterance and a topic, so development of the topic depends on the keyword list.

これに対し、発明者らは、対話を継続させるために、ユーザの関心度の高い話題で、対話内容を展開する「雑談対話型ＡＩ」を開発することはできないか、と考えた。 In response to this, the inventors wondered whether it would be possible to develop a "chat dialogue type AI" that expands the content of dialogue on topics of high interest to the user in order to continue the dialogue.

そこで、本発明は、ユーザに飽きられることなく雑談のような対話を継続するために、ユーザの関心度に応じて対話内容を切り替える対話装置、プログラム及び方法を提供することを目的とする。 SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to provide an interactive device, a program, and a method for switching dialogue contents according to the user's degree of interest in order to continue chat-like dialogue without boring the user.

本発明によれば、複数の対話学習エンジンから選択した対話学習エンジンを用いて、ユーザと対話する対話装置において、
全ての対話学習エンジンから対話シナリオを取得し、全ての対話シナリオから所定条件に基づく複数の語彙を抽出する語彙抽出手段と、
抽出された複数の語彙を、意味属性的に類似するクラスタに分類したシソーラス辞書を作成すると共に、現在の対話内容に対応する語彙にマーカーを付すシソーラス辞書作成手段と、
対話中のユーザに基づくマルチメディアデータを取得するユーザデータ取得手段と、
マルチメディアデータから、現在の対話内容に対するユーザの関心度を推定する関心度推定エンジンと、
シソーラス辞書のマーカー語彙と類似度が最も高い対話シナリオを持つ対話学習エンジンを選択する対話学習エンジン選択手段と、
関心度が高いほどマーカー語彙と同じクラスタに属する他の語彙へ、関心度が低いほどマーカー語彙と異なるクラスタに属する語彙へ、マーカーを移動させるマーカー語彙移動制御手段と
を有することを特徴とする。 According to the present invention, in a dialogue device that dialogues with a user using a dialogue learning engine selected from a plurality of dialogue learning engines,
Vocabulary extraction means for acquiring dialogue scenarios from all dialogue learning engines and extracting a plurality of vocabularies based on predetermined conditions from all dialogue scenarios;
a thesaurus dictionary creating means for creating a thesaurus dictionary in which the plurality of extracted vocabularies are classified into clusters having similar semantic attributes, and marking the vocabularies corresponding to the content of the current dialogue;
user data acquisition means for acquiring multimedia data based on an interacting user;
an interest level estimation engine for estimating a user's level of interest in current dialogue content from multimedia data;
dialogue learning engine selection means for selecting a dialogue learning engine having a dialogue scenario with the highest degree of similarity to the marker vocabulary of the thesaurus dictionary;
It is characterized by having a marker vocabulary movement control means for moving the marker to another vocabulary belonging to the same cluster as the marker vocabulary when the interest is high, and moving the marker to a vocabulary belonging to a cluster different from the marker vocabulary when the interest is low.

本発明の対話装置における他の実施形態によれば、
語彙及び対話シナリオは、意味属性的に近いほど距離が近くなるようにベクトル表現されており、
シソーラス辞書作成手段は、語彙同士のベクトルの距離が近いほど同一のクラスタに分類する
ことも好ましい。 According to another embodiment of the dialogue device of the present invention,
Vocabularies and dialogue scenarios are expressed in vectors so that the closer they are in terms of semantic attributes, the closer the distance is.
It is also preferable that the thesaurus dictionary creation means classify the terms into the same cluster as the distance between the vectors of the vocabularies becomes closer.

本発明の対話装置における他の実施形態によれば、
マーカー語彙移動制御手段は、関心度が低いほどマーカー語彙との距離が遠いクラスタに属する語彙へ、マーカーを移動させる
ことも好ましい。 According to another embodiment of the dialogue device of the present invention,
It is also preferable that the marker vocabulary movement control means moves the marker to a vocabulary belonging to a cluster that is farther from the marker vocabulary as the interest level is lower.

本発明の対話装置における他の実施形態によれば、
複数の対話学習エンジンは、汎用対話学習エンジンに加えて、時事用対話学習エンジン、テレビ用対話学習エンジン、及び／又は、専門用対話学習エンジンを含む
ことも好ましい。 According to another embodiment of the dialogue device of the present invention,
The plurality of dialogue learning engines also preferably includes a current affairs dialogue learning engine, a television dialogue learning engine, and/or a specialized dialogue learning engine in addition to the general purpose dialogue learning engine.

本発明の対話装置における他の実施形態によれば、
対話装置は、カメラに接続されており、
マルチメディアデータは、カメラによって撮影されたユーザの顔画像の特徴量であり、
関心度推定エンジンは、学習段階として、顔画像の特徴量とユーザの関心度とを対応付けて学習したものであり、推定段階として、マルチメディアデータとしての顔画像の特徴量を入力し、ユーザの関心度を出力する
ことも好ましい。 According to another embodiment of the dialogue device of the present invention,
The interactive device is connected to the camera and
Multimedia data is a feature quantity of a user's face image captured by a camera,
The interest degree estimation engine learns by associating the feature amount of the face image with the user's degree of interest in the learning stage. It is also preferable to output the degree of interest of

本発明の対話装置における他の実施形態によれば、
関心度推定エンジンにおける顔画像の特徴量は、顔表情、視線及び／又は仕草に基づくものである
ことも好ましい。 According to another embodiment of the dialogue device of the present invention,
It is also preferable that the feature amount of the facial image in the interest level estimation engine is based on facial expression, line of sight and/or gesture.

本発明の対話装置における他の実施形態によれば、
対話装置は、マイクに接続されており、
マルチメディアデータは、マイクによって収音されたユーザの発話音声から音声認識された発話文であり、
関心度推定エンジンは、学習段階として、ユーザにおける発話文の特徴量とユーザの関心度とを対応付けて学習したものであり、推定段階として、マルチメディアデータとしての発話文の特徴量を入力し、ユーザの関心度を出力する
ことも好ましい。 According to another embodiment of the dialogue device of the present invention,
The dialogue device is connected to a microphone and
The multimedia data is an utterance sentence obtained by speech recognition from the user's utterance voice picked up by a microphone,
The interest level estimation engine learns by associating the feature quantity of the user's uttered sentence with the user's interest level in the learning stage, and inputs the feature quantity of the uttered sentence as multimedia data in the estimation stage. , it is also preferable to output the degree of interest of the user.

本発明によれば、複数の対話学習エンジンから選択した対話学習エンジンを用いて、ユーザと対話する装置に搭載されたコンピュータを機能させるプログラムにおいて、
全ての対話学習エンジンから対話シナリオを取得し、全ての対話シナリオから所定条件に基づく複数の語彙を抽出する語彙抽出手段と、
抽出された複数の語彙を、意味属性的に類似するクラスタに分類したシソーラス辞書を作成すると共に、現在の対話内容に対応する語彙にマーカーを付すシソーラス辞書作成手段と、
対話中のユーザに基づくマルチメディアデータを取得するユーザデータ取得手段と、
マルチメディアデータから、現在の対話内容に対するユーザの関心度を推定する関心度推定エンジンと、
シソーラス辞書のマーカー語彙と類似度が最も高い対話シナリオを持つ対話学習エンジンを選択する対話学習エンジン選択手段と、
関心度が高いほどマーカー語彙と同じクラスタに属する他の語彙へ、関心度が低いほどマーカー語彙と異なるクラスタに属する語彙へ、マーカーを移動させるマーカー語彙移動制御手段と
してコンピュータを機能させることを特徴とする。 According to the present invention, in a program that causes a computer installed in a device that interacts with a user to function using a dialogue learning engine selected from a plurality of dialogue learning engines,
Vocabulary extraction means for acquiring dialogue scenarios from all dialogue learning engines and extracting a plurality of vocabularies based on predetermined conditions from all dialogue scenarios;
a thesaurus dictionary creating means for creating a thesaurus dictionary in which the plurality of extracted vocabularies are classified into clusters having similar semantic attributes, and marking the vocabularies corresponding to the content of the current dialogue;
user data acquisition means for acquiring multimedia data based on an interacting user;
an interest level estimation engine for estimating a user's level of interest in current dialogue content from multimedia data;
dialogue learning engine selection means for selecting a dialogue learning engine having a dialogue scenario with the highest degree of similarity to the marker vocabulary of the thesaurus dictionary;
The computer functions as marker vocabulary movement control means for moving the marker to another vocabulary belonging to the same cluster as the marker vocabulary when the interest is high, and to the vocabulary belonging to a cluster different from the marker vocabulary when the interest is low. do.

本発明によれば、複数の対話学習エンジンから選択した対話学習エンジンを用いて、ユーザと対話する装置の対話方法において、
装置は、
全ての対話学習エンジンから対話シナリオを取得し、全ての対話シナリオから所定条件に基づく複数の語彙を抽出し、抽出された複数の語彙を、意味属性的に類似するクラスタに分類したシソーラス辞書を作成すると共に、現在の対話内容に対応する語彙にマーカーを付すシソーラス辞書と、
対話中のユーザに基づくマルチメディアデータから、現在の対話内容に対するユーザの関心度を推定する関心度推定エンジンと
を有し、
対話中のユーザに基づくマルチメディアデータを取得する第１のステップと、
関心度推定エンジンを用いて、取得したマルチメディアデータから、現在の対話内容に対するユーザの関心度を推定する第２のステップと、
関心度が高いほどマーカー語彙と同じクラスタに属する他の語彙へ、関心度が低いほどマーカー語彙と異なるクラスタに属する語彙へ、マーカーを移動させる第３のステップと、
シソーラス辞書のマーカー語彙と類似度が最も高い対話シナリオを持つ対話学習エンジンを選択する第４のステップと、
を繰り返し実行することを特徴とする。 According to the present invention, in a dialogue method for a device that dialogues with a user using a dialogue learning engine selected from a plurality of dialogue learning engines,
The device
Acquire dialogue scenarios from all dialogue learning engines, extract multiple vocabularies based on predetermined conditions from all dialogue scenarios, and create a thesaurus dictionary that classifies the multiple extracted vocabularies into clusters with similar semantic attributes. and a thesaurus dictionary that marks the vocabulary corresponding to the content of the current dialogue;
an interest level estimation engine for estimating the user's level of interest in current dialogue content from multimedia data based on the user during the dialogue;
a first step of obtaining multimedia data based on an interactive user;
a second step of estimating the user's interest in the current dialogue content from the acquired multimedia data using an interest estimation engine;
a third step of moving the marker to other vocabularies belonging to the same cluster as the marker vocabulary with higher interest and to vocabularies belonging to different clusters from the marker vocabulary having lower interest;
a fourth step of selecting the dialogue learning engine having the dialogue scenario that has the highest degree of similarity with the marker vocabulary of the thesaurus dictionary;
is repeatedly executed.

本発明の対話装置、プログラム及び方法によれば、ユーザに飽きられることなく雑談のような対話を継続するために、ユーザの関心度に応じて対話を進行させることができる。 ADVANTAGE OF THE INVENTION According to the dialogue apparatus, program, and method of the present invention, it is possible to progress the dialogue according to the user's degree of interest in order to continue the dialogue, such as chatting, without getting tired of the user.

本発明における対話装置の周辺環境を表すシステム構成図である。1 is a system configuration diagram showing a surrounding environment of a dialogue device according to the present invention; FIG. 本発明における対話装置の機能構成図である。1 is a functional configuration diagram of an interactive device according to the present invention; FIG. 本発明における語彙抽出部及びシソーラス辞書作成部の説明図である。FIG. 4 is an explanatory diagram of a vocabulary extraction unit and a thesaurus dictionary creation unit in the present invention; 本発明における関心度推定エンジンの説明図である。FIG. 4 is an explanatory diagram of an interest degree estimation engine in the present invention; 本発明におけるマーカー語彙移動制御部及び対話学習エンジン選択部の説明図である。FIG. 4 is an explanatory diagram of a marker vocabulary movement control section and a dialogue learning engine selection section in the present invention; 対話装置とユーザとの間の対話を表すシーケンス図である。FIG. 3 is a sequence diagram representing interaction between the interaction device and the user;

以下、本発明の実施の形態について、図面を用いて詳細に説明する。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明における対話装置の周辺環境を表すシステム構成図である。 FIG. 1 is a system configuration diagram showing the surrounding environment of a dialogue device according to the present invention.

図１によれば、対話装置１は、キャラクタとしてのロボットであり、ユーザと雑談のような対話を継続するために、ユーザの関心度に応じて対話を進行させることができる。
対話装置１は、対話用のユーザインタフェースの入出力デバイスとして、マイク及びスピーカを搭載する。マイクによってユーザの発話音声を収音し、スピーカによって対話音声をユーザへ発声する。
また、対話用のユーザインタフェースの入出力デバイスとして、キー及びディスプレイを搭載したものであってもよい。ユーザのキー入力によって発話文を取得し、ディスプレイによって対話文をユーザへ明示するものであってもよい。 According to FIG. 1, the dialogue device 1 is a robot as a character, and can advance the dialogue according to the user's interest level in order to continue the dialogue with the user, such as chatting.
The dialogue device 1 is equipped with a microphone and a speaker as input/output devices of a user interface for dialogue. A microphone picks up a user's uttered voice, and a speaker utters a dialogue voice to the user.
Further, it may be one equipped with keys and a display as an input/output device of a user interface for interaction. It is also possible to obtain a spoken sentence by user's key input and clearly show the spoken sentence to the user through a display.

対話装置１は、対話中のユーザの関心度を推定するために、ユーザに基づくマルチメディアデータを取得する。ここで、マルチメディアデータの取得用のユーザインタフェースとして、以下の２つの実施形態がある。
＜第１の実施形態：ユーザの顔画像を撮影するカメラの場合＞
＜第２の実施形態：ユーザの声を収音するマイク場合＞
（マイクは、対話用のユーザインタフェースと同様） The dialogue device 1 acquires user-based multimedia data in order to estimate the user's degree of interest during dialogue. Here, there are the following two embodiments as a user interface for acquiring multimedia data.
<First Embodiment: Camera Capturing User's Face Image>
<Second Embodiment: Microphone for Picking up User's Voice>
(The microphone is similar to the user interface for interaction)

図２は、本発明における対話装置の機能構成図である。 FIG. 2 is a functional configuration diagram of the interactive device in the present invention.

図２によれば、対話装置１は、複数の異なる対話学習エンジン１０１～１０ｎと、語彙抽出部１１と、シソーラス辞書作成部１２と、ユーザデータ取得部１３（顔画像認識部１３１、音声認識部１３２）と、関心度推定エンジン１４と、マーカー語彙移動制御部１５と、対話学習エンジン選択部１６と、対話実行部１７１と、音声変換部１７２とを有する。これら機能構成部は、対話装置に搭載されたコンピュータを機能させるプログラムを実行することによって実現できる。また、これら機能構成部の処理の流れは、装置の対話方法としても理解できる。 According to FIG. 2, the dialogue device 1 includes a plurality of different dialogue learning engines 101 to 10n, a vocabulary extraction unit 11, a thesaurus dictionary creation unit 12, a user data acquisition unit 13 (face image recognition unit 131, voice recognition unit 131). 132), an interest degree estimation engine 14, a marker vocabulary movement control unit 15, a dialog learning engine selection unit 16, a dialog execution unit 171, and a voice conversion unit 172. These functional components can be realized by executing a program that causes a computer installed in the interactive device to function. In addition, the processing flow of these functional components can also be understood as an interaction method of the device.

［対話学習エンジン１０１～１０ｎ］
対話装置１は、異なる複数の対話学習エンジン１０１～１０ｎを備える。対話学習エンジン１０はそれぞれ、特有の「対話シナリオ」を記憶している。対話学習エンジン１０は、ユーザからの発話文に応じて、対話シナリオを辿りながら対話を進行する。
対話シナリオとは、対話文を含む対話ノードを、ユーザの発話文に応じてツリー状に辿るように構成したものである。例えば、ユーザの入力に対する応答パターンを記述するＦＳＴ(Finite State Transducer)のスクリプトファイルで記述されたものであってもよい。 [Dialogue learning engines 101-10n]
The dialogue device 1 comprises a plurality of different dialogue learning engines 101-10n. Each interactive learning engine 10 stores a unique "interactive scenario." The dialogue learning engine 10 progresses the dialogue while tracing the dialogue scenario according to the utterance sentence from the user.
A dialogue scenario is a structure in which dialogue nodes including dialogue sentences are traced in a tree-like manner according to the utterance sentences of the user. For example, it may be written in a FST (Finite State Transducer) script file that describes a response pattern to a user's input.

対話学習エンジン１０としては、例えば、汎用対話学習エンジンに加えて、時事用対話学習エンジン、テレビ用対話学習エンジン、及び／又は、専門用対話学習エンジンを含むものであってもよい。人間同士の雑談のように、現在進行中の話題と共通性を持ちながら、異なる分野の話題へ展開するために、複数の異なる分野の対話学習エンジンを備える。これによって、話題が豊富で飽きられにくい雑談的な対話を実現することができる。 The dialogue learning engine 10 may include, for example, a current affairs dialogue learning engine, a television dialogue learning engine, and/or a specialized dialogue learning engine in addition to a general purpose dialogue learning engine. It is equipped with dialogue learning engines in multiple different fields in order to develop topics in different fields while maintaining commonality with ongoing topics, just like chatting between people. As a result, it is possible to realize a chat-like dialogue that is rich in topics and is difficult to get tired of.

（汎用対話学習エンジン）
汎用対話学習エンジンは、例えば日常的な対話を進行する対話シナリオを記憶する。例えばクラウドソーシングによって構築された一般的な人間同士の対話を想定したものである。
（時事用対話学習エンジン）
時事用対話学習エンジンは、例えばニュースの話題に基づく対話を進行する対話シナリオを記憶する。例えばＳＮＳ(Social Networking Service)上で話題になっているニュースや、そのニュースに対するコメント（ツイート等）に基づく対話を進行する。
（テレビ用対話学習エンジン）
テレビ用対話学習エンジンは、例えばテレビの番組コンテンツに基づく対話を進行する対話シナリオを記憶する。例えば現在放送中の番組コンテンツのメタデータ（例えば電子番組表や、ナレーションの字幕など）に基づく話題を進行する。
（専門用対話学習エンジン）
専門用対話学習エンジンは、例えば科学技術のような特定の専門分野に基づく対話を進行する対話シナリオを記憶する。
その他、様々な性質を持つ対話学習エンジンを備えることが好ましい。 (General-purpose interactive learning engine)
A general-purpose dialogue learning engine stores, for example, a dialogue scenario that progresses a daily dialogue. For example, it assumes general human-to-human dialogue constructed by crowdsourcing.
(Dialogue learning engine for current affairs)
The current affairs dialogue learning engine stores dialogue scenarios that progress dialogues based on, for example, news topics. For example, conversations are conducted based on news that has become a hot topic on SNS (Social Networking Service) and comments (tweets, etc.) on the news.
(Dialogue learning engine for TV)
The television dialogue learning engine stores, for example, dialogue scenarios that progress dialogues based on television program content. For example, topics based on the metadata of the program content currently being broadcast (for example, an electronic program guide, subtitles of narration, etc.) are advanced.
(Professional interactive learning engine)
A specialized dialogue learning engine stores dialogue scenarios that develop dialogues based on a particular field of expertise, such as science and technology.
In addition, it is preferable to have a dialogue learning engine with various properties.

図３は、本発明における語彙抽出部及びシソーラス辞書作成部の説明図である。 FIG. 3 is an explanatory diagram of the vocabulary extraction unit and the thesaurus dictionary creation unit in the present invention.

［語彙抽出部１１］
語彙抽出部１１は、全ての対話学習エンジン１０１～１０ｎから対話シナリオを取得し、全ての対話シナリオから所定条件に基づく複数の語彙（有効単語の群）を抽出する。対話シナリオは、ユーザの発話文に対する応答文を記述したものであって、テキスト群である。
語彙抽出部１１は、これら対話シナリオの大量のテキスト群から、形態素解析によって一般名詞を抽出する（図３参照）。抽出された多数の一般名詞は、シソーラス辞書作成部１２へ出力される。 [Vocabulary extraction unit 11]
The vocabulary extraction unit 11 acquires dialogue scenarios from all the dialogue learning engines 101 to 10n, and extracts a plurality of vocabularies (a group of effective words) based on predetermined conditions from all the dialogue scenarios. A dialogue scenario is a text group that describes a response sentence to a user's utterance sentence.
The vocabulary extraction unit 11 extracts common nouns by morphological analysis from a large amount of texts of these dialogue scenarios (see FIG. 3). A large number of extracted common nouns are output to the thesaurus dictionary creation unit 12 .

［シソーラス辞書作成部１２］
シソーラス辞書作成部１２は、抽出された複数の語彙を、意味属性的に類似するクラスタに分類したシソーラス(thesaurus)辞書を作成する。 [Thesaurus dictionary creation unit 12]
The thesaurus dictionary creating unit 12 creates a thesaurus dictionary in which the plurality of extracted vocabularies are classified into clusters having similar semantic attributes.

「語彙」は、意味属性的に近いほど距離（ユークリッド距離）が近くなるように、例えばWord2vecに基づくベクトル（分散ベクトル）で表現する。
「Word2vec」とは、単語の意味や文法を捉えるために単語をベクトル表現化して次元を圧縮する技術をいう。２つの語彙について、類似度が高いほどベクトル間距離は短くなり、類似度が低いほどベクトル間距離は長くなる。当然、同一カテゴリに属する語彙同士は、類似度が高くなる（ベクトル間距離は短くなる）。
語彙の類似度は、以下のようにコサイン類似度で表される。
Ｓ(a,b)＝cosθ＝（Ｖa・Ｖb）／(|Ｖa||Ｖb|)
Ｖa：第１の語彙の特徴ベクトル
Ｖb：第２の語彙の特徴ベクトル
Ｓ(a,b)：０～１（類似度が高いほど１に近づく） A "vocabulary" is represented by a vector (variance vector) based on Word2vec, for example, so that the distance (Euclidean distance) becomes closer as semantic attributes are closer.
"Word2vec" is a technology that converts words into vectors and compresses the dimensions in order to capture the meaning and grammar of words. For two vocabularies, the higher the similarity, the shorter the vector-to-vector distance, and the lower the similarity, the longer the vector-to-vector distance. Naturally, vocabularies belonging to the same category have a high degree of similarity (a short distance between vectors).
The lexical similarity is represented by cosine similarity as follows.
S(a,b)=cos θ=(Va·Vb)/(|Va||Vb|)
Va: feature vector of the first vocabulary Vb: feature vector of the second vocabulary S(a,b): 0 to 1 (approaching 1 as the degree of similarity increases)

「シソーラス辞書」は、第１の語彙から第２の語彙までの距離を、第１の話題から第２の話題へ遷移した際の話題展開度合いを意味する。
本発明のシソーラス辞書は、各語彙のベクトルのユークリッド距離が近い（同義語又は関連語）ほど、同一のクラスタに分類するようにする。例えばk-meansのようなクラスタリング手法を用いて、多数の語彙を複数のクラスタに分類する。図３によれば、１つのカテゴリに複数の語彙が含まれている。同じクラスタに属する語彙同士は、ベクトルのユークリッド距離が近いものとなる。
尚、シソーラス辞書は、Wordnetを用いて、各カテゴリ階層の下に複数の語彙を置くサブ・ツリー構造に分類されたものであってもよい。 The "thesaurus dictionary" means the distance from the first vocabulary to the second vocabulary, and the degree of topic development when the first topic transitions to the second topic.
The thesaurus dictionary of the present invention classifies the vectors of each vocabulary into the same cluster as the Euclidean distance (synonyms or related words) is closer. For example, a clustering method such as k-means is used to classify a large number of vocabularies into multiple clusters. According to FIG. 3, one category includes multiple vocabularies. Vocabularies belonging to the same cluster have close Euclidean vector distances.
Note that the thesaurus dictionary may be classified using Wordnet into a sub-tree structure in which a plurality of vocabularies are placed under each category hierarchy.

図３によれば、１つのカテゴリと複数の語彙とが以下のような関係にある。
カテゴリ「国内」＝＞語彙「政治」「社会」「人」
カテゴリ「国際」＝＞語彙「中韓露」「米ＥＵ」「・・・」
・・・・・・・
例えば、語彙「政治」「社会」「人」のベクトル間距離は、比較的近くなる。また、語彙「中韓露」「米ＥＵ」のベクトル間距離も、比較的近くなる。一方で、語彙「社会」と語彙「中韓露」とのベクトル間距離は、比較的遠くなる。
また、カテゴリも、ベクトル表現される。例えばカテゴリ「国内」のベクトルは、語彙「政治」「社会」「人」のベクトルの平均値としてもよい。 According to FIG. 3, one category and a plurality of vocabularies have the following relationship.
Category "Domestic"=> Vocabulary "Politics""Society""People"
Category "International"=> Vocabulary "China, Korea, Russia""U.S.EU""..."
・・・・・・・・
For example, the distances between vectors of the vocabularies "politics", "society" and "people" are relatively close. In addition, the distance between the vectors of the vocabularies “China, Korea, Russia” and “US, EU” becomes relatively close. On the other hand, the vector-to-vector distance between the vocabulary "society" and the vocabulary "China, Korea, Russia" is relatively long.
A category is also represented by a vector. For example, the vector for the category "domestic" may be the average of the vectors for the lexical terms "politics", "society" and "people".

また、シソーラス辞書作成部１２は、各対話シナリオもベクトル表現する。例えば、「対話シナリオに含まれる全ての語彙」について、全ての語彙のベクトルを平均化して、１つのベクトルで表現するようにする。即ち、１つの語彙又はカテゴリと、１つの対話シナリオとを、ベクトルの距離を類似度で判定することができる。
これによって、語彙、カテゴリ及び対話シナリオは、意味属性的に近いほど距離が近くなるようにベクトル表現される。 The thesaurus dictionary creating unit 12 also expresses each dialogue scenario as a vector. For example, for "all vocabularies included in the dialogue scenario", vectors of all vocabularies are averaged and represented by one vector. That is, one vocabulary or category and one dialogue scenario can be judged by the similarity of the vector distance.
Vocabularies, categories, and dialogue scenarios are expressed in vectors such that the closer the semantic attributes, the closer the distance.

更に、シソーラス辞書には、現在の対話内容に対応する語彙にマーカーが付されている。本発明によれば、シソーラス辞書の中で、語彙に付されるマーカーを移動させることによって、そのマーカー語彙又はカテゴリに基づく話題へ切り替わるように展開される。マーカー語彙の移動については、マーカー語彙移動制御部１５によって後述する。 In addition, the thesaurus dictionary has a marker attached to the vocabulary corresponding to the content of the current dialogue. According to the present invention, by moving a marker attached to a vocabulary in the thesaurus dictionary, it is developed so as to switch to a topic based on that marker vocabulary or category. Movement of the marker vocabulary will be described later by the marker vocabulary movement control unit 15 .

尚、任意の語彙について、最も類似度が高いカテゴリに属するものであっても、その語彙とカテゴリとのベクトル間の距離が所定閾値以上である場合、「未知語」と判定するものであってもよい。その場合、全ての未知語に対して、クラスタリングを実行し、新規に複数のカテゴリを構成することもできる。 Note that even if an arbitrary vocabulary belongs to the category with the highest degree of similarity, if the distance between the vectors of the vocabulary and the category is equal to or greater than a predetermined threshold, it is determined to be an "unknown word." good too. In that case, it is also possible to perform clustering on all unknown words and construct a plurality of new categories.

＜第１の実施形態：ユーザの顔画像を撮影するカメラの場合＞
［ユーザデータ取得部１３］
ユーザデータ取得部１３は、対話中のユーザに基づくマルチメディアデータを取得する。第１の実施形態におけるマルチメディアデータは、カメラによって撮影された顔画像となる。 <First Embodiment: Camera Capturing User's Face Image>
[User data acquisition unit 13]
The user data acquisition unit 13 acquires multimedia data based on the user in conversation. Multimedia data in the first embodiment is a face image captured by a camera.

（顔画像認識部１３１）
顔画像認識部１３１は、インカメラによって撮影されたユーザの顔画像（映像）を入力し、各画像からユーザ毎の顔領域を検出する。顔領域は、顔の特徴から作成されたテンプレートと一致する画像部分が検索される。例えば、顔のパーツの相対位置や大きさ、目や鼻やほお骨やあごの形を用いる。
そして、顔画像認識部１３１は、時系列の各画像から、顔パラメータの時系列変化を特徴量として抽出する。顔画像の時系列の特徴量は、顔表情、視線及び／又は仕草に基づくものである。尚、顔認識アルゴリズムとしては、様々な既存の方法がある（例えば非特許文献２参照）。
そして、顔画像の時系列の特徴量は、関心度推定エンジン１４へ出力される。 (Face image recognition unit 131)
The face image recognition unit 131 receives a user's face image (video) captured by an in-camera, and detects a face area of each user from each image. Facial regions are searched for portions of the image that match a template constructed from facial features. For example, the relative positions and sizes of facial parts, and the shapes of eyes, nose, cheekbones, and chin are used.
Then, the facial image recognition unit 131 extracts time-series changes in facial parameters from each time-series image as a feature amount. The time-series feature amount of the face image is based on facial expression, line of sight and/or gesture. As a face recognition algorithm, there are various existing methods (for example, see Non-Patent Document 2).
Then, the time-series feature amount of the face image is output to the degree-of-interest estimation engine 14 .

［関心度推定エンジン１４］
関心度推定エンジン１４は、マルチメディアデータ（顔画像の特徴量）から、現在の対話内容に対するユーザの関心度を推定する。 [Interest degree estimation engine 14]
The interest level estimation engine 14 estimates the user's level of interest in the content of the current dialogue from multimedia data (feature amounts of face images).

図４は、本発明における関心度推定エンジンの説明図である。 FIG. 4 is an explanatory diagram of the degree-of-interest estimation engine in the present invention.

図４によれば、関心度推定エンジン１４は、学習段階として、顔画像の特徴量とユーザの関心度とを対応付けて学習したものである。
学習段階における教師データの顔画像として、例えばＩＭＤｂ(Internet Movie Database)のデータセット（例えば45,723枚）を用いることができる。各顔画像から時系列の特徴量を抽出し、その特徴量には、ユーザの関心度が付与されている。例えば目を見開いている顔画像の特徴量には、比較的高い関心度が付与されており、伏し目がちな顔画像の特徴量には、比較的低い関心度が付与されている。これら、顔画像の特徴量とユーザの関心度とが対応付けられた教師データを、例えば畳み込みニューラルネットワークに基づいて学習させる。
その後、推定段階として、関心度推定エンジン１４は、ユーザの顔画像の特徴量を入力することによって、推定した関心度を出力する。推定した関心度は、マーカー語彙移動制御部１５へ出力される。 According to FIG. 4, the interest level estimation engine 14 learns by associating the feature amount of the face image with the user's interest level in the learning stage.
For example, a data set (for example, 45,723 images) of IMDb (Internet Movie Database) can be used as the face image of the training data in the learning stage. A time-series feature amount is extracted from each face image, and the user's degree of interest is assigned to the feature amount. For example, a relatively high degree of interest is given to the feature amount of a face image with eyes wide open, and a relatively low degree of interest is given to the feature amount of a face image with downcast eyes. The teacher data in which the feature amount of the face image and the degree of interest of the user are associated with each other are learned based on, for example, a convolutional neural network.
After that, as an estimation step, the interest level estimation engine 14 outputs an estimated interest level by inputting the feature amount of the user's face image. The estimated degree of interest is output to the marker vocabulary movement control section 15 .

図５は、本発明におけるマーカー語彙移動制御部及び対話学習エンジン選択部の説明図である。 FIG. 5 is an explanatory diagram of the marker vocabulary movement control section and the dialogue learning engine selection section in the present invention.

［対話学習エンジン選択部１６］
対話学習エンジン選択部１６は、複数の対話学習エンジンの中から、いずれか１つの対話エンジンを選択する。このとき、シソーラス辞書の「マーカー語彙」と類似度が最も高い対話シナリオを持つ対話学習エンジンを選択する。 [Dialogue learning engine selection unit 16]
The dialogue learning engine selection unit 16 selects any one dialogue engine from among a plurality of dialogue learning engines. At this time, the dialogue learning engine having the dialogue scenario with the highest degree of similarity to the "marker vocabulary" of the thesaurus dictionary is selected.

図５によれば、シソーラス辞書には、カテゴリ毎に複数の語彙が含むクラスタが表されている。また、以下のように、カテゴリ毎に、いずれかの対話学習エンジンに紐付けられている。
時事用対話学習エンジン <->カテゴリ「国内」「国際」
経済用対話学習エンジン <->カテゴリ「経済」
テレビ用対話学習エンジン<->カテゴリ「エンタメ」「スポーツ」
技術用対話学習エンジン <->カテゴリ「ＩＴ」「科学」
汎用対話学習エンジン <->カテゴリ「ライフ」
地域用対話学習エンジン <->カテゴリ「地域」
シソーラス辞書によれば、複数の「語彙」が属する各カテゴリは、ベクトル表現されている。また、各対話学習エンジン１０の対話シナリオ全体についても、ベクトル表現される。この場合、各カテゴリを、ベクトル表現としても最も類似する対話シナリオを持つ対話学習エンジン１０に紐付けることができる。
結果的に、「マーカー語彙」に対して、類似度が最も高い対話シナリオを持つ対話学習エンジン１０を選択することができる。 According to FIG. 5, the thesaurus dictionary represents a cluster including a plurality of vocabularies for each category. In addition, each category is linked to one of the dialogue learning engines as follows.
Dialogue learning engine for current affairs <-> Category "domestic""international"
Economic Dialogue Learning Engine <-> Category "Economy"
Dialogue learning engine for TV <-> Category "Entertainment""Sports"
Interactive learning engine for technology <-> Category "IT""Science"
General-purpose interactive learning engine <-> Category "Life"
Dialogue learning engine for region <-> Category "Region"
According to the thesaurus dictionary, each category to which multiple "vocabularies" belong is represented by a vector. The entire dialogue scenario of each dialogue learning engine 10 is also represented by a vector. In this case, each category can be associated with the dialogue learning engine 10 that has the most similar dialogue scenario as a vector representation.
As a result, it is possible to select the dialogue learning engine 10 having the dialogue scenario with the highest degree of similarity to the "marker vocabulary".

［マーカー語彙移動制御部１５］
マーカー語彙移動制御部１５は、関心度が高いほどマーカー語彙と同じクラスタに属する他の語彙（同義語又は関連語）へ、関心度が低いほどマーカー語彙と異なるクラスタに属する語彙（反義語又は無関連語）へ、マーカーを移動させる。 [Marker vocabulary movement control unit 15]
The marker vocabulary movement control unit 15 moves vocabulary (synonyms or related words) belonging to the same cluster as the marker vocabulary as the interest level is higher, and vocabularies (antonyms or irrelevant words) belonging to a cluster different from the marker vocabulary as the interest level is lower. word).

図６は、対話装置とユーザとの間の対話を表すシーケンス図である。 FIG. 6 is a sequence diagram representing the interaction between the interactive device and the user.

図６（ａ）のシーケンスによれば、図５のマーカー語彙移動制御部１５について、「ユーザの関心度が高い」場合について表す。
（Ｓ１１）対話装置１は、経過時点t0で、シーケンス辞書におけるマーカーを、語彙「ゲーム」に付しているとする。図５によれば、語彙「ゲーム」は、カテゴリ「エンタメ」に属し、テレビ用対話学習エンジンが選択されているとする。
（Ｓ１２）マーカー語彙「ゲーム」のカテゴリ「エンタメ」に対して、テレビ用対話学習エンジンが選択されている（図５参照）。テレビ用対話学習エンジンは、対話シナリオの中から「ゲーム」に適する対話文「ゲームは、ＡＡＡが今、流行ってるね！」を出力している。
（Ｓ１３）これに対し、対話装置１は、ユーザの顔画像（マルチメディアデータ）を撮影する。その顔画像から、ユーザの関心度０．９が推定されたとする。例えば、閾値０．７以上は、関心度が高いと判定することができる。
（Ｓ１４）このとき、対話装置１は、経過時点t1で、シーケンス辞書におけるマーカーを、関連度に応じたベクトル距離で、同じカテゴリ（クラスタ）に属する語彙「テレビ」へ移動させる（図５参照）。
（Ｓ１５）マーカー語彙「テレビ」のカテゴリ「エンタメ」に対して、テレビ用対話学習エンジンが選択されている。テレビ用対話学習エンジンは、ユーザの発話文「僕は、ＢＢＢが好きだけど」を取得し、対話シナリオの中から「テレビ」「ＢＢＢ」に適する対話文「ＢＢＢは、芸人Ｘが得意だって！」を出力している。 According to the sequence of FIG. 6(a), the marker vocabulary movement control unit 15 of FIG. 5 represents a case where "user's degree of interest is high".
(S11) It is assumed that the interactive device 1 puts a marker on the vocabulary "game" in the sequence dictionary at the elapsed time t0. According to FIG. 5, it is assumed that the vocabulary "game" belongs to the category "entertainment" and the dialog learning engine for television is selected.
(S12) A dialog learning engine for television is selected for the category "entertainment" of the marker vocabulary "game" (see FIG. 5). The dialogue learning engine for television outputs a dialogue sentence suitable for "game" from among the dialogue scenarios, "In games, AAA is popular now!".
(S13) In response, the interactive device 1 captures the user's facial image (multimedia data). Assume that the user's interest level of 0.9 is estimated from the face image. For example, a threshold value of 0.7 or more can be determined as having a high degree of interest.
(S14) At this time, the interactive device 1 moves the marker in the sequence dictionary to the vocabulary "television" belonging to the same category (cluster) at the time point t1, at the vector distance corresponding to the degree of association (see FIG. 5). .
(S15) A dialogue learning engine for television is selected for the category "entertainment" of the marker vocabulary "television". The dialogue learning engine for television acquires the user's utterance sentence "I like BBB", and selects the dialogue sentence "BBB is good at entertainer X!" is outputting

図６（ｂ）のシーケンスによれば、図５のマーカー語彙移動制御部１５について、「ユーザの関心度が低い」場合について表す。
（Ｓ２１）図６（ａ）のＳ１１と同様。
（Ｓ２２）図６（ａ）のＳ１２と同様。
（Ｓ２３）これに対し、対話装置１は、ユーザの顔画像（マルチメディアデータ）を撮影する。その顔画像から、ユーザの関心度０．２が推定されたとする。例えば、閾値０．３以下は、関心度が低いと判定することができる。
（Ｓ２４）このとき、対話装置１は、経過時点t1で、シーケンス辞書におけるマーカーを、関連度に応じたベクトル距離で、別のカテゴリ（クラスタ）に属する語彙「ヘルス」へ移動させる（図５参照）。
（Ｓ２５）マーカー語彙「ヘルス」のカテゴリ「ライフ」に対して、汎用対話学習エンジンが選択されている。汎用対話学習エンジンは、ユーザの発話文「・・・」を取得し、対話シナリオの中から「ヘルス」に適する対話文「毎日、運動はしてますか？」を出力している。 According to the sequence of FIG. 6(b), the marker vocabulary movement control unit 15 of FIG.
(S21) Same as S11 in FIG. 6(a).
(S22) Same as S12 in FIG. 6(a).
(S23) In response, the interactive device 1 captures the user's facial image (multimedia data). Assume that the user's interest level of 0.2 is estimated from the face image. For example, a threshold value of 0.3 or less can be determined as having a low degree of interest.
(S24) At this time, the interactive device 1 moves the marker in the sequence dictionary at the elapsed time t1 to the word "health" belonging to another category (cluster) at a vector distance corresponding to the degree of association (see FIG. 5). ).
(S25) A general-purpose interactive learning engine is selected for the category "life" of the marker vocabulary "health." The general-purpose dialogue learning engine acquires the user's utterance sentence "..." and outputs the dialogue sentence "Do you exercise every day?" suitable for "health" from the dialogue scenario.

このように、マーカー語彙移動制御部１５は、関心度が低いほどマーカー語彙とのユークリッド距離が遠いクラスタに属する語彙へ、マーカーを移動させる。即ち、関心度の大きさに応じて、次にマーカーを移動させる距離を制御する。現在指示の距離が遠くなるほど、異なる話題で対話文が生成されることとなる。 In this way, the marker vocabulary movement control unit 15 moves the marker to a vocabulary belonging to a cluster having a longer Euclidean distance from the marker vocabulary as the interest level is lower. That is, the distance by which the marker is moved next is controlled according to the degree of interest. As the distance of the current instruction increases, dialogue sentences are generated on different topics.

［対話実行部１７１・音声変換部１７２］
対話実行部１７１は、対話学習エンジン選択部１６によって選択された対話学習エンジンを用いて、ユーザに対する対話文を出力する。
音声変換部１７２は、対話実行部１７１から出力された対話文を、音声合成によって音声信号に変換し、その音声信号をスピーカへ出力する。
このようにして、対話装置１は、ユーザとの間で、音声によって対話を進行していく。 [Dialogue Execution Unit 171/Voice Conversion Unit 172]
Dialogue execution unit 171 uses the dialogue learning engine selected by dialogue learning engine selection unit 16 to output a dialogue sentence for the user.
The voice conversion unit 172 converts the dialog sentence output from the dialog execution unit 171 into a voice signal by voice synthesis, and outputs the voice signal to a speaker.
In this manner, the dialogue device 1 progresses the dialogue with the user by voice.

＜第２の実施形態：ユーザの声を収音するマイク場合＞
第１の実施形態によれば、ユーザに基づくマルチメディアデータが、カメラによって撮影されたユーザの顔画像であるとして説明した。
これに対して、第２の実施形態によれば、ユーザに基づくマルチメディアデータが、マイクによって収音されたユーザの声であるとする。この場合、図２によれば、ユーザデータ取得部１３は、音声認識部１３２として機能する。 <Second Embodiment: Microphone for Picking up User's Voice>
According to the first embodiment, it has been described that the user-based multimedia data is the user's face image captured by the camera.
On the other hand, according to the second embodiment, it is assumed that user-based multimedia data is the user's voice picked up by a microphone. In this case, according to FIG. 2, the user data acquisition unit 13 functions as the voice recognition unit 132.

（音声認識部１３２）
音声認識部１３２は、マイクによって収音されたユーザが発話した声から音声認識によって、発話文を出力する。
この場合、関心度推定エンジン１４は、学習段階として、発話文の特徴量とユーザの関心度とを対応付けて学習したものとなる。また、関心度推定エンジン１４は、推定段階として、マルチメディアデータとしての発話文を入力し、ユーザの関心度を出力する (Voice recognition unit 132)
The speech recognition unit 132 outputs an utterance sentence by speech recognition from the user's voice picked up by the microphone.
In this case, the interest level estimation engine 14 learns by associating the feature quantity of the uttered sentence with the user's interest level in the learning stage. In addition, the interest level estimation engine 14, as an estimation stage, inputs an utterance sentence as multimedia data and outputs the user's level of interest.

以上、詳細に説明したように、本発明の対話装置、プログラム及び方法によれば、ユーザに飽きられることなく雑談のような対話を継続するために、ユーザの関心度に応じて対話を進行させることができる。
尚、被験者実験の結果、政治やスポーツなどの幅広い話題に対して、従来技術における対話装置に対して、本発明の対話装置は、雑談のような対話を、２～３倍の時間の長さで継続することができた。 As described above in detail, according to the dialogue apparatus, program and method of the present invention, the dialogue proceeds according to the user's interest level in order to continue the dialogue like chatting without getting tired of the user. be able to.
As a result of experiments on subjects, it was found that the dialog system of the present invention can perform conversations such as casual conversations two to three times longer than the dialog systems of the prior art for a wide range of topics such as politics and sports. was able to continue.

前述した本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 For the various embodiments of the present invention described above, various changes, modifications and omissions within the spirit and scope of the present invention can be easily made by those skilled in the art. The foregoing description is exemplary only and is not intended to be limiting. The invention is to be limited only as limited by the claims and the equivalents thereof.

１対話装置
１０対話学習エンジン
１１語彙抽出部
１２シソーラス辞書作成部
１３ユーザデータ取得部
１３１顔画像認識部
１３２音声認識部
１４関心度推定エンジン
１５マーカー語彙移動制御部
１６対話学習エンジン選択部
１７１対話実行部
１７２音声変換部 1 dialogue device 10 dialogue learning engine 11 vocabulary extraction unit 12 thesaurus dictionary creation unit 13 user data acquisition unit 131 face image recognition unit 132 speech recognition unit 14 interest level estimation engine 15 marker vocabulary movement control unit 16 dialogue learning engine selection unit 171 dialogue execution Unit 172 Voice conversion unit

Claims

In a dialogue device that dialogues with a user using a dialogue learning engine selected from a plurality of dialogue learning engines,
Vocabulary extraction means for acquiring dialogue scenarios from all dialogue learning engines and extracting a plurality of vocabularies based on predetermined conditions from all dialogue scenarios;
a thesaurus dictionary creating means for creating a thesaurus dictionary in which the plurality of extracted vocabularies are classified into clusters having similar semantic attributes, and marking the vocabularies corresponding to the content of the current dialogue;
user data acquisition means for acquiring multimedia data based on an interacting user;
an interest level estimation engine for estimating a user's level of interest in current dialogue content from multimedia data;
dialogue learning engine selection means for selecting a dialogue learning engine having a dialogue scenario with the highest degree of similarity to the marker vocabulary of the thesaurus dictionary;
a marker vocabulary movement control means for moving the marker to another vocabulary belonging to the same cluster as the marker vocabulary for a higher level of interest and to a vocabulary belonging to a cluster different from the marker vocabulary for a lower interest level. Device.

Vocabularies and dialogue scenarios are expressed in vectors so that the closer they are in terms of semantic attributes, the closer the distance is.
2. The dialogue apparatus according to claim 1, wherein the thesaurus dictionary creating means classifies the words into the same cluster as the distance between the vectors of the words is closer.

3. The dialogue apparatus according to claim 1, wherein the marker vocabulary movement control means moves the marker to a vocabulary belonging to a cluster having a greater distance from the marker vocabulary as the degree of interest is lower.

4. Any one of claims 1 to 3, wherein the plurality of dialogue learning engines includes a current affairs dialogue learning engine, a television dialogue learning engine, and/or a specialized dialogue learning engine in addition to the general purpose dialogue learning engine. or 1. A dialogue device according to claim 1.

The interactive device is connected to the camera and
Multimedia data is a feature quantity of a user's face image captured by a camera,
The interest degree estimation engine learns by associating the feature amount of the face image with the user's degree of interest in the learning stage. 5. The interactive device according to claim 1, wherein the degree of interest of the user is output.

6. The dialogue apparatus according to claim 5, wherein the feature amount of the facial image in the interest level estimation engine is based on facial expression, line of sight and/or gesture.

The dialogue device is connected to a microphone and
The multimedia data is an utterance sentence obtained by speech recognition from the user's utterance voice picked up by a microphone,
The interest level estimation engine learns by associating the feature quantity of the user's uttered sentence with the user's interest level in the learning stage, and inputs the feature quantity of the uttered sentence as multimedia data in the estimation stage. 7. The interactive device according to claim 1, wherein the user's degree of interest is output.

In a program that operates a computer installed in a device that interacts with a user using a dialogue learning engine selected from a plurality of dialogue learning engines,
Vocabulary extraction means for acquiring dialogue scenarios from all dialogue learning engines and extracting a plurality of vocabularies based on predetermined conditions from all dialogue scenarios;
a thesaurus dictionary creating means for creating a thesaurus dictionary in which the plurality of extracted vocabularies are classified into clusters having similar semantic attributes, and marking the vocabularies corresponding to the content of the current dialogue;
user data acquisition means for acquiring multimedia data based on an interacting user;
an interest level estimation engine for estimating a user's level of interest in current dialogue content from multimedia data;
dialogue learning engine selection means for selecting a dialogue learning engine having a dialogue scenario with the highest degree of similarity to the marker vocabulary of the thesaurus dictionary;
The computer functions as marker vocabulary movement control means for moving the marker to another vocabulary belonging to the same cluster as the marker vocabulary when the interest is high, and to the vocabulary belonging to a cluster different from the marker vocabulary when the interest is low. program to do.

In a dialogue method for a device that dialogues with a user using a dialogue learning engine selected from a plurality of dialogue learning engines,
The device
Acquire dialogue scenarios from all dialogue learning engines, extract multiple vocabularies based on predetermined conditions from all dialogue scenarios, and create a thesaurus dictionary that classifies the multiple extracted vocabularies into clusters with similar semantic attributes. and a thesaurus dictionary that marks the vocabulary corresponding to the content of the current dialogue;
an interest level estimation engine for estimating the user's level of interest in current dialogue content from multimedia data based on the user during the dialogue;
a first step of obtaining multimedia data based on an interactive user;
a second step of estimating the user's interest in the current dialogue content from the acquired multimedia data using an interest estimation engine;
a third step of moving the marker to other vocabularies belonging to the same cluster as the marker vocabulary with higher interest and to vocabularies belonging to different clusters from the marker vocabulary having lower interest;
a fourth step of selecting the dialogue learning engine having the dialogue scenario that has the highest degree of similarity with the marker vocabulary of the thesaurus dictionary;
A device interaction method characterized by repeatedly executing