JP7176333B2

JP7176333B2 - Dialogue device, dialogue method and dialogue program

Info

Publication number: JP7176333B2
Application number: JP2018185297A
Authority: JP
Inventors: 恵多比良; 岳今井; 徹上和田; 雅晴新井; 利知金岡; 章人吉井; 聡一酒井; 祐輔酒井
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2018-09-28
Filing date: 2018-09-28
Publication date: 2022-11-22
Anticipated expiration: 2038-09-28
Also published as: JP2020056824A

Description

本発明は、対話装置、対話方法および対話プログラムに関する。 The present invention relates to an interactive device, an interactive method, and an interactive program.

近年、ＦＡＱ（Frequently Asked Questions）システムやＷｅｂページの検索などに、タスク指向型の対話システムや一問一答型のチャットボットなどの対話システムが利用されている。例えば、自然な話題展開を行うために、複数のシナリオを用意しておき、ユーザ等から発話されたユーザ発話の内容に応じたシナリオを選択し、選択したシナリオに沿って発話を実行することで、ユーザとの対話を行うシステムが知られている。 In recent years, dialogue systems such as task-oriented dialogue systems and question-and-answer chatbots have been used for FAQ (Frequently Asked Questions) systems, web page searches, and the like. For example, in order to develop a natural topic, prepare multiple scenarios, select a scenario according to the contents of the user utterance uttered by the user, etc., and execute the utterance according to the selected scenario. , systems that interact with users are known.

特開２００３－４４０８８号公報JP-A-2003-44088 特開２００３－３２３３８８号公報Japanese Patent Application Laid-Open No. 2003-323388

しかしながら、上記技術では、予め想定したシナリオに沿った発話が実行されるので、シナリオから外れたユーザ発話が行われると、唐突な話題変換などが発生して対話が終了するなど、システムの柔軟性が乏しい。 However, in the above technology, since utterances are executed according to a scenario assumed in advance, if the user utters something that deviates from the scenario, the conversation may end due to an abrupt topic change, etc., resulting in system flexibility. is scarce.

また、シナリオに対応付けて様々な発話用のコンテンツを予め用意することも考えられるが、用意するコンテンツが膨大であることから、コストが増大し現実的ではない。また、シナリオに沿った発話だけだと、ユーザとの対話が単調になり、ユーザの満足度が低下して対話が途中で終了することがある。 It is also conceivable to prepare in advance various contents for utterance in association with scenarios, but the amount of contents to be prepared is enormous, which increases costs and is not practical. In addition, if only the utterances according to the scenario are used, the dialogue with the user becomes monotonous, and the user's satisfaction may decrease and the dialogue may end prematurely.

一つの側面では、自然な対話の継続性を高めることができる対話装置、対話方法および対話プログラムを提供することを目的とする。 An object of one aspect is to provide a dialogue device, a dialogue method, and a dialogue program capable of enhancing the continuity of natural dialogue.

第１の案では、対話装置は、ユーザの発話に応じて応答する発話内容が規定されたコンテンツを記憶する記憶部を有する。対話装置は、前記ユーザの発話に基づき、前記発話内容の詳細度合である深度を判定する判定部を有する。対話装置は、判定された前記深度に応じて、前記記憶部に記憶されるコンテンツから発話対象のコンテンツを選択する選択部を有する。対話装置は、選択された前記発話対象のコンテンツに基づき、前記ユーザに対して発話を行う発話実行部を有する。 In the first proposal, the dialogue device has a storage unit that stores content in which utterance contents to be responded to in response to the user's utterance are defined. The dialogue device has a determination unit that determines depth, which is the degree of detail of the utterance content, based on the user's utterance. The dialogue device has a selection unit that selects utterance target content from the content stored in the storage unit according to the determined depth. The dialogue device has an utterance execution unit that utters an utterance to the user based on the selected utterance target content.

一つの側面では、自然な対話の継続性を高めることができる。 On one side, the continuity of natural dialogue can be enhanced.

図１は、実施例１にかかる対話システムの全体構成例を示す図である。FIG. 1 is a diagram illustrating an overall configuration example of a dialogue system according to a first embodiment; 図２は、実施例１にかかる対話装置の機能構成を示す機能ブロック図である。FIG. 2 is a functional block diagram of the functional configuration of the interactive device according to the first embodiment; 図３は、実施例１にかかるコンテンツＤＢのデータ構造の例を説明する図である。FIG. 3 is a diagram illustrating an example of the data structure of a content DB according to the first embodiment; 図４は、実施例１にかかるコンテンツの階層構造を説明する図である。FIG. 4 is a diagram for explaining the hierarchical structure of content according to the first embodiment. 図５は、深度による発話候補の増減を説明する図である。FIG. 5 is a diagram for explaining an increase or decrease in utterance candidates depending on depth. 図６は、処理の流れを示すフローチャートである。FIG. 6 is a flowchart showing the flow of processing. 図７は、一般技術を用いたときの対話例を説明する図である。FIG. 7 is a diagram for explaining an example of dialogue when using the general technique. 図８は、実施例１を用いたときの対話例を説明する図である。FIG. 8 is a diagram for explaining an example of dialogue when using the first embodiment. 図９は、低深度におけるトピック探索を説明する図である。FIG. 9 is a diagram illustrating topic search at low depth. 図１０は、キーワードによるコンテンツ選択を説明する図である。FIG. 10 is a diagram for explaining content selection using keywords. 図１１は、実施例２にかかる処理の流れを示すフローチャートである。FIG. 11 is a flowchart illustrating the flow of processing according to the second embodiment; 図１２は、ハードウェア構成例を説明する図である。FIG. 12 is a diagram illustrating a hardware configuration example.

以下に、本願の開示する対話装置、対話方法および対話プログラムの実施例を図面に基づいて詳細に説明する。なお、この実施例によりこの発明が限定されるものではない。また、各実施例は、矛盾のない範囲内で適宜組み合わせることができる。 Hereinafter, embodiments of the interactive device, the interactive method, and the interactive program disclosed in the present application will be described in detail based on the drawings. In addition, this invention is not limited by this Example. Moreover, each embodiment can be appropriately combined within a range without contradiction.

［全体構成］
図１は、実施例１にかかる対話システムの全体構成例を示す図である。図１に示すように、この対話システムは、ユーザ端末１と対話装置１０とがネットワークＮを介して、通信可能に接続される。対話の例としては、通信販売、ネット販売、ネットゲームなどのエンドユーザからの質問、システム障害などの管理者からの質問、旅行先などユーザが希望するＷｅｂサイトの検索などがある。 [overall structure]
FIG. 1 is a diagram illustrating an overall configuration example of a dialogue system according to a first embodiment; As shown in FIG. 1, in this interactive system, a user terminal 1 and an interactive device 10 are connected via a network N so as to be communicable. Examples of dialogue include questions from end users such as mail-order sales, net sales, and net games, questions from managers such as system failures, and searches for websites desired by users such as travel destinations.

なお、ネットワークＮは、有線や無線を問わず、インターネットや専用線など様々なネットワークを採用することができる。また、ユーザ端末１が１台の例を示すが、これに限定されるものではなく、複数台のユーザ端末１と対話装置１０とが同時に対話することもできる。 It should be noted that the network N can employ various networks such as the Internet and dedicated lines, regardless of whether they are wired or wireless. Also, although an example in which one user terminal 1 is used is shown, the present invention is not limited to this, and a plurality of user terminals 1 and the interactive device 10 can simultaneously interact with each other.

ユーザ端末１は、スマートフォン、携帯端末、パーソナルコンピュータなどのコンピュータ装置である。ユーザ端末１は、Ｗｅｂブラウザや専用のアプリケーションなどを利用して、対話装置１０にアクセスして、対話装置１０と対話を実行する。また、ユーザ端末１は、音声入力やキーボードなどを使った文字入力により、対話を実行する。 The user terminal 1 is a computer device such as a smart phone, a mobile terminal, or a personal computer. The user terminal 1 uses a web browser, a dedicated application, or the like to access the interactive device 10 and interact with the interactive device 10 . In addition, the user terminal 1 executes dialogue by voice input or character input using a keyboard or the like.

対話装置１０は、ユーザ端末１との間でＷｅｂ通信などを確立して、ユーザ端末１と対話を実行するコンピュータ装置である。この対話装置１０は、タスク指向型の対話システムや一問一答型のチャットボットなどの対話システムを実現し、ユーザに対してシステム側から話題提供を行う。 The dialogue device 10 is a computer device that establishes Web communication or the like with the user terminal 1 and executes dialogue with the user terminal 1 . This dialogue device 10 realizes a dialogue system such as a task-oriented dialogue system or a question-and-answer type chatbot, and provides topics to the user from the system side.

このような対話システムにおいて、対話装置１０は、発話候補であるコンテンツをトピックにおける詳細度合によって階層化して保持する。そして、対話装置１０は、対話のトピックと対話で使用されたコンテンツの深度に基づき、浅い階層では話題の変更と深化を制御し、深い階層ではユーザ発話との一致度合の制限を強める。この結果、対話装置１０は、唐突感のない話題展開を実現することができるので、自然な対話の継続性を高めることができる。 In such a dialogue system, the dialogue device 10 hierarchizes and holds contents, which are utterance candidates, according to the degree of detail in the topic. Then, based on the topic of the dialogue and the depth of the content used in the dialogue, the dialogue device 10 controls the change and deepening of the topic in shallow hierarchies, and strengthens the limit on the degree of matching with the user's utterance in deep hierarchies. As a result, the dialogue device 10 can develop topics without abruptness, thereby enhancing the continuity of natural dialogue.

［機能構成］
図２は、実施例１にかかる対話装置１０の機能構成を示す機能ブロック図である。図２に示すように、対話装置１０は、通信部１１、記憶部１２、制御部２０を有する。 [Function configuration]
FIG. 2 is a functional block diagram showing the functional configuration of the interactive device 10 according to the first embodiment. As shown in FIG. 2, the interactive device 10 has a communication section 11, a storage section 12, and a control section 20. FIG.

通信部１１は、ユーザ端末１との間の通信を制御する処理部であり、例えば通信インタフェースなどである。例えば、通信部１１は、ユーザ端末１との間でＷｅｂ通信やチャットボットによる通信を確立し、対話に関するデータの送受信を実行する。 The communication unit 11 is a processing unit that controls communication with the user terminal 1, and is, for example, a communication interface. For example, the communication unit 11 establishes Web communication or communication with the user terminal 1 using a chatbot, and transmits and receives data related to dialogue.

記憶部１２は、プログラムやデータを記憶する記憶装置の一例であり、例えばメモリやハードディスクなどである。この記憶部１２は、コンテンツＤＢ１３や対話履歴ＤＢ１４を記憶する。 The storage unit 12 is an example of a storage device that stores programs and data, such as a memory or a hard disk. This storage unit 12 stores a content DB 13 and a dialogue history DB 14 .

コンテンツＤＢ１３は、発話対象であるコンテンツを記憶するデータベースである。具体的には、コンテンツＤＢ１３は、規定される発話内容の詳細度合であって、コンテンツの詳細度合を示す深度により階層化され、発話内容などが指定されたコンテンツを記憶する。図３は、実施例１にかかるコンテンツＤＢ１３のデータ構造の例を説明する図である。図３に示すように、コンテンツＤＢ１３は、レベル１コンテンツ、レベル２コンテンツ、レベル３コンテンツの階層構造でコンテンツを記憶する。 The content DB 13 is a database that stores content to be spoken. More specifically, the content DB 13 stores contents in which speech content and the like are designated, and are hierarchically arranged according to the depth indicating the detail degree of the content, which is the prescribed degree of detail of the speech content. FIG. 3 is a diagram illustrating an example of the data structure of the content DB 13 according to the first embodiment; As shown in FIG. 3, the content DB 13 stores content in a hierarchical structure of level 1 content, level 2 content, and level 3 content.

レベル１コンテンツは、「発話内容、発話条件、トピック・深度コントロール」が設定された複数のコンテンツを有する。「発話内容」には、対話装置１０がユーザ端末１に発話する内容が設定される。「発話条件」には、コンテンツが選択される条件が設定され、例えばレベルやキーワードなどが設定される。例えば、レベル１が設定されているコンテンツは、トピックが未設定の場合に、優先的に選択されることを示す。また、キーワード「料理」が設定されているコンテンツは、最新のユーザ発話に「料理」が含まれている場合に、優先的に選択されることを示す。「トピック・深度コントロール」には、コンテンツ選択後のトピックや深度の制御方法が設定される。例えば、「Ｘ１と応答したら、Ｔ１トピックのレベル２」が設定されているコンテンツは、最新のコンテンツに対するユーザ発話に「Ｘ１」が含まれている場合に、トピックが「Ｔ１」に設定され、発話条件にトピックＴ１が設定されたレベル２のコンテンツを次の発話対象とすることを示す。 Level 1 content includes a plurality of contents for which "utterance contents, utterance conditions, topic/depth control" are set. In the “utterance content”, the content that the interactive device 10 speaks to the user terminal 1 is set. In the "utterance condition", conditions for selecting content are set, such as levels and keywords. For example, content set at level 1 is preferentially selected when no topic is set. It also indicates that content with the keyword “cooking” is preferentially selected when the latest user utterance includes “cooking”. In "Topic/Depth Control", a control method for topic and depth after content selection is set. For example, for the content set to "If you respond with X1, T1 topic level 2" is set, the topic is set to "T1" when "X1" is included in the user's utterance to the latest content, and the utterance This indicates that the next utterance target is level 2 content for which topic T1 is set as a condition.

レベル２コンテンツは、「発話内容、発話条件」が設定された複数のコンテンツを有する。「発話内容」には、対話装置１０がユーザ端末１に発話する内容が設定される。「発話条件」には、コンテンツが選択される条件が設定され、例えばレベルかつトピックの組み合わせ、キーワードなどが設定される。例えば、レベル２以上かつトピックＴが設定されるコンテンツは、レベル２以上のコンテンツが発話済み、かつ、トピックにＴが設定されているときに、優先的に選択されることを示す。また、キーワード「カレー」が設定されるコンテンツは、最新のユーザ発話に「カレー」が含まれているときに優先的に選択されることを示す。 The level 2 content has a plurality of contents for which "utterance contents, utterance conditions" are set. In the “utterance content”, the content that the interactive device 10 speaks to the user terminal 1 is set. In the "utterance condition", a condition for selecting content is set, for example, a combination of level and topic, a keyword, and the like are set. For example, content of level 2 or higher and for which topic T is set is preferentially selected when content of level 2 or higher has already been spoken and T is set to the topic. It also indicates that the content with the keyword "curry" is preferentially selected when "curry" is included in the latest user utterance.

レベル３コンテンツは、「発話内容、発話条件」が設定された複数のコンテンツを有する。「発話内容」には、対話装置１０がユーザ端末１に発話する内容が設定される。「発話条件」には、コンテンツが選択される条件が設定され、例えばレベルかつトピックかつユーザ発話内容の組み合わせ、キーワードなどが設定される。例えば、レベル２以上かつトピックＴかつユーザ発話Ｙが設定されるコンテンツは、レベル２以上のコンテンツが発話済み、かつ、トピックにＴが設定されている、かつ、直前のユーザ発話に「Ｙ」が含まれている場合に、優先的に選択されることを示す。また、キーワード「カレーの語源」が設定されるコンテンツは、最新のユーザ発話に「カレーの語源」が含まれているときに優先的に選択されることを示す。 The level 3 content has a plurality of contents for which "utterance contents, utterance conditions" are set. In the “utterance content”, the content that the interactive device 10 speaks to the user terminal 1 is set. In the "utterance condition", a condition for selecting content is set. For example, a combination of a level, a topic, and user's utterance contents, a keyword, and the like are set. For example, content of level 2 or higher, topic T, and user utterance Y are set, the content of level 2 or higher has been uttered, T is set to the topic, and "Y" is set to the immediately preceding user utterance. If it is included, it indicates that it is preferentially selected. It also indicates that the content for which the keyword ``the etymology of curry'' is set is preferentially selected when the latest user utterance includes ``the etymology of curry''.

上述したように、コンテンツＤＢ１３が記憶するコンテンツは、レベルごとに区分され、階層構造を形成する。図４は、実施例１にかかるコンテンツの階層構造を説明する図である。図４に示すように、コンテンツの階層構造は、コンテンツの詳細度合を示す深度が最も浅いレベル１から深度が最も高いレベル３までとなる。レベル１には、いつでも発話可能なコンテンツが含まれ、レベル２には、トピックにより発話が限定されるコンテンツが含まれ、レベル３には、ユーザ発話に反応するコンテンツが含まれる。 As described above, the content stored in the content DB 13 is divided into levels to form a hierarchical structure. FIG. 4 is a diagram for explaining the hierarchical structure of content according to the first embodiment. As shown in FIG. 4, the hierarchical structure of the content ranges from level 1, which is the shallowest, to level 3, which indicates the degree of detail of the content. Level 1 includes content that can be spoken at any time, Level 2 includes content whose utterance is limited by topic, and Level 3 includes content that responds to user utterances.

コンテンツの内容が深度によって詳細になる具体例を説明する。例えば、対話が開始されてトピックが未設定の状態で、レベル１のコンテンツである発話内容「私は犬が好きなのですが、犬派ですか？猫派ですか？」が発話され、その応答としてユーザが「犬」と発話した場合、トピックに「犬」が設定されたレベル２のコンテンツが発話される。例えば、レベル２のコンテンツとして、発話内容「日本ではマンションで飼いやすいトイプードルが人気ですね」が発話される。 A specific example will be described in which the details of the content become more detailed depending on the depth. For example, when a dialogue is started and the topic is not set, the level 1 content of the utterance "I like dogs, but are you a dog person or a cat person?" When the user utters "dog" as , level 2 content with "dog" set as the topic is uttered. For example, as level 2 content, the utterance content "In Japan, toy poodles that are easy to keep in apartments are popular, aren't they?"

その後、トピックが「犬」の状態で、上記発話に対して、ユーザが「トイプードルが好きです」などのように「トイプードル」を含む発話を行ったときに、そのユーザ発話の応答として、レベル３のコンテンツが発話される。例えば、レベル３のコンテンツとして、発話内容「海外では２色以上の毛色をもつプードルがパーティーカラーと呼ばれて人気があるそうです。」が発話される。 After that, when the topic is "dog" and the user makes an utterance including "toy poodle" such as "I like toy poodles" in response to the above utterance, as a response to the user utterance, Level 3 content is spoken. For example, as level 3 content, the utterance content "It is said that poodles with two or more coat colors are popular overseas and are called party colors."

図２に戻り、対話履歴ＤＢ１４は、ユーザ端末１との間の対話履歴を記憶するデータベースである。例えば、対話履歴ＤＢ１４は、ユーザ発話の内容、対話装置１０が発話した内容、設定されたトピックの内容を時系列に記憶する。また、対話履歴ＤＢ１４に記憶される情報は、ユーザ端末１との対話が継続している間は保存され、対話が終了すると削除される。 Returning to FIG. 2 , the dialogue history DB 14 is a database that stores the history of dialogue with the user terminal 1 . For example, the dialogue history DB 14 stores the contents of user utterances, the contents of utterances by the dialogue device 10, and the contents of set topics in chronological order. Information stored in the dialogue history DB 14 is saved while the dialogue with the user terminal 1 continues, and is deleted when the dialogue ends.

制御部２０は、対話装置１０全体を司る処理部であり、例えばプロセッサなどである。この制御部２０は、コンテンツ管理部２１、対話入力部２２、対話制御部２３、対話出力部２４を有する。なお、コンテンツ管理部２１、対話入力部２２、対話制御部２３、対話出力部２４は、プロセッサなどの電子回路の一例やプロセッサが実行するプロセスの一例である。 The control unit 20 is a processing unit that controls the entire interactive device 10, such as a processor. The control section 20 has a content management section 21 , a dialog input section 22 , a dialog control section 23 and a dialog output section 24 . Note that the content management unit 21, the dialogue input unit 22, the dialogue control unit 23, and the dialogue output unit 24 are an example of an electronic circuit such as a processor or an example of a process executed by the processor.

コンテンツ管理部２１は、コンテンツＤＢ１３に格納するコンテンツの管理を実行する処理部である。具体的には、コンテンツ管理部２１は、階層構造のコンテンツを自動で生成してコンテンツＤＢ１３に格納したり、管理者が生成した階層構造のコンテンツを取得してコンテンツＤＢ１３に格納したりする。 The content management unit 21 is a processing unit that manages content stored in the content DB 13 . Specifically, the content management unit 21 automatically generates hierarchically structured content and stores it in the content DB 13 , or acquires hierarchically structured content generated by an administrator and stores it in the content DB 13 .

例えば、コンテンツ管理部２１は、Ｗｅｂページからコンテンツを取得し、トップページからの遷移を階層に設定し、階層構造のコンテンツを生成することができる。旅行サイトを例にすると、トップページのコンテンツ「温泉特集」のページがレベル１のコンテンツであり、トップページから次に遷移するコンテンツ「単純温泉」、「酸性泉」、「展望露天風呂」のそれぞれのページがレベル２のコンテンツとなる。また、コンテンツ「単純温泉」、「酸性泉」、「展望露天風呂」から次に遷移するコンテンツ「〇〇宿」などの各宿のページがレベル３のコンテンツとなる。 For example, the content management unit 21 can acquire content from a web page, set transitions from the top page to a hierarchy, and generate content with a hierarchical structure. Taking a travel site as an example, the content on the top page, "hot spring special", is level 1 content, and the content that transitions from the top page to the next is "simple hot spring," "acidic spring," and "open-air bath with a view." is level 2 content. In addition, the pages of each inn, such as the content "○○ Inn", which transitions from the content "simple hot spring", "acidic spring", and "open-air bath with a view", are level 3 content.

ここで、コンテンツ管理部２１がトピックと深度とを設定する方式を説明する。例えば、コンテンツ管理部２１は、深度Ｘのコンテンツの発話条件に、深度Ｘより低い深度条件に加え、ユーザ発話などの別の条件を記述するコンテンツを作ることにより、直前のコンテンツによらず、変更する深度のコンテンツに制御情報を記述できる。また、コンテンツ管理部２１は、ユーザの興味が一致する場合には話題を深めやすくするため、各コンテンツにキーワードを付与し、ユーザ発話にキーワードが含まれるコンテンツがある場合には優先的に選ばれるように記述することもできる。 Here, a method for setting the topic and depth by the content management unit 21 will be described. For example, the content management unit 21 creates content that describes another condition such as a user utterance in addition to a depth condition lower than the depth X in the utterance condition of the content of depth X. Control information can be described in the content of the depth. In addition, when the interests of the user match, the content management unit 21 assigns a keyword to each content in order to make it easier to deepen the topic. can also be written as

対話入力部２２は、ユーザ端末１から発話されたユーザ発話を取得して、対話制御部２３に入力する処理部である。例えば、対話入力部２２は、ユーザ発話「何か話してよ」などを取得して、対話制御部２３に入力し、対話履歴ＤＢ１４に格納する。また、対話入力部２２は、ユーザ発話の内容に対して形態素解析などを実行し、抽出される単語などの情報を解析結果として、対話制御部２３に入力することもできる。 The dialogue input unit 22 is a processing unit that acquires user speech uttered from the user terminal 1 and inputs it to the dialogue control unit 23 . For example, the dialog input unit 22 acquires a user utterance such as “Tell me something”, inputs it to the dialog control unit 23, and stores it in the dialog history DB 14. FIG. The dialog input unit 22 can also perform morphological analysis and the like on the contents of user utterances, and input information such as extracted words to the dialog control unit 23 as analysis results.

対話制御部２３は、トピック管理部２３ａ、候補抽出部２３ｂ、選択部２３ｃを有し、ユーザ発話に応じて、該当するコンテンツを特定し、ユーザに対して発話を実行する制御部である。 The dialog control unit 23 has a topic management unit 23a, a candidate extraction unit 23b, and a selection unit 23c, and is a control unit that identifies relevant content according to user utterances and executes the utterances to the user.

トピック管理部２３ａは、ユーザとの対話状況に基づき、対話のトピックを管理する処理部である。具体的には、トピック管理部２３ａは、トピックについて状態を管理し、深度についてはレベルを管理する。また、トピック管理部２３ａは、対話出力部２４から発話されたコンテンツの情報を取得して保持する。そして、トピック管理部２３ａは、直前のコンテンツと、深度コントロールの情報と、ユーザ発話の内容とに基づいて、トピックおよびレベルを判断する。 The topic management unit 23a is a processing unit that manages topics of dialogue based on the state of dialogue with the user. Specifically, the topic management unit 23a manages the state of topics and the level of depth. In addition, the topic management unit 23a acquires and holds information on the content spoken from the dialogue output unit 24. FIG. Then, the topic management unit 23a determines the topic and level based on the immediately preceding content, the depth control information, and the content of the user's utterance.

例えば、トピック管理部２３ａは、レベル１のコンテンツＡが発話された場合、当該コンテンツに設定されるトピックおよび深度コントロール「Ｘ１と応答したらＴ１トピックのレベル２」を保持する。その後、トピック管理部２３ａは、ユーザ発話に「Ｘ１」が含まれる場合、トピックを「Ｔ１」と設定するとともに、ユーザとの対話の詳細度合である対話レベルをレベル２に設定する。一方、トピック管理部２３ａは、ユーザ発話に「Ｘ１」が含まれない場合、トピックを未設定のままとするとともに、対話レベルをレベル１のままにする。 For example, when level 1 content A is uttered, the topic management unit 23a holds the topic and depth control set for the content "If X1 is answered, T1 topic level 2". After that, when the user's utterance includes "X1", the topic management unit 23a sets the topic to "T1" and sets the dialogue level, which is the degree of detail of the dialogue with the user, to level 2. On the other hand, if "X1" is not included in the user's utterance, the topic management unit 23a leaves the topic unset and leaves the conversation level at level 1.

また、トピック管理部２３ａは、レベル１のコンテンツＡに設定される深度コントロールと同様のユーザ応答が検出された場合に、対話レベルをレベル２に維持する。なお、レベル１のコンテンツはレベル２以上の場合にも発話可能であり、それに対して深度を決める条件にマッチすれば新しいトピックのレベル２が設定される。また、ユーザとの発話時間が所定時間経過した場合やユーザ反応薄となった場合、トピックが未設定となる。 Also, the topic management unit 23a maintains the dialogue level at level 2 when a user response similar to the depth control set for content A at level 1 is detected. Content of level 1 can be uttered even when the content is of level 2 or higher, and if the condition for determining the depth is met, a new topic of level 2 is set. Further, when the speech time with the user has passed for a predetermined period of time or when the user's response is low, the topic is not set.

候補抽出部２３ｂは、トピックや発話条件に基づいて、発話候補のコンテンツを抽出する処理部である。具体的には、候補抽出部２３ｂは、発話済みのコンテンツのレベルが浅い（低レベル）場合は、選択肢を多くしてトピックを探すように、発話候補のコンテンツを抽出する。また、候補抽出部２３ｂは、発話済みのコンテンツのレベルが深い（高レベル）場合は、ユーザ発話の内容に追従するように、発話候補のコンテンツを抽出する。そして、候補抽出部２３ｂは、抽出した結果を選択部２３ｃに出力する。 The candidate extraction unit 23b is a processing unit that extracts contents of utterance candidates based on topics and utterance conditions. Specifically, when the level of the content that has already been spoken is low (low level), the candidate extraction unit 23b extracts the content of the speech candidate so as to search for a topic with more options. In addition, when the level of the uttered content is deep (high level), the candidate extraction unit 23b extracts the utterance candidate content so as to follow the content of the user's utterance. Then, the candidate extraction unit 23b outputs the extracted result to the selection unit 23c.

図５は、深度による発話候補の増減を説明する図である。図５に示すように、候補抽出部２３ｂは、トピックがなしの状態では、レベル１のコンテンツを発話候補として抽出する。また、候補抽出部２３ｂは、トピックが「Ｘ」の場合、レベル１のコンテンツとトピック「Ｘ」が設定されたレベル２のコンテンツとを発話候補として抽出する。この場合、トピックＸを設定する際に、Ｘに関連する対話が行われており、レベル２のコンテンツはＸ関連のキーワードが含まれていることからレベル１の候補より選択されやすくなる。また、候補抽出部２３ｂは、トピックが「Ｘ」かつユーザ発話が「ａａ」である場合、レベル１のコンテンツとトピック「Ｘ」が設定されたレベル２のコンテンツに加えて、トピックが「Ｘ」かつユーザ発話が「ａａ」が設定されたレベル３のコンテンツを発話候補として抽出する。このように、深度が深くなるにしたがって、より詳細なコンテンツが選択されやすくなる。 FIG. 5 is a diagram for explaining an increase or decrease in utterance candidates depending on depth. As shown in FIG. 5, the candidate extracting unit 23b extracts level 1 content as an utterance candidate when there is no topic. Further, when the topic is "X", the candidate extraction unit 23b extracts the content of level 1 and the content of level 2 to which the topic "X" is set as utterance candidates. In this case, when topic X is set, a dialogue related to X is taking place, and content of level 2 contains keywords related to X, so it is more likely to be selected than level 1 candidates. Further, when the topic is "X" and the user utterance is "aa", the candidate extraction unit 23b adds the level 1 content and the level 2 content set with the topic "X" to the topic "X". In addition, the content of level 3 in which the user's utterance is set to "aa" is extracted as an utterance candidate. Thus, as the depth increases, more detailed content is likely to be selected.

より詳細には、候補抽出部２３ｂは、トピックが未設定の場合や対話レベルが１の場合、レベル１のコンテンツを発話候補として抽出する。また、候補抽出部２３ｂは、レベル１のコンテンツＡが発話された後、当該コンテンツに設定されるトピックおよび深度コントロール「Ｘ１と応答したらＴ１トピックのレベル２」に該当するユーザ発話が実行された場合、トピック管理部２３ａによりトピックに「Ｔ１」が設定されるので、深度コントロールにしたがって、レベル１のコンテンツに加えてトピックが「Ｔ１」であるレベル２のコンテンツを発話候補として抽出する。 More specifically, when the topic is not set or when the dialogue level is 1, the candidate extraction unit 23b extracts level 1 content as an utterance candidate. Further, the candidate extracting unit 23b, after the content A of level 1 is uttered, when the user utterance corresponding to the topic and the depth control set for the content "If you respond with X1, the level 2 of the T1 topic" is executed , "T1" is set as the topic by the topic management unit 23a, so in addition to the content of level 1, the content of level 2 whose topic is "T1" is extracted as an utterance candidate according to the depth control.

また、候補抽出部２３ｂは、トピックが「Ｔ」で対話レベルが２以上の状態で、最新のユーザ発話に「Ｙ」が含まれている場合、レベル２のコンテンツのうち、「トピックがＴかつ対話レベルが２以上かつユーザ発話にＹが含まれる」ことを発話条件とするコンテンツを発話候補として抽出する。また、候補抽出部２３ｂは、発話条件と一致するキーワードがユーザによって発話された場合は、キーワードと一致するコンテンツに出力候補が限定される。 Further, when the topic is “T” and the dialogue level is 2 or higher, and the latest user utterance includes “Y”, the candidate extraction unit 23b selects “topic is T and Contents having an utterance condition that the dialogue level is 2 or more and Y is included in the user's utterance are extracted as utterance candidates. Further, when the user utters a keyword that matches the utterance condition, the candidate extraction unit 23b limits output candidates to content that matches the keyword.

また、候補抽出部２３ｂは、ユーザにより「興味がない」などの否定的な発話が実行されると、トピックを維持しつつ、レベルを下げて、発話候補のコンテンツを選択する。また、候補抽出部２３ｂは、トピック管理部２３ａによってトピックが未設定に変更された場合には、レベル１のコンテンツを発話候補として抽出する。すなわち、トピックが未設定に変更されることに伴い、発話レベルを下げ、トピックの再検討が実行される。 Further, when the user utters a negative utterance such as "not interested", the candidate extraction unit 23b lowers the level while maintaining the topic, and selects utterance candidate content. Further, when the topic management unit 23a changes the topic to unset, the candidate extraction unit 23b extracts content of level 1 as an utterance candidate. That is, as the topic is changed to unset, the utterance level is lowered and the topic is reexamined.

選択部２３ｃは、候補抽出部２３ｂによって抽出された発話候補のコンテンツから、発話対象のコンテンツを選択する処理部である。具体的には、選択部２３ｃは、発話候補のコンテンツから、キーワード一致やランダムなどの手法を用いて発話候補を選択し、対話出力部２４に出力する。 The selection unit 23c is a processing unit that selects utterance target content from the utterance candidate content extracted by the candidate extraction unit 23b. Specifically, the selection unit 23c selects utterance candidates from the contents of the utterance candidates using a method such as keyword matching or random, and outputs the selected utterance candidates to the dialogue output unit 24 .

例えば、選択部２３ｃは、トピックが未設定の場合、発話候補であるレベル１のコンテンツからランダムに選択することができる。例えば、選択部２３ｃは、レベル１のコンテンツの連続発話回数が閾値未満である場合、大きな話題変換とならないように、出力済みのコンテンツとカテゴリが共通するコンテンツを選択する。より詳細には、選択部２３ｃは、「野球」のコンテンツが出力済みのときは、同じスポーツである「サッカー」のコンテンツを選択する。 For example, when the topic is not set, the selection unit 23c can randomly select content of level 1, which is an utterance candidate. For example, when the number of consecutive utterances of level 1 content is less than a threshold, the selection unit 23c selects content that has a common category with output content so as not to cause a large topic change. More specifically, when the content of "baseball" has already been output, the selection unit 23c selects the content of "soccer", which is the same sport.

また、選択部２３ｃは、対話レベルが２の場合は、発話候補のコンテンツのうちレベル２のコンテンツを優先的に選択することができる。例えば、選択部２３ｃは、トピックが「料理」の場合、「料理」をより詳細にした下位概念である「カレー」や「寿司」などの料理名が発話内容に設定されるコンテンツを選択する。 Further, when the dialogue level is 2, the selection unit 23c can preferentially select the content of level 2 among the content of the utterance candidates. For example, when the topic is "cooking", the selecting unit 23c selects content whose utterance content is a cooking name such as "curry" or "sushi", which is a subordinate concept of "cooking" in more detail.

また、選択部２３ｃは、レベル３のコンテンツが発話候補の場合、ユーザ発話に追従するコンテンツを優先的に選択することができる。例えば、選択部２３ｃは、トピックが「料理」でユーザ発話の履歴に「カレー」などが含まれる場合、「カレー」の話から脱線しないように、「カレーの作り方」や「カレーが食べられる国」など「カレー」に関する詳細な内容が設定されるコンテンツを選択する。 Moreover, when the content of level 3 is an utterance candidate, the selection unit 23c can preferentially select the content that follows the user's utterance. For example, when the topic is "cooking" and the history of user utterances includes "curry", the selection unit 23c selects "how to make curry" and "countries where curry can be eaten" so as not to deviate from the topic of "curry". Select content that sets detailed contents about "curry" such as ".

図２に戻り、対話出力部２４は、選択部２３ｃにより選択されたコンテンツを出力する処理部である。また、対話出力部２４は、ユーザ端末に出力（発話）したコンテンツに関する情報をトピック管理部２３ａに出力したり、対話履歴ＤＢ１４に格納したりする。 Returning to FIG. 2, the dialogue output unit 24 is a processing unit that outputs the content selected by the selection unit 23c. Further, the dialogue output unit 24 outputs information about content output (spoken) to the user terminal to the topic management unit 23a and stores the information in the dialogue history DB 14. FIG.

［処理の流れ］
図６は、処理の流れを示すフローチャートである。図６に示すように、対話入力部２２がユーザ反応を受け付けると（Ｓ１０１：Ｙｅｓ）、対話制御部２３は、ユーザ反応にしたがって、トピックを更新し、トピックの設定や削除などを行う（Ｓ１０２）。 [Process flow]
FIG. 6 is a flowchart showing the flow of processing. As shown in FIG. 6, when the dialog input unit 22 receives a user reaction (S101: Yes), the dialog control unit 23 updates the topic according to the user reaction, and sets or deletes the topic (S102). .

続いて、対話制御部２３は、深度を判定し（Ｓ１０３）、深度にしたがって発話候補のコンテンツを抽出し（Ｓ１０４）、発話候補のうちランダム等の基準により、発話対象のコンテンツを選択する（Ｓ１０５）。その後、対話出力部２４は、選択された発話対象のコンテンツに設定される発話内容をユーザ端末１に出力する（Ｓ１０６）。 Subsequently, the dialogue control unit 23 determines the depth (S103), extracts utterance candidate content according to the depth (S104), and selects utterance target content from among the utterance candidates based on criteria such as randomness (S105). ). After that, the dialogue output unit 24 outputs the speech content set in the selected speech target content to the user terminal 1 (S106).

［比較例］
次に、一般技術と実施例１との比較例を説明する。図７は、一般技術を用いたときの対話例を説明する図であり、図８は、実施例１を用いたときの対話例を説明する図である。 [Comparative example]
Next, a comparative example between the general technique and Example 1 will be described. FIG. 7 is a diagram for explaining an example of dialogue when using the general technique, and FIG. 8 is a diagram for explaining an example of dialogue when using the first embodiment.

図７の（１）に示すように、一般技術を用いた一般システムは、ユーザが「何か話してよ」などのように話題を要求すると、シナリオや最近のトレンドなどにしたがって、「ナイトプール」に関する発話を行う。続いて、図７の（２）に示すように、一般システムは、興味が小さいときやシナリオから外れたときに、ユーザが発話する「そうなんだ」などの相槌などを示す特定の発話が繰り返された場合でも、予め指定した「そうなんだ」などを発話して単なる相槌を行う。 As shown in (1) of FIG. 7, when a user requests a topic such as "Tell me something," the general system using the general technology responds to a scenario, recent trends, etc., and responds with a "night pool." ”. Subsequently, as shown in (2) of FIG. 7, the general system repeats a specific utterance indicating a backtracking such as uttered by the user when the user's interest is low or when the scenario deviates. Even in the case where the user has been asked to do so, he or she simply backtracks by uttering a pre-designated phrase such as "that's right".

その後、図７の（３）に示すように、一般システムは、不機嫌なときにユーザが発話する「何じゃそりゃ」などの内容が発話されたため、今までの対話履歴に関係なく、突然の話題変換を実行する。この結果、図７の（４）に示すように、ユーザの満足度が低下し、対話を強制的に終了したりする。 After that, as shown in (3) of FIG. 7, since the user uttered a content such as "What is that?" that the user utters when he/she is in a bad mood, the general system suddenly perform the conversion. As a result, as shown in (4) of FIG. 7, the user's satisfaction is lowered, and the dialogue is forcibly terminated.

一方、図８の（１）に示すように、実施例１にかかる対話装置１０は、ユーザが「何か話してよ」などのように話題を要求すると、「テレビゲーム」や「料理」などのトピックを含むレベル１のコンテンツを発話する。続いて、図８の（２）に示すように、対話装置１０は、「料理」を含むユーザ発話が入力されると、トピックに「料理」を設定し、トピック「料理」と対応付けられる「好きな料理名」を問い合わせるためのレベル２のコンテンツを発話する。この発話に対する応答で、対話装置１０は、好きな料理名が「カレー」であることを特定する。 On the other hand, as shown in (1) of FIG. 8, when the user requests a topic such as "Tell me something", the interactive device 10 according to the first embodiment displays a topic such as "video game" or "cooking". Speak level 1 content that includes the topic of Subsequently, as shown in (2) of FIG. 8 , when a user utterance including “cooking” is input, the interactive device 10 sets “cooking” to the topic, and “cooking” is associated with the topic “cooking”. Speak level 2 content for inquiring "favorite dish name". In response to this utterance, the interactive device 10 specifies that the favorite dish name is "curry".

その後、図８の（３）に示すように、対話装置１０は、「料理」をより具体的にした「カレー」が発話されたことから、カレーの内容をより深く説明するコンテンツとして、「カレーの語源」を説明するレベル３のコンテンツを発話する。この結果、図８の（４）に示すように、不用意な話題変換やシナリオに沿った無機質な対話を抑制することができるので、ユーザの満足度も向上する。 After that, as shown in (3) of FIG. 8, since "curry" which is a more specific representation of "cooking" was uttered, the interactive device 10 selects "curry Speak level 3 content that explains the etymology of As a result, as shown in (4) of FIG. 8, it is possible to suppress inadvertent topic changes and inorganic dialogues according to the scenario, thereby improving the user's satisfaction.

［効果］
上述したように、対話装置１０は、コンテンツの出力とユーザ応答の履歴から現在のトピックと深度を決定し、現在のトピックと深度から出力候補を抽出し、出力候補からコンテンツを選択して出力する。このように、対話装置１０は、発話候補をトピックにおける詳細度合によって階層化して保持し、各コンテンツは発話条件と深化を決める情報を持つ。したがって、対話装置１０は、深い階層の発話候補はトピックやユーザ発話との一致条件を増やし、トピックに基づき話題の変更と深化を制御することができる。 [effect]
As described above, the interactive device 10 determines the current topic and depth from the output of content and the history of user responses, extracts output candidates from the current topic and depth, and selects and outputs content from the output candidates. . In this way, the dialogue device 10 holds utterance candidates in a hierarchical manner according to the degree of detail in the topic, and each content has information for determining utterance conditions and depth. Therefore, the dialogue apparatus 10 can increase matching conditions with topics and user utterances for utterance candidates in a deep hierarchy, and control change and deepening of topics based on the topics.

この結果、対話装置１０は、唐突さの少ない話題転換と話題の深掘りによるユーザが親しみやすい対話を実現することができ、自然な対話の継続性を高めることができる。また、対話装置１０は、階層が浅い段階では、トピックの探索を実行し、階層が深い段階では、対話の深堀を実行するので、ユーザの満足度を向上させることができる。また、対話装置１０は、フローを必要とするシナリオ形式と比較して、段々詳細な内容を話すようなコンテンツの作成にかかるコストを低減することができる。 As a result, the dialogue device 10 can realize dialogues that are friendly to the user by less abrupt topic changes and deepening of topics, and can enhance the continuity of natural dialogues. In addition, the dialogue apparatus 10 searches for topics at a shallow level, and deepens the dialogue at a deep level, thereby improving the user's satisfaction. In addition, the interactive device 10 can reduce the cost of creating content in which the contents are gradually detailed, compared to a scenario format that requires a flow.

ところで、実施例１では、コンテンツ側に発話条件を設定することで、深度を考慮した発話制限および発話制御を実現する例を説明したが、これに限定されるものではない。例えば、コンテンツ側に設定する情報を削減してコンテンツ作成者の負担を低減し、トピックの探索状況やキーワードの発話状況を用いて、対話装置１０が主導でコンテンツ選択や発話制御を実行することができる。 By the way, in the first embodiment, an example has been described in which utterance restriction and utterance control are realized by setting the utterance condition on the content side, but the present invention is not limited to this. For example, the information set on the content side can be reduced to reduce the burden on the content creator, and the dialog device 10 can take the lead in executing content selection and utterance control using the topic search status and keyword utterance status. can.

図９は、低深度におけるトピック探索を説明する図であり、図１０は、キーワードによるコンテンツ選択を説明する図である。図９に示すように、対話装置１０の対話制御部２３は、ユーザ発話を受信すると、レベル１のコンテンツからユーザ発話と関連性の高いコンテンツまたはランダムに選択したコンテンツを発話する（図９の（１）参照）。なお、関連性が高いコンテンツとは、例えば発話済みコンテンツと同じジャンルやトピックに属するコンテンツや、発話済みコンテンツに含まれる単語や当該単語の類義語などを含むコンテンツなどである。また、関連性が低いコンテンツとは、例えば発話済みコンテンツと異なるジャンルやトピックに属するコンテンツや、発話済みコンテンツに含まれる単語や当該単語の類義語などを含まないコンテンツなどである。 FIG. 9 is a diagram for explaining topic search in low depth, and FIG. 10 is a diagram for explaining content selection by keyword. As shown in FIG. 9, upon receiving the user's utterance, the dialogue control unit 23 of the dialogue device 10 utters content highly relevant to the user's utterance or randomly selected content from the level 1 contents (( 1) see). Note that highly relevant content includes, for example, content belonging to the same genre or topic as the spoken content, content including words included in the spoken content, synonyms of the words, and the like. Also, the content with low relevance is, for example, content belonging to a genre or topic different from that of the spoken content, content that does not include words included in the spoken content, synonyms of the words, and the like.

続いて、対話制御部２３は、発話したコンテンツに対する応答であるユーザ発話に、予め定めたトピックに対応する単語が含まれていないことから、レベル１のコンテンツを再度選択して発話する（図９の（２）参照）。このとき、対話制御部２３は、ランダムにコンテンツを選択したり、発話済みのコンテンツとの類似度が小さいコンテンツを選択したりすることもできる。なお、類似度は、単語の一致度などを用いて、閾値以上か否かにより判定することができる。 Subsequently, since the user's utterance, which is a response to the uttered content, does not include a word corresponding to the predetermined topic, the dialogue control unit 23 selects and utters the content of level 1 again (FIG. 9). (2)). At this time, the interaction control unit 23 can also select content at random, or select content that has a low degree of similarity to content that has already been spoken. Note that the degree of similarity can be determined based on whether or not it is equal to or greater than a threshold using the degree of matching between words.

同様に、対話制御部２３は、発話したコンテンツに対する応答であるユーザ発話に、予め定めたトピックに対応する単語が含まれていないことから、レベル１のコンテンツを再度選択して発話する（図９の（３）参照）。このようにして、対話制御部２３は、ユーザが興味を示すトピックの探索を実行する。 Similarly, since the user's utterance, which is a response to the uttered content, does not contain a word corresponding to the predetermined topic, the dialogue control unit 23 selects and utters the content of level 1 again (FIG. 9). (3)). In this way, the dialogue control unit 23 searches for topics that the user is interested in.

その後、対話制御部２３は、発話したコンテンツに対する応答であるユーザ発話に、予め定めたトピックに対応する単語Ｙが含まれていることを検出すると、このユーザとの対話のトピックをＹに設定する（図９の（４）参照）。ここで、対話制御部２３は、レベル１のコンテンツに加えて、レベル２のコンテンツのうちトピックＹに対応する単語Ｙが含まれるコンテンツを発話候補とする。なお、レベル２のコンテンツの方が優先的に選択され、該当するコンテンツがない場合や発話対象のレベル２のコンテンツが発話済みの場合は、レベル１のコンテンツが選択される。また、レベル２は、トピックごとにカテゴライズされていてもよい。 After that, when the dialog control unit 23 detects that the user's utterance, which is a response to the uttered content, includes a word Y corresponding to a predetermined topic, it sets Y as the topic of the dialog with the user. (See (4) in FIG. 9). Here, in addition to the level 1 content, the dialogue control unit 23 sets content including the word Y corresponding to the topic Y among the level 2 content as an utterance candidate. Note that the level 2 content is preferentially selected, and the level 1 content is selected when there is no corresponding content or when the utterance target level 2 content has already been uttered. Level 2 may also be categorized by topic.

そして、対話制御部２３は、レベル２のコンテンツであって、単語Ｙが含まれるコンテンツもしくはトピックＹに該当するコンテンツを選択して、ユーザに発話する（図９の（５）参照）。このようにして、対話装置１０は、ユーザが興味をもつトピックを探索し、興味をもつトピックが探索された場合には、そのトピックについてより詳細な内容を発話して、ユーザとの対話内容を充実させることができる。 Then, the dialog control unit 23 selects the level 2 content that includes the word Y or the content that corresponds to the topic Y, and speaks to the user (see (5) in FIG. 9). In this way, the dialogue device 10 searches for a topic in which the user is interested, and when the topic in which the user is interested is searched, utters more detailed content about the topic, thereby reflecting the content of the dialogue with the user. can be enriched.

また、図１０に示すように、対話制御部２３は、ユーザ発話を受信すると、レベル１のコンテンツをランダムに選択して発話する（図１０の（１）参照）。その応答として、キーワード「ＫＹ１」を含むユーザ発話が行われると、対話制御部２３は、キーワード「ＫＹ１」が設定されたレベル３のコンテンツ群を検出し、当該コンテンツ群のうちランダムに選択したコンテンツを発話する（図１０の（２）参照）。このように、対話制御部２３は、レベルに関係なく、キーワードを優先的に利用してコンテンツを選択する。また、対話制御部２３は、発話したレベル３のコンテンツに設定されるトピックまたは当該レベル３のコンテンツに含まれる単語を現在のトピックに設定する。 Further, as shown in FIG. 10, upon receiving the user's speech, the interaction control unit 23 randomly selects content of level 1 and speaks (see (1) in FIG. 10). As a response, when the user utters an utterance containing the keyword "KY1", the interaction control unit 23 detects a content group of level 3 in which the keyword "KY1" is set, and randomly selects content from the content group. (see (2) in FIG. 10). In this way, the dialogue control unit 23 preferentially uses keywords to select content regardless of the level. Further, the dialogue control unit 23 sets the topic set in the spoken level 3 content or the word included in the level 3 content as the current topic.

続いて、対話制御部２３は、レベル３のコンテンツが発話された後、ユーザが単なる相槌や対話終了を示す発話を行わない間、ユーザが興味を示していると判定し、キーワード「ＫＹ１」を有するレベル３のコンテンツもしくはトピックが一致するレベル２のコンテンツを選択して発話を続ける（図１０の（３）参照）。このとき、対話制御部２３は、ユーザ発話に含まれるキーワードに限らず、発話済みのコンテンツと類似するコンテンツや同じカテゴリに属するレベル２やレベル３のコンテンツを選択することもできる。 Next, after the level 3 content is uttered, the dialogue control unit 23 determines that the user is showing interest while the user does not make a mere backtrack or an utterance indicating the end of the dialogue, and utters the keyword "KY1". Select the content of level 3 that the user has or the content of level 2 that matches the topic, and continue speaking (see (3) in FIG. 10). At this time, the interaction control unit 23 can select not only the keyword included in the user's utterance but also the content similar to the uttered content or the content of level 2 or 3 belonging to the same category.

その後、対話制御部２３は、発話されたレベル３のコンテンツに対して、「興味ない」などのユーザ発話が入力されると、ユーザの興味がなくなったと判定し、トピックの設定を解除して、レベル１のコンテンツをランダムに選択して発話する（図１０の（４）参照）。 After that, when a user utterance such as "I'm not interested" is input to the uttered level 3 content, the dialogue control unit 23 determines that the user is no longer interested, cancels the setting of the topic, The content of level 1 is randomly selected and spoken (see (4) in FIG. 10).

このように、対話装置１０は、レベル３のコンテンツを発話している状態であっても、ユーザの興味が遠ざかった場合には、レベル１のコンテンツを再度発話することで、ユーザが興味をもつトピックを再度探索するので、ユーザの満足度の低下を抑制することができる。 In this way, even when the user is speaking the content of level 3, if the user loses interest, the dialogue device 10 speaks the content of level 1 again, thereby increasing the interest of the user. Since the topic is searched again, it is possible to suppress the decrease in user's satisfaction.

図１１は、実施例２にかかる処理の流れを示すフローチャートである。図１１に示すように、対話装置１０の対話制御部２３は、ユーザ発話を受信すると（Ｓ２０１：Ｙｅｓ）、予め指定した否定的な発話に一致するか否かを判定する（Ｓ２０２）。例えば、対話制御部２３は、「興味がない」、「そうですか」、「へえ～」、「他の話はないの？」などの発話を否定的な発話として予め指定することができる。 FIG. 11 is a flowchart illustrating the flow of processing according to the second embodiment; As shown in FIG. 11, upon receiving a user's utterance (S201: Yes), the dialogue control unit 23 of the dialogue device 10 determines whether or not it matches a negative utterance specified in advance (S202). For example, the dialog control unit 23 can preliminarily designate utterances such as "I'm not interested", "Is that so?", "Huh?"

続いて、対話制御部２３は、ユーザ発話が否定的な発話ではない場合（Ｓ２０２：Ｎｏ）、レベル３のコンテンツの発話条件と一致するか否かを判定する（Ｓ２０３）。例えば、対話制御部２３は、レベル３のコンテンツに設定されるキーワードがユーザ発話に含まれるか否かを判定する。別例としては、対話制御部２３は、直前に発話されたレベル２のコンテンツに含まれるキーワード（単語）等よりも下位概念のキーワード（単語）がユーザ発話に含まれているか否かを判定する。また、対話制御部２３は、レベル２のコンテンツが連続して複数回発話されているか否かを判定する。 Subsequently, if the user's utterance is not a negative utterance (S202: No), the dialogue control unit 23 determines whether or not it matches the utterance condition of the content of level 3 (S203). For example, the dialogue control unit 23 determines whether or not the keyword set for the level 3 content is included in the user's utterance. As another example, the dialog control unit 23 determines whether or not the user's utterance includes a keyword (word) that is a lower concept than the keyword (word) included in the level 2 content that was uttered immediately before. . Further, the dialogue control unit 23 determines whether or not the content of level 2 is continuously uttered a plurality of times.

そして、対話制御部２３は、レベル３のコンテンツの発話条件と一致する場合（Ｓ２０３：Ｙｅｓ）、レベル３のコンテンツから興味が深くなるようなコンテンツを選択する（Ｓ２０４）。例えば、対話制御部２３は、ランダムにコンテンツを選択したり、直前に発話されたコンテンツと類似するコンテンツ、直前に発話されたコンテンツと同じカテゴリに属するコンテンツ、直前に発話されたコンテンツに含まれるキーワードと類似するキーワードや下位概念のキーワードを含むコンテンツを選択したりする。 Then, when matching the utterance condition of the content of level 3 (S203: Yes), the dialogue control unit 23 selects content that makes the content of level 3 more interesting (S204). For example, the dialogue control unit 23 randomly selects content, content similar to the content uttered immediately before, content belonging to the same category as the content uttered immediately before, and keywords contained in the content uttered immediately before. Select content that contains similar keywords or subordinate keywords.

一方、対話制御部２３は、レベル３のコンテンツの発話条件と一致しない場合（Ｓ２０３：Ｎｏ）、トピックが確定済みか否かを判定する（Ｓ２０５）。 On the other hand, if the speech condition does not match the utterance condition of the content of level 3 (S203: No), the dialogue control unit 23 determines whether or not the topic has been decided (S205).

そして、対話制御部２３は、トピックが確定済みである場合（Ｓ２０５：Ｙｅｓ）、レベル１のコンテンツと、該当するレベル２のコンテンツとから、興味が深くなるようなコンテンツを選択する（Ｓ２０６）。例えば、対話制御部２３は、Ｓ２０４と同様の基準で選択することができる。 Then, when the topic has been decided (S205: Yes), the dialogue control unit 23 selects content that makes the user more interested from the level 1 content and the corresponding level 2 content (S206). For example, the dialog control unit 23 can be selected based on the same criteria as in S204.

一方、対話制御部２３は、トピックが確定済みではない場合（Ｓ２０５：Ｎｏ）、レベル１のコンテンツから未発話のコンテンツを選択する（Ｓ２０７）。なお、Ｓ２０２において、ユーザ発話が否定的な発話であると判定された場合も（Ｓ２０２：Ｙｅｓ）、Ｓ２０７が実行される。 On the other hand, if the topic has not been finalized (S205: No), the dialogue control unit 23 selects unuttered content from the content of level 1 (S207). Note that S207 is also executed when it is determined in S202 that the user's utterance is a negative utterance (S202: Yes).

その後、対話制御部２３は、Ｓ２０４、Ｓ２０６、Ｓ２０７のいずれかで選択されたコンテンツに設定される発話内容をユーザ端末に対して発話する（Ｓ２０８）。 After that, the dialogue control unit 23 utters the utterance content set in the content selected in any of S204, S206, and S207 to the user terminal (S208).

上述したように、トピックの探索状況やキーワードの発話状況を用いて、対話装置１０が主導でコンテンツ選択や発話制御を実行することができるので、コンテンツ側に設定する情報を削減して利用者の負担を低減することができる。 As described above, the dialog apparatus 10 can take the lead in executing content selection and utterance control using the topic search state and keyword utterance state. The burden can be reduced.

さて、これまで本発明の実施例について説明したが、本発明は上述した実施例以外にも、種々の異なる形態にて実施されてよいものである。 Although the embodiments of the present invention have been described so far, the present invention may be implemented in various different forms other than the embodiments described above.

［レベル］
例えば、上記実施例では、レベル１からレベル３の３段階の階層構造の例を説明したが、これに限定されるものではなく、２段階や４段階以上の階層構造を用いることもできる。また、実施例では、説明上、各コンテンツにレベルを明示的に設定する例を説明したが、これに限定されるものではない。例えば、「料理」、「カレー」、「カレーの作り方」のように、発話内容を段階的に設定することで、結果的に階層構造となるようにコンテンツを構成することもできる。 [level]
For example, in the above embodiment, an example of a three-stage hierarchical structure from level 1 to level 3 has been described, but the present invention is not limited to this, and a hierarchical structure of two stages or four or more stages can also be used. Also, in the embodiment, for the sake of explanation, an example in which a level is explicitly set for each content has been described, but the present invention is not limited to this. For example, by setting utterance contents step by step, such as "cooking", "curry", and "how to make curry", it is possible to configure the content so that it results in a hierarchical structure.

［コンテンツ］
上記実施例で説明したコンテンツのデータ構造や設定例は、あくまで一例であり、任意に設定変更することができる。例えば、トピックを複数設定したり、複数のトピックごとに異なる複数の発話条件を設定したりすることもできる。また、対話システムで利用される対話ブロックなどの公知のコンテンツを採用することもできる。 [content]
The data structure of the content and setting examples described in the above embodiments are only examples, and the settings can be changed arbitrarily. For example, it is possible to set multiple topics or set different utterance conditions for each of the multiple topics. Also, known contents such as dialog blocks used in dialog systems can be employed.

また、対話装置１０は、１つの対話の間に複数のトピックを設定することもできる。この場合、対話装置１０は、いずれかの対話をユーザに選択させたり、最新のトピックを優先させたり、当該ユーザが発話した回数の多いトピックを優先させたりするなど、任意に選択することができる。 The dialogue device 10 can also set multiple topics during one dialogue. In this case, the dialogue device 10 allows the user to select one of the dialogues, give priority to the latest topic, or give priority to the topic that the user has spoken many times. .

［深度制御］
例えば、対話装置１０は、発話内容によってはユーザの応答に依らずにトピックや深度をコントロールすることもできる。具体的には、対話装置１０は、ユーザが興味を示していない場合はトピック深度を浅くする。また、対話装置１０は、予め設定した「もういい」、「やめて」、「他の話をして」などの特定の言葉がユーザにより発話された場合にトピックを削除して、レベル１のコンテンツから選択することもできる。 [Depth control]
For example, the dialogue device 10 can control the topic and depth depending on the content of the utterance without depending on the user's response. Specifically, the interactive device 10 reduces the topic depth when the user is not interested. In addition, the dialogue device 10 deletes the topic when the user utters a preset specific word such as "It's okay", "Stop", "Tell me something else", and deletes the content of level 1. You can also choose from

また、対話装置１０は、例えば３分などの所定時間以上、同じレベルのコンテンツが発話されている場合に、現在のレベルよりも一つ浅くしたレベルに遷移することもできる。また、対話装置１０は、コンテンツの深度条件にユーザの感情を設定することもできる。音声入力の場合、感情として、音声の大きさや感情を示す各周波数などを用いることができる。例えば、怒りや悲しみなどの負の感情が検出された場合は、レベルを１段階下げて階層を浅くし、喜びや驚きなどの正の感情が検出された場合は、レベルを１段階上げて階層を深くすることもできる。 In addition, the dialogue device 10 can also transition to a level one level lower than the current level when the same level of content is spoken for a predetermined time such as three minutes or more. The interactive device 10 can also set the user's emotion as the depth condition of the content. In the case of voice input, the emotion can be the volume of the voice, each frequency indicating the emotion, or the like. For example, if a negative emotion such as anger or sadness is detected, the level is lowered by one step to make the hierarchy shallower, and if a positive emotion such as joy or surprise is detected, the level is raised by one step to make the hierarchy can also be deepened.

［システム］
上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 [system]
Information including processing procedures, control procedures, specific names, and various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散や統合の具体的形態は図示のものに限られない。つまり、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、コンテンツ管理部２１と対話制御部２３とを統合することもできる。 Also, each component of each device illustrated is functionally conceptual, and does not necessarily need to be physically configured as illustrated. That is, the specific forms of distribution and integration of each device are not limited to those shown in the drawings. That is, all or part of them can be functionally or physically distributed and integrated in arbitrary units according to various loads and usage conditions. For example, the content management section 21 and the dialog control section 23 can be integrated.

さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 Further, each processing function performed by each device may be implemented in whole or in part by a CPU and a program analyzed and executed by the CPU, or implemented as hardware based on wired logic.

［ハードウェア］
図１２は、ハードウェア構成例を説明する図である。図１２に示すように、対話装置１０は、通信装置１０ａ、ＨＤＤ（Hard Disk Drive）１０ｂ、メモリ１０ｃ、プロセッサ１０ｄを有する。また、図１２に示した各部は、バス等で相互に接続される。 [hardware]
FIG. 12 is a diagram illustrating a hardware configuration example. As shown in FIG. 12, the interactive device 10 has a communication device 10a, a HDD (Hard Disk Drive) 10b, a memory 10c, and a processor 10d. 12 are interconnected by a bus or the like.

通信装置１０ａは、ネットワークインタフェースカードなどであり、他のサーバとの通信を行う。ＨＤＤ１０ｂは、図２に示した機能を動作させるプログラムやＤＢを記憶する。 The communication device 10a is a network interface card or the like, and communicates with other servers. The HDD 10b stores programs and DBs for operating the functions shown in FIG.

プロセッサ１０ｄは、図２に示した各処理部と同様の処理を実行するプログラムをＨＤＤ１０ｂ等から読み出してメモリ１０ｃに展開することで、図２等で説明した各機能を実行するプロセスを動作させる。すなわち、このプロセスは、対話装置１０が有する各処理部と同様の機能を実行する。具体的には、プロセッサ１０ｄは、コンテンツ管理部２１、対話入力部２２、対話制御部２３、対話出力部２４等と同様の機能を有するプログラムをＨＤＤ１０ｂ等から読み出す。そして、プロセッサ１０ｄは、コンテンツ管理部２１、対話入力部２２、対話制御部２３、対話出力部２４等と同様の処理を実行するプロセスを実行する。 The processor 10d reads from the HDD 10b or the like a program that executes the same processing as each processing unit shown in FIG. 2 and develops it in the memory 10c, thereby operating the process of executing each function described with reference to FIG. 2 and the like. That is, this process executes the same function as each processing unit of the interactive device 10. FIG. Specifically, the processor 10d reads a program having functions similar to those of the content management unit 21, the dialogue input unit 22, the dialogue control unit 23, the dialogue output unit 24, and the like, from the HDD 10b and the like. Then, the processor 10d executes the same processes as those of the content management unit 21, the dialogue input unit 22, the dialogue control unit 23, the dialogue output unit 24, and the like.

このように対話装置１０は、プログラムを読み出して実行することで対話方法を実行する情報処理装置として動作する。また、対話装置１０は、媒体読取装置によって記録媒体から上記プログラムを読み出し、読み出された上記プログラムを実行することで上記した実施例と同様の機能を実現することもできる。なお、この他の実施例でいうプログラムは、対話装置１０によって実行されることに限定されるものではない。例えば、他のコンピュータまたはサーバがプログラムを実行する場合や、これらが協働してプログラムを実行するような場合にも、本発明を同様に適用することができる。 Thus, the interactive device 10 operates as an information processing device that executes an interactive method by reading and executing a program. Further, the interactive device 10 can read the program from the recording medium by the medium reading device and execute the read program to realize the same functions as the above embodiment. It should be noted that the programs referred to in other embodiments are not limited to being executed by the interactive device 10. FIG. For example, the present invention can be applied in the same way when another computer or server executes the program, or when they cooperate to execute the program.

１０対話装置
１１通信部
１２記憶部
１３コンテンツＤＢ
１４対話履歴ＤＢ
２０制御部
２１コンテンツ管理部
２２対話入力部
２３対話制御部
２３ａトピック管理部
２３ｂ候補抽出部
２３ｃ選択部
２４対話出力部 10 dialogue device 11 communication unit 12 storage unit 13 content DB
14 Dialogue history DB
20 control unit 21 content management unit 22 dialogue input unit 23 dialogue control unit 23a topic management unit 23b candidate extraction unit 23c selection unit 24 dialogue output unit

Claims

A content in which utterance content to be responded to by a user is defined, wherein the topic of dialogue and the utterance content to respond to the dialogue are limited for each hierarchical level as the depth of the level increases. a storage unit that stores content ;
a decision unit that decides, based on the user's utterance, the occurrence status of the topic of the dialogue and the status of the level used in the response of the dialogue ;
In the interaction with the user, if the topic has not been set, the shallow level content used for setting the topic is selected, and if the topic has been set, the topic has been set. a selection unit that selects content at a deep level according to the depth and appearance of keywords in the dialogue ;
and an utterance execution unit that utters to the user using the utterance content set in the selected content .

2. The storage unit according to claim 1, wherein the deeper the hierarchies of the depths are, the more closely related to the utterance content of the user is stored as the utterance target content. interactive device.

The storage unit stores the content for each depth layered according to the defined degree of detail of the utterance content,
2. The dialogue apparatus according to claim 1, wherein the selection unit selects content to be spoken from among a plurality of contents corresponding to the hierarchy of depth corresponding to the determined level .

The storage unit stores the content in which the utterance content and utterance conditions are set for each depth hierarchized according to the defined degree of detail of the utterance content,
The selection unit selects content satisfying the utterance condition as content to be uttered based on the user's response to the uttered content and the depth hierarchy to which the uttered content belongs. 2. A dialogue device according to claim 1, characterized in that it selects.

The depth is hierarchized by levels indicating the degree of detail of interaction with the user,
In the utterance condition, the higher the level, the more detailed conditions are specified that make it difficult to utter,
The determining unit determines the level based on the user's response to the spoken content and the depth hierarchy to which the spoken content belongs,
5. The dialogue apparatus according to claim 4, wherein the selection unit selects content satisfying the utterance condition from among contents corresponding to the determined level as the utterance target content.

The utterance condition of the contents of each level includes keywords,
wherein, when the keyword is included in the response content of the user, the selection unit selects the content to which the keyword is set as the content to be uttered regardless of the specified level. Item 6. An interactive device according to item 5.

When the content of a response from the user includes a negative word, or when the time of interaction with the user exceeds a predetermined time, the selection unit selects content belonging to a level shallower than a specified level, 6. The interactive device according to claim 5, wherein the content to be spoken is selected.

2. The dialogue according to claim 1, wherein the storage unit stores, as the utterance target content, the shallower the layered depth, the lower the relevance to the uttered content. Device.

the computer
determining, based on user utterances, the occurrence of a topic of dialogue and the level of context being used in the response of said dialogue ;
The content defines utterance content to be responded to according to the user's utterance, wherein the topic of dialogue and the utterance content to respond to the dialogue are limited for each hierarchical level as the depth of the level increases. By referring to a storage unit that stores each content , if the topic has not been set in the dialogue with the user, the shallow level content used for setting the topic is selected, and the topic is set. if it has already been done, selecting content at a deeper level in accordance with the depth at which the topic is set and the occurrence of keywords in the dialogue ;
A dialogue method characterized by executing a process of uttering to the user using the utterance content set in the selected content .

to the computer,
determining, based on user utterances, the occurrence of a topic of dialogue and the level of context being used in the response of said dialogue ;
The content defines utterance content to be responded to according to the user's utterance, wherein the topic of dialogue and the utterance content to respond to the dialogue are limited for each hierarchical level as the depth of the level increases. By referring to a storage unit that stores each content , if the topic has not been set in the dialogue with the user, the shallow level content used for setting the topic is selected, and the topic is set. if it has already been done, selecting content at a deeper level in accordance with the depth at which the topic is set and the occurrence of keywords in the dialogue ;
A dialog program characterized by executing a process of speaking to the user using the speech content set in the selected content .