JP2017207693A

JP2017207693A - Dialogue method, dialogue system, dialogue device, and program

Info

Publication number: JP2017207693A
Application number: JP2016101221A
Authority: JP
Inventors: 弘晃杉山; Hiroaki Sugiyama; 豊美目黒; Toyomi Meguro; 淳司大和; Atsushi Yamato; 山田　智広; Tomohiro Yamada; 智広山田; 崇由望月; Takayoshi Mochizuki; 崇裕松元; Takahiro Matsumoto; 安範尾崎; Yasunori Ozaki; 雄一郎吉川; Yuichiro Yoshikawa; 石黒　浩; Hiroshi Ishiguro; 浩石黒
Original assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Current assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Priority date: 2016-05-20
Filing date: 2016-05-20
Publication date: 2017-11-24
Anticipated expiration: 2036-05-20
Also published as: JP6601625B2

Abstract

PROBLEM TO BE SOLVED: To provide a dialogue system capable of continuing a dialogue with a user for a long period by guiding a speech of a user to a range which satisfies conditions for the dialogue system acquires a speech of the user.SOLUTION: A dialogue system 10 includes at least: an input part 1 that receives a speech of a user; and a presentation part 5 that presents a speech. A presentation part 5-1 presents a first speech. A presentation part 5-2 presents an action corresponding to the conditions to acquire a speech of the user responding to the first speech. The input part 1 receives the speech made by the user after making an action corresponding to the conditions for acquiring the speech of the user.SELECTED DRAWING: Figure 1

Description

この発明は、人とコミュニケーションを行うロボットなどに適用可能な、コンピュータが人間と自然言語を用いて対話を行う技術に関する。 The present invention relates to a technology in which a computer interacts with a human using a natural language, which can be applied to a robot that communicates with a human.

近年、人とコミュニケーションを行うロボットの研究開発が進展しており、様々な現場で実用化されてきている。例えば、コミュニケーションセラピーの現場において、ロボットが孤独感を抱える人の話し相手となる利用形態がある。具体的には、老人介護施設においてロボットが入居者の傾聴役となることで、入居者の孤独感を癒す役割を担うことができると共に、ロボットとの会話している姿を見せ、入居者とその家族や介護士など周りの人々との会話のきっかけを作ることができる。また、例えば、コミュニケーション訓練の現場において、ロボットが練習相手となる利用形態がある。具体的には、外国語学習施設においてロボットが外国語学習者の練習相手となることで、外国語学習を効率的に進めることができる。また、例えば、情報提示システムとしての応用において、ロボット同士の対話を聞かせることを基本としながら、時折人に話しかけることで、退屈させずに人を対話に参加させ、人が受け入れやすい形で情報を提示することができる。具体的には、街中の待ち合わせ場所やバス停、駅のホームなどで人が時間を持て余している際や、自宅や教室などで対話に参加する余裕がある際に、ニュースや商品紹介、蘊蓄・知識紹介、教育（例えば、子供の保育・教育、大人への一般教養教授、モラル啓発など）など、効率的な情報提示が期待できる。さらに、例えば、情報収集システムとしての応用において、ロボットが人に話しかけながら情報を収集する利用形態がある。ロボットとのコミュニケーションにより対話感を保持できるため、人に聴取されているという圧迫感を与えずに情報収集することができる。具体的には、個人情報調査や市場調査、商品評価、推薦商品のための趣向調査などに応用することが想定されている。このように人とロボットのコミュニケーションは様々な応用が期待されており、ユーザとより自然に対話を行うロボットの実現が期待される。また、スマートフォンの普及により、LINE(登録商標)のように、複数ユーザでほぼリアルタイムにチャットを行うことにより、人との会話を楽しむサービスも実施されている。このチャットサービスにロボットとの会話の技術を適用すれば、チャット相手がいなくても、ユーザとより自然に対話を行うチャットサービスの実現が可能となる。 In recent years, research and development of robots that communicate with people have progressed and have been put to practical use in various fields. For example, in the field of communication therapy, there is a usage form in which a robot is a conversation partner of a person who is lonely. Specifically, in a nursing home for the elderly, the robot can play a role of listening to the resident, so he can play a role in healing the loneliness of the resident and show a conversation with the robot. You can create conversation opportunities with the family and caregivers. Further, for example, there is a usage form in which a robot is a practice partner in a communication training field. Specifically, the foreign language learning can be efficiently advanced by having the robot become a practice partner of the foreign language learner at the foreign language learning facility. Also, for example, in application as an information presentation system, it is basic to let robots talk to each other, but by talking to people from time to time, people can participate in the conversation without being bored, and information that is easy for people to accept Can be presented. Specifically, news, product introductions, accumulation / knowledge when people have time in meeting places in the city, bus stops, station platforms, etc. or when there is room to participate in dialogues at home or in classrooms. Efficient information presentation such as introduction and education (for example, childcare / education for children, general education professor for adults, moral education, etc.) can be expected. Furthermore, for example, in application as an information collection system, there is a utilization form in which a robot collects information while talking to a person. Since communication can be maintained through communication with the robot, information can be collected without giving a sense of pressure that people are listening. Specifically, it is assumed to be applied to personal information surveys, market surveys, product evaluations, preference surveys for recommended products, and the like. As described above, various applications of human-robot communication are expected, and realization of a robot that can more naturally interact with users is expected. In addition, with the spread of smartphones, services such as LINE (registered trademark) that allow users to enjoy conversations with people by chatting in almost real time are also being implemented. If the technology of conversation with the robot is applied to this chat service, it becomes possible to realize a chat service for more natural dialogue with the user even when there is no chat partner.

本明細書では、これらのサービスで用いられるロボットやチャット相手などのユーザとの対話相手となるハードウェアやユーザとの対話相手となるハードウェアとしてコンピュータを機能させるためのコンピュータソフトウェアなどを総称してエージェントと呼ぶこととする。エージェントは、ユーザとの対話相手となるものであるため、ロボットやチャット相手などのように擬人化されていたり、人格化されていたり、性格や個性を有していたりするものであってもよい。 In this specification, the hardware used as a conversation partner with a user such as a robot and a chat partner used in these services, and the computer software for causing the computer to function as the hardware as a conversation partner with the user are collectively referred to. It will be called an agent. Since the agent is a conversation partner with the user, the agent may be anthropomorphic, personalized, or have personality or individuality, such as a robot or a chat partner.

これらのサービスの実現のキーとなるのは、ハードウェアやコンピュータソフトウェアにより実現されるエージェントが人間と自然に対話を行うことができる技術である。 The key to the realization of these services is a technology that enables agents realized by hardware and computer software to naturally interact with humans.

上記のエージェントの一例として、例えば、非特許文献１に記載されたような、ユーザの発話を音声認識し、発話の意図を理解・推論して、適切な応答をする音声対話システムがある。音声対話システムの研究は、音声認識技術の進展に伴って活発に進められ、例えば音声自動応答システムなどで実用化されている。 As an example of the agent described above, there is a voice dialogue system that recognizes a user's utterance, understands / infers the intention of the utterance, and responds appropriately as described in Non-Patent Document 1, for example. Research on speech dialogue systems has been actively promoted with the progress of speech recognition technology, and has been put to practical use in, for example, automatic speech response systems.

また、上記のエージェントの一例として、あらかじめ定められたシナリオに沿って特定の話題についてユーザと対話を行うシナリオ対話システムがある。シナリオ対話システムでは、シナリオに沿って対話が展開する限り対話を続けることが可能である。例えば、非特許文献２に記載された対話システムは、ユーザと複数のエージェント間で、エージェントによる割り込みやエージェント同士のやり取りを含めながら対話を行うシステムである。例えば、エージェントは、ユーザに対してシナリオに用意された質問を発話し、質問に対するユーザの回答の発話がシナリオに用意された選択肢に対応する場合に、その選択肢に対応する発話を行うように機能する。すなわち、シナリオ対話システムは、システムに予め記憶されたシナリオに基づいた発話をエージェントが行う対話システムである。この対話システムでは、エージェントがユーザに問いかけ、ユーザからの返答を受けた際に、ユーザの発話内容に関わらず「そっか」といった相槌で流したり、エージェントの割り込みで話題を変えたりすることで、ユーザの発話が本来の話題から外れた場合であってもストーリーの破綻をユーザに感じさせないように応答することが可能である。 Further, as an example of the agent, there is a scenario dialogue system that performs dialogue with a user on a specific topic according to a predetermined scenario. In the scenario dialogue system, the dialogue can be continued as long as the dialogue develops along the scenario. For example, the dialogue system described in Non-Patent Document 2 is a system that performs a dialogue between a user and a plurality of agents, including an interruption by an agent and an exchange between agents. For example, when an agent utters a question prepared for a scenario to a user, and the utterance of a user's answer to the question corresponds to an option prepared for the scenario, the agent functions to utter corresponding to the option. To do. That is, the scenario dialogue system is a dialogue system in which an agent makes an utterance based on a scenario stored in advance in the system. In this interactive system, when the agent asks the user and receives a response from the user, the conversation is swayed regardless of the content of the user's utterance, or the topic is changed by interrupting the agent. Even when the user's utterance deviates from the original topic, it is possible to respond so as not to make the user feel the story is broken.

また、上記のエージェントの一例として、ユーザの発話内容に沿った発話をエージェントが行うことにより、ユーザとエージェントとが自然な対話を行う雑談対話システムがある。例えば、非特許文献３に記載された対話システムは、ユーザとエージェントとの間で行われる複数回の対話の中で文脈に特有のものをより重視しながら、ユーザまたはエージェントの発話に含まれる単語をトリガーとして、あらかじめ記述しておいたルールに従ってシステムが発話することで、ユーザとシステムとの間で雑談対話を実現するシステムである。雑談対話システムが用いるルールは、あらかじめ記述したものだけでなく、ユーザの発話内容に基づいて自動的に生成したものであってもよいし、ユーザまたはエージェントによる直前の発話またはその近傍に発話された発話に基づいて自動的に生成したものであってもよいし、ユーザまたはエージェントによる直前の発話またはその近傍に発話された発話を少なくとも含む発話に基づいて自動的に生成したものであってもよい。非特許文献３には、ユーザの発話に含まれる単語と共起関係や係り受け関係にある単語に基づいて、自動的にルールを生成する技術が記載されている。また、例えば、非特許文献４に記載された対話システムは、人手で記述したルールと統計的発話生成手法で記述したルールを融合することで、ルール生成のコストを低減したシステムである。雑談対話システムは、シナリオ対話システムとは異なり、予め用意されたシナリオに沿った発話をエージェントが行うものではないため、ユーザの発話によっては、エージェントの発話がユーザの発話に対応しないものとなってしまうという事態は生じずに、少なくともユーザの発話内容、もしくはユーザまたはエージェントによる直前の発話またはその近傍に発話された発話、もしくはユーザまたはエージェントによる直前の発話またはその近傍に発話された発話を少なくとも含む発話に基づいた発話をエージェントが行うことが可能である。すなわち、雑談対話システムは、少なくともユーザの発話内容、もしくはユーザまたはエージェントによる直前の発話またはその近傍に発話された発話、もしくはユーザまたはエージェントによる直前の発話またはその近傍に発話された発話を少なくとも含む発話に基づいた発話をエージェントが行う対話システムである。これらの雑談対話システムでは、ユーザの発話に対して明示的に応答することが可能である。 Further, as an example of the above-described agent, there is a chat dialogue system in which a user and an agent have a natural dialogue when the agent utters according to the content of the user's utterance. For example, in the dialogue system described in Non-Patent Document 3, words included in the utterance of the user or the agent while giving more importance to the context-specific ones in a plurality of dialogues between the user and the agent. Is a system that realizes a chat conversation between the user and the system by the system speaking according to the rules described in advance. The rules used by the chat dialogue system are not limited to those described in advance, but may be automatically generated based on the user's utterance content, or uttered in the immediate utterance by the user or agent or in the vicinity thereof. It may be automatically generated based on the utterance, or may be automatically generated based on the utterance including at least the utterance immediately before or near the utterance by the user or the agent. . Non-Patent Document 3 describes a technique for automatically generating a rule based on words that have a co-occurrence relationship or a dependency relationship with words included in a user's utterance. Further, for example, the dialogue system described in Non-Patent Document 4 is a system that reduces the cost of rule generation by fusing rules described manually and rules described using a statistical utterance generation method. The chat dialogue system is different from the scenario dialogue system because the agent does not utter the utterance according to the prepared scenario. Therefore, depending on the user's utterance, the agent's utterance does not correspond to the user's utterance. At least the content of the user's utterance, the utterance spoken immediately before or near the user or agent, or the utterance spoken immediately before or near the user or agent An agent can make an utterance based on the utterance. That is, the chat dialogue system includes at least the utterance content of the user, the utterance spoken immediately before or by the user or agent, or the utterance uttered immediately before or by the user or agent. It is a dialogue system in which an agent utters speech based on. In these chat dialogue systems, it is possible to explicitly respond to the user's utterance.

河原達也，“話し言葉による音声対話システム”，情報処理，vol. 45，no. 10，pp. 1027-1031，2004年10月Tatsuya Kawahara, “Spoken Dialogue System by Spoken Language”, Information Processing, vol. 45, no. 10, pp. 1027-1031, October 2004 有本庸浩，吉川雄一郎，石黒浩，“複数体のロボットによる音声認識なし対話の印象評価”，日本ロボット学会学術講演会，2016年Arimoto Yasuhiro, Yoshikawa Yuichiro, Ishiguro Hiroshi, “Impression Evaluation of Speechless Speech Recognition by Multiple Robots”, Annual Conference of the Robotics Society of Japan, 2016 杉山弘晃，目黒豊美，東中竜一郎，南泰浩，“任意の話題を持つユーザ発話に対する係り受けと用例を利用した応答文の生成”，人工知能学会論文誌，vol. 30(1)，pp. 183-194，2015年Hiroaki Sugiyama, Toyomi Meguro, Ryuichiro Higashinaka, Yasuhiro Minami, “Generation of response sentences using dependency and examples for user utterances with arbitrary topics”, Transactions of the Japanese Society for Artificial Intelligence, vol. 30 (1), pp. 183-194, 2015 目黒豊美，杉山弘晃，東中竜一郎，南泰浩，“ルールベース発話生成と統計的発話生成の融合に基づく対話システムの構築”，人工知能学会全国大会論文集，vol. 28，pp. 1-4，2014年Toyomi Meguro, Hiroaki Sugiyama, Ryuichiro Higashinaka, Yasuhiro Minami, “Construction of a dialogue system based on the fusion of rule-based utterance generation and statistical utterance generation”, Proceedings of National Conference of the Japanese Society for Artificial Intelligence, vol. 28, pp. 1-4 ,2014

しかしながら、ユーザは多種多様で複雑な発話を行うため、従来の音声対話システムでは、すべてのユーザの発話について意味や内容を正確に理解することは難しい。音声対話システムがユーザの発話を正確に理解できなければ、ユーザの発話に対して適切な返答を行うことができない。ユーザと音声対話システムとが一対一で対話を行う状況では、音声対話システムが適切な返答をできないと、ユーザは対話を続けることにストレスを感じることになり、対話を中断したり対話破綻を起こしたりする原因となる。 However, since users utter a wide variety of complicated utterances, it is difficult to accurately understand the meanings and contents of utterances of all users in the conventional speech dialogue system. If the spoken dialogue system cannot accurately understand the user's utterance, an appropriate response cannot be made to the user's utterance. In a situation where the user and the voice dialogue system have a one-on-one dialogue, if the voice dialogue system fails to respond appropriately, the user feels stressed to continue the dialogue, interrupting the dialogue or causing the failure of the dialogue. Cause it.

この発明の目的は、上述のような点に鑑みて、ユーザの発話を対話システムが発話を取得するための条件を満たす範囲に引き込み、対話を長く続けることができる対話技術を提供することである。 In view of the above-described points, an object of the present invention is to provide a dialogue technique that can draw a user's utterance into a range that satisfies a condition for the dialogue system to acquire the utterance and can continue the dialogue for a long time. .

上記の課題を解決するために、この発明の第一の態様の対話方法は、ある発話である第一発話に対するユーザの発話を取得するために対話システムが行う対話方法であって、提示部が、第一発話を提示する第一提示ステップと、提示部が、第一発話に対するユーザの発話を取得するための条件に対応する行動を提示する第二提示ステップと、入力部が、行動後にユーザが発した発話を受け付ける返答受付ステップと、を含む。 In order to solve the above-described problem, a dialogue method according to a first aspect of the present invention is a dialogue method performed by a dialogue system to acquire a user's utterance for a first utterance that is a certain utterance, and the presentation unit A first presentation step for presenting the first utterance, a second presentation step in which the presentation unit presents an action corresponding to a condition for acquiring the user's utterance for the first utterance, and an input unit after the action A response accepting step for accepting an utterance uttered by.

この発明の第二の態様の対話システムは、ある発話である第一発話に対するユーザの発話を取得する対話システムであって、第一発話と、第一発話に対するユーザの発話を取得するための条件に対応する行動と、を決定する発話決定部と、発話決定部が決定した第一発話を提示し、発話決定部が決定した行動を第一発話を提示した後に行う提示部と、行動後にユーザが発した発話を受け付ける入力部と、を含む。 A dialogue system according to a second aspect of the present invention is a dialogue system that acquires a user's utterance for a first utterance that is a certain utterance, and a condition for acquiring the first utterance and the user's utterance for the first utterance. An utterance determination unit that determines an action corresponding to the utterance, a presentation unit that presents the first utterance determined by the utterance determination unit, and presents the action determined by the utterance determination unit after presenting the first utterance, and a user after the action And an input unit that accepts utterances uttered by.

この発明の第三の態様の対話装置は、ユーザの発話を受け付ける入力部と、発話および行動を提示する提示部とを少なくとも含み、ある発話である第一発話に対するユーザの発話を取得する対話システムが提示する発話を決定する対話装置であって、第一発話と、第一発話を提示した後に行う行動であり、かつ、第一発話に対するユーザの発話を取得するための条件に対応する行動と、を決定する発話決定部を含む。 A dialogue apparatus according to a third aspect of the present invention includes an input unit that accepts a user's utterance and a presentation unit that presents the utterance and action, and obtains the user's utterance for the first utterance that is a certain utterance. An utterance determined by the first utterance, an action performed after the first utterance is presented, and an action corresponding to a condition for acquiring the user's utterance for the first utterance, , An utterance determination unit for determining.

この発明によれば、ユーザの発話の前に対話システムが発話を取得するための条件に対応する行動を行うことで、ユーザの発話を対話システムが取得するための条件を満たす範囲に引き込むことができ、ユーザとの対話を長く継続することが可能な対話システム、対話装置を実現することが可能となる。 According to this invention, by performing an action corresponding to a condition for the dialog system to acquire an utterance before the user's utterance, the user's utterance can be drawn into a range that satisfies the condition for the dialog system to acquire. It is possible to realize a dialogue system and a dialogue device that can continue the dialogue with the user for a long time.

図１は、人型ロボットによる対話システムの機能構成を例示する図である。FIG. 1 is a diagram illustrating a functional configuration of a dialogue system using a humanoid robot. 図２は、第一実施形態の対話方法の処理手続きを例示する図である。FIG. 2 is a diagram illustrating a processing procedure of the interactive method according to the first embodiment. 図３は、第二実施形態の対話方法の処理手続きを例示する図である。FIG. 3 is a diagram illustrating a processing procedure of the dialogue method according to the second embodiment. 図４は、グループチャットによる対話システムの機能構成を例示する図である。FIG. 4 is a diagram illustrating a functional configuration of a dialogue system using group chat.

以下、この発明の実施の形態について詳細に説明する。なお、図面中において同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. In addition, the same number is attached | subjected to the component which has the same function in drawing, and duplication description is abbreviate | omitted.

＜第一実施形態＞
第一実施形態の対話システムは、複数台の人型ロボットが協調してユーザとの対話を行う情報処理装置である。対話システム１０は、図１に示すように、入力部１、音声認識部２、発話決定部３、音声合成部４、および提示部５を備える。この対話システム１０が後述する各ステップの処理を行うことにより第一実施形態の対話方法が実現される。なお、図１に示すように、対話システム１０の音声認識部２、発話決定部３、音声合成部４による部分を対話装置１１とする。 <First embodiment>
The dialogue system according to the first embodiment is an information processing apparatus in which a plurality of humanoid robots cooperate with each other to interact with a user. As shown in FIG. 1, the dialogue system 10 includes an input unit 1, a speech recognition unit 2, an utterance determination unit 3, a speech synthesis unit 4, and a presentation unit 5. The interactive method of the first embodiment is realized by the processing of each step described later by the interactive system 10. As shown in FIG. 1, a part of the dialogue system 10 including the voice recognition unit 2, the utterance determination unit 3, and the voice synthesis unit 4 is a dialogue device 11.

人間同士が円滑な対話を行っているとき、互いに似た現象が起こることが確認されている（例えば、参考文献１参照）。この現象は、引き込み現象と呼ばれる。言語的な引き込み現象は、人間とロボットの間でも起こることが確認されている（例えば、参考文献２参照）。
〔参考文献１〕Condon, Williams S., and Louis W. Sander, “Neonate movement is synchronized with adult speech: Interactional participation and language acquisition”, Science, vol. 183, issue 4120, pp. 99-101, 1974
〔参考文献２〕飯尾尊優他，“語彙の引き込み：ロボットは人間の語彙を引き込めるか？”，情報処理学会論文誌，vol. 51，no. 2，pp. 277-289，2010 It has been confirmed that similar phenomena occur when humans have a smooth conversation (see, for example, Reference 1). This phenomenon is called a pull-in phenomenon. It has been confirmed that the linguistic entrainment phenomenon occurs between a human and a robot (for example, see Reference 2).
[Reference 1] Condon, Williams S., and Louis W. Sander, “Neonate movement is synchronized with adult speech: Interactional participation and language acquisition”, Science, vol. 183, issue 4120, pp. 99-101, 1974
[Reference 2] Takao Iio et al., “Vocabulary Attraction: Can Robots Inspire Human Vocabulary?”, IPSJ Journal, vol. 51, no. 2, pp. 277-289, 2010

この発明の対話技術は、上述の引き込み現象を利用して、ユーザが発話する前に、対話システムが発話を取得するための条件に対応する行動をユーザに提示することで、ユーザの発話を対話システムが取得するための条件を満たす範囲に引き込む。これにより、対話システムがユーザの発話を理解できずに対話が中断する事態を回避することができ、対話を長く継続することが可能となる。 The dialog technology of the present invention uses the above-described pulling phenomenon to dialog the user's utterance by presenting the user with an action corresponding to a condition for the utterance to be acquired by the dialog system before the user utters. Pull into a range that satisfies the conditions for the system to acquire. As a result, it is possible to avoid a situation where the dialogue is interrupted because the dialogue system cannot understand the user's utterance, and the dialogue can be continued for a long time.

ユーザと複数のエージェントとが対話を行う対話システムにおいて、ユーザの発話を引き込む例を示す。まず、第一のエージェントは相手の返答を求める発話（例えば、質問形式の発話）を行う。続いて、第二のエージェントは対話システムが理解しやすい発話（以下、引き込み発話と呼ぶ）を行い、ユーザの発話を待機する。これに続くユーザの発話は、直前の第二のエージェントの発話に引き込まれ、第二のエージェントの発話に似た特徴を持つ発話となる。この例では、対話システムが発話を取得するための条件に対応する行動として対話システムが理解しやすい発話を発することとしたが、その行動は発話に限定されず、視線または身体の向きや手足の動作など非言語的な行動であってもよい。 An example in which a user's speech is drawn in an interactive system in which a user and a plurality of agents interact with each other will be described. First, the first agent performs an utterance (for example, a question-type utterance) that asks for an answer from the other party. Subsequently, the second agent performs an utterance (hereinafter referred to as a pull-in utterance) that the dialog system can easily understand, and waits for the user's utterance. The subsequent user's utterance is drawn into the immediately preceding second agent's utterance and becomes an utterance having characteristics similar to those of the second agent's utterance. In this example, it is assumed that the dialogue system utters an easy-to-understand utterance as an action corresponding to the condition for the dialogue system to acquire the utterance, but the behavior is not limited to the utterance, and the gaze or body orientation or limbs It may be non-verbal behavior such as movement.

引き込み発話の決定方法としては、事前にルールとして記述しておく手法が考えられる。具体的には、空白があるテンプレートに適切な単語を埋めて発話内容を決定するルールが挙げられる。ルールの作成方法としては、人手で作成する方法と、公知の破綻検出技術（例えば、参考文献３参照）を用いる方法とが挙げられる。破綻検出技術を用いる方法では、第一のエージェントの発話に続く第二のエージェントの発話に対して、対話が破綻していないかどうかを判定する。このとき対話が破綻していないと判定されれば、第二のエージェントの発話は対話システムが理解しやすい発話であると言え、引き込み発話として適切である。
〔参考文献３〕杉山弘晃，“異なる特性を持つデータの組み合わせによる雑談対話の破綻検出”，第６回対話システムシンポジウム（SIG-SLUD），人工知能学会，pp. 51-56，2015年 As a method for determining the pull-in utterance, a method described in advance as a rule can be considered. Specifically, there is a rule for deciding the utterance content by filling an appropriate word in a template with a blank. Examples of the rule creation method include a manual creation method and a method using a known failure detection technique (for example, see Reference 3). In the method using the failure detection technique, it is determined whether or not the dialogue has failed for the utterance of the second agent following the utterance of the first agent. If it is determined that the dialogue has not failed at this time, it can be said that the utterance of the second agent is an utterance easy to understand by the dialogue system, and is appropriate as a pull-in utterance.
[Reference 3] Hiroaki Sugiyama, “Detection of Chat Dialogue Failure by Combining Data with Different Characteristics”, 6th Dialogue System Symposium (SIG-SLUD), Artificial Intelligence Society, pp. 51-56, 2015

また、ルールを事前に用意しておく方法ではなく、対話を行いながら都度引き込み発話の内容を決定する方法も考えられる。この方法では、対話の途中でその時点までの対話履歴に対して破綻検出を行い、次の対話装置の発話が対話破綻を起こさないように第二のエージェントの発話を決定する。この方法であれば、より長い対話履歴を用いることができるため、よりその発話に適切な引き込み発話の内容を決定することができる。 Also, instead of preparing rules in advance, a method of determining the contents of the utterance to be drawn each time while performing a dialogue is also conceivable. In this method, failure detection is performed on the dialogue history up to that point in the middle of the dialogue, and the utterance of the second agent is determined so that the utterance of the next dialogue device does not cause the dialogue failure. With this method, since a longer dialog history can be used, it is possible to determine the content of the pull-in utterance more appropriate for the utterance.

対話装置は、例えば、中央演算処理装置（CPU: Central Processing Unit）、主記憶装置（RAM: Random Access Memory）などを有する公知又は専用のコンピュータに特別なプログラムが読み込まれて構成された特別な装置である。対話装置は、例えば、中央演算処理装置の制御のもとで各処理を実行する。対話装置に入力されたデータや各処理で得られたデータは、例えば、主記憶装置に格納され、主記憶装置に格納されたデータは必要に応じて読み出されて他の処理に利用される。また、対話装置の各処理部の少なくとも一部が集積回路等のハードウェアによって構成されていてもよい。 The interactive device is a special device configured by reading a special program into a known or dedicated computer having a central processing unit (CPU), a main storage device (RAM), and the like, for example. It is. For example, the interactive device executes each process under the control of the central processing unit. The data input to the interactive device and the data obtained in each process are stored in the main storage device, for example, and the data stored in the main storage device is read out as necessary and used for other processing. . Further, at least a part of each processing unit of the interactive apparatus may be configured by hardware such as an integrated circuit.

入力部１は、ユーザの発話を対話システムが取得するためのインターフェースである。言い換えれば、入力部１は、ユーザが発話を対話システムへ入力するためのインターフェースである。例えば、入力部１はユーザの発話音声を収音して音声信号とするマイクロホンである。入力部１で収音したユーザの発話音声の音声信号は、音声認識部２へ入力される。 The input unit 1 is an interface for the dialog system to acquire the user's utterance. In other words, the input unit 1 is an interface for the user to input an utterance to the dialogue system. For example, the input unit 1 is a microphone that picks up a user's speech and uses it as an audio signal. The voice signal of the user's uttered voice collected by the input unit 1 is input to the voice recognition unit 2.

音声認識部２は、入力部１で収音したユーザの発話音声の音声信号をユーザの発話内容を表すテキストに変換する。ユーザの発話内容を表すテキストは、発話決定部３へ入力される。音声認識の方法は既存のいかなる音声認識技術を用いてもよく、利用環境等に合わせて最適なものを適宜選択すればよい。 The voice recognition unit 2 converts the voice signal of the user's uttered voice collected by the input unit 1 into text representing the content of the user's utterance. Text representing the user's utterance content is input to the utterance determination unit 3. Any existing speech recognition technology may be used as the speech recognition method, and an optimal method may be selected as appropriate in accordance with the usage environment.

発話決定部３は、入力されたユーザの発話内容を表すテキストに基づいて、対話システムからの発話内容を表すテキストを決定する。発話決定部３が決定した発話内容を表すテキストは音声合成部４へ入力される。また、対話システムが引き込み発話の代わりに非言語的な行動を行う場合には、発話決定部３は、入力されたユーザの発話内容を表すテキストに基づいて、対話システムからユーザへ提示する非言語的な行動の内容を表す情報を決定する。この場合、発話決定部３が決定した行動の内容を表す情報は提示部５へ入力される。 The utterance determination unit 3 determines the text representing the utterance content from the dialogue system based on the input text representing the utterance content of the user. Text representing the utterance content determined by the utterance determination unit 3 is input to the speech synthesis unit 4. Further, when the dialogue system performs non-verbal behavior instead of the pull-in utterance, the utterance determination unit 3 displays the non-language presented to the user from the dialogue system based on the input text representing the utterance content of the user. Information representing the content of typical actions. In this case, information indicating the content of the action determined by the utterance determination unit 3 is input to the presentation unit 5.

音声合成部４は、発話決定部３が決定した発話内容を表すテキストを、発話内容を表す音声信号に変換する。発話内容を表す音声信号は、提示部５へ入力される。音声合成の方法は既存のいかなる音声合成技術を用いてもよく、利用環境等に合わせて最適なものを適宜選択すればよい。 The speech synthesizer 4 converts the text representing the utterance content determined by the utterance determination unit 3 into a speech signal representing the utterance content. An audio signal representing the utterance content is input to the presentation unit 5. As a speech synthesis method, any existing speech synthesis technology may be used, and an optimum method may be selected as appropriate in accordance with the usage environment.

提示部５は、発話決定部３が決定した発話内容もしくは非言語的な行動をユーザへ提示するためのインターフェースである。例えば、提示部５は、人間の形を模して製作された人型ロボットである。この人型ロボットは、発話決定部３が決定した発話内容を表すテキストが音声合成部４により音声信号に変換された発話内容を表す音声信号を、例えば頭部に搭載したスピーカから発音する、すなわち、発話を提示する。また、この人型ロボットは、発話決定部３が決定した非言語的な行動の内容を表す情報に従って筺体を動作させることで非言語的な行動を実行する、すなわち、行動を提示する。提示部５を人型ロボットとした場合には、対話に参加する人格ごとに一台の人型ロボットを用意する。以下では、二人の人格が対話に参加する例として、二台の人型ロボット５−１および５−２が存在するものとする。 The presentation unit 5 is an interface for presenting the utterance content or non-linguistic behavior determined by the utterance determination unit 3 to the user. For example, the presentation unit 5 is a humanoid robot imitating a human shape. In this humanoid robot, a speech signal representing the utterance content obtained by converting the text representing the utterance content determined by the utterance determination unit 3 into an audio signal by the speech synthesizer 4 is pronounced from, for example, a speaker mounted on the head. Present an utterance. In addition, this humanoid robot executes a non-verbal action by operating a housing according to information representing the content of the non-verbal action determined by the utterance determination unit 3, that is, presents an action. When the presenting unit 5 is a humanoid robot, one humanoid robot is prepared for each personality participating in the dialogue. In the following, it is assumed that there are two humanoid robots 5-1 and 5-2 as an example in which two personalities participate in the dialogue.

入力部１は提示部５と一体として構成してもよい。例えば、提示部５を人型ロボットとした場合、人型ロボットの頭部にマイクロホンを搭載し、入力部１として利用することも可能である。 The input unit 1 may be configured integrally with the presentation unit 5. For example, when the presentation unit 5 is a humanoid robot, a microphone can be mounted on the head of the humanoid robot and used as the input unit 1.

以下、図２を参照して、第一実施形態の対話方法の処理手続きを説明する。 Hereinafter, with reference to FIG. 2, the processing procedure of the interactive method of the first embodiment will be described.

ステップＳ１１において、人型ロボット５−１は、ある発話である第一発話の内容を表す音声をスピーカから出力する。第一発話の内容を表すテキストは、発話決定部３が、例えば、あらかじめ定められ発話決定部３内の図示しない記憶部に記憶された定型文から任意に選択してもよいし、直前までの発話内容に応じて決定してもよい。直前までの発話内容に応じて発話内容を決定する技術は、従来の対話システムにおいて用いられているものを利用すればよく、例えば、非特許文献２に記載されたシナリオ対話システムや非特許文献３または４に記載された雑談対話システムなどを用いることができる。発話決定部３がシナリオ対話システムにおいて用いられている技術を用いる場合は、例えば、発話決定部３は、直前の５発話程度を含む対話について、各発話に含まれる単語や各発話を構成する焦点語と発話決定部３内の図示しない記憶部に記憶された各シナリオに含まれる単語や焦点語との単語間距離が所定の距離より近いシナリオを選択し、選択したシナリオに含まれるテキストを選択することにより第一発話の内容を表すテキストを決定する。発話決定部３が雑談対話システムにおいて用いられている技術を用いる場合は、発話決定部３は、例えば、ユーザの発話に含まれる単語をトリガーとして、あらかじめ記述して発話決定部３内の図示しない記憶部に記憶しておいたルールに従って第一発話の内容を表すテキストを決定してもよいし、ユーザの発話に含まれる単語と共起関係や係り受け関係にある単語に基づいて自動的にルールを生成し、そのルールに従って第一発話の内容を表すテキストを決定してもよい。 In step S11, the humanoid robot 5-1 outputs a voice representing the content of the first utterance which is a certain utterance from the speaker. The text representing the content of the first utterance may be arbitrarily selected by the utterance determination unit 3 from, for example, fixed phrases stored in a storage unit (not shown) in the utterance determination unit 3 in advance. It may be determined according to the utterance content. As a technique for determining the utterance contents according to the utterance contents up to immediately before, a technique used in a conventional dialogue system may be used. For example, the scenario dialogue system described in Non-Patent Document 2 and Non-Patent Document 3 Alternatively, the chat dialogue system described in 4 can be used. When the utterance determination unit 3 uses the technology used in the scenario dialogue system, for example, the utterance determination unit 3 includes the words included in each utterance and the focus constituting each utterance for a dialogue including about the last five utterances. Select a scenario in which the distance between words and words included in each scenario stored in a storage unit (not shown) in the utterance determination unit 3 is shorter than a predetermined distance, and select text included in the selected scenario By doing so, the text representing the content of the first utterance is determined. When the utterance determination unit 3 uses the technology used in the chat dialogue system, the utterance determination unit 3 is described in advance using a word included in the user's utterance as a trigger, for example, and is not shown in the utterance determination unit 3. The text representing the content of the first utterance may be determined according to the rules stored in the storage unit, or automatically based on words that are co-occurrence or dependency with words included in the user's utterance A rule may be generated, and a text representing the content of the first utterance may be determined according to the rule.

ステップＳ１２において、人型ロボット５−２は、第一発話に対するユーザの発話を取得するための条件に対応する行動（以下、引き込み行動と呼ぶ）を行う。引き込み行動は、上述した引き込み発話を含み、例えば、視線または身体の向きや手足の動作など非言語的な行動をも含む。引き込み行動の内容は、発話決定部３が、例えば、第一発話の内容と同様にして、あらかじめ定められ発話決定部３内の図示しない記憶部に記憶された定型的な行動の中から任意に選択してもよいし、直前までの発話内容に応じて決定してもよい。ユーザの発話を取得するための条件は、Ａ．ユーザの発話における非言語行動に関する条件と、Ｂ．ユーザの発話の内容に関する条件に分類することができる。Ａ．ユーザの発話における非言語行動に関する条件は、Ａ１．ユーザの発話のタイミングに関する条件、すなわち、音声認識部２がユーザの発話を受け付け可能となるタイミングよりも早くユーザが発話してしまうことを回避するための条件や、Ａ２．ユーザの発話の音量や方向に関する条件、すなわち、音声認識部２がユーザの発話を音声認識することが不可能な音量で入力部１がユーザの発話音声を収音することを回避するための条件などである。Ｂ．ユーザの発話の内容に関する条件は、音声認識部２がより高精度にユーザの発話を音声認識するため、もしくは、ユーザの発話の内容が実行中のシナリオで想定する範囲外となりシナリオの続行に失敗することを回避するための条件である。 In step S12, the humanoid robot 5-2 performs an action (hereinafter referred to as a pull-in action) corresponding to the condition for acquiring the user's utterance for the first utterance. The pull-in behavior includes the above-described pull-in utterance, and also includes non-verbal behavior such as gaze or body orientation and limb movement. The content of the pull-in action can be arbitrarily selected by the utterance determination unit 3 from the standard behaviors that are determined in advance and stored in a storage unit (not shown) in the utterance determination unit 3 in the same manner as the content of the first utterance, for example. It may be selected or may be determined according to the content of the utterance up to immediately before. The conditions for acquiring the user's utterance are: B. Conditions related to non-verbal behavior in user utterances; It can classify | categorize into the conditions regarding the content of a user's utterance. A. Conditions regarding non-verbal behavior in the user's utterance are A1. Conditions regarding the timing of the user's utterance, that is, conditions for avoiding that the user utters earlier than the timing at which the speech recognition unit 2 can accept the user's utterance, or A2. Conditions relating to the volume and direction of the user's utterance, that is, conditions for avoiding the input unit 1 collecting the user's uttered voice at a volume at which the voice recognition unit 2 cannot recognize the user's utterance. Etc. B. The condition related to the content of the user's utterance is that the speech recognition unit 2 recognizes the user's utterance with higher accuracy, or the content of the user's utterance is outside the range assumed in the running scenario, and the scenario cannot be continued. This is a condition for avoiding this.

Ａ１．ユーザの発話のタイミングに対応する行動は、具体的には、Ａ１−１．所望のタイミングの模範回答をロボットが先に行う、Ａ１−２．ユーザの発話が所望のタイミングになるようにロボットが視線を動かす、などが挙げられる。Ａ２．ユーザの発話の音量や方向に関する条件に対応する行動は、声が小さいユーザに対しては、より大きな音量での模範回答をロボットが先に行うなどが挙げられる。Ｂ．ユーザの発話の内容に対応する行動は、具体的には、Ｂ−１．発話の長さを所望の長さに制御した発話をロボットが先に行う、Ｂ−２．発話の詳細度を所望の水準に制御した発話をロボットが先に行う、Ｂ−３．文法の難易度を所望の水準に制御した発話をロボットが先に行う、Ｂ−４．発話中の固有名詞の有無を制御した発話をロボットが先に行う、Ｂ−５．発話の口語らしさの度合いを所望の水準に制御した発話をロボットが先に行う、などが挙げられる。 A1. Specifically, the action corresponding to the timing of the user's utterance is A1-1. The robot makes a model answer at a desired timing first, A1-2. For example, the robot moves the line of sight so that the user's utterance is at a desired timing. A2. The behavior corresponding to the conditions related to the volume and direction of the user's utterance may be that the robot makes a model answer at a higher volume first for a user whose voice is low. B. Specifically, the action corresponding to the content of the user's utterance is B-1. The robot first performs an utterance in which the length of the utterance is controlled to a desired length, B-2. The robot first performs an utterance in which the level of detail of the utterance is controlled to a desired level. B-3. The robot first makes an utterance in which the difficulty level of the grammar is controlled to a desired level. B-4. The robot first performs utterance in which the presence or absence of proper nouns in the utterance is controlled, B-5. For example, the robot first performs an utterance in which the degree of spokenness of the utterance is controlled to a desired level.

上記したユーザの発話を取得するための条件に対応する行動の具体例は、任意に組み合わせることが可能である。例えば、Ａ１．ユーザの発話のタイミングとＢ．ユーザの発話の内容との両方に対応する行動を行うものとして、Ｂ−１．発話の長さを所望の長さに制御した発話を、Ａ１−１．所望のタイミングでロボットが先に行ってもよい。また、例えば、Ｂ．ユーザの発話の内容に対応する行動を行うものとして、Ｂ−２．発話の詳細度とＢ−４．発話中の固有名詞の有無を同時に制御した発話をロボットが先に行ってもよい。 Specific examples of actions corresponding to the conditions for acquiring the user's utterance described above can be arbitrarily combined. For example, A1. B. User's utterance timing and As a thing which performs the action corresponding to both the contents of a user's utterance, B-1. An utterance in which the length of the utterance is controlled to a desired length is A1-1. The robot may perform first at a desired timing. For example, B.I. As a thing which performs the action corresponding to the content of the user's utterance, B-2. Level of utterance and B-4. The robot may first perform the utterance in which the presence / absence of proper nouns in the utterance is simultaneously controlled.

以下、ユーザの発話を取得するための条件に対応する行動について、具体例を挙げながら説明する。ここで、Ｒはロボットを表し、Ｈはユーザを表す。Ｒの後に続く数字は人型ロボットの識別子である。「Ｒ１」は人型ロボット５−１が発話することを表し、「Ｒ２」は人型ロボット５−２が発話することを表す。なお、人型ロボットが誰に話かける意図であるかは、例えば、人型ロボットの頭部や視線の動きにより表出するようにしてもよいし、表出しなくてもよい。 Hereinafter, an action corresponding to a condition for acquiring a user's utterance will be described with a specific example. Here, R represents a robot and H represents a user. The number following R is the identifier of the humanoid robot. “R1” represents that the humanoid robot 5-1 speaks, and “R2” represents that the humanoid robot 5-2 speaks. Note that who the humanoid robot intends to talk to may be expressed by, for example, the movement of the head or line of sight of the humanoid robot, or may not be expressed.

Ａ１−１．所望のタイミングの模範回答をロボットが先に行う場合の具体例を以下に示す。これは、音声認識部２が音声認識を開始するタイミングが遅くなることがあり、例えばユーザ発話の音声認識に失敗したり、ユーザ発話の先頭部分が欠けた音声認識結果となったりすることを避けるために行うものである。 A1-1. A specific example in which the robot makes a model answer at a desired timing first is shown below. This may cause the timing at which the voice recognition unit 2 starts voice recognition to be delayed, for example, to prevent voice recognition of a user utterance from failing or a voice recognition result lacking the beginning of the user utterance. Is what we do.

Ｒ１：「どんな食べ物が好きですか？」（※質問＝第一発話）
Ｒ２：「そば」（※模範回答＝行動）
Ｈ：「ラーメン」 R1: “What food do you like?” (* Question = first utterance)
R2: “Soba” (* Model answer = Action)
H: “Ramen”

Ａ１−２．所望のタイミングになるようにロボットが視線を動かす場合の具体例を以下に示す。これも、Ａ１−１と同様に、音声認識部２が音声認識を開始するタイミングが遅くなることによる問題を回避するために行うものである。 A1-2. A specific example in the case where the robot moves the line of sight so that the desired timing is reached is shown below. This is also performed in order to avoid a problem caused by the timing at which the speech recognition unit 2 starts speech recognition being delayed, as in A1-1.

Ｒ１：「どんな食べ物が好きですか？」（※質問＝第一発話）
Ｒ２：（ユーザへ視線を向ける）（※行動）
Ｈ：「ラーメン」 R1: “What food do you like?” (* Question = first utterance)
R2: (Look at the user) (* Behavior)
H: “Ramen”

上記の例では、ユーザへ視線を向けるロボットをＲ２としたが、Ｒ１がユーザへ視線を向ける行動を行ってもよいし、Ｒ１およびＲ２以外の他のロボットがユーザへ視線を向ける行動を行ってもよい。 In the above example, R2 is the robot that directs the line of sight to the user, but R1 may perform an action of directing the line of sight to the user, or another robot other than R1 and R2 may perform an action of directing the line of sight to the user. Also good.

Ｂ−１．発話の長さを所望の長さに制御した発話をロボットが先に行う場合の具体例を以下に示す。ユーザの発話が長過ぎたり短過ぎたりすると、音声認識部２の認識率が低下する場合がある。そのため、ユーザが適切な長さで発話するように引き込むために、ユーザが発話する前にロボットが所望の長さの模範回答を発話する。 B-1. A specific example in the case where the robot first performs an utterance in which the length of the utterance is controlled to a desired length is shown below. If the user's utterance is too long or too short, the recognition rate of the voice recognition unit 2 may decrease. Therefore, in order to draw in the user to speak at an appropriate length, the robot speaks an exemplary answer of a desired length before the user speaks.

以下は、従来どおり引き込むための行動を行わず、ユーザの発話が短過ぎるため対話に失敗する例である。 The following is an example in which a dialogue is unsuccessful because a user's utterance is too short without performing an action for pulling in as usual.

Ｒ１：「どんな食べ物が好きですか？」
Ｈ：「そば」（※ユーザの発話が一単語のみであるため、文脈情報が利用できず音声認識が困難である。） R1: “What kind of food do you like?”
H: “Soba” (* Since the user utters only one word, context information cannot be used and speech recognition is difficult.)

以下は、従来どおり引き込むための行動を行わず、ユーザの発話が長過ぎるため対話に失敗する例である。 The following is an example in which a dialogue is unsuccessful because the user's utterance is too long without performing an action for drawing as usual.

Ｒ１：「どんな食べ物が好きですか？」
Ｈ：「あー、最近だと●●店っていう城陽のほうにあるラーメン屋がだいぶおいしかったかなあ、だいぶ並んだけど。」（※ユーザの発話に含まれる単語が多すぎるため、すべての単語を誤りなく音声認識することは困難である。） R1: “What kind of food do you like?”
H: “Oh, it ’s a recent ramen shop in the area of Joyo, but it ’s pretty good, but it ’s lined up.” (* The user ’s utterance contains too many words, so all the words are wrong. And it is difficult to recognize voice.

以下は、ユーザの発話より先にロボットが模範回答を発話する例である。 The following is an example in which the robot utters an exemplary answer prior to the user's utterance.

Ｒ１：「どんな食べ物が好きですか？」
Ｒ２：「ラーメンが好きです。」
Ｈ：「そばが好きです。」（※ユーザの発話がロボットの模範回答に引き込まれて周辺単語が付加されるため認識率が向上する。） R1: “What kind of food do you like?”
R2: “I like ramen.”
H: “I like soba.” (* Since the user's utterance is drawn into the robot's model answer and surrounding words are added, the recognition rate improves.)

Ｂ−２．発話の詳細度を所望の水準に制御した発話をロボットが先に行う場合の具体例を以下に示す。ユーザの発話が詳細過ぎたり簡易過ぎたりすると、適切な返答が生成できない場合がある。そのため、ユーザが適切な詳細度で発話するように引き込むために、ユーザが発話する前にロボットが所望の詳細度で模範回答を発話する。 B-2. A specific example in the case where the robot first performs an utterance in which the level of detail of the utterance is controlled to a desired level is shown below. If the user's utterance is too detailed or too simple, an appropriate response may not be generated. Therefore, in order to draw the user to speak at an appropriate level of detail, the robot speaks an exemplary answer at the desired level of detail before the user speaks.

以下は、「今夜の予定は？」という発話に対して、従来どおり引き込むための行動を行わず、ユーザの発話が簡易過ぎるため対話に失敗する例である。 The following is an example in which the dialogue is unsuccessful because the user's utterance is too simple for the utterance “Tonight's schedule?”

Ｒ１：「今夜の予定は？」
Ｈ：「飲んで寝る」
Ｒ１：「水を飲みますか？」（※ユーザの発話の一部が省略されているため意味を正しく解釈できなかった。） R1: "What are your plans for tonight?"
H: “Drink and sleep”
R1: “Do you want to drink water?” (* Because part of the user's utterance is omitted, the meaning could not be correctly interpreted.)

以下は、従来どおり引き込むための行動を行わず、ユーザの発話が詳細過ぎるため対話に失敗する例である。 The following is an example in which a dialogue is unsuccessful because a user's utterance is too detailed without performing an action for drawing as usual.

Ｒ１：「今夜の予定は？」
Ｈ：「気分が沈みがちだからおねえちゃんのいる飲み屋に行ってパーっと遊ぶ」
Ｒ１：「どこに沈みますか？」（※ユーザの発話の話題の焦点がどこにあるか理解できなかった。） R1: "What are your plans for tonight?"
H: “Because I tend to sink, I go to a bar with a sister and play with a parlor”
R1: “Where do you sink?” (* I could not understand where the topic of the user's utterance was focused.)

Ｒ１：「今夜の予定は？」
Ｒ２：「映画館に映画を観に行きます。」
Ｈ：「飲み屋にお酒を飲みに行きます。」（※ユーザの発話がロボットの模範回答に引き込まれて、適切な粒度で話題を特定する単語が含まれているため、正しくユーザの発話を解釈することができる。） R1: "What are your plans for tonight?"
R2: “I will go to the cinema to see the movie.”
H: “I'm going to drink at a bar.” (* Because the user's utterance is drawn into the robot's model answer and contains words that identify the topic at an appropriate granularity, the user's utterance is correctly Can be interpreted.)

以下は、「この間旅行に行ってきたよ」という発話に対して、従来どおり引き込むための行動を行わず、ユーザの発話が簡易過ぎるため対話に失敗する例である。 The following is an example in which the dialogue fails because the user's utterance is too simple for the utterance “I went on a trip during this time” without performing the action for drawing in as usual.

Ｒ１：「この間旅行に行ってきたよ」
Ｈ：「どのあたり？」
Ｒ１：「あたりです」（※ユーザの発話が一般的な単語のみであり話題の焦点を見つけることができなかった。） R1: “I went on a trip during this time”
H: "Which area?"
R1: "It's around" (* User's utterance is only a general word and the focus of the topic could not be found.)

Ｒ１：「この間旅行に行ってきたよ」
Ｈ：「私はサーリセルカに行きました」
Ｒ１：（沈黙）（※ユーザの発話の話題が詳細過ぎるため適切な返答を生成できなかった。） R1: “I went on a trip during this time”
H: “I went to Saariselka”
R1: (silence) (* Since the topic of the user's utterance is too detailed, an appropriate response could not be generated.)

Ｒ１：「この間旅行に行ってきたよ」
Ｒ２（→Ｒ１）：「アメリカに行ったんだっけ？」
Ｒ１（→Ｈ）：「うん。あなたはどこか行った？」
Ｈ：「フィンランドに行ったよ」（※ユーザの発話の話題が適度に詳細であり返答を生成できる。） R1: “I went on a trip during this time”
R2 (→ R1): “Did you go to America?”
R1 (→ H): “Yeah. Where have you gone?”
H: “I went to Finland” (* The topic of the user's utterance is reasonably detailed and can generate a response.)

Ｂ−３．文法の難易度を所望の水準に制御した発話をロボットが先に行う場合の具体例を以下に示す。ユーザの発話が所望の文法でないと、適切な返答が生成できない場合がある。そのため、ユーザが所望の難易度の文法で発話するように引き込むために、ユーザが発話する前にロボットが所望の文法で模範回答を発話する。 B-3. A specific example in the case where the robot first makes an utterance in which the difficulty level of the grammar is controlled to a desired level is shown below. If the user's utterance does not have the desired grammar, an appropriate response may not be generated. Therefore, in order for the user to speak in the grammar of the desired difficulty level, the robot utters the model answer with the desired grammar before the user speaks.

以下は、述語項構造を発話生成のキーとする場合の例である。模範回答により引き込みを行わないと、上述の詳細過ぎる例のように発話が崩れてしまい、ユーザの発話内容を解釈できない場合がある。なお、下記の例において、NPは名詞句を、Adjは形容詞句を、VPは動詞句を表す。 The following is an example where the predicate term structure is used as a key for utterance generation. If the user does not draw in the model answer, the utterance may be broken as in the above-described example, and the user's utterance content may not be interpreted. In the following example, NP represents a noun phrase, Adj represents an adjective phrase, and VP represents a verb phrase.

Ｒ１：「どんな食べ物が好きですか？」
Ｒ２：「ぼくは（NP）/あっさりした（Adj）/ラーメンが（NP）/好きです（VP）」
Ｈ：「私は（NP）/さっぱりした（Adj）/そばが（NP）/好きです（VP）」 R1: “What kind of food do you like?”
R2: “I am (NP) / Slightly (Adj) / Ramen is (NP) / I like (VP)”
H: “I am (NP) / fresh (Adj) / Soba is (NP) / I like (VP)”

以下は、名詞を発話生成のキーとする場合の例である。 The following is an example where nouns are used as utterance generation keys.

Ｒ１：「どんな食べ物が好きですか？」
Ｒ２：「あっさりした（Adj）/ラーメン（NP）」
Ｈ：「さっぱりした（Adj）/そばかな（NP）」 R1: “What kind of food do you like?”
R2: “Easy (Adj) / Ramen (NP)”
H: “Refreshing (Adj) / Soba kana (NP)”

Ｂ−４．発話中の固有名詞の有無を制御した発話をロボットが先に行う場合の具体例を以下に示す。ユーザの発話に固有名詞が含まれると、話題を容易に同定することができるため、以降の対話を扱いやすい場合が多い。 B-4. A specific example in the case where the robot first utters the controlled presence or absence of proper nouns during utterance is shown below. When proper nouns are included in the user's utterance, the topic can be easily identified, so that the subsequent dialogue is often easy to handle.

以下は、固有名詞をなしとする場合の例である。 The following is an example when there is no proper noun.

Ｒ１：「どんなラーメンが好き？」
Ｒ２：「ぼくはあっさりしたのが好きかな」
Ｈ：「私はこってりかな」 R1: “What kind of ramen do you like?”
R2: “Do you like being light?”
H: “I ’m so tired”

以下は、固有名詞をありとする場合の例である。 The following is an example where there are proper nouns.

Ｒ１：「どんなラーメンが好き？」
Ｒ２：「ぼくは●●店のこってりが好きだよ」
Ｈ：「私は▲▲店とかが好きかな」 R1: “What kind of ramen do you like?”
R2: “I like ●
H: “I like ▲▲ shops”

Ｂ−５．発話の口語らしさの度合いを所望の水準に制御した発話をロボットが先に行う場合の具体例を以下に示す。ここで、「口語らしさ」とは、例えば、助詞の欠落、語尾の変化、多義語の増加、口語的間投詞や副詞などの増加、などが挙げられる。口語らしさの度合いが低いほど音声認識や発話理解の精度が上がる。一方、口語らしさの度合いが高いほどユーザに対してフランクな印象を与えることができる。 B-5. A specific example in the case where the robot first performs an utterance in which the degree of spokenness of the utterance is controlled to a desired level is shown below. Here, “spokenness” includes, for example, missing particles, changes in endings, increases in polysemy, increases in colloquial interjections and adverbs, and the like. The lower the degree of spokenness, the higher the accuracy of speech recognition and speech understanding. On the other hand, the higher the degree of spokenness, the more frank impression can be given to the user.

以下は、口語らしさの度合いが低い場合の例である。 The following is an example when the degree of colloquialism is low.

Ｒ１：「どんなラーメンが好きですか？」
Ｒ２：「ぼくはあっさりしたラーメンが好きです」
Ｈ：「私はこってりしたラーメンが好きです」 R1: “What kind of ramen do you like?”
R2: “I like light noodles”
H: “I like thick ramen”

以下は、口語らしさの度合いが高い場合の例である。 The following is an example when the degree of colloquialism is high.

Ｒ１：「どんなラーメンが好き？」
Ｒ２：「あっさりしたのとか好きかなあ」
Ｈ：「まあやっぱりこってりかな」 R1: “What kind of ramen do you like?”
R2: “I like it lightly or I like it”
H: “Well, it ’s all right”

後者の例では、ロボットＲ２の発話は、助詞「が」の欠落、口語的な語尾、「ラーメン」を「の」に置き換えなどが含まれ、口語らしさの度合いが高く、ユーザＨの発話も、間投詞「まあ」の増加、比較副詞「やっぱ」の増加、口語的な語尾、「ラーメン」を「の」に置き換えなどが含まれ、口語らしさの度合いが高い。 In the latter example, the utterance of the robot R2 includes a missing particle “ga”, a colloquial ending, “replacement of“ ramen ”with“ no ”, and the like, and the utterance of the user H is high. The increase in interjection “Well”, the increase in comparative adverb “Yappa”, colloquial endings, “Ramen” replaced with “no”, etc., and the degree of colloquialism is high.

ステップＳ１３において、マイクロホン１は、引き込み行動の後にユーザが発した発話を受け付ける。以下、この発話をユーザ発話と呼ぶ。マイクロホン１で収音したユーザ発話の音声信号は、音声認識部２により認識され、音声認識結果として得られたテキストがユーザ発話の内容を表すテキストとして発話決定部３へ入力される。 In step S <b> 13, the microphone 1 accepts an utterance uttered by the user after the pull-in action. Hereinafter, this utterance is referred to as user utterance. The voice signal of the user utterance collected by the microphone 1 is recognized by the voice recognition unit 2, and the text obtained as a voice recognition result is input to the utterance determination unit 3 as text representing the content of the user utterance.

以降は、ユーザと対話システムとの間でユーザ発話の内容を話題とした対話が続行される。例えば、シナリオ対話システムにおいて用いられている技術により選択したシナリオに沿った対話がユーザと対話システムとの間で実行されるように、対話システムは、シナリオ対話システムにおいて用いられている技術により決定したシナリオ発話の発話内容を表す音声をスピーカから出力する。また、例えば、対話システムは、ユーザの発話に基づいて雑談対話システムにおいて用いられている技術により決定した雑談発話の発話内容を表す音声をスピーカから出力する。以降の発話を行うロボットは、何れか１つの人型ロボットであっても複数の人型ロボットであってもよい。 Thereafter, the conversation on the content of the user utterance is continued between the user and the dialog system. For example, the dialog system is determined by the technology used in the scenario dialog system so that a dialog according to the scenario selected by the technology used in the scenario dialog system is executed between the user and the dialog system. A voice representing the utterance content of the scenario utterance is output from the speaker. Further, for example, the dialogue system outputs, from a speaker, a voice representing the utterance content of the chat utterance determined by the technology used in the chat dialogue system based on the user's utterance. The robot that performs subsequent utterances may be any one humanoid robot or a plurality of humanoid robots.

＜第二実施形態＞
第一実施形態では、対話システムがユーザの発話を正確に理解できるように、引き込み現象を利用して、ユーザの発話を対話システムが取得するための条件を満たす範囲に引き込むように構成した。第二実施形態では、引き込み現象を利用せずに、ユーザの発話を所望の範囲に限定させる構成を説明する。ユーザの発話を対話システムが想定する範囲に限定させることができれば、対話システムはユーザの発話に対して適切な応答をできる可能性が高くなる。例えば、ユーザが必ず肯定または否定（「Yes / No」）を発話するようにできれば、対話システムはユーザの発話に対して必ず適切な応答をすることができる。 <Second embodiment>
In the first embodiment, the user's utterance is drawn into a range that satisfies the conditions for acquiring the dialog system using the pull-in phenomenon so that the dialog system can accurately understand the user's utterance. In the second embodiment, a configuration for limiting the user's utterance to a desired range without using the pull-in phenomenon will be described. If the user's utterance can be limited to the range assumed by the dialog system, the dialog system is likely to be able to respond appropriately to the user's utterance. For example, if the user can always make an affirmative or negative (“Yes / No”) speech, the dialogue system can always respond appropriately to the user's speech.

以下、図３を参照して、第二実施形態の対話方法の処理手続きを説明する。 Hereinafter, with reference to FIG. 3, the processing procedure of the interactive method of the second embodiment will be described.

ステップＳ２１において、マイクロホン１は、ユーザが発した発話を受け付ける。以下、この発話を第一ユーザ発話と呼ぶ。マイクロホン１で収音した第一ユーザ発話の音声信号は、音声認識部２により認識され、音声認識結果として得られたテキストが第一ユーザ発話の内容を表すテキストとして発話決定部３へ入力される。 In step S21, the microphone 1 accepts an utterance uttered by the user. Hereinafter, this utterance is referred to as a first user utterance. The voice signal of the first user utterance collected by the microphone 1 is recognized by the voice recognition unit 2, and the text obtained as a voice recognition result is input to the utterance determination unit 3 as text representing the contents of the first user utterance. .

ステップＳ２２において、人型ロボット５−１は、第一ユーザ発話の内容を表すテキストに基づいて発話決定部３が決定した発話の内容を表す音声をスピーカから出力する。以下、この発話を限定発話と呼ぶ。限定発話は、ユーザの発話を所望の範囲に限定させるための発話である。所望の範囲としては、例えば、Ｃ−１．ユーザの発話を相槌に限定させる、Ｃ−２．ユーザの発話を肯定または否定（例えば、「Yes / No」）に限定させる、などが挙げられる。 In step S <b> 22, the humanoid robot 5-1 outputs a voice representing the content of the speech determined by the speech determining unit 3 based on the text representing the content of the first user speech from the speaker. Hereinafter, this utterance is called a limited utterance. The limited utterance is an utterance for limiting the user's utterance to a desired range. As a desired range, for example, C-1. Limit the user's utterances to the other side, C-2. For example, the user's utterance is limited to affirmation or denial (eg, “Yes / No”).

以下、ユーザの発話を所望の範囲に限定させるための発話について、具体例を挙げながら詳細に説明する。具体例の表記方法については第一実施形態と同様である。なお、具体例における※１は第一ユーザ発話に相当し、※２は限定発話に相当する。 Hereinafter, an utterance for limiting a user's utterance to a desired range will be described in detail with a specific example. About the notation method of a specific example, it is the same as that of 1st embodiment. In the specific example, * 1 corresponds to the first user utterance and * 2 corresponds to the limited utterance.

Ｃ−１．ユーザの発話を相槌に限定させる場合の具体例を以下に示す。例えば、第一ユーザ発話の内容を表す語を含み、第一ユーザ発話の内容を確認する質問を限定発話として発話することで、ユーザが相槌を返す可能性が高まる。 C-1. A specific example in the case where the user's utterance is limited to the companion is shown below. For example, by including a word representing the content of the first user utterance and speaking a question that confirms the content of the first user utterance as a limited utterance, the user is more likely to return a conflict.

Ｒ：「何が好き？」
Ｈ：「読書が好き」（※１）
Ｒ：「本を読むのが好きなんだね」（※２）
Ｈ：「うん」 R: “What do you like?”
H: “I like reading” (* 1)
R: “I like reading books” (* 2)
H: “Yes”

Ｃ−２．ユーザの発話を肯定または否定に限定させる場合の具体例を以下に示す。例えば、第一ユーザ発話の内容に関連する語を含むクローズ質問を限定発話とすることで、ユーザは肯定または否定で回答する可能性が高まる。なお、クローズ質問とは、例えば「Yes / No」や「A or B or C」のように回答範囲が限定される質問である。逆に、いわゆる５Ｗ１Ｈ（いつ（When）、どこで（Where）、だれが（Who）、なにを（What）、なぜ（Why）、どのように（How））のように自由に回答できる質問はオープン質問と呼ばれる。 C-2. A specific example in the case where the user's utterance is limited to positive or negative is shown below. For example, by setting a closed question including a word related to the content of the first user utterance as a limited utterance, the user has a higher possibility of answering with affirmative or negative. The closed question is a question whose answer range is limited, for example, “Yes / No” or “A or B or C”. On the other hand, questions that can be answered freely, such as the so-called 5W1H (When, Where, Who, What, Why, How) Called open questions.

Ｒ：「何が好き？」
Ｈ：「読書が好き」（※１）
Ｒ：「漫画を読むのも好き？」（※２）
Ｈ：「うん」 R: “What do you like?”
H: “I like reading” (* 1)
R: “Do you like reading comics?” (* 2)
H: “Yes”

ステップＳ２３において、マイクロホン１は、限定発話の後にユーザが発した発話を受け付ける。以下、この発話を第二ユーザ発話と呼ぶ。マイクロホン１で収音した第二ユーザ発話の音声信号は、音声認識部２により認識され、音声認識結果として得られたテキストが第二ユーザ発話の内容を表すテキストとして発話決定部３へ入力される。 In step S23, the microphone 1 accepts an utterance uttered by the user after the limited utterance. Hereinafter, this utterance is referred to as a second user utterance. The voice signal of the second user utterance collected by the microphone 1 is recognized by the voice recognition unit 2, and the text obtained as a voice recognition result is input to the utterance determination unit 3 as text representing the content of the second user utterance. .

以降は、ユーザと対話システムとの間で第二ユーザ発話の内容を話題とした対話が続行される。例えば、シナリオ対話システムにおいて用いられている技術により選択したシナリオに沿った対話がユーザと対話システムとの間で実行されるように、対話システムは、シナリオ対話システムにおいて用いられている技術により決定したシナリオ発話の発話内容を表す音声をスピーカから出力する。また、例えば、対話システムは、ユーザの発話に基づいて雑談対話システムにおいて用いられている技術により決定した雑談発話の発話内容を表す音声をスピーカから出力する。以降の発話を行うロボットは、何れか１つの人型ロボットであっても複数の人型ロボットであってもよい。 Thereafter, the conversation on the content of the second user utterance is continued between the user and the dialogue system. For example, the dialog system is determined by the technology used in the scenario dialog system so that a dialog according to the scenario selected by the technology used in the scenario dialog system is executed between the user and the dialog system. A voice representing the utterance content of the scenario utterance is output from the speaker. Further, for example, the dialogue system outputs, from a speaker, a voice representing the utterance content of the chat utterance determined by the technology used in the chat dialogue system based on the user's utterance. The robot that performs subsequent utterances may be any one humanoid robot or a plurality of humanoid robots.

＜変形例＞
上述した実施形態では、エージェントとしてロボットを用いて音声による対話を行う例を説明したが、上述した実施形態のロボットは身体等を有する人型ロボットであっても、身体等を有さないロボットであってもよい。また、この発明の対話技術はこれらに限定されず、人型ロボットのように身体等の実体がなく、発声機構を備えないエージェントを用いて対話を行う形態とすることも可能である。そのような形態としては、例えば、コンピュータの画面上に表示されたエージェントを用いて対話を行う形態が挙げられる。より具体的には、「LINE」（登録商標）や「２ちゃんねる」（登録商標）のような、複数アカウントがテキストメッセージにより対話を行うグループチャットにおいて、ユーザのアカウントと対話装置のアカウントとが対話を行う形態に適用することも可能である。この形態では、エージェントを表示する画面を有するコンピュータは人の近傍にある必要があるが、当該コンピュータと対話装置とはインターネットなどのネットワークを介して接続されていてもよい。つまり、本対話システムは、人とロボットなどの話者同士が実際に向かい合って話す対話だけではなく、話者同士がネットワークを介してコミュニケーションを行う会話にも適用可能である。 <Modification>
In the embodiment described above, an example in which a robot is used as an agent to perform a voice conversation has been described. However, the robot in the embodiment described above is a humanoid robot having a body or the like, but a robot having no body or the like. There may be. In addition, the dialogue technique of the present invention is not limited to these, and it is also possible to adopt a form in which a dialogue is performed using an agent that does not have an entity such as a human body and does not have an utterance mechanism like a humanoid robot. As such a form, for example, a form in which dialogue is performed using an agent displayed on a computer screen can be cited. More specifically, in a group chat in which multiple accounts interact by text messages, such as “LINE” (registered trademark) and “2 channel” (registered trademark), the user's account and the dialog device account interact. It is also possible to apply to the form which performs. In this form, the computer having the screen for displaying the agent needs to be in the vicinity of the person, but the computer and the interactive device may be connected via a network such as the Internet. That is, this dialogue system can be applied not only to a dialogue in which speakers such as a person and a robot actually talk each other but also to a conversation in which the speakers communicate via a network.

変形例の対話システム２０は、図４に示すように、入力部１、発話決定部３、および提示部５を備える。図４の例では、変形例の対話システム２０は１台の対話装置２１からなり、変形例の対話装置２１は、入力部１、発話決定部３、および提示部５を備える。 As shown in FIG. 4, the interactive system 20 according to the modification includes an input unit 1, an utterance determination unit 3, and a presentation unit 5. In the example of FIG. 4, the interactive system 20 according to the modification includes a single interactive device 21, and the interactive device 21 according to the modified example includes an input unit 1, an utterance determination unit 3, and a presentation unit 5.

変形例の対話装置は、例えば、スマートフォンやタブレットのようなモバイル端末、もしくはデスクトップ型やラップトップ型のパーソナルコンピュータなどの情報処理装置である。以下、対話装置がスマートフォンであるものとして説明する。提示部５はスマートフォンが備える液晶ディスプレイである。この液晶ディスプレイにはチャットアプリケーションのウィンドウが表示され、ウィンドウ内にはグループチャットの対話内容が時系列に表示される。グループチャットとは、チャットにおいて複数のアカウントが互いにテキストメッセージを投稿し合い対話を展開する機能である。このグループチャットには、対話装置が制御する仮想的な人格に対応する複数の仮想アカウントと、ユーザのアカウントとが参加しているものとする。すなわち、本変形例は、エージェントが、対話装置であるスマートフォンの液晶ディスプレイに表示された仮想アカウントである場合の一例である。ユーザはソフトウェアキーボードを用いて入力部１へ発話内容を入力し、自らのアカウントを通じてグループチャットへ投稿することができる。発話決定部３はユーザのアカウントからの投稿に基づいて対話装置からの発話内容を決定し、各仮想アカウントを通じてグループチャットへ投稿する。なお、スマートフォンに搭載されたマイクロホンと音声認識機能を用い、ユーザが発声により入力部１へ発話内容を入力する構成としてもよい。また、スマートフォンに搭載されたスピーカと音声合成機能を用い、各対話システムから得た発話内容を、各仮想アカウントに対応する音声でスピーカから出力する構成としてもよい。 The interactive apparatus of the modified example is an information processing apparatus such as a mobile terminal such as a smartphone or a tablet, or a desktop or laptop personal computer. In the following description, it is assumed that the interactive device is a smartphone. The presentation unit 5 is a liquid crystal display included in the smartphone. A chat application window is displayed on the liquid crystal display, and conversation contents of the group chat are displayed in time series in the window. The group chat is a function in which a plurality of accounts post a text message to each other and develop a conversation in the chat. It is assumed that a plurality of virtual accounts corresponding to a virtual personality controlled by the dialogue apparatus and a user account participate in this group chat. That is, this modification is an example in which the agent is a virtual account displayed on a liquid crystal display of a smartphone that is an interactive device. The user can input the utterance content to the input unit 1 using the software keyboard and post it to the group chat through his / her account. The utterance determination unit 3 determines the utterance content from the dialogue device based on the posting from the user's account, and posts it to the group chat through each virtual account. In addition, it is good also as a structure which inputs the utterance content to the input part 1 by a utterance using the microphone and voice recognition function which were mounted in the smart phone. Moreover, it is good also as a structure which outputs the utterance content obtained from each dialog system from the speaker with the audio | voice corresponding to each virtual account, using the speaker and speech synthesis function which were mounted in the smart phone.

上記のように構成することにより、この発明の対話技術によれば、ユーザの発話の前に対話システムが発話を取得するための条件に対応する行動を行うことで、ユーザの発話を対話システムが取得するための条件を満たす範囲に引き込むことができ、ユーザは対話システムとの対話を長く続けることができるようになる。 With the above-described configuration, according to the dialogue technique of the present invention, the dialogue system performs the action corresponding to the condition for the dialogue system to acquire the utterance before the user utterance, thereby The user can be drawn into a range that satisfies the conditions for acquisition, and the user can continue to interact with the dialog system for a long time.

以上、この発明の実施の形態について説明したが、具体的な構成は、これらの実施の形態に限られるものではなく、この発明の趣旨を逸脱しない範囲で適宜設計の変更等があっても、この発明に含まれることはいうまでもない。実施の形態において説明した各種の処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 As described above, the embodiments of the present invention have been described, but the specific configuration is not limited to these embodiments, and even if there is a design change or the like as appropriate without departing from the spirit of the present invention, Needless to say, it is included in this invention. The various processes described in the embodiments are not only executed in time series according to the description order, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes.

［プログラム、記録媒体］
上記実施形態で説明した対話装置における各種の処理機能をコンピュータによって実現する場合、対話装置が有すべき機能の処理内容はプログラムによって記述される。また、上記変形例で説明した対話システムにおける各種の処理機能をコンピュータによって実現する場合、対話システムが有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記対話装置および対話システムにおける各種の処理機能がコンピュータ上で実現される。 [Program, recording medium]
When various processing functions in the interactive device described in the above embodiment are realized by a computer, the processing contents of the functions that the interactive device should have are described by a program. When various processing functions in the interactive system described in the above modification are realized by a computer, the processing contents of the functions that the interactive system should have are described by a program. Then, by executing this program on a computer, various processing functions in the interactive device and the interactive system are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 This program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるASP（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. A configuration in which the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes a processing function only by an execution instruction and result acquisition without transferring a program from the server computer to the computer. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, the present apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

１入力部
２音声認識部
３発話決定部
４音声合成部
５提示部
１０、２０対話システム
１１、２１対話装置 DESCRIPTION OF SYMBOLS 1 Input part 2 Speech recognition part 3 Utterance determination part 4 Speech synthesis part 5 Presentation part 10, 20 Dialog system 11, 21 Dialogue device

Claims

A dialogue method performed by a dialogue system to acquire a user's utterance for a first utterance which is a certain utterance,
A first presentation step in which the presenting unit presents the first utterance;
A second presentation step in which the presenting unit presents an action corresponding to a condition for acquiring the user's utterance for the first utterance;
A response accepting step in which the input unit accepts an utterance made by the user after the action;
Interactive method including

The dialogue method according to claim 1,
The first presentation step and the second presentation step are executed with a predetermined personality.
How to interact.

The interactive method according to claim 1 or 2,
The above conditions are conditions that prompt the user to speak,
The above action is an action that prompts the user to speak.
How to interact.

The dialogue method according to claim 3,
The above action is an utterance that prompts the user to speak,
How to interact.

A dialogue method according to any one of claims 1 to 4,
The presentation unit further includes a third presentation step of performing an action corresponding to a condition for acquiring the user's utterance for the first utterance after the second presentation step,
How to interact.

A dialogue method according to any one of claims 1 to 5,
The first presentation step is performed by a first personality that is a certain personality,
The above conditions include conditions corresponding to at least one of the length of the utterance, the level of detail of the utterance, the difficulty of the grammar of the utterance, the presence or absence of proper nouns included in the utterance, and the degree of spokenness of the utterance,
The action includes presenting an utterance corresponding to the above condition with a second personality that is different from the first personality,
How to interact.

A dialogue method according to any one of claims 1 to 5,
The first presentation step is performed by a first personality that is a certain personality,
The above conditions include conditions corresponding to non-verbal behavior of utterances,
The action includes presenting an utterance corresponding to the above condition with a second personality that is different from the first personality,
How to interact.

A dialogue method according to any one of claims 1 to 5,
The above conditions include conditions corresponding to non-verbal behavior of utterances,
The action includes an action of prompting the user to speak so that the user starts speaking after the time when the input unit can accept the user's utterance with respect to the first utterance,
How to interact.

A dialogue method according to claim 8, comprising:
The action includes an action in which the robot directs a line of sight to the user at or just before the time when the input unit can accept the user's utterance with respect to the first utterance.
How to interact.

An interactive system for acquiring a user's utterance for a first utterance that is a certain utterance,
An utterance determination unit for determining the first utterance and an action corresponding to a condition for acquiring the user's utterance for the first utterance;
A presentation unit that presents the first utterance determined by the utterance determination unit and presents the behavior determined by the utterance determination unit after presenting the first utterance;
An input unit for receiving an utterance made by the user after the action;
Interactive system including

An interactive device for determining an utterance to be presented by an interactive system that obtains a user's utterance with respect to a first utterance that is a certain utterance, including at least an input unit that receives the user's utterance and a presentation unit that presents the utterance and action ,
An utterance determination unit that determines the first utterance and an action to be presented after presenting the first utterance and corresponding to a condition for acquiring the user's utterance for the first utterance Interactive device.

A program for causing a computer to execute each step of the interactive method according to claim 1.

A program for causing a computer to function as the interactive device according to claim 11.