JPWO2017200076A1

JPWO2017200076A1 - Dialogue method, dialogue system, dialogue apparatus, and program

Info

Publication number: JPWO2017200076A1
Application number: JP2018518375A
Authority: JP
Inventors: 弘晃杉山; 豊美目黒; 淳司大和; 雄一郎吉川; 石黒　浩; 浩石黒
Original assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Current assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Priority date: 2016-05-20
Filing date: 2017-05-19
Publication date: 2018-12-13
Anticipated expiration: 2037-05-19
Also published as: WO2017200076A1; JP6699010B2

Abstract

応答感を向上し、かつ、対話を長く続ける。対話システム１０は、ユーザの発話を受け付ける入力部１と、発話を提示する提示部５とを少なくとも含む。提示部５−１または提示部５−２は、オープン発話を提示する。入力部１は、オープン発話に対するユーザの発話を受け付ける。提示部５−１は、少なくともユーザの発話内容に基づいた発話を決定する雑談対話システム６がユーザの発話に基づいて決定した発話を提示する。提示部５−２は、予め記憶されたシナリオに基づいた発話を決定するシナリオ対話システム７が決定した発話を雑談対話システム６が決定した発話の後に提示する。 Improve responsiveness and keep the dialogue going for a long time. The dialogue system 10 includes at least an input unit 1 that receives a user's utterance and a presentation unit 5 that presents the utterance. The presentation unit 5-1 or the presentation unit 5-2 presents an open utterance. The input unit 1 accepts a user utterance for an open utterance. The presentation unit 5-1 presents the utterance determined based on the user's utterance by the chat dialogue system 6 that determines at least the utterance based on the utterance content of the user. The presentation unit 5-2 presents the utterance determined by the scenario dialog system 7 that determines the utterance based on the scenario stored in advance after the utterance determined by the chat conversation system 6.

Description

この発明は、人とコミュニケーションを行うロボットなどに適用可能な、コンピュータが人間と自然言語を用いて対話を行う技術に関する。 The present invention relates to a technology in which a computer interacts with a human using a natural language, which can be applied to a robot that communicates with a human.

近年、人とコミュニケーションを行うロボットの研究開発が進展しており、様々な現場で実用化されてきている。例えば、コミュニケーションセラピーの現場において、ロボットが孤独感を抱える人の話し相手となる利用形態がある。具体的には、老人介護施設においてロボットが入居者の傾聴役となることで、入居者の孤独感を癒す役割を担うことができると共に、ロボットとの会話している姿を見せ、入居者とその家族や介護士など周りの人々との会話のきっかけを作ることができる。また、例えば、コミュニケーション訓練の現場において、ロボットが練習相手となる利用形態がある。具体的には、外国語学習施設においてロボットが外国語学習者の練習相手となることで、外国語学習を効率的に進めることができる。また、例えば、情報提示システムとしての応用において、ロボット同士の対話を聞かせることを基本としながら、時折人に話しかけることで、退屈させずに人を対話に参加させ、人が受け入れやすい形で情報を提示することができる。具体的には、街中の待ち合わせ場所やバス停、駅のホームなどで人が時間を持て余している際や、自宅や教室などで対話に参加する余裕がある際に、ニュースや商品紹介、蘊蓄・知識紹介、教育（例えば、子供の保育・教育、大人への一般教養教授、モラル啓発など）など、効率的な情報提示が期待できる。さらに、例えば、情報収集システムとしての応用において、ロボットが人に話しかけながら情報を収集する利用形態がある。ロボットとのコミュニケーションにより対話感を保持できるため、人に聴取されているという圧迫感を与えずに情報収集することができる。具体的には、個人情報調査や市場調査、商品評価、推薦商品のための趣向調査などに応用することが想定されている。このように人とロボットのコミュニケーションは様々な応用が想定されており、ユーザとより自然に対話を行うロボットの実現が期待される。また、スマートフォンの普及により、LINE(登録商標)のように、複数ユーザでほぼリアルタイムにチャットを行うことにより、人との会話を楽しむサービスも実施されている。このチャットサービスにロボットとの会話の技術を適用すれば、チャット相手がいなくても、ユーザとより自然に対話を行うチャットサービスの実現が可能となる。 In recent years, research and development of robots that communicate with people have progressed and have been put to practical use in various fields. For example, in the field of communication therapy, there is a usage form in which a robot is a conversation partner of a person who is lonely. Specifically, in a nursing home for the elderly, the robot can play a role of listening to the resident, so he can play a role in healing the loneliness of the resident and show a conversation with the robot. You can create conversation opportunities with the family and caregivers. Further, for example, there is a usage form in which a robot is a practice partner in a communication training field. Specifically, the foreign language learning can be efficiently advanced by having the robot become a practice partner of the foreign language learner at the foreign language learning facility. Also, for example, in application as an information presentation system, it is basic to let robots talk to each other, but by talking to people from time to time, people can participate in the conversation without being bored, and information that is easy for people to accept Can be presented. Specifically, news, product introductions, accumulation / knowledge when people have time in meeting places in the city, bus stops, station platforms, etc. or when there is room to participate in dialogues at home or in classrooms. Efficient information presentation such as introduction and education (for example, childcare / education for children, general education professor for adults, moral education, etc.) can be expected. Furthermore, for example, in application as an information collection system, there is a utilization form in which a robot collects information while talking to a person. Since communication can be maintained through communication with the robot, information can be collected without giving a sense of pressure that people are listening. Specifically, it is assumed to be applied to personal information surveys, market surveys, product evaluations, preference surveys for recommended products, and the like. As described above, various applications of communication between humans and robots are assumed, and it is expected to realize a robot that can more naturally interact with users. In addition, with the spread of smartphones, services such as LINE (registered trademark) that allow users to enjoy conversations with people by chatting in almost real time are also being implemented. If the technology of conversation with the robot is applied to this chat service, it becomes possible to realize a chat service for more natural dialogue with the user even when there is no chat partner.

本明細書では、これらのサービスで用いられるロボットやチャット相手などのユーザとの対話相手となるハードウェアやユーザとの対話相手となるハードウェアとしてコンピュータを機能させるためのコンピュータソフトウェアなどを総称してエージェントと呼ぶこととする。エージェントは、ユーザとの対話相手となるものであるため、ロボットやチャット相手などのように擬人化されていたり、人格化されていたり、性格や個性を有していたりするものであってもよい。 In this specification, the hardware used as a conversation partner with a user such as a robot and a chat partner used in these services, and the computer software for causing the computer to function as the hardware as a conversation partner with the user are collectively referred to. It will be called an agent. Since the agent is a conversation partner with the user, the agent may be anthropomorphic, personalized, or have personality or individuality, such as a robot or a chat partner.

これらのサービスの実現のキーとなるのは、ハードウェアやコンピュータソフトウェアにより実現されるエージェントが人間と自然に対話を行うことができる技術である。 The key to the realization of these services is a technology that enables agents realized by hardware and computer software to naturally interact with humans.

上記のエージェントの一例として、あらかじめ定められたシナリオに沿って特定の話題についてユーザと対話を行うシナリオ対話システムがある。シナリオ対話システムでは、シナリオに沿って対話が展開する限り対話を続けることが可能である。例えば、非特許文献１に記載された対話システムは、ユーザと複数のエージェント間で、エージェントによる割り込みやエージェント同士のやり取りを含めながら対話を行うシステムである。例えば、エージェントは、ユーザに対してシナリオに用意された質問を発話し、質問に対するユーザの回答の発話がシナリオに用意された選択肢に対応する場合にその選択肢に対応する発話を行うように機能する。すなわち、シナリオ対話システムは、システムに予め記憶されたシナリオに基づいた発話をエージェントが行う対話システムである。この対話システムでは、エージェントがユーザに問いかけ、ユーザからの返答を受けた際に、ユーザの発話内容に関わらず「そっか」といった相槌で流したり、エージェントの割り込みで話題を変えたりすることで、ユーザの発話が本来の話題から外れた場合であってもストーリーの破綻をユーザに感じさせないように応答することが可能である。 As an example of the above-described agent, there is a scenario dialogue system that performs dialogue with a user on a specific topic according to a predetermined scenario. In the scenario dialogue system, the dialogue can be continued as long as the dialogue develops along the scenario. For example, the dialogue system described in Non-Patent Document 1 is a system that performs a dialogue between a user and a plurality of agents, including an interruption by an agent and an exchange between agents. For example, when an agent speaks a question prepared in a scenario to a user, and an utterance of a user's answer to the question corresponds to an option prepared in the scenario, the agent functions to perform an utterance corresponding to the option. . That is, the scenario dialogue system is a dialogue system in which an agent makes an utterance based on a scenario stored in advance in the system. In this interactive system, when the agent asks the user and receives a response from the user, the conversation is swayed regardless of the content of the user's utterance, or the topic is changed by interrupting the agent. Even when the user's utterance deviates from the original topic, it is possible to respond so as not to make the user feel the story is broken.

また、上記のエージェントの一例として、ユーザの発話内容に沿った発話をエージェントが行うことにより、ユーザとエージェントとが自然な対話を行う雑談対話システムがある。例えば、非特許文献２に記載された対話システムは、ユーザとエージェントとの間で行われる複数ターンの対話の中で文脈に特有のものをより重視しながら、ユーザまたはエージェントの発話に含まれる単語をトリガーとして、あらかじめ記述しておいたルールに従ってシステムが発話することで、ユーザとシステムとの間で雑談対話を実現するシステムである。雑談対話システムが用いるルールは、あらかじめ記述したものだけでなく、ユーザの発話内容に基づいて自動的に生成したものであってもよいし、ユーザまたはエージェントによる直前の発話またはその近傍に発話された発話に基づいて自動的に生成したものであってもよいし、ユーザまたはエージェントによる直前の発話またはその近傍に発話された発話を少なくとも含む発話に基づいて自動的に生成したものであってもよい。非特許文献２には、ユーザの発話に含まれる単語と共起関係や係り受け関係にある単語に基づいて、自動的にルールを生成する技術が記載されている。また、例えば、非特許文献３に記載された対話システムは、人手で記述したルールと統計的発話生成手法で記述したルールを融合することで、ルール生成のコストを低減したシステムである。雑談対話システムは、シナリオ対話システムとは異なり、予め用意されたシナリオに沿った発話をエージェントが行うものではないため、ユーザの発話によっては、エージェントの発話がユーザの発話に対応しないものとなってしまうという事態は生じずに、少なくともユーザの発話内容、もしくはユーザまたはエージェントによる直前の発話またはその近傍に発話された発話、もしくはユーザまたはエージェントによる直前の発話またはその近傍に発話された発話を少なくとも含む発話に基づいた発話をエージェントが行うことが可能である。すなわち、雑談対話システムは、少なくともユーザの発話内容、もしくはユーザまたはエージェントによる直前の発話またはその近傍に発話された発話、もしくはユーザまたはエージェントによる直前の発話またはその近傍に発話された発話を少なくとも含む発話に基づいた発話をエージェントが行う対話システムである。これらの雑談対話システムでは、ユーザの発話に対して明示的に応答することが可能である。 Further, as an example of the above-described agent, there is a chat dialogue system in which a user and an agent have a natural dialogue when the agent utters according to the content of the user's utterance. For example, in the dialogue system described in Non-Patent Document 2, the words included in the utterance of the user or the agent while giving more importance to the context-specific conversation in the multi-turn dialogue performed between the user and the agent. Is a system that realizes a chat conversation between the user and the system by the system speaking according to the rules described in advance. The rules used by the chat dialogue system are not limited to those described in advance, but may be automatically generated based on the user's utterance content, or uttered in the immediate utterance by the user or agent or in the vicinity thereof. It may be automatically generated based on the utterance, or may be automatically generated based on the utterance including at least the utterance immediately before or near the utterance by the user or the agent. . Non-Patent Document 2 describes a technique for automatically generating a rule based on words that have a co-occurrence relationship or a dependency relationship with words included in a user's utterance. Further, for example, the dialogue system described in Non-Patent Document 3 is a system that reduces the cost of rule generation by fusing rules described manually and rules described by a statistical utterance generation method. The chat dialogue system is different from the scenario dialogue system because the agent does not utter the utterance according to the prepared scenario. Therefore, depending on the user's utterance, the agent's utterance does not correspond to the user's utterance. At least the content of the user's utterance, the utterance spoken immediately before or near the user or agent, or the utterance spoken immediately before or near the user or agent An agent can make an utterance based on the utterance. That is, the chat dialogue system includes at least the utterance content of the user, the utterance spoken immediately before or by the user or agent, or the utterance uttered immediately before or by the user or agent. It is a dialogue system in which an agent utters speech based on. In these chat dialogue systems, it is possible to explicitly respond to the user's utterance.

有本庸浩，吉川雄一郎，石黒浩，“複数体のロボットによる音声認識なし対話の印象評価”，日本ロボット学会学術講演会，2016年Arimoto Yasuhiro, Yoshikawa Yuichiro, Ishiguro Hiroshi, “Impression Evaluation of Speechless Speech Recognition by Multiple Robots”, Annual Conference of the Robotics Society of Japan, 2016 杉山弘晃，目黒豊美，東中竜一郎，南泰浩，“任意の話題を持つユーザ発話に対する係り受けと用例を利用した応答文の生成”，人工知能学会論文誌，vol.30(1)，pp. 183-194，2015年Sugiyama Hiroaki, Meguro Toyomi, Higashinaka Ryuichiro, Minami Yasuhiro, “Generation of Responses Using Dependence and Examples for User Utterances with Arbitrary Topics”, Transactions of the Japanese Society for Artificial Intelligence, vol.30 (1), pp. 183-194, 2015 目黒豊美，杉山弘晃，東中竜一郎，南泰浩，“ルールベース発話生成と統計的発話生成の融合に基づく対話システムの構築”，人工知能学会全国大会論文集，vol. 28，pp. 1-4，2014年Toyomi Meguro, Hiroaki Sugiyama, Ryuichiro Higashinaka, Yasuhiro Minami, “Construction of a dialogue system based on the fusion of rule-based utterance generation and statistical utterance generation”, Proceedings of National Conference of the Japanese Society for Artificial Intelligence, vol. 28, pp. 1-4 ,2014

従来の雑談対話システムでは、基本的に一問一答であるため一貫したストーリーで対話を続けることができない。従来のシナリオ対話システムでは、ユーザの発話がシナリオで想定した範囲外であると応答できない場合がある。また、非特許文献１の対話システムでは、ユーザは自分の発話が流されてばかりいると感じてしまうおそれがある。 In the conventional chat dialogue system, since it is basically one question one answer, it is not possible to continue the dialogue with a consistent story. In a conventional scenario dialogue system, there are cases where a response cannot be made if the user's utterance is outside the range assumed in the scenario. In the dialogue system of Non-Patent Document 1, the user may feel that his / her utterance is being played.

この発明の目的は、上述のような点に鑑みて、ユーザの発話に対する応答感を向上し、かつ、対話を長く続けることができる対話技術を提供することである。 In view of the above points, an object of the present invention is to provide a dialogue technique that can improve a user's responsiveness to an utterance and can continue the dialogue for a long time.

上記の課題を解決するために、この発明の第一の態様の対話方法は、予め記憶されたシナリオに基づいた発話を決定する対話システムを第一の対話システムとよび、少なくともユーザの発話内容に基づいた発話を決定する対話システムを第二の対話システムとよぶとしたとき、提示部が、オープン発話を提示する第一提示ステップと、入力部が、オープン発話に対するユーザの発話を受け付ける発話受付ステップと、提示部が、ユーザの発話に基づいて第二の対話システムが決定した発話を提示する第二提示ステップと、提示部が、第一の対話システムが決定した発話を第二提示ステップの後に提示する第三提示ステップと、を含む。 In order to solve the above-described problem, in the dialogue method according to the first aspect of the present invention, the dialogue system that determines an utterance based on a scenario stored in advance is called a first dialogue system, and at least the utterance content of the user is included. When the dialogue system for determining the utterance based on the second dialogue system is called a second dialogue system, the presentation unit presents a first presentation step for presenting an open utterance, and the input unit accepts a user utterance for the open utterance. A second presentation step in which the presentation unit presents the utterance determined by the second dialogue system based on the user's utterance; and the presentation unit displays the utterance decided by the first dialogue system after the second presentation step. A third presentation step to present.

この発明の第二の態様の対話システムは、予め記憶されたシナリオに基づいた発話を決定する対話システムを第一の対話システムとよび、少なくともユーザの発話内容に基づいた発話を決定する対話システムを第二の対話システムとよぶとしたとき、オープン発話を決定し、オープン発話後にオープン発話に対するユーザの発話に基づいて第二の対話システムにより発話を決定し、第二の対話システムにより決定した発話後に第一の対話システムにより発話を決定する発話決定部と、オープン発話に対するユーザの発話を受け付ける入力部と、発話決定部が決定したオープン発話を提示し、発話決定部が第二の対話システムにより決定した発話をユーザの発話の後に提示し、発話決定部が第一の対話システムにより決定した発話を第二の対話システムが決定した発話の後に提示する提示部と、を含む。 In the dialog system according to the second aspect of the present invention, the dialog system that determines the utterance based on the scenario stored in advance is called the first dialog system, and the dialog system that determines the utterance based on at least the user's utterance content. When the second dialogue system is called, an open utterance is decided, and after the open utterance, the utterance is decided by the second dialogue system based on the user's utterance to the open utterance, and after the utterance decided by the second dialogue system An utterance determination unit that determines an utterance by the first dialog system, an input unit that accepts a user's utterance for an open utterance, and an open utterance determined by the utterance determination unit are presented, and the utterance determination unit is determined by the second dialog system Utterance is presented after the user's utterance, and the utterance determined by the utterance determination unit by the first dialog system is displayed in the second dialog. Including a presentation unit that presents after the utterance stem is determined, the.

この発明の第三の態様の対話装置は、ユーザの発話を受け付ける入力部と、発話を提示する提示部とを少なくとも含む対話システムが提示する発話を決定する対話装置であって、予め記憶されたシナリオに基づいた発話を決定する対話システムを第一の対話システムとよび、少なくともユーザの発話内容に基づいた発話を決定する対話システムを第二の対話システムとよぶとしたとき、オープン発話を決定し、オープン発話後にオープン発話に対するユーザの発話に基づいて第二の対話システムにより発話を決定し、第二の対話システムにより決定した発話後に第一の対話システムにより発話を決定する発話決定部を含む。 A dialog device according to a third aspect of the present invention is a dialog device that determines an utterance to be presented by a dialog system including at least an input unit that receives a user's utterance and a presentation unit that presents the utterance, and is stored in advance. The dialog system that determines the utterance based on the scenario is called the first dialog system, and at least the dialog system that determines the utterance based on the utterance content of the user is called the second dialog system, the open utterance is determined. And an utterance determination unit that determines an utterance by the second dialog system based on the user's utterance for the open utterance after the open utterance, and determines an utterance by the first dialog system after the utterance determined by the second dialog system.

この発明によれば、ユーザの発話に対しては少なくともユーザの発話内容に基づいた発話を行う対話システムにより適切な応答ができ、その後は予め記憶されたシナリオに基づいた発話を行う対話システムが決定した発話をできるため、ユーザに対してストーリーを感じさせつつ高い応答感を与えることができ、かつ、ユーザとの間で対話を長く続けることが可能な対話システム、対話装置を実現することが可能となる。 According to the present invention, an appropriate response can be made to a user's utterance by an interactive system that utters based on at least the content of the user's utterance, and thereafter an interactive system that performs utterance based on a pre-stored scenario is determined. It is possible to realize a dialogue system and a dialogue device that can give a high responsiveness while making the user feel a story and can continue a dialogue with the user for a long time. It becomes.

図１は、第一実施形態の人型ロボットによる対話システムの機能構成を例示する図である。FIG. 1 is a diagram illustrating a functional configuration of a dialogue system using a humanoid robot according to the first embodiment. 図２は、第一実施形態の対話方法の処理手続きを例示する図である。FIG. 2 is a diagram illustrating a processing procedure of the interactive method according to the first embodiment. 図３は、第二実施形態の人型ロボットによる対話システムの機能構成を例示する図である。FIG. 3 is a diagram illustrating a functional configuration of the dialogue system using the humanoid robot according to the second embodiment. 図４は、第二実施形態の対話方法の処理手続きを例示する図である。FIG. 4 is a diagram illustrating a processing procedure of the interactive method according to the second embodiment. 図５は、第一実施形態のグループチャットによる対話システムの機能構成を例示する図である。FIG. 5 is a diagram illustrating a functional configuration of the interactive system using group chat according to the first embodiment.

以下、この発明の実施の形態について詳細に説明する。なお、図面中において同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. In addition, the same number is attached | subjected to the component which has the same function in drawing, and duplication description is abbreviate | omitted.

＜第一実施形態＞
第一実施形態の対話システムは、複数台の人型ロボットが協調してユーザとの対話を行うシステムである。すなわち、第一実施形態の対話システムは、エージェントが人型ロボットである場合の一例である。対話システム１０は、図１に示すように、入力部１、音声認識部２、発話決定部３、音声合成部４、および提示部５を備える。この対話システム１０が後述する各ステップの処理を行うことにより第一実施形態の対話方法が実現される。なお、図１に示すように、対話システム１０の音声認識部２、発話決定部３、音声合成部４による部分を第一実施形態の対話装置１１とする。発話決定部３は、外部に存在する雑談対話システム６およびシナリオ対話システム７と通信可能なインターフェースを備える。雑談対話システム６およびシナリオ対話システム７は同様の機能を持つ処理部として対話装置１１内に構成しても構わない。なお、雑談対話システム６は少なくともユーザの発話内容に基づいた発話をエージェントが行う対話システムの一例であり、シナリオ対話システム７は予め記憶されたシナリオに基づいた発話をエージェントが行う対話システムの一例である。<First embodiment>
The dialogue system of the first embodiment is a system in which a plurality of humanoid robots cooperate with each other to interact with a user. That is, the dialogue system of the first embodiment is an example in the case where the agent is a humanoid robot. As shown in FIG. 1, the dialogue system 10 includes an input unit 1, a speech recognition unit 2, an utterance determination unit 3, a speech synthesis unit 4, and a presentation unit 5. The interactive method of the first embodiment is realized by the processing of each step described later by the interactive system 10. As shown in FIG. 1, a part formed by the speech recognition unit 2, the utterance determination unit 3, and the speech synthesis unit 4 of the dialogue system 10 is a dialogue device 11 of the first embodiment. The utterance determination unit 3 includes an interface capable of communicating with the chat dialogue system 6 and the scenario dialogue system 7 existing outside. The chat dialogue system 6 and the scenario dialogue system 7 may be configured in the dialogue apparatus 11 as processing units having similar functions. The chat dialogue system 6 is an example of a dialogue system in which an agent utters at least based on the content of the user's utterance, and the scenario dialogue system 7 is an example of a dialogue system in which the agent performs utterance based on a scenario stored in advance. is there.

本形態の対話システム１０は、応答感を向上するために、オープン発話に対するユーザの発話に基づいて雑談対話システム６が決定した発話を提示し、続いて、シナリオ対話システム７が決定した発話を雑談対話システム６が決定した発話の後に提示する。雑談対話システム６は少なくともユーザの発話内容に基づいて発話を決定するため、ユーザの発話に対して明示的に返答することができる。そのため、例えば「そっか」など単に相槌を打つ場合と比較して、自分の発言が流されてばかりいるような感覚をユーザへ与えてしまうことを回避することができる。これにより、ユーザに対して対話システム１０が真面目に対応してくれている印象を与えることができ、その後に続くシナリオ対話においても長く対話を続けることが可能となる。 In order to improve the sense of response, the dialogue system 10 of the present embodiment presents the utterance determined by the chat dialogue system 6 based on the user's utterance to the open utterance, and subsequently chats the utterance decided by the scenario dialogue system 7. Presented after the utterance determined by the dialog system 6. Since the chat dialogue system 6 determines the utterance based on at least the user's utterance content, it can explicitly reply to the user's utterance. For this reason, it is possible to avoid giving the user a feeling that his / her remarks are being shed, as compared with the case of simply hitting a conflict such as “Soka”. Thereby, it is possible to give the user an impression that the dialogue system 10 is seriously responding, and it is possible to continue the dialogue for a long time in the subsequent scenario dialogue.

対話装置１１は、例えば、中央演算処理装置（CPU: Central Processing Unit）、主記憶装置（RAM: Random Access Memory）などを有する公知又は専用のコンピュータに特別なプログラムが読み込まれて構成された特別な装置である。対話装置１１は、例えば、中央演算処理装置の制御のもとで各処理を実行する。対話装置１１に入力されたデータや各処理で得られたデータは、例えば、主記憶装置に格納され、主記憶装置に格納されたデータは必要に応じて読み出されて他の処理に利用される。また、対話装置１１の各処理部の少なくとも一部が集積回路等のハードウェアによって構成されていてもよい。 The interactive device 11 is a special device configured by reading a special program into a known or dedicated computer having a central processing unit (CPU), a main storage device (RAM), and the like. Device. For example, the interactive apparatus 11 executes each process under the control of the central processing unit. The data input to the dialogue device 11 and the data obtained in each process are stored in, for example, the main storage device, and the data stored in the main storage device is read out as necessary and used for other processing. The Further, at least a part of each processing unit of the interactive apparatus 11 may be configured by hardware such as an integrated circuit.

入力部１は、ユーザの発話を対話システム１０が取得するためのインターフェースである。言い換えれば、入力部１は、ユーザが発話を対話システム１０へ入力するためのインターフェースである。例えば、入力部１はユーザの発話音声を収音して音声信号とするマイクロホンである。入力部１は、収音したユーザの発話音声の音声信号を、音声認識部２へ入力する。 The input unit 1 is an interface for the dialog system 10 to acquire a user's utterance. In other words, the input unit 1 is an interface for the user to input an utterance to the dialogue system 10. For example, the input unit 1 is a microphone that picks up a user's speech and uses it as an audio signal. The input unit 1 inputs the voice signal of the collected user's uttered voice to the voice recognition unit 2.

音声認識部２は、入力部１が収音したユーザの発話音声の音声信号をユーザの発話内容を表すテキストに変換する。音声認識部２は、ユーザの発話内容を表すテキストを、発話決定部３へ入力する。音声認識の方法は既存のいかなる音声認識技術を用いてもよく、利用環境等に合わせて最適なものを適宜選択すればよい。 The voice recognition unit 2 converts the voice signal of the user's uttered voice picked up by the input unit 1 into text representing the content of the user's utterance. The voice recognition unit 2 inputs text representing the user's utterance content to the utterance determination unit 3. Any existing speech recognition technology may be used as the speech recognition method, and an optimal method may be selected as appropriate in accordance with the usage environment.

発話決定部３は、雑談対話システム６またはシナリオ対話システム７と通信し、入力されたテキストに基づいてユーザの発話に対する対話システム１０からの発話内容を表すテキストを決定する。発話決定部３は、決定した発話内容を表すテキストを、音声合成部４へ入力する。 The utterance determination unit 3 communicates with the chat dialog system 6 or the scenario dialog system 7 and determines text representing the utterance content from the dialog system 10 for the user's utterance based on the input text. The utterance determination unit 3 inputs text representing the determined utterance content to the speech synthesis unit 4.

音声合成部４は、発話決定部３が決定した発話内容を表すテキストを、発話内容を表す音声信号に変換する。音声合成部４は、発話内容を表す音声信号を、提示部５へ入力する。音声合成の方法は既存のいかなる音声合成技術を用いてもよく、利用環境等に合わせて最適なものを適宜選択すればよい。 The speech synthesizer 4 converts the text representing the utterance content determined by the utterance determination unit 3 into a speech signal representing the utterance content. The voice synthesis unit 4 inputs a voice signal representing the utterance content to the presentation unit 5. As a speech synthesis method, any existing speech synthesis technology may be used, and an optimum method may be selected as appropriate in accordance with the usage environment.

提示部５は、発話決定部３が決定した発話内容をユーザへ提示するためのインターフェースである。例えば、提示部５は、人間の形を模して製作された人型ロボットである。この人型ロボットは、発話決定部３が決定した発話内容を表すテキストを音声合成部４が音声信号に変換した発話内容を表す音声信号を、例えば頭部に搭載したスピーカから発音する、すなわち、発話を提示する。提示部５を人型ロボットとした場合には、対話に参加する人格ごとに一台の人型ロボットを用意する。以下では、二人の人格が対話に参加する例として、二台の人型ロボット５−１および５−２が存在するものとする。 The presentation unit 5 is an interface for presenting the utterance content determined by the utterance determination unit 3 to the user. For example, the presentation unit 5 is a humanoid robot imitating a human shape. This humanoid robot, for example, produces a speech signal representing the utterance content obtained by converting the text representing the utterance content determined by the utterance determination unit 3 into an audio signal by the speech synthesis unit 4 from, for example, a speaker mounted on the head. Present the utterance. When the presenting unit 5 is a humanoid robot, one humanoid robot is prepared for each personality participating in the dialogue. In the following, it is assumed that there are two humanoid robots 5-1 and 5-2 as an example in which two personalities participate in the dialogue.

入力部１は提示部５と一体として構成してもよい。例えば、提示部５を人型ロボットとした場合、人型ロボットの頭部にマイクロホンを搭載し、入力部１として利用することも可能である。 The input unit 1 may be configured integrally with the presentation unit 5. For example, when the presentation unit 5 is a humanoid robot, a microphone can be mounted on the head of the humanoid robot and used as the input unit 1.

以下、図２を参照して、第一実施形態の対話方法の処理手続きを説明する。 Hereinafter, with reference to FIG. 2, the processing procedure of the interactive method of the first embodiment will be described.

ステップＳ１１において、人型ロボット５−１または５−２は、シナリオ対話システム７または雑談対話システム６が、選択したオープン発話の内容を表す音声をスピーカから出力する。オープン発話とは、相手が自由に応答できる発話であり、少なくともオープン質問とオープンコメントの何れかを含む発話である。オープン質問とは、いわゆる５Ｗ１Ｈ（いつ（When）、どこで（Where）、だれが（Who）、なにを（What）、なぜ（Why）、どのように（How））のように、相手が自由に回答できる質問である。オープン質問の逆に、例えば「Yes / No」や「A or B or C」のように回答範囲が限定される質問は、クローズ質問と呼ばれる。オープンコメントとは、話題に対するとりとめのない感想など、特に答えを求めない発言であり、すなわち相手が自由に発言できる発話である。オープン発話の内容を表すテキストは発話決定部３の要求に応じてシナリオ対話システム７が選択したシナリオに沿って決定される。シナリオ対話システム７は、あらかじめ用意された複数のシナリオからランダムにシナリオを選択してもよいし、直前までの対話内容に基づいてあらかじめ定めたルールに従ってシナリオを選択してもよい。直前までの対話内容に基づいてシナリオを選択する方法としては、例えば、直前の５発話程度を含む対話について、各発話に含まれる単語や各発話を構成する焦点語と各シナリオに含まれる単語や焦点語との単語間距離が所定の距離より近いシナリオを選択する方法が挙げられる。 In step S11, the humanoid robot 5-1 or 5-2 outputs a voice representing the content of the open utterance selected by the scenario dialogue system 7 or the chat dialogue system 6 from the speaker. An open utterance is an utterance that a partner can freely respond to, and is an utterance including at least one of an open question and an open comment. The open question is the so-called 5W1H (when, where, who, who, what, why, how) This is a question that can be answered. In contrast to the open question, a question whose answer range is limited such as “Yes / No” or “A or B or C” is called a closed question. An open comment is an utterance that does not particularly require an answer, such as an unconcerned impression of a topic, that is, an utterance that can be freely spoken by the other party. The text representing the content of the open utterance is determined according to the scenario selected by the scenario dialogue system 7 in response to a request from the utterance determination unit 3. The scenario dialogue system 7 may select a scenario at random from a plurality of scenarios prepared in advance, or may select a scenario according to a rule determined in advance based on the content of the previous dialogue. As a method for selecting a scenario based on the content of the previous conversation, for example, for a conversation including about the last five utterances, the words included in each utterance, the focus words constituting each utterance, the words included in each scenario, There is a method of selecting a scenario in which the distance between words with the focal word is closer than a predetermined distance.

ステップＳ１２において、マイクロホン１は、オープン発話に対するユーザの発話を受け付ける。音声認識部２は、マイクロホン１が収音したユーザの発話音声の音声信号を音声認識し、音声認識結果として得られたテキストをユーザの発話内容を表すテキストとして発話決定部３へ入力する。 In step S 12, the microphone 1 accepts a user utterance for an open utterance. The voice recognition unit 2 recognizes the voice signal of the user's utterance voice collected by the microphone 1 and inputs the text obtained as the voice recognition result to the utterance determination unit 3 as text representing the user's utterance content.

ステップＳ１３において、人型ロボット５−１は、ユーザの発話内容を表すテキストに基づいて雑談対話システム６が決定した雑談発話の内容を表す音声をスピーカから出力する。雑談発話の内容を表すテキストは、発話決定部３がユーザの発話の音声認識結果であるユーザの発話内容を表すテキストに基づいて、雑談対話システム６を用いて決定する。雑談発話の内容を表す音声を出力する人型ロボットは、オープン発話の内容を表す音声を出力した人型ロボットであってもよいし、オープン発話の内容を表す音声を出力した人型ロボットとは異なる人型ロボットであってもよい。 In step S 13, the humanoid robot 5-1 outputs a voice representing the content of the chat utterance determined by the chat dialogue system 6 based on the text representing the content of the user's utterance from the speaker. The text representing the content of the chat utterance is determined by the utterance determination unit 3 using the chat dialogue system 6 based on the text representing the content of the user's utterance as the voice recognition result of the user's utterance. The humanoid robot that outputs the voice representing the contents of the chat utterance may be a humanoid robot that outputs the voice representing the contents of the open utterance, or the humanoid robot that outputs the voice representing the contents of the open utterance. Different humanoid robots may be used.

ステップＳ１４において、人型ロボット５−２は、シナリオ対話システム７が決定したシナリオ発話の内容を表す音声をスピーカから出力する。ステップＳ１４は、ステップＳ１３の後に行われる。すなわち、対話システムは、ユーザの発話に基づいて雑談対話システム６が決定した雑談発話の内容を表す音声をスピーカから出力した後に、シナリオ対話システム７が決定したシナリオ発話の内容を表す音声をスピーカから出力する。シナリオ発話の内容を表すテキストは、発話決定部３の要求に応じてシナリオ対話システム７が選択したシナリオに沿って、発話決定部３が決定する。シナリオ発話の内容を表す音声を出力する人型ロボットは、オープン発話の内容を表す音声を出力した人型ロボットであってもよいし、オープン発話の内容を表す音声を出力した人型ロボットとは異なる人型ロボットであってもよい。以降は、例えば、シナリオに沿った対話がユーザと対話システムとの間で実行されるように、対話システムは、シナリオ対話システム７が決定したシナリオ発話の発話内容を表す音声をスピーカから出力する。以降の発話を行う人型ロボットは、何れか１つの人型ロボットであっても複数の人型ロボットであってもよい。 In step S 14, the humanoid robot 5-2 outputs a voice representing the content of the scenario utterance determined by the scenario dialogue system 7 from the speaker. Step S14 is performed after step S13. That is, the dialogue system outputs the voice representing the content of the chat utterance determined by the chat dialogue system 6 based on the user's utterance from the speaker, and then the voice representing the content of the scenario utterance decided by the scenario dialogue system 7 from the speaker. Output. The text representing the contents of the scenario utterance is determined by the utterance determination unit 3 in accordance with the scenario selected by the scenario dialogue system 7 in response to a request from the utterance determination unit 3. The humanoid robot that outputs the voice that represents the contents of the scenario utterance may be a humanoid robot that outputs the voice that represents the contents of the open utterance, or the humanoid robot that outputs the voice that represents the contents of the open utterance Different humanoid robots may be used. Thereafter, for example, the dialogue system outputs a voice representing the utterance content of the scenario utterance determined by the scenario dialogue system 7 from the speaker so that a dialogue according to the scenario is executed between the user and the dialogue system. The humanoid robot that performs subsequent speech may be any one humanoid robot or a plurality of humanoid robots.

以下、第一実施形態の対話内容の具体例を示す。ここで、Ｒは人型ロボットを表し、Ｈはユーザを表す。Ｒの後の括弧内における数字は人型ロボットの識別子である。「Ｒ（１→Ｈ）」は人型ロボット５−１がユーザへ話しかける意図で発話していることを表し、「Ｒ（１→２）」は人型ロボット５−１が人型ロボット５−２へ話しかける意図で発話していることを表す。なお、人型ロボットが誰に話かける意図であるかは、例えば、人型ロボットの頭部や視線の動きにより表出するようにしてもよいが、表出しないでもよい。 Hereafter, the specific example of the content of the conversation of 1st embodiment is shown. Here, R represents a humanoid robot, and H represents a user. The number in parentheses after R is the identifier of the humanoid robot. “R (1 → H)” indicates that the humanoid robot 5-1 is speaking with the intention of speaking to the user, and “R (1 → 2)” indicates that the humanoid robot 5-1 is humanoid robot 5- Indicates that you are speaking with the intention of speaking to 2. It should be noted that the intention of the humanoid robot to speak to may be expressed by, for example, the movement of the head or line of sight of the humanoid robot, but may not be expressed.

Ｒ（１→Ｈ）：「食べ物だったら何が好き？」（※２）
Ｈ：「ラーメンが好き！」
Ｒ（１→Ｈ）：「豚骨がいいよね。」（※１）
Ｒ（２→１）：「僕たちロボットだから、何も食べられないんだけどね。」（※２）
Ｒ（１→２）：「それはしょうがないよ。」（※２）
Ｒ（２→１）：「そっか。」（※２）R (1 → H): “What do you like if it was food?” (* 2)
H: “I like ramen!”
R (1 → H): “I like pork bones” (* 1)
R (2 → 1): “Because we are robots, we can't eat anything.” (* 2)
R (1 → 2): “It can't be helped” (* 2)
R (2 → 1): “So soft.” (* 2)

ここで、※１は雑談対話システム６により決定された発話内容である。※２はシナリオ対話システム７により決定された発話内容である。なお、※１以降の人型ロボットの発話は、発話する人型ロボットが逆でもよいし、話しかける相手が人間でも他の人型ロボットでも人間と他の人型ロボットの両方でもよい。 Here, * 1 is the utterance content determined by the chat dialogue system 6. * 2 is the utterance content determined by the scenario dialogue system 7. Note that the utterances of the humanoid robots after * 1 may be the opposite of the humanoid robot that speaks, or the person to talk to may be a human or another humanoid robot, or both humans and other humanoid robots.

上記のように構成することにより、本形態の対話技術によれば、オープン発話に対するユーザの発話がシナリオの想定外であっても、雑談対話システムによりユーザの発話にふさわしい応答をすることができるため、ユーザが感じる応答感が向上する。これにより、ユーザはシステムとの対話を継続する意欲をかき立てられ、対話を長く続けることができるようになる。 With the configuration as described above, according to the dialogue technique of this embodiment, even if the user's utterance with respect to the open utterance is outside the scenario, the chat dialogue system can respond to the user's utterance. The responsiveness felt by the user is improved. Thereby, the user is motivated to continue the dialogue with the system, and the dialogue can be continued for a long time.

＜第一実施形態の変形例１＞
第一実施形態では、ステップＳ１１においてシナリオ対話システム７が選択したオープン発話をスピーカから出力する例を説明したが、ステップＳ１１においてスピーカから出力するオープン発話はどのように生成されたものでもよい。<Variation 1 of the first embodiment>
In the first embodiment, the example in which the open utterance selected by the scenario dialogue system 7 in step S11 is output from the speaker has been described. However, the open utterance output from the speaker in step S11 may be generated in any way.

例えば、ステップＳ１１の前に行われたユーザの発話を入力部１が収音して音声信号とし、音声認識部２がユーザの発話内容を表すテキストを得て、発話決定部３がユーザの発話内容を表すテキストに少なくとも基づいて雑談対話システム６によってオープン発話の発話内容を表すテキストを決定し、雑談対話システム６が決定したオープン発話の発話内容を表す音声信号をスピーカから出力するようにしてもよい。 For example, the user's utterance performed before step S11 is picked up by the input unit 1 as a voice signal, the voice recognition unit 2 obtains text representing the user's utterance content, and the utterance determination unit 3 receives the user's utterance. It is also possible to determine the text representing the utterance content of the open utterance by the chat dialogue system 6 based on at least the text representing the content, and to output the audio signal representing the utterance content of the open utterance determined by the chat dialogue system 6 from the speaker. Good.

＜第一実施形態の変形例２＞
第一実施形態では、ステップＳ１３においてユーザの発話に基づいて雑談対話システム６が決定した雑談発話の内容を表す音声をスピーカから出力する例、すなわち、ステップＳ１３において雑談対話システム６が決定した１つの雑談発話の内容を表す音声をスピーカから出力する例を説明したが、ステップＳ１３において雑談対話システム６が決定した複数の雑談発話の内容を表す音声をスピーカから出力してもよい。<Modification 2 of the first embodiment>
In the first embodiment, an example of outputting voice representing the contents of the chat utterance determined by the chat dialog system 6 based on the user's utterance in step S13 from the speaker, that is, one determined by the chat dialog system 6 in step S13. Although the example which outputs the audio | voice showing the content of chat utterance from a speaker was demonstrated, you may output the audio | voice showing the content of the several chat utterance which the chat dialog system 6 determined in step S13 from a speaker.

例えば、ステップＳ１３において、まず、人型ロボット５−１が、ユーザの発話に基づいて雑談対話システム６が決定した雑談発話の内容を表す音声をスピーカから出力し、次に、人型ロボット５−２が、スピーカから出力した人型ロボット５−１の発話に基づいて雑談対話システム６が決定した雑談発話の内容を表す音声をスピーカから出力するようにしてもよい。 For example, in step S13, the humanoid robot 5-1 first outputs a voice representing the contents of the chat utterance determined by the chat dialogue system 6 based on the user's utterance from the speaker, and then the humanoid robot 5- 2 may output from the speaker a voice representing the content of the chat utterance determined by the chat dialogue system 6 based on the utterance of the humanoid robot 5-1 output from the speaker.

＜第一実施形態の変形例３＞
第一実施形態の対話システム１０は、複数台の人型ロボットが協調してユーザとの対話を行うシステムであったが、ユーザとの対話の全てまたは一部を１台の人型ロボットが行うシステムであってもよい。例えば、ステップＳ１１のオープン発話、ステップＳ１３のユーザの発話に基づいて雑談対話システム６が決定した雑談発話、およびステップＳ１４のシナリオ対話システム７が決定したシナリオ発話を同じ１台の人型ロボットが行うようにしてもよい。この場合は、例えば、ステップＳ１４以降のユーザとの対話は、複数台の人型ロボットで協調して行ってもよいし、ステップＳ１４までと同じ１台の人型ロボットが行ってもよい。<Modification 3 of the first embodiment>
The dialogue system 10 of the first embodiment is a system in which a plurality of humanoid robots cooperate with each other to interact with a user. However, a single humanoid robot performs all or part of the dialogue with the user. It may be a system. For example, the same one humanoid robot performs the open utterance in step S11, the chat utterance determined by the chat dialog system 6 based on the user's utterance in step S13, and the scenario utterance determined by the scenario dialog system 7 in step S14. You may do it. In this case, for example, the dialogue with the user after step S14 may be performed in cooperation by a plurality of humanoid robots, or may be performed by the same one humanoid robot as in step S14.

＜第二実施形態＞
第二実施形態では、ユーザの話題への参加感を向上するために、シナリオ対話から雑談対話に移行し、その後、雑談対話から再度シナリオ対話へ移行する。シナリオ対話の間に雑談対話を挿入することで、対話の流れが自然になり、話題の決定に自分も参加している感覚をユーザへ与えることができる。これにより、ユーザはその後に続くシナリオ対話においても長く対話を続けることが可能となる。<Second embodiment>
In the second embodiment, in order to improve the user's sense of participation in the topic, the scenario dialogue is changed to the chat dialogue, and then the chat dialogue is changed to the scenario dialogue again. By inserting a chat dialogue between scenario dialogues, the flow of the dialogue becomes natural, and it is possible to give the user a sense that he / she is also participating in the topic determination. As a result, the user can continue the dialogue for a long time in the subsequent scenario dialogue.

第二実施形態の対話システム１２は、図３に示すように、入力部１、音声認識部２、発話決定部３、音声合成部４、および提示部５を、第一実施形態と同様に備え、さらに対話制御部８を備える。この対話システム１２が後述する各ステップの処理を行うことにより第二実施形態の対話方法が実現される。なお、図３に示すように、対話システム１２の音声認識部２、発話決定部３、音声合成部４、対話制御部８による部分を第二実施形態の対話装置１３とする。対話制御部８は、対話システム１２が備える他の処理部を制御して、ユーザの発話の受付とユーザへの発話の提示とをそれぞれ少なくとも一回以上実行する対話である対話フローを実行する。 As shown in FIG. 3, the dialogue system 12 of the second embodiment includes an input unit 1, a speech recognition unit 2, an utterance determination unit 3, a speech synthesis unit 4, and a presentation unit 5, as in the first embodiment. Further, a dialog control unit 8 is provided. The dialog system according to the second embodiment is realized by the processing of each step described later by the dialog system 12. In addition, as shown in FIG. 3, let the part by the speech recognition part 2, the speech determination part 3, the speech synthesizing part 4, and the dialogue control part 8 of the dialogue system 12 be the dialogue apparatus 13 of the second embodiment. The dialogue control unit 8 controls another processing unit included in the dialogue system 12 to execute a dialogue flow that is a dialogue in which the reception of the user's utterance and the presentation of the utterance to the user are executed at least once.

以下、図４を参照して、第二実施形態の対話方法の処理手続きを説明する。 Hereinafter, with reference to FIG. 4, the processing procedure of the dialogue method of the second embodiment will be described.

ステップＳ２１において、人型ロボット５−１または５−２は、シナリオ対話システム７が任意に選択した第一シナリオに含まれる第一シナリオ発話の内容を表す音声をスピーカから出力する。第一シナリオの選択は発話決定部３の要求を契機として行われる。シナリオ対話システム７は、あらかじめ用意された複数のシナリオからランダムに第一シナリオを選択してもよいし、以前の対話内容に基づいてあらかじめ定めたルールに従って第一シナリオを選択してもよい。シナリオ対話システム７が第一シナリオを選択する方法は、第一実施形態のステップＳ１１で説明した方法と同様である。 In step S 21, the humanoid robot 5-1 or 5-2 outputs a voice representing the content of the first scenario utterance included in the first scenario arbitrarily selected by the scenario dialogue system 7 from the speaker. The selection of the first scenario is performed in response to a request from the utterance determination unit 3. The scenario dialogue system 7 may select a first scenario randomly from a plurality of scenarios prepared in advance, or may select a first scenario according to a rule determined in advance based on previous dialogue contents. The method by which the scenario dialogue system 7 selects the first scenario is the same as the method described in step S11 of the first embodiment.

ステップＳ２２において、マイクロホン１は、第一シナリオ発話に対してユーザが発した第一ユーザ発話を受け付ける。音声認識部２は、マイクロホン１が収音したユーザの発話の音声信号を音声認識し、音声認識結果として得られたテキストを第一ユーザ発話の内容を表すテキストとして発話決定部３へ入力する。 In step S22, the microphone 1 accepts a first user utterance uttered by the user in response to the first scenario utterance. The voice recognition unit 2 recognizes the voice signal of the user's utterance collected by the microphone 1 and inputs the text obtained as the voice recognition result to the utterance determination unit 3 as text representing the content of the first user utterance.

ステップＳ２３において、対話システム１２は、シナリオ対話から雑談対話へ切り替える条件を満足したか否かを判定する。条件を満足したと判定した場合には、ステップＳ２４へ処理を進める。条件を満足していないと判定した場合には、ステップＳ２１へ処理を戻し、ステップＳ２１−Ｓ２２の処理を再度実行する。 In step S23, the dialogue system 12 determines whether or not a condition for switching from the scenario dialogue to the chat dialogue is satisfied. If it is determined that the condition is satisfied, the process proceeds to step S24. If it is determined that the condition is not satisfied, the process returns to step S21, and the processes of steps S21 to S22 are executed again.

シナリオ対話から雑談対話へ切り替える条件は、例えば、Ａ１．第一シナリオに沿った対話がすべて完了した場合、Ａ２．第一シナリオの進行が失敗した場合などが挙げられる。シナリオの進行が失敗した場合とは、例えば、Ａ２−１．ユーザへ向けて発話した後のユーザの発話がシナリオ対話システムの想定範囲に含まれていない場合、Ａ２−２．進行中のシナリオに対するユーザの振る舞いからそのシナリオについて話したくない意思やユーザの対話意欲が減退していることが認識された場合などが挙げられる。また、Ａ２−１の場合とＡ２−２の場合とを組み合わせ、ユーザへ向けて発話した後のユーザの反応が芳しくない場合も含まれる。ユーザの発話がシナリオ対話システムの想定範囲に含まれていない場合とは、例えば、ユーザの発話と、シナリオ対話システムが予め記憶されたシナリオに基づいてユーザの発話に対して決定した発話とが整合しない場合である。ユーザの振る舞いには、非言語シグナル、パラ言語シグナル（間の情報も含む）なども含まれる。ユーザの反応が芳しくない場合には、ユーザの振る舞いから話したくない意思が認識された場合や、ユーザの対話意欲の減退が認識された場合が含まれる。話したくない意思の認識や対話意欲の減退は、例えば、ユーザがあらかじめ定めた特定の語句（フレーズ）（例えば、「この話、さっきもしたよ。」など）を発したら、もうその話題について話したくない意思を表していると判断すればよい。 The conditions for switching from the scenario dialogue to the chat dialogue are, for example, A1. If all dialogues according to the first scenario are completed, A2. For example, when the progress of the first scenario fails. For example, A2-1. When the user's utterance after speaking to the user is not included in the assumed range of the scenario dialogue system, A2-2. For example, when the user's behavior with respect to an ongoing scenario is recognized that the user does not want to talk about the scenario or that the user's willingness to interact is reduced. In addition, the case of A2-1 and the case of A2-2 are combined, and the case where the reaction of the user after speaking to the user is not good is also included. When the user's utterance is not included in the assumed range of the scenario dialogue system, for example, the user's utterance matches the utterance determined for the user's utterance based on the scenario stored in advance by the scenario dialogue system This is the case. User behavior includes non-verbal signals, paralinguistic signals (including information between them), and the like. The case where the user's reaction is not good includes a case where an intention not to speak is recognized from a user's behavior and a case where a decrease in the user's desire to interact is recognized. Recognizing intentions that you do not want to talk about or diminishing willingness to talk, for example, if you utter a specific phrase (phrase) that the user has set in advance (for example, “This story, I did it before”), talk about that topic. It may be judged that it expresses the intention that you do not want.

ステップＳ２４において、人型ロボット５−１または５−２は、第一ユーザ発話の内容を表すテキストに基づいて雑談対話システム６が決定した雑談発話の内容を表す音声をスピーカから出力する。雑談発話の内容を表す音声を出力する人型ロボットは、第一シナリオに基づく一つ以上の発話のうち最後の発話の内容を表す音声を出力した人型ロボットであってもよいし、その最後の発話の内容を表す音声を出力した人型ロボットとは異なる人型ロボットであってもよい。 In step S 24, the humanoid robot 5-1 or 5-2 outputs a voice representing the content of the chat utterance determined by the chat dialogue system 6 based on the text representing the content of the first user utterance from the speaker. The humanoid robot that outputs the voice representing the content of the chat utterance may be a humanoid robot that outputs the voice representing the content of the last utterance out of one or more utterances based on the first scenario. The humanoid robot may be different from the humanoid robot that outputs the voice representing the content of the utterance.

雑談発話の内容を表すテキストは、発話決定部３が第一シナリオによる対話中の人型ロボットとユーザとの発話の系列の音声認識結果である発話系列の内容を表すテキストに基づいて、雑談対話システム６を用いて決定したものである。雑談対話システム６へ入力する発話系列の範囲は、直前の発話に限定してもよいし、第一シナリオの一部または全部でもよいし、第一シナリオの前に行われた雑談対話またはシナリオ対話すべてを含めてもよい。雑談対話へ切り替える際の最初の発話をシナリオごとに事前に設定しておいてもよい。例えば、食べ物の話をするシナリオの後に「食べること以外で、何が好き？」などの質問を用意することなどが考えられる。 The text representing the content of the chat utterance is based on the text representing the content of the utterance sequence, which is the speech recognition result of the sequence of utterances between the humanoid robot and the user who are talking in the first scenario. This is determined using the system 6. The range of the utterance sequence input to the chat dialogue system 6 may be limited to the immediately preceding utterance, may be part or all of the first scenario, or the chat dialogue or scenario dialogue performed before the first scenario. All may be included. The first utterance when switching to chat conversation may be set in advance for each scenario. For example, it may be possible to prepare a question such as “What do you like except eating?” After a scenario of talking about food.

ステップＳ２５において、マイクロホン１は、雑談発話に対してユーザが発した第二ユーザ発話を受け付ける。音声認識部２は、マイクロホン１が収音した第二ユーザ発話の音声信号を音声認識し、音声認識結果として得られたテキストを第二ユーザ発話の内容を表すテキストとして発話決定部３へ入力する。 In step S 25, the microphone 1 accepts a second user utterance uttered by the user in response to the chat utterance. The voice recognition unit 2 recognizes the voice signal of the second user utterance picked up by the microphone 1 and inputs the text obtained as the voice recognition result to the utterance determination unit 3 as text representing the content of the second user utterance. .

ステップＳ２６において、対話システム１２は、雑談対話からシナリオ対話へ切り替える条件を満足したか否かを判定する。条件を満足したと判定した場合には、ステップＳ２７へ処理を進める。条件を満足していないと判定した場合には、ステップＳ２４へ処理を戻し、ステップＳ２４−Ｓ２５の処理を再度実行する。 In step S26, the dialogue system 12 determines whether or not a condition for switching from the chat dialogue to the scenario dialogue is satisfied. If it is determined that the condition is satisfied, the process proceeds to step S27. If it is determined that the condition is not satisfied, the process returns to step S24, and the processes of steps S24-S25 are executed again.

雑談対話からシナリオ対話へ切り替える条件は、例えば、Ｂ１．雑談対話の継続が困難になった場合、Ｂ２．雑談対話を通じて次のシナリオを十分な信頼度で選択できる状況となった場合などが挙げられる。雑談対話の継続が困難な場合とは、例えば、Ｂ１−１．ユーザへ向けて発話した後のユーザの発話が雑談対話システムの想定範囲に含まれていない場合、Ｂ１−２．進行中の雑談対話に対するユーザの振る舞いからそのシナリオについて話したくない意思やユーザの対話意欲が減退していることが認識された場合、Ｂ１−３．次のシナリオを十分な信頼度で選択できる状況とならず所定の回数の雑談発話を繰り返した場合などが挙げられる。また、Ｂ１−１の場合とＢ１−２の場合とを組み合わせ、ユーザへ向けて発話した後のユーザの反応が芳しくない場合も含まれる。Ｂ１−３の場合は、Ｂ１−１の場合および／またはＢ１−２の場合と組み合わせることができ、例えば、ユーザの発話が雑談対話システムの想定範囲に含まれない場合および／またはユーザの振る舞いから話したくない意思が認識された場合が所定の回数繰り返し発生した場合に、雑談対話の継続が困難と判断するように構成してもよい。ユーザの発話が雑談対話システムの想定範囲に含まれていない場合とは、例えば、ユーザの発話と、雑談対話システムが少なくともユーザの発話に基づいて決定した発話とが整合しない場合である。ユーザの振る舞いには、非言語シグナル、パラ言語シグナル（間の情報も含む）なども含まれる。ユーザの反応が芳しくない場合には、ユーザの振る舞いから話したくない意思が認識された場合や、ユーザの対話意欲の減退が認識された場合が含まれる。話したくない意思の認識や対話意欲の減退は、例えば、ユーザがあらかじめ定めた特定の語句（フレーズ）（例えば、「この話、さっきもしたよ。」など）を発した場合などが挙げられる。 The condition for switching from the chat dialogue to the scenario dialogue is, for example, B1. If it becomes difficult to continue the chat conversation, B2. For example, it is possible to select the next scenario with sufficient confidence through chat conversation. The case where it is difficult to continue the chat conversation is, for example, B1-1. When the user's utterance after speaking to the user is not included in the assumed range of the chat dialogue system, B1-2. When it is recognized from the user's behavior with respect to the ongoing chat conversation that the user does not want to talk about the scenario or the user's willingness to interact is reduced. There are cases where the next scenario is not selected with sufficient reliability and the chat utterance is repeated a predetermined number of times. In addition, the case of B1-1 and the case of B1-2 are combined, and the case where the reaction of the user after speaking to the user is not good is also included. The case of B1-3 can be combined with the case of B1-1 and / or B1-2. For example, when the user's utterance is not included in the assumed range of the chat dialogue system and / or from the user's behavior The configuration may be such that it is determined that it is difficult to continue the chat conversation when the intention that the user does not want to speak is recognized a predetermined number of times. The case where the user's utterance is not included in the assumed range of the chat dialogue system is, for example, a case where the user's utterance and the utterance determined by the chat dialogue system based on at least the user's utterance do not match. User behavior includes non-verbal signals, paralinguistic signals (including information between them), and the like. The case where the user's reaction is not good includes a case where an intention not to speak is recognized from a user's behavior and a case where a decrease in the user's desire to interact is recognized. The recognition of the intention that the user does not want to speak and the decrease in the willingness to talk include, for example, a case where a user has issued a specific phrase (phrase) (for example, “This story, I've done it before”).

シナリオ対話へ切り替える際には、直前のユーザの発話を「そっか」といった相槌で受けるとともに、最初の発話の直前に間を挿入し、「ところでさ」「そうそう」「ねぇ」など、話題を変えようとしていることを表す発話を、雑談対話の最後に発話した人型ロボットと異なる人型ロボットに発話させるとよい。これにより、話題の不連続性により生じる違和感を軽減することができる。このとき、次のシナリオを選択した際の信頼度に応じて、発話する内容を変更してもよい。ここで、選択の信頼度とは、雑談対話の内容と選択したシナリオの類似度の高さを表す指標である。例えば、選択の信頼度が比較的高い場合には「そうそう、・・・」などと短い発話を挿入することとし、選択の信頼度が比較的低い場合には「ところでさ、全然関係ないかもしれないけど・・・」などと話題が変わることを明示的に表す内容を発話することが考えられる。選択の信頼度が低い場合の具体例としては、例えば、雑談対話で「ラーメンを食べた」ことを発話した後に、シナリオ対話で「レストランの経営」を話題にするように、話題語・話題述語項間の類似度が低い場合が挙げられる。また、例えば、雑談対話で「スポーツが好きではない」ことを発話した後に、シナリオ対話で「スキーをした」ことを話題にするように、話題の類似度は高いものの、ユーザがその話題に否定的である場合が挙げられる。さらに、例えば、雑談対話で「ドラッグ」についての発話があり、マウスのドラッグの話題か薬のドラッグの話題か判別できない場合のように、話題語が多義語であり、いずれの意味で発話されたのかを判別できない場合が挙げられる。 When switching to the scenario dialogue, the previous user's utterance is received with a so-called “Soka”, and an interval is inserted immediately before the first utterance to change the topic, such as “By the way”, “Yes”, “Hey”, etc. It is good to make the utterance showing that it is going to speak to a humanoid robot different from the humanoid robot uttered at the end of the chat conversation. Thereby, the uncomfortable feeling caused by the discontinuity of the topic can be reduced. At this time, the content to be uttered may be changed according to the reliability when the next scenario is selected. Here, the reliability of selection is an index representing the degree of similarity between the contents of the chat conversation and the selected scenario. For example, if the selection reliability is relatively high, a short utterance such as “Yes, ...” is inserted, and if the selection reliability is relatively low, “By the way, it may not be related at all. It is conceivable to utter content that expressly expresses that the topic will change, such as "No ...". As a specific example when the reliability of the selection is low, for example, after speaking that “I ate ramen” in the chat dialogue, I talked about “Restaurant management” in the scenario dialogue. One example is when the similarity between terms is low. Also, for example, after speaking that “I do not like sports” in the chat dialogue, the topic similarity is high, but the user denies the topic. Case. In addition, for example, in a chat conversation, there is an utterance about “drag”, and the topic word is an ambiguous word, such as when it is not possible to determine whether it is a topic of mouse drag or drug drag, and spoken in any sense There are cases where it is impossible to determine whether or not.

ステップＳ２７において、人型ロボット５−１または５−２は、シナリオ対話システム７が選択した第二シナリオに含まれる第二シナリオ発話の内容を表す音声をスピーカから出力する。第二シナリオの選択は発話決定部３の要求を契機として行われる。ステップＳ２３において第一シナリオが完了する前にシナリオ対話から雑談対話への切り替えが行われていた場合には、第一シナリオの残りの部分を第二シナリオとしてもよい。以降、第二シナリオに沿った対話がユーザと対話装置との間で実行される。 In step S 27, the humanoid robot 5-1 or 5-2 outputs a voice representing the content of the second scenario utterance included in the second scenario selected by the scenario dialogue system 7 from the speaker. The selection of the second scenario is performed in response to a request from the utterance determination unit 3. If switching from the scenario dialogue to the chat dialogue has been performed before the completion of the first scenario in step S23, the remaining portion of the first scenario may be set as the second scenario. Thereafter, a dialogue according to the second scenario is executed between the user and the dialogue device.

シナリオ対話システム７は、ステップＳ２４およびＳ２５で行われた雑談対話の内容に基づいてあらかじめ定めたルールに従って第二シナリオを選択する。雑談対話中の発話にはユーザが興味を抱き得る話題を表す語句が含まれていると考えられるため、これらを手掛かりとすることで適切な話題に関するシナリオを第二シナリオとして選択することができる。例えば、各シナリオに話題を表すキーワードを設定しておき、雑談対話中の人型ロボットの発話およびユーザの発話のいずれかまたは両方とそのキーワードとの類似度に従ってシナリオの選択を行う。また、例えば、ユーザの発話に対して雑談対話システム６が生成した文がいずれかのシナリオの先頭文に類似している場合はそのシナリオを選択する。 The scenario dialogue system 7 selects the second scenario according to a rule determined in advance based on the contents of the chat dialogue performed in steps S24 and S25. Since it is considered that the utterance during the chat conversation includes a phrase representing a topic that the user may be interested in, a scenario related to an appropriate topic can be selected as a second scenario by using these as clues. For example, a keyword representing a topic is set in each scenario, and the scenario is selected according to the similarity between the keyword and one or both of the utterance of the humanoid robot and the user's utterance during the chat conversation. Further, for example, when the sentence generated by the chat dialogue system 6 with respect to the user's utterance is similar to the head sentence of any scenario, the scenario is selected.

雑談対話中の発話を手掛かりとした選択が行えず、対話の継続が困難と判定した場合、あらかじめ用意された複数のシナリオの中からランダムに第二シナリオを選択する。この場合、事前にシナリオを準備する際にシナリオごとに選択確率を設定してもよい。すなわち、必ずしも均等なランダム選択をしなくともよい。また、これまでの観測情報から算出されている類似度に基づいて選択確率を重み付けしてもよい。 When it is determined that it is difficult to continue the conversation because the utterance during the chat conversation cannot be selected as a clue, the second scenario is selected at random from a plurality of prepared scenarios. In this case, the selection probability may be set for each scenario when preparing the scenario in advance. That is, it is not always necessary to make an equal random selection. Further, the selection probability may be weighted based on the similarity calculated from the observation information so far.

第二実施形態の対話方法における各ステップは対話制御部８の制御により実行される。対話制御部８は、シナリオ対話システム７が決定した第一シナリオに基づいて第一シナリオ発話の内容を提示するステップＳ２１と第一シナリオ発話に対してユーザが発した第一ユーザ発話を受け付けるステップＳ２２とをそれぞれ一回以上実行する第一対話フローと、雑談対話システム６がユーザの発話に基づいて決定した雑談発話の内容を提示するステップＳ２４と雑談発話に対してユーザが発した第二ユーザ発話を受け付けるステップＳ２５とをそれぞれ一回以上実行する第二対話フローとを実行する制御を行うことで、第二実施形態の対話方法を実現する。 Each step in the dialogue method of the second embodiment is executed under the control of the dialogue control unit 8. The dialogue control unit 8 presents the content of the first scenario utterance based on the first scenario determined by the scenario dialogue system 7, and accepts the first user utterance uttered by the user in response to the first scenario utterance. And the second user utterance uttered by the user in response to the chat utterance and step S24 in which the chat dialog system 6 presents the contents of the chat utterance determined based on the user utterance. The dialogue method of the second embodiment is realized by performing control to execute the second dialogue flow in which step S25 for accepting each is executed at least once.

以下、第二実施形態の対話内容の具体例を示す。ここで、Ｒは人型ロボットを表し、Ｈはユーザを表す。Ｒの後の括弧内における数字は人型ロボットの識別子である。第一実施形態と同様に、人型ロボットが誰に話かける意図であるかは、例えば、人型ロボットの頭部や視線の動きにより表出するようにしてもよいが、表出しなくてもよい。 Hereinafter, a specific example of the conversation contents of the second embodiment will be shown. Here, R represents a humanoid robot, and H represents a user. The number in parentheses after R is the identifier of the humanoid robot. As in the first embodiment, the intention that the humanoid robot intends to talk to may be expressed by, for example, the movement of the head or line of sight of the humanoid robot. Good.

Ｒ（１→２）：「食べ物だったら何が好き？」（※１：シナリオ対話システム７が選択した第一シナリオに基づいて決定した発話内容である。）
Ｒ（２→１）：「お寿司が好き。」（※１：シナリオ対話システム７が選択した第一シナリオに基づいて決定した発話内容である。）
Ｈ：「この話、さっきもしたよ。」（※「Ａ２−２．進行中のシナリオに対するユーザの振る舞いからそのシナリオについて話したくない意思が認識された場合」の具体例である。）
Ｒ（１→Ｈ）：「食べること以外で、何が好き？」（※１：以前の対話内容である「食べ物」に基づいてシナリオ対話システム７が選択した第一シナリオに含まれる発話の具体例である。）
Ｈ：「読書かな。」
Ｒ（１→Ｈ）：「好きな本のジャンルは何？」（※２：ユーザ発話の「読書」に基づいて雑談対話システム６が決定した発話の具体例である。）
Ｈ：「スポーツ漫画が好きだよ」
Ｒ（２→Ｈ）：「スポーツはよくやるの？」（※２：雑談対話を複数回繰り返す具体例である。ここではユーザ発話の「スポーツ」に基づいて雑談対話システム６が発話内容を決定している。）
Ｈ：「スポーツはあんまり好きじゃないな」
Ｒ（２→Ｈ）：「そっか」（※３：「Ｂ１−２．進行中の雑談対話に対するユーザの振る舞いからそのシナリオについて話したくない意思が認識された場合」に雑談対話の継続が困難と判断する場合の具体例である。ユーザが「あんまり好きじゃない」と話題を否定したため、雑談対話の継続が困難と判断している。ここでは、まず直前のユーザ発話を相槌で受けている。）
Ｒ（１→２）：「そういえば，ぼくはこの前スキーをしてきたよ」（※３：最初に、話題を変えようとしていることを表す「そういえば」を発話し、続いて、雑談対話の内容であった「スポーツ」に基づいてシナリオ対話システム７が選択した第二シナリオに含まれる発話を行っている。）
Ｒ（２→１）：「その体型でスキーは難しいんじゃない？」（※３：雑談対話の内容であった「スポーツ」に基づいてシナリオ対話システム７が選択した第二シナリオに含まれる発話の具体例である。）R (1 → 2): “What do you like if it was food?” (* 1: The content of the utterance determined based on the first scenario selected by the scenario dialogue system 7)
R (2 → 1): “I like sushi.” (* 1: The content of the utterance determined based on the first scenario selected by the scenario dialogue system 7)
H: “This story, I said a while ago.” (* This is a specific example of “A2-2. When a user's behavior regarding an ongoing scenario is recognized as not to talk about the scenario”)
R (1 → H): “What do you like except eating?” (* 1: Specifics of utterances included in the first scenario selected by the scenario dialogue system 7 based on the previous dialogue “food”) An example.)
H: "Is it reading?"
R (1 → H): “What is your favorite book genre?” (* 2: A specific example of an utterance determined by the chat dialogue system 6 based on “reading” of a user utterance.)
H: “I like sports comics”
R (2 → H): “Do you often do sports?” (* 2: This is a specific example in which the chat dialogue is repeated multiple times. Here, the chat dialogue system 6 determines the utterance content based on the “sport” of the user utterance. doing.)
H: “I do n’t really like sports.”
R (2 → H): “Soka” (* 3: Difficult to continue chat dialogue when “B1-2. User's behavior for ongoing chat dialogue recognizes intention to not talk about the scenario”) This is a specific example when the user denies the topic “I don't like it so much,” so it is judged that it is difficult to continue the chat conversation. .)
R (1 → 2): “Speaking of me, I've been skiing last time” (* 3: Speaking of “speaking of me”, which means that I ’m trying to change the topic first, followed by chat conversation (Speech included in the second scenario selected by the scenario dialogue system 7 based on the content "Sports")
R (2 → 1): “Isn't it difficult to ski with that figure?” (* 3: The utterances included in the second scenario selected by the scenario dialogue system 7 based on the “sports” that was the content of the chat dialogue This is a specific example.)

上記のように構成することにより、本形態の対話技術によれば、あるシナリオ対話が終了した後、雑談対話の内容に従って次のシナリオ対話の話題が決定されるため、ユーザが感じる対話への参加感が向上する。これにより、ユーザは対話システムとの対話を継続する意欲をかき立てられ、対話を長く続けることができるようになる。 By configuring as described above, according to the dialogue technique of this embodiment, after a scenario dialogue is finished, the topic of the next scenario dialogue is determined according to the contents of the chat dialogue, so participation in the dialogue felt by the user A feeling improves. As a result, the user is motivated to continue the dialogue with the dialogue system, and the dialogue can be continued for a long time.

＜変形例＞
上述した実施形態では、エージェントとしてロボットを用いて音声による対話を行う例を説明したが、上述した実施形態のロボットは身体等を有する人型ロボットであっても、身体等を有さないロボットであってもよい。また、この発明の対話技術はこれらに限定されず、人型ロボットのように身体等の実体がなく、発声機構を備えないエージェントを用いて対話を行う形態とすることも可能である。そのような形態としては、例えば、コンピュータの画面上に表示されたエージェントを用いて対話を行う形態が挙げられる。より具体的には、「LINE」（登録商標）や「２ちゃんねる」（登録商標）のような、複数アカウントがテキストメッセージにより対話を行うグループチャットにおいて、ユーザのアカウントと対話装置のアカウントとが対話を行う形態に適用することも可能である。この形態では、エージェントを表示する画面を有するコンピュータは人の近傍にある必要があるが、当該コンピュータと対話装置とはインターネットなどのネットワークを介して接続されていてもよい。つまり、本対話システムは、人とロボットなどの話者同士が実際に向かい合って話す対話だけではなく、話者同士がネットワークを介してコミュニケーションを行う会話にも適用可能である。<Modification>
In the embodiment described above, an example in which a robot is used as an agent to perform a voice conversation has been described. However, the robot in the embodiment described above is a humanoid robot having a body or the like, but a robot having no body or the like. There may be. In addition, the dialogue technique of the present invention is not limited to these, and it is also possible to adopt a form in which a dialogue is performed using an agent that does not have an entity such as a human body and does not have an utterance mechanism like a humanoid robot. As such a form, for example, a form in which dialogue is performed using an agent displayed on a computer screen can be cited. More specifically, in a group chat in which multiple accounts interact by text messages, such as “LINE” (registered trademark) and “2 channel” (registered trademark), the user's account and the dialog device account interact. It is also possible to apply to the form which performs. In this form, the computer having the screen for displaying the agent needs to be in the vicinity of the person, but the computer and the interactive device may be connected via a network such as the Internet. That is, this dialogue system can be applied not only to a dialogue in which speakers such as a person and a robot actually talk each other but also to a conversation in which the speakers communicate via a network.

変形例の対話システム２０は、図５に示すように、入力部１、発話決定部３、および提示部５を備える。図５の例では、変形例の対話システム２０は１台の対話装置２１からなり、変形例の対話装置２１は、入力部１、発話決定部３、および提示部５を備える。発話決定部３は、外部に存在する雑談対話システム６およびシナリオ対話システム７と通信可能なインターフェースを備える。雑談対話システム６およびシナリオ対話システム７は同様の機能を持つ処理部として対話装置内に構成しても構わない。 As shown in FIG. 5, the interactive system 20 according to the modification includes an input unit 1, an utterance determination unit 3, and a presentation unit 5. In the example of FIG. 5, the interactive system 20 according to the modified example includes one interactive device 21, and the interactive device 21 according to the modified example includes an input unit 1, an utterance determining unit 3, and a presenting unit 5. The utterance determination unit 3 includes an interface capable of communicating with the chat dialogue system 6 and the scenario dialogue system 7 existing outside. The chat dialogue system 6 and the scenario dialogue system 7 may be configured in the dialogue apparatus as processing units having similar functions.

変形例の対話装置は、例えば、スマートフォンやタブレットのようなモバイル端末、もしくはデスクトップ型やラップトップ型のパーソナルコンピュータなどの情報処理装置である。以下、対話装置がスマートフォンであるものとして説明する。提示部５はスマートフォンが備える液晶ディスプレイである。この液晶ディスプレイにはチャットアプリケーションのウィンドウが表示され、ウィンドウ内にはグループチャットの対話内容が時系列に表示される。グループチャットとは、チャットにおいて複数のアカウントが互いにテキストメッセージを投稿し合い対話を展開する機能である。このグループチャットには、対話装置が制御する仮想的な人格に対応する複数の仮想アカウントと、ユーザのアカウントとが参加しているものとする。すなわち、本変形例は、エージェントが、対話装置であるスマートフォンの液晶ディスプレイに表示された仮想アカウントである場合の一例である。ユーザはソフトウェアキーボードを用いて入力部１へ発話内容を入力し、自らのアカウントを通じてグループチャットへ投稿することができる。発話決定部３はユーザのアカウントからの投稿を雑談対話システム６またはシナリオ対話システム７へ入力し、各対話システムから得た発話内容を、各仮想アカウントを通じてグループチャットへ投稿する。なお、スマートフォンに搭載されたマイクロホンと音声認識機能を用い、ユーザが発声により入力部１へ発話内容を入力する構成としてもよい。また、スマートフォンに搭載されたスピーカと音声合成機能を用い、各対話システムから得た発話内容を、各仮想アカウントに対応する音声でスピーカから出力する構成としてもよい。 The interactive apparatus of the modified example is an information processing apparatus such as a mobile terminal such as a smartphone or a tablet, or a desktop or laptop personal computer. In the following description, it is assumed that the interactive device is a smartphone. The presentation unit 5 is a liquid crystal display included in the smartphone. A chat application window is displayed on the liquid crystal display, and conversation contents of the group chat are displayed in time series in the window. The group chat is a function in which a plurality of accounts post a text message to each other and develop a conversation in the chat. It is assumed that a plurality of virtual accounts corresponding to a virtual personality controlled by the dialogue apparatus and a user account participate in this group chat. That is, this modification is an example in which the agent is a virtual account displayed on a liquid crystal display of a smartphone that is an interactive device. The user can input the utterance content to the input unit 1 using the software keyboard and post it to the group chat through his / her account. The utterance determination unit 3 inputs a post from the user's account to the chat dialogue system 6 or the scenario dialogue system 7 and posts the utterance content obtained from each dialogue system to the group chat through each virtual account. In addition, it is good also as a structure which inputs the utterance content to the input part 1 by a utterance using the microphone and voice recognition function which were mounted in the smart phone. Moreover, it is good also as a structure which outputs the utterance content obtained from each dialog system from the speaker with the audio | voice corresponding to each virtual account, using the speaker and speech synthesis function which were mounted in the smart phone.

なお、エージェントの発話が提示部５（ディスプレイ）に同時に表示されると、対話が活発である、あるいは盛り上がっている、と好意的に感じられる可能性がある一方、ユーザが一度に読み切れないほどの量のテキストが提示されると、ユーザが対話を継続しにくく感じる可能性があるため、ユーザに認知負荷をかけられない場面、落ち着いた雰囲気が求められる場面など、使用される状況に応じて、発話を順次表示することにしてもよい。なお、順次表示する場合には、所定の時間間隔を開けて表示してもよい。時間間隔は固定であっても可変であってもよい。 If the agent's utterance is displayed on the presentation unit 5 (display) at the same time, it may be positively felt that the conversation is active or exciting, but the user cannot read it all at once. If the amount of text is presented, the user may feel it is difficult to continue the conversation, so depending on the situation used, such as when the user is unable to put a cognitive load or where a calm atmosphere is required, The utterances may be displayed sequentially. In addition, when displaying sequentially, you may open and display a predetermined time interval. The time interval may be fixed or variable.

以上、この発明の実施の形態について説明したが、具体的な構成は、これらの実施の形態に限られるものではなく、この発明の趣旨を逸脱しない範囲で適宜設計の変更等があっても、この発明に含まれることはいうまでもない。実施の形態において説明した各種の処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 As described above, the embodiments of the present invention have been described, but the specific configuration is not limited to these embodiments, and even if there is a design change or the like as appropriate without departing from the spirit of the present invention, Needless to say, it is included in this invention. The various processes described in the embodiments are not only executed in time series according to the description order, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes.

［プログラム、記録媒体］
上記実施形態で説明した対話装置における各種の処理機能をコンピュータによって実現する場合、対話装置が有すべき機能の処理内容はプログラムによって記述される。また、上記変形例で説明した対話システムにおける各種の処理機能をコンピュータによって実現する場合、対話システムが有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記対話装置および対話システムにおける各種の処理機能がコンピュータ上で実現される。[Program, recording medium]
When various processing functions in the interactive device described in the above embodiment are realized by a computer, the processing contents of the functions that the interactive device should have are described by a program. When various processing functions in the interactive system described in the above modification are realized by a computer, the processing contents of the functions that the interactive system should have are described by a program. Then, by executing this program on a computer, various processing functions in the interactive device and the interactive system are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 This program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記憶装置に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるASP（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, this computer reads the program stored in its own storage device and executes the process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. A configuration in which the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes a processing function only by an execution instruction and result acquisition without transferring a program from the server computer to the computer. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, the present apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

Claims

When the dialog system that determines the utterance based on the scenario stored in advance is called the first dialog system, and the dialog system that determines the utterance based on at least the user's utterance content is called the second dialog system,
A first presentation step in which the presentation unit presents an open utterance;
An utterance accepting step in which the input unit accepts a user's utterance for the open utterance;
A second presentation step in which the presenting unit presents the utterance determined by the second dialogue system based on the utterance of the user;
A third presentation step in which the presentation unit presents the utterance determined by the first dialogue system after the second presentation step;
Interactive method including

The dialogue method according to claim 1,
Each of the above presentation steps is performed by a plurality of robots,
In the first presentation step, any one of the plurality of robots speaks.
The utterance receiving step is to recognize and accept the user's utterance,
In the second presentation step, a first robot that is one of the plurality of robots speaks,
In the third presentation step, a second robot that is different from the first robot among the plurality of robots speaks.
How to interact.

The dialogue method according to claim 1,
Each of the above presentation steps is performed by multiple accounts in group chat.
In the first presentation step, the group chat is presented by any one of the plurality of accounts.
The utterance receiving step is to receive the user's post in the group chat,
The second presentation step is to present by the first account that is one of the plurality of accounts in the group chat,
The third presentation step is to present a second account different from the first account among the plurality of accounts in the group chat.
How to interact.

When the dialog system that determines the utterance based on the scenario stored in advance is called the first dialog system, and the dialog system that determines the utterance based on at least the user's utterance content is called the second dialog system,
An open utterance is determined, an utterance is determined by the second dialog system based on the user's utterance for the open utterance after the open utterance, and an utterance is determined by the first dialog system after the utterance determined by the second dialog system. An utterance determination unit to determine;
An input unit for receiving the user's utterance for the open utterance;
The open utterance determined by the utterance determination unit is presented, the utterance determination unit presents the utterance determined by the second dialog system after the user's utterance, and the utterance determination unit is displayed by the first dialog system. A presentation unit for presenting the determined utterance after the utterance determined by the second dialogue system;
Interactive system including

An interactive device for determining an utterance to be presented by an interactive system including at least an input unit that receives a user's utterance and a presentation unit that presents the utterance,
When the dialog system that determines the utterance based on the scenario stored in advance is called the first dialog system, and the dialog system that determines the utterance based on at least the user's utterance content is called the second dialog system,
An open utterance is determined, an utterance is determined by the second dialog system based on the user's utterance for the open utterance after the open utterance, and an utterance is determined by the first dialog system after the utterance determined by the second dialog system. An interactive device including an utterance determining unit for determining.

The program for making a computer perform each step of the dialogue method in any one of Claim 1 to 3.

A program for causing a computer to function as the interactive device according to claim 5.