JPWO2017200075A1

JPWO2017200075A1 - Dialogue method, dialogue system, dialogue scenario generation method, dialogue scenario generation device, and program

Info

Publication number: JPWO2017200075A1
Application number: JP2018518374A
Authority: JP
Inventors: 弘晃杉山; 豊美目黒; 淳司大和; 雄一郎吉川; 石黒　浩; 浩石黒
Original assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Current assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Priority date: 2016-05-20
Filing date: 2017-05-19
Publication date: 2018-11-22
Anticipated expiration: 2037-05-19
Also published as: JP6755509B2; WO2017200075A1

Abstract

対話システムが行う対話方法は、対話システムが、発話を生成する発話生成ステップと、対話システムが、発話生成ステップが生成した発話の少なくとも一部を曖昧化する、または／および、発話生成ステップが生成した発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話を変換後発話として得る発話決定ステップと、対話システムが、発話決定ステップが得た変換後発話を提示する発話提示ステップと、を含む。The dialog method performed by the dialog system includes an utterance generation step in which the dialog system generates an utterance, and the dialog system obfuscates at least a part of the utterance generated in the utterance generation step, and / or the utterance generation step generates Utterance decision step that obtains the utterance generated by replacing the word included in the utterance with a word that does not have the meaning of the word, and the dialogue system presents the utterance after conversion obtained by the utterance decision step An utterance presentation step.

Description

この発明は、人とコミュニケーションを行うロボットなどに適用可能な、コンピュータが人間と自然言語を用いて対話を行う技術に関する。 The present invention relates to a technology in which a computer interacts with a human using a natural language, which can be applied to a robot that communicates with a human.

近年、人とコミュニケーションを行うロボットの研究開発が進展しており、様々な現場で実用化されてきている。例えば、コミュニケーションセラピーの現場において、ロボットが孤独感を抱える人の話し相手となる利用形態がある。具体的には、老人介護施設においてロボットが入居者の傾聴役となることで、入居者の孤独感を癒す役割を担うことができると共に、ロボットとの会話している姿を見せ、入居者とその家族や介護士など周りの人々との会話のきっかけを作ることができる。また、例えば、コミュニケーション訓練の現場において、ロボットが練習相手となる利用形態がある。具体的には、外国語学習施設においてロボットが外国語学習者の練習相手となることで、外国語学習を効率的に進めることができる。また、例えば、情報提示システムとしての応用において、ロボット同士の対話を聞かせることを基本としながら、時折人に話しかけることで、退屈させずに人を対話に参加させ、人が受け入れやすい形で情報を提示することができる。具体的には、街中の待ち合わせ場所やバス停、駅のホームなどで人が時間を持て余している際や、自宅や教室などで対話に参加する余裕がある際に、ニュースや商品紹介、蘊蓄及び知識紹介、教育（例えば、子供の保育及び教育、大人への一般教養教授、モラル啓発など）など、効率的な情報提示が期待できる。さらに、例えば、情報収集システムとしての応用において、ロボットが人に話しかけながら情報を収集する利用形態がある。ロボットとのコミュニケーションにより対話感を保持できるため、人に聴取されているという圧迫感を与えずに情報収集することができる。具体的には、個人情報調査や市場調査、商品評価、推薦商品のための趣向調査などに応用することが想定されている。このように人とロボットのコミュニケーションは様々な応用が期待されており、ユーザとより自然に対話を行うロボットの実現が期待される。また、スマートフォンの普及により、LINE(登録商標)のように、複数ユーザでほぼリアルタイムにチャットを行うことにより、ユーザ間での会話を楽しむチャットサービスも実施されている。このチャットサービスにユーザとロボットとの会話の技術を適用すれば、チャット相手となるユーザがいなくても、ユーザとより自然に会話を行うチャットサービスの実現が可能となる。本明細書では、これらのサービスで用いられるロボットやチャット相手などのユーザの対話相手となるハードウェアやユーザの対話相手となるハードウェアとしてコンピュータを機能させるためのコンピュータソフトウェアなどを総称してエージェントと呼ぶこととする。エージェントは、ユーザの対話相手となるものであるため、ロボットやチャット相手などのように擬人化されていたり、人格化されていたり、性格や個性を有していたりするものであってもよい。 In recent years, research and development of robots that communicate with people have progressed and have been put to practical use in various fields. For example, in the field of communication therapy, there is a usage form in which a robot is a conversation partner of a person who is lonely. Specifically, in a nursing home for the elderly, the robot can play a role of listening to the resident, so he can play a role in healing the loneliness of the resident and show a conversation with the robot. You can create conversation opportunities with the family and caregivers. Further, for example, there is a usage form in which a robot is a practice partner in a communication training field. Specifically, the foreign language learning can be efficiently advanced by having the robot become a practice partner of the foreign language learner at the foreign language learning facility. Also, for example, in application as an information presentation system, it is basic to let robots talk to each other, but by talking to people from time to time, people can participate in the conversation without being bored, and information that is easy for people to accept Can be presented. Specifically, news, product introductions, accumulation and knowledge when people have time to spend at meeting places in the city, bus stops, station platforms, etc., or when they can afford to participate in dialogues at home or in classrooms. Efficient information presentation such as introduction and education (for example, childcare and education for children, general education professor for adults, moral education, etc.) can be expected. Furthermore, for example, in application as an information collection system, there is a utilization form in which a robot collects information while talking to a person. Since communication can be maintained through communication with the robot, information can be collected without giving a sense of pressure that people are listening. Specifically, it is assumed to be applied to personal information surveys, market surveys, product evaluations, preference surveys for recommended products, and the like. As described above, various applications of human-robot communication are expected, and realization of a robot that can more naturally interact with users is expected. In addition, with the spread of smartphones, a chat service for enjoying conversations between users by chatting in almost real time with a plurality of users, such as LINE (registered trademark), has been implemented. If the technology of conversation between the user and the robot is applied to this chat service, it is possible to realize a chat service that allows conversation with the user more naturally even if there is no user as a chat partner. In this specification, the term “agent” is used to collectively refer to hardware used as a user's dialogue partner such as a robot or chat partner used in these services, or computer software for causing a computer to function as hardware as a user's dialogue partner. I will call it. Since the agent is a user's conversation partner, the agent may be anthropomorphic, personalized, or have personality or personality, such as a robot or chat partner.

これらのサービスの実現のキーとなるのは、ハードウェアやコンピュータソフトウェアにより実現されるエージェントが人間と自然に対話を行うことができる技術である。 The key to the realization of these services is a technology that enables agents realized by hardware and computer software to naturally interact with humans.

対話システムの従来技術として非特許文献１，２が知られている。非特許文献１では、所定のシナリオに沿って発話を生成する。また、非特許文献１では、人の発話に依らず、「そっか」や「ふーん」等の相槌や曖昧な回答を示す発話を生成する。非特許文献２では、一つ以上前の人または対話システムの発話からのみに基づいて次の発話を生成する。 Non-Patent Documents 1 and 2 are known as conventional technologies of the dialogue system. In Non-Patent Document 1, an utterance is generated along a predetermined scenario. Further, in Non-Patent Document 1, an utterance that shows a conflict or an ambiguous answer, such as “Soka” or “Fun”, is generated regardless of a person's utterance. In Non-Patent Document 2, the next utterance is generated based only on the utterances of one or more previous persons or dialogue systems.

有本庸浩，吉川雄一郎，石黒浩，「複数体のロボットによる音声認識なし対話の印象評価」，日本ロボット学会学術講演会,2016Arimoto Yasuhiro, Yoshikawa Yuichiro, Ishiguro Hiroshi, "Impression Evaluation of Dialogue Without Speech Recognition Using Multiple Robots", The Robotics Society of Japan, 2016 杉山弘晃、目黒豊美、東中竜一郎、南泰浩、「任意の話題を持つユーザ発話に対する係り受けと用例を利用した応答文の生成」，人工知能学会論文誌，2015, 30(1), 183-194.Sugiyama Hiroaki, Meguro Toyomi, Higashinaka Ryuichiro, Minami Yasuhiro, “Generation of Responses Using Dependency and Examples for User Utterances with Arbitrary Topics”, Transactions of the Japanese Society for Artificial Intelligence, 2015, 30 (1), 183- 194.

人と対話システムとの対話を継続することで(i)メンタルヘルスケアができる、(ii)エンターテイメントになる、(iii)コミュニケーションの練習になる、(iv)対話システムへの親近感が増す、等の効果を得ることができる。 Continuing dialogue between people and the dialogue system (i) mental health care, (ii) becoming entertainment, (iii) communication practice, (iv) increasing familiarity with the dialogue system, etc. The effect of can be obtained.

しかしながら、非特許文献１のように所定のシナリオに沿って発話を生成するのでは、想定外の質問に答えられず、会話が続かない。また、非特許文献１では、質問したロボットが人間の返答に対して、「そっか」などの曖昧なレスポンスにとどめる。そのように人の発話を促した後は、別のロボットが少し話題をずらした発話をする。このようにすることで、人に、自分の発話が無視された感じを与えないようにしている。しかし、「そっか」などの曖昧なレスポンスが続くと、人は自分の発言を流されてばかりいると感じてしまい、会話が続かない。非特許文献２のように応答文を生成するのでは、１問１答になってしまい、会話が続かない。 However, if an utterance is generated according to a predetermined scenario as in Non-Patent Document 1, an unexpected question cannot be answered and the conversation does not continue. Further, in Non-Patent Document 1, the robot that asked the question only responds with an ambiguous response such as “Soka” in response to a human response. After prompting the person to speak like that, another robot utters a little off topic. This prevents people from feeling that their utterances have been ignored. However, if an ambiguous response such as “Soka” continues, people feel that they are being uttered and their conversation does not continue. If a response sentence is generated as in Non-Patent Document 2, it becomes 1 question 1 answer, and the conversation does not continue.

本発明は、対話システムの発話の一部をいったん曖昧なものにして、その曖昧な部分を確認させる対話をするための発話を差し込ませることで、対話のターン数を増やすことができる対話方法、対話システム、対話シナリオ生成方法、対話シナリオ生成装置、及びプログラムを提供することを目的とする。 The present invention relates to a dialogue method capable of increasing the number of turns of a dialogue by inserting a utterance to make a part of the utterance of the dialogue system once ambiguous and confirming the ambiguous part. It is an object to provide a dialogue system, a dialogue scenario generation method, a dialogue scenario generation device, and a program.

上記の課題を解決するために、本発明の一態様によれば、対話システムが行う対話方法は、対話システムが、発話を生成する発話生成ステップと、対話システムが、発話生成ステップが生成した発話の少なくとも一部を曖昧化する、または／および、発話生成ステップが生成した発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話を変換後発話として得る発話決定ステップと、対話システムが、発話決定ステップが得た変換後発話を提示する発話提示ステップと、を含む。 In order to solve the above problem, according to one aspect of the present invention, a dialog method performed by a dialog system includes an utterance generation step in which the dialog system generates an utterance, and an utterance generated by the dialog system in the utterance generation step. Utterance decision step for obtaining as a post-conversion utterance an utterance generated by obscuring at least a part of the utterance or / and replacing a word included in the utterance generated by the utterance generation step with a word having no meaning of the word And an utterance presentation step in which the dialogue system presents the converted utterance obtained by the utterance determination step.

上記の課題を解決するために、本発明の他の態様によれば、対話システムが行う対話方法は、対話システムが、所定の発話の少なくとも一部を曖昧化する、または／および、所定の発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話である第１の発話を提示する第１発話提示ステップと、対話システムが、第１の発話を提示した後に、第１の発話を一意に解釈できていないことが読み取れる発話である第２の発話を提示する第２発話提示ステップと、を含む。 In order to solve the above-mentioned problem, according to another aspect of the present invention, a dialogue method performed by the dialogue system is such that the dialogue system obfuscates at least a part of the predetermined utterance and / or the predetermined utterance. A first utterance presentation step of presenting a first utterance that is an utterance generated by replacing a word included in the word with a word having no meaning of the word, and after the dialogue system presents the first utterance, A second utterance presentation step of presenting a second utterance that is an utterance that can be read that the first utterance cannot be uniquely interpreted.

上記の課題を解決するために、本発明の他の態様によれば、対話システムが行う対話方法は、対話システムが、所定の発話の少なくとも一部を曖昧化する、または／および、所定の発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話である第１の発話を提示する第１発話提示ステップと、対話システムが、第１の発話を提示した後に、第１の発話を1つの意味に特定するための質問を含む発話である第２の発話を提示する第２発話提示ステップと、を含む。 In order to solve the above-mentioned problem, according to another aspect of the present invention, a dialogue method performed by the dialogue system is such that the dialogue system obfuscates at least a part of the predetermined utterance and / or the predetermined utterance. A first utterance presentation step of presenting a first utterance that is an utterance generated by replacing a word included in the word with a word having no meaning of the word, and after the dialogue system presents the first utterance, A second utterance presentation step of presenting a second utterance that is an utterance including a question for specifying the first utterance as one meaning.

上記の課題を解決するために、本発明の他の態様によれば、対話システムが行う対話方法は、対話システムが、少なくとも一部が曖昧化された発話、または／および、意味を有さない語を含む発話、を提示する第１の発話提示ステップと、対話システムが、第１の発話提示ステップによる提示の後に、曖昧化された部分に対応する具体内容を含む発話、または／および、意味を有さない語の部分に対応する意味を有する語を含む発話、を提示する第２の発話提示ステップと、を含む。 In order to solve the above problems, according to another aspect of the present invention, a dialogue method performed by a dialogue system is provided that the dialogue system has at least a partially obscured utterance, and / or no meaning. A first utterance presenting step for presenting an utterance including a word, and an utterance including a concrete content corresponding to an obfuscated part after the presentation by the first utterance presenting step, and / or meaning A second utterance presentation step of presenting an utterance including a word having a meaning corresponding to a part of the word not having the word.

上記の課題を解決するために、本発明の他の態様によれば、対話シナリオ生成方法において、対話システムが行う対話に用いる対話シナリオを対話シナリオ生成装置が生成する。対話シナリオ生成方法において、対話シナリオ生成装置が、所定の発話の少なくとも一部を曖昧化する、または／および、所定の発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話である第１の発話と、第１の発話を提示した後に提示する発話であり、第１の発話を一意に解釈できていないことが読み取れる発話である第２の発話と、を含む対話シナリオを生成する。 In order to solve the above-described problem, according to another aspect of the present invention, in the dialog scenario generation method, the dialog scenario generation device generates a dialog scenario used for a dialog performed by the dialog system. In a dialogue scenario generation method, a dialogue scenario generation device obfuscates at least a part of a predetermined utterance and / or replaces a word included in the predetermined utterance with a word having no meaning of the word Including a first utterance that is an uttered utterance, and a second utterance that is an utterance that is presented after the first utterance is presented and that can be read that the first utterance cannot be uniquely interpreted Generate a scenario.

上記の課題を解決するために、本発明の他の態様によれば、対話シナリオ生成方法において、対話システムが行う対話に用いる対話シナリオを対話シナリオ生成装置が生成する。対話シナリオ生成方法において、対話シナリオ生成装置が、所定の発話の少なくとも一部を曖昧化する、または／および、所定の発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話である第１の発話と、第１の発話を提示した後に提示する発話であり、第１の発話を1つの意味に特定するための質問を含む発話である第２の発話と、を含む対話シナリオを生成する。 In order to solve the above-described problem, according to another aspect of the present invention, in the dialog scenario generation method, the dialog scenario generation device generates a dialog scenario used for a dialog performed by the dialog system. In a dialogue scenario generation method, a dialogue scenario generation device obfuscates at least a part of a predetermined utterance and / or replaces a word included in the predetermined utterance with a word having no meaning of the word A first utterance that is an utterance and a second utterance that is an utterance that is presented after the first utterance is presented and that includes a question for identifying the first utterance as one meaning, Generate an interactive scenario that includes

上記の課題を解決するために、本発明の他の態様によれば、対話シナリオ生成方法において、対話システムが行う対話に用いる対話シナリオを対話シナリオ生成装置が生成する。対話シナリオ生成方法において、対話シナリオ生成装置が、少なくとも一部が曖昧化された発話、または／および、意味を有さない語を含む発話である第１の発話と、第１の発話を提示した後に提示する発話であり、曖昧化された部分に対応する具体内容を含む発話、または／および、意味を有さない語の部分に対応する意味を有する語を含む発話、である第２の発話と、を含む対話シナリオを生成する。 In order to solve the above-described problem, according to another aspect of the present invention, in the dialog scenario generation method, the dialog scenario generation device generates a dialog scenario used for a dialog performed by the dialog system. In the dialogue scenario generation method, the dialogue scenario generation device presents a first utterance and a first utterance that are at least partially obscured utterances and / or utterances that include meaningless words. A second utterance that is an utterance that is presented later and that includes a specific content corresponding to the obfuscated part, and / or an utterance that includes a word that has a meaning corresponding to a part of a word that has no meaning And generate an interactive scenario.

上記の課題を解決するために、本発明の他の態様によれば、対話システムは、発話を生成する発話生成部と、発話生成部が生成した発話の少なくとも一部を曖昧化する、または／および、発話生成ステップが生成した発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話を変換後発話として得る発話決定部と、発話決定部が得た変換後発話を提示する発話提示部と、を含む。 In order to solve the above problems, according to another aspect of the present invention, an interactive system obfuscates an utterance generation unit that generates an utterance and at least a part of the utterances generated by the utterance generation unit. And an utterance determination unit that obtains an utterance generated by replacing a word included in the utterance generated in the utterance generation step with a word having no meaning of the word, and a post-conversion utterance obtained by the utterance determination unit And an utterance presentation unit for presenting.

上記の課題を解決するために、本発明の他の態様によれば、対話システムは、所定の発話の少なくとも一部を曖昧化する、または／および、所定の発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話である第１の発話を提示する第１発話提示部と、第１の発話を提示した後に、第１の発話を一意に解釈できていないことが読み取れる発話である第２の発話を提示する第２発話提示部と、を含む。 In order to solve the above problems, according to another aspect of the present invention, the dialogue system obfuscates at least a part of a predetermined utterance and / or changes a word included in the predetermined utterance to the word. The first utterance presenting unit that presents the first utterance that is an utterance generated by replacing the word with no meaning, and the first utterance cannot be uniquely interpreted after presenting the first utterance A second utterance presentation unit that presents a second utterance that is a readable utterance.

上記の課題を解決するために、本発明の他の態様によれば、対話システムは、所定の発話の少なくとも一部を曖昧化する、または／および、所定の発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話である第１の発話を提示する第１発話提示部と、第１の発話を提示した後に、第１の発話を1つの意味に特定するための質問を含む発話である第２の発話を提示する第２発話提示部と、を含む。 In order to solve the above problems, according to another aspect of the present invention, the dialogue system obfuscates at least a part of a predetermined utterance and / or changes a word included in the predetermined utterance to the word. A first utterance presenting unit that presents a first utterance that is an utterance generated by replacing words that have no meaning, and a first utterance after specifying the first utterance in one meaning A second utterance presentation unit that presents a second utterance that is an utterance including the question.

上記の課題を解決するために、本発明の他の態様によれば、対話システムは、少なくとも一部が曖昧化された発話、または／および、意味を有さない語を含む発話、を提示した後に、曖昧化された部分に対応する具体内容を含む発話、または／および、意味を有さない語の部分に対応する意味を有する語を含む発話、を提示する提示部を含む。 In order to solve the above problems, according to another aspect of the present invention, a dialogue system presented an utterance that is at least partially obfuscated, and / or includes an utterance that has no meaning. A presentation unit for presenting an utterance including specific content corresponding to the obfuscated part and / or an utterance including a word having a meaning corresponding to a part of a word having no meaning is included.

上記の課題を解決するために、本発明の他の態様によれば、対話シナリオ生成装置は、対話システムが行う対話に用いる対話シナリオを生成する。対話シナリオ生成装置は、所定の発話の少なくとも一部を曖昧化する、または／および、所定の発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話である第１の発話と、第１の発話を提示した後に提示する発話である、第１の発話を一意に解釈できていないことが読み取れる発話である第２の発話と、を含む対話シナリオを生成する。 In order to solve the above-described problem, according to another aspect of the present invention, a dialog scenario generation device generates a dialog scenario used for a dialog performed by a dialog system. The dialogue scenario generation device is a first utterance generated by obscuring at least a part of a predetermined utterance and / or replacing a word included in the predetermined utterance with a word having no meaning of the word. And a second utterance that is an utterance that is presented after the first utterance is presented and that can be read that the first utterance cannot be uniquely interpreted is generated.

上記の課題を解決するために、本発明の他の態様によれば、対話シナリオ生成装置は、対話システムが行う対話に用いる対話シナリオを生成する。対話シナリオ生成装置は、所定の発話の少なくとも一部を曖昧化する、または／および、所定の発話に含まれる語を当該語の意味を有さない語に置き換えることにより生成した発話である第１の発話と、第１の発話を提示した後に提示する発話であり、第１の発話を1つの意味に特定するための質問を含む発話である第２の発話と、を含む対話シナリオを生成する。 In order to solve the above-described problem, according to another aspect of the present invention, a dialog scenario generation device generates a dialog scenario used for a dialog performed by a dialog system. The dialogue scenario generation device is a first utterance generated by obscuring at least a part of a predetermined utterance and / or replacing a word included in the predetermined utterance with a word having no meaning of the word. And a second utterance which is an utterance to be presented after presenting the first utterance and including a question for specifying the first utterance as one meaning is generated. .

上記の課題を解決するために、本発明の他の態様によれば、対話シナリオ生成装置は、対話システムが行う対話に用いる対話シナリオを生成する。対話シナリオ生成装置は、少なくとも一部が曖昧化された発話、または／および、意味を有さない語を含む発話である第１の発話と、第１の発話を提示した後に提示する発話であり、曖昧化された部分に対応する具体内容を含む発話、または／および、意味を有さない語の部分に対応する意味を有する語を含む発話、である第２の発話と、を含む対話シナリオを生成する。 In order to solve the above-described problem, according to another aspect of the present invention, a dialog scenario generation device generates a dialog scenario used for a dialog performed by a dialog system. The dialogue scenario generation device includes a first utterance that is an utterance that is at least partially obfuscated, and / or an utterance that includes a meaningless word, and an utterance that is presented after the first utterance is presented , An utterance containing specific content corresponding to the obfuscated part, and / or a second utterance containing an utterance containing a word having a meaning corresponding to an insignificant word part. Is generated.

本発明によれば、対話のターン数を増やすことができるという効果を奏する。 According to the present invention, it is possible to increase the number of dialogue turns.

第一実施形態に係る対話システムの機能ブロック図。The functional block diagram of the dialogue system which concerns on 1st embodiment. 第一実施形態に係る対話システムの処理フローの例を示す図。The figure which shows the example of the processing flow of the dialogue system which concerns on 1st embodiment. 第二実施形態に係る対話システムの機能ブロック図。The functional block diagram of the dialogue system which concerns on 2nd embodiment. 第二実施形態に係る対話システムの処理フローの例を示す図。The figure which shows the example of the processing flow of the dialogue system which concerns on 2nd embodiment. 第三実施形態に係る対話システムの機能ブロック図。The functional block diagram of the dialogue system which concerns on 3rd embodiment. 第三実施形態に係る対話システムの処理フローの例を示す図。The figure which shows the example of the processing flow of the dialogue system which concerns on 3rd embodiment. 変形例３に係る対話システムを示す図。The figure which shows the dialogue system which concerns on the modification 3.

以下、本発明の実施形態について、説明する。なお、以下の説明に用いる図面では、同じ機能を持つ構成部や同じ処理を行うステップには同一の符号を記し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described. In the drawings used for the following description, constituent parts having the same function and steps for performing the same process are denoted by the same reference numerals, and redundant description is omitted.

＜本発明の実施形態のポイント＞
本発明の実施形態では、ユーザと対話する対話システムであって、複数のロボットであるロボットＲ１とロボットＲ２とを備える対話システムに、対話システムが生成した発話文（元の発話文）をそのまま発話させるのではなく、元の発話文の少なくとも一部を曖昧化する、または／および、元の発話文に含まれる語を当該語の意味を有さない語に置き換えることにより生成した文(以下、これらの方法により生成した文を「曖昧化した文」ともいう)に変換し、変換した文をあるロボットに発話させる。そしてその後に、一意に解釈できていないことを表出する文を他のロボットに発話させる、または／および、曖昧化した文を発話したロボットに元の発話文を発話させる（言い直しさせる）。ロボットが一意に解釈できていないことを表出する発話をすると、ユーザはその発話から、ロボットが一意に解釈できていないことを読み取ることができる。すなわち、一意に解釈できていないことを表出する文とは、一意に解釈できていないことが読み取れる文である。このようにすれば、対話システムが生成する発話文を増やさずに、ユーザに納得感を与えるロボットの発話を増やすことができ、結果としてユーザと対話システムとの対話のターンを増やすことができる。曖昧化した文とは、例えば、元の発話文の一部を(i)指示語で置換した文、(ii)言い間違った語で置換した文、あるいは(iii)省略した文等である。なお、(i)指示語で置換した文の場合、すなわち、元の発話文を提示することなく元の発話文の一部を指示語で置換した文を提示した場合には、指示語が何を意味するのかにより、2つ以上の意味に解釈される。(ii)言い間違った語で置換した文の場合、すなわち、元の発話文を提示することなく元の発話文の一部を言い間違った語で置換した文を提示した場合には、少なくとも(a)前後の文脈を考慮して、言い間違いのない本来の意味に解釈可能な文と、(b)言い間違った語で置換した文との2つの意味に解釈される。なお、言い間違った語が元の言葉とあまりにも違うと、対話感に影響が出るので、以下に例示するように、元の言葉と一音違う意味のある言葉等、音が似ていている語を言い間違った語として用いることが望ましい。(iii)省略した文の場合、すなわち、元の発話文を提示することなく元の発話文の一部を省略した文を提示した場合には、省略した部分に何が補われるのかにより、2つ以上の意味に解釈される。以下、元の発話文と変換した発話文の例を示す。
元の発話文：「『車』、燃費が大事だよね」
(i)指示語で置換した文：「『あれ』、燃費が大事だよね」
(ii)言い間違った語で置換した文：「『くるみ』、燃費が大事だよね」
(iii)省略した文：「『省略』燃費が大事だよね」<Points of Embodiment of the Present Invention>
In the embodiment of the present invention, an utterance sentence (original utterance sentence) generated by the dialog system is directly uttered in an interactive system that interacts with a user and includes a plurality of robots R1 and R2. Rather than obscure at least a part of the original utterance, and / or a sentence generated by replacing a word included in the original utterance with a word that does not have the meaning of the word (hereinafter, Sentences generated by these methods are also converted to “obfuscated sentences”, and a robot is uttered by the converted sentences. Then, after that, another robot utters a sentence expressing that it cannot be uniquely interpreted, and / or a robot that uttered an obfuscated sentence utters (rephrases) the original utterance sentence. When an utterance expressing that the robot cannot be interpreted uniquely, the user can read from the utterance that the robot cannot be interpreted uniquely. That is, the sentence expressing that it cannot be interpreted uniquely is a sentence that can be read that it cannot be interpreted uniquely. In this way, it is possible to increase the number of utterances of the robot that gives a sense of satisfaction to the user without increasing the number of utterances generated by the dialogue system, and as a result, the number of dialogue turns between the user and the dialogue system can be increased. The ambiguous sentence is, for example, (i) a sentence in which a part of the original utterance sentence is replaced with an instruction word, (ii) a sentence in which an incorrect word is replaced, or (iii) a sentence that is omitted. (I) In the case of a sentence replaced with a directive, that is, when a sentence in which a part of the original utterance is replaced with a directive without presenting the original utterance, what is the directive Is interpreted as two or more meanings. (ii) In the case of a sentence that has been replaced with the wrong word, i.e., when a sentence that has been partially replaced with the wrong word is presented without presenting the original utterance, at least ( Considering the context before and after, it is interpreted in two meanings: a sentence that can be interpreted in its original meaning without mistakes, and (b) a sentence that is replaced with the wrong words. Note that if the wrong word is too different from the original word, the dialogue will be affected. As shown below, the sound is similar, such as a word that has a meaning different from the original word. It is desirable to say a word and use it as a wrong word. (iii) In the case of an abbreviated sentence, that is, when a sentence in which a part of the original utterance sentence is omitted without presenting the original utterance sentence, 2 Interpreted to more than one meaning. Hereinafter, examples of the original utterance sentence and the converted utterance sentence are shown.
Original utterance: “Car, fuel consumption is important.”
(i) Sentences replaced with directives: “That”, fuel consumption is important.
(ii) Sentence replaced with the wrong word: “Kurumi”, fuel efficiency is important.
(iii) Omitted sentence: “'Omitted' fuel economy is important.”

対話のターンを増やすために、本実施形態の対話システムにおいては、あるロボットＲ１に発話させるために生成された発話文を、曖昧化した文に変換し、曖昧化した文をロボットＲ１が発話する。そして、ロボットＲ１による曖昧化した文の発話の後に、曖昧化した文の内容を確認する発話文を別のロボットＲ２が発話する。ただし、対話システムにロボットＲ１一体しか含まれない場合は、ロボットＲ１による曖昧化した文の発話の後に、ロボットＲ１自身が曖昧化した文の内容を確認する発話文を発話してもよい。 In order to increase the number of dialogue turns, in the dialogue system according to the present embodiment, an utterance sentence generated to make a certain robot R1 speak is converted into an obfuscated sentence, and the robot R1 utters the obfuscated sentence. . Then, after the utterance of the ambiguous sentence by the robot R1, another robot R2 utters an utterance sentence that confirms the content of the obfuscated sentence. However, when only the robot R1 is included in the dialogue system, after the utterance of the ambiguous sentence by the robot R1, the utterance sentence that confirms the content of the ambiguous sentence may be uttered by the robot R1 itself.

なお、ユーザと対話システムとの対話中の任意のタイミングで曖昧化した文の発話を挿入してよいが、対話が長くなりすぎないように留意する必要がある。元の発話をそのまま発しても、ユーザがロボットの発話を理解あるいはそれに共感を持ちにくいと判断される場合に曖昧化した文の発話を挿入すると特に効果的である。例えば、(A)対話システムが話題を転換する（例えば、シナリオ対話を開始する）タイミング、(B)対話システムの発話に対するユーザの返答が対話システムの予測する返答からはずれているとき、(C)対話システムが話題の変化を検出したとき、などにロボットに曖昧化した文の発話させるとよい。対話システムが話題の変化を検出する方法としては、例えば、対話中の文や単語を利用して、(a)word2vecを利用した話題語間の距離、(b)文全体の単語をword2vecにかけて平均を取った場合の文間距離、(c)単語のcos類似度などを求め、距離が所定の値以上の場合またはcos類似度が所定の値以下の場合(要は、二つの発話が関連していない、または、二つの発話の関連が薄いことを所定の指標が示している場合)、話題が変化したと判定する方法がある。上述の（A）〜(C)等のタイミングは、ユーザが対話システムの発話内容を理解しづらくなるタイミングであるため、ロボットＲ１に曖昧化した文を発話させて、ロボットＲ１とロボットＲ２との間の対話を挿入することで、人と対話システムとの間の対話のターン数を増やすとともに、人に対話システムの発話内容の理解を促すことができる。 Note that an obscured sentence utterance may be inserted at any timing during the dialogue between the user and the dialogue system, but care must be taken so that the dialogue does not become too long. Even if the original utterance is uttered as it is, it is particularly effective to insert an obscured sentence utterance when it is determined that the user does not understand the robot's utterance or is difficult to empathize with it. For example, (A) when the dialog system changes the topic (eg, starts a scenario dialog), (B) when the user's response to the dialog system utterance deviates from the dialog system's expected response (C) When the conversation system detects a change in topic, it is better to let the robot utter an ambiguous sentence. As a method of detecting changes in the topic, the dialog system uses, for example, a sentence or word in conversation, (a) the distance between topic words using word2vec, and (b) the average word over the word2vec. (C) If the distance is greater than or equal to a predetermined value or if the cos similarity is less than or equal to a predetermined value (in short, two utterances are related) There is a method for determining that the topic has changed (if the predetermined index indicates that the relationship between the two utterances is not high). The timings (A) to (C) described above are timings at which it becomes difficult for the user to understand the utterance contents of the dialogue system, so that the robot R1 utters an ambiguous sentence and the robot R1 and the robot R2 By inserting the dialogue between them, it is possible to increase the number of dialogue turns between the person and the dialogue system and to encourage the person to understand the utterance contents of the dialogue system.

前述のように、元の発話文を(i)指示語で置換した文、(ii)言い間違った語で置換した文、(iii)省略した文、に変換するときの、指示語の対象となる語、言い間違いの対象となる語、省略の対象となる語に特に限定はないが、例えば、主要な語を対象とする。例えば、tf-idf(文書中の単語に関する重み)に基づき、元の発話文に含まれる語のうちの重みの大きい語を対象となる語として選択してもよい。また、元の発話文に含まれる語のうちで、他の語との関係で、上位概念となる語を対象となる語として選択してもよい。例えば、元の発話文に『セダン』と『車』とが含まれる場合、語『セダン』との関係で、上位概念となる語『車』を対象となる語として選択することができる。 As described above, the target words when converting the original utterance sentence to (i) the sentence replaced with the instruction word, (ii) the sentence replaced with the wrong word, and (iii) the omitted sentence. There are no particular limitations on the word, the word to be mistaken, and the word to be omitted, but for example, the main word is the target. For example, based on tf-idf (weight for a word in a document), a word with a higher weight among words included in the original utterance may be selected as a target word. Moreover, you may select the word used as a high-order concept as a target word among the words contained in the original speech sentence in relation to another word. For example, in the case where “sedan” and “car” are included in the original utterance sentence, the word “car” as a superordinate concept can be selected as the target word in relation to the word “sedan”.

以下、対話の例を示す。以下の対話の例は、発話t(1)、発話t(2)、・・・の順に発話されているものとする。なお、Ｘ→ＹはＸからＹに対して発話していることを意味し、『』内に指示語、言い間違い、省略の何れかを示す。 An example of dialogue is shown below. In the following dialogue example, it is assumed that utterances t (1), utterances t (2),. X → Y means that the utterance is made from X to Y, and indicates an instruction word, a wrong word, or an omission in “”.

（例１：指示語）
発話t(1):ロボットＲ１→ロボットＲ２：僕、『あれ』、セダンが好きなんだよね
発話t(2):ロボットＲ２→ロボットＲ１：それって車の話？
発話t(3):ロボットＲ１→ロボットＲ２：そう、車。僕、車、セダンがすきなんだよね(Example 1: Indicator)
Utterance t (1): Robot R1 → Robot R2: I like “that”, sedan. Utterance t (2): Robot R2 → Robot R1: Is that a car story?
Utterance t (3): Robot R1 → Robot R2: Yes, car. I like cars, sedans, right?

（例２：省略）
発話t(1):ロボットＲ１→ロボットＲ２：僕、『省略』セダンが好きなんだよね
発話t(2):ロボットＲ２→ロボットＲ１：それって何の話？
発話t(3):ロボットＲ１→ロボットＲ２：うん、車。僕、車、セダンがすきなんだよね(Example 2: omitted)
Utterance t (1): Robot R1 → Robot R2: I like “omitted” sedans. Utterance t (2): Robot R2 → Robot R1: What is that story?
Utterance t (3): Robot R1 → Robot R2: Yeah, car. I like cars, sedans, right?

（例３：言い間違い）
発話t(1):ロボットＲ１→ロボットＲ２：僕、『くるみ』、セダンが好きなんだよね
発話t(2):ロボットＲ２→ロボットＲ１：え、何の話？
発話t(3):ロボットＲ１→ロボットＲ２：ごめん、車。僕、車、セダンがすきなんだよね(Example 3: Misspoken)
Utterance t (1): Robot R1 → Robot R2: I like “Kurumi”, sedan. Utterance t (2): Robot R2 → Robot R1: What are you talking about?
Utterance t (3): Robot R1 → Robot R2: Sorry, car. I like cars, sedans, right?

なお、例１〜３では、曖昧化した発話t(1)の直後に対話システムが行う発話（この例ではロボットＲ２の発話t(2)）は、曖昧化した発話である第１の発話t(1)の曖昧化されている部分を1つの意味に特定する語を含む発話である。しかし、曖昧化した発話t(1)の直後に対話システムが行う発話は、このような発話に限らず、一意に解釈できていないことの表出する発話、すなわち、一意に解釈できていないことが読み取れる発話、であればよく、例えば、次のような発話でもよい。 In Examples 1 to 3, the utterance performed by the dialogue system immediately after the obscured utterance t (1) (in this example, the utterance t (2) of the robot R2) is the first utterance t that is an obscured utterance. This is an utterance containing a word that identifies the obfuscated part of (1) as one meaning. However, the utterances made by the dialogue system immediately after the obscured utterance t (1) are not limited to such utterances, and the utterances that indicate that they cannot be interpreted uniquely, that is, they cannot be interpreted uniquely. Can be read, for example, the following utterance may be used.

（例４：言い間違い）
発話t(1):ロボットＲ１→ロボットＲ２：僕、『くるみ』、セダンが好きなんだよね
発話t(2):ロボットＲ２→ロボットＲ１：ごめん、意味が分からない。
発話t(3):ロボットＲ１→ロボットＲ２：ごめん、車。僕、車、セダンがすきなんだよね(Example 4: Misspoken)
Utterance t (1): Robot R1 → Robot R2: I like “Kurumi”, sedan. Utterance t (2): Robot R2 → Robot R1: I ’m sorry, I do n’t understand.
Utterance t (3): Robot R1 → Robot R2: Sorry, car. I like cars, sedans, right?

この例では、ロボットＲ２の発話「ごめん、意味が分からない。」は、曖昧化した発話である第１の発話t(1)の曖昧化されている部分を1つの意味に特定する語を含む発話とは言えないが、ロボットＲ２の発話t(2)の対象となった発話をしたロボットであるロボットＲ１が曖昧化した発話を特定する語を発話せざるを得ないようにする発話である。例１〜４の曖昧化した発話t(1)の直後に対話システムが行う発話t(2)は、まとめると、一意に解釈できていないことの表出する発話と言え、別の言い方をすると、一意に解釈できていないことが読み取れる発話と言え、さらに別の言い方をすると、1つの意味に特定する語を含む発話をさせるための発話と言える。 In this example, the utterance “I'm sorry, I don't understand the meaning” of the robot R2 includes a word that identifies the obfuscated part of the first utterance t (1), which is an obscured utterance, as one meaning. Although it is not an utterance, it is an utterance that makes it necessary for the robot R1, which is the robot that made the utterance t (2) of the robot R2, to speak a word that identifies the obscured utterance. . The utterance t (2) performed by the dialogue system immediately after the obscured utterance t (1) in Examples 1 to 4 can be summarized as an utterance that expresses that it cannot be uniquely interpreted. It can be said that it is an utterance that can be read that it cannot be interpreted uniquely, and in other words, it can be said to be an utterance that causes a utterance including a word specified in one meaning.

なお、上述の対話の例には、発話を誰に対して行っているかを記載してあるが、発話を誰に対して行うかを限定する必要はない。例えば、例１はロボットＲ１とロボットＲ２との間の対話であるが、ロボットＲ１とロボットＲ２と人との間の対話であってもよい。なお、発話を誰に対して行っているかを限定する場合には、例えば、ロボットの頭部や視線の動きにより発話の対象となる相手が誰であるかを表出するようにすればよい。 In addition, although the example of the above-mentioned dialogue describes to whom the utterance is performed, it is not necessary to limit to whom the utterance is performed. For example, Example 1 is an interaction between the robot R1 and the robot R2, but may be an interaction between the robot R1, the robot R2, and a person. In order to limit to whom the utterance is performed, for example, it is only necessary to indicate who the utterance target is based on the movement of the robot's head or line of sight.

＜第一実施形態＞
図１は第一実施形態に係る対話システム１００の機能ブロック図を、図２は第一実施形態に係る対話システム１００の処理フローを示す。<First embodiment>
FIG. 1 is a functional block diagram of the interactive system 100 according to the first embodiment, and FIG. 2 shows a processing flow of the interactive system 100 according to the first embodiment.

対話システム１００は、ロボットＲ１、Ｒ２と、対話装置１９０と、を含む。対話装置１９０は、音声合成部１１０と、発話生成部１５０と、発話決定部１２０とを含む。ロボットＲ１は提示部１０１−１を含み、ロボットＲ２は提示部１０１−２を含む。提示部１０１−１、１０１−２は、ロボットＲ１、Ｒ２の周囲に音響信号を発するものであり、例えばスピーカである。 The dialogue system 100 includes robots R1 and R2 and a dialogue device 190. The dialogue device 190 includes a speech synthesis unit 110, an utterance generation unit 150, and an utterance determination unit 120. The robot R1 includes a presentation unit 101-1, and the robot R2 includes a presentation unit 101-2. The presentation units 101-1 and 101-2 emit acoustic signals around the robots R1 and R2, for example, speakers.

対話システム１００は、ユーザである人が２体のロボットであるＲ１とＲ２と対話するためのものであり、対話装置１９０が生成した発話音声（合成音声データ）をロボットＲ１、Ｒ２が発話するものである。以下、対話システム１００が行う動作の流れを説明する。 The dialogue system 100 is for a user being a dialogue with two robots R1 and R2, and the robots R1 and R2 speak the speech (synthetic speech data) generated by the dialogue device 190. It is. Hereinafter, the flow of operations performed by the interactive system 100 will be described.

発話生成部１５０は、発話文(テキストデータ)を生成し（Ｓ１）、発話決定部１２０及び音声合成部１１０に出力する。以下、この発話文をオリジナル発話文ともいう。発話生成部１５０内には、例えば、非特許文献２に記載された「雑談対話システム」と呼ばれる対話システムのように、入力された単語をトリガーとして、あらかじめ記述しておいたルールに従って発話のテキストを生成して出力する対話システムが備えられている。事前に設定された単語に基づき発話生成部１５０内に備えられた当該対話システムが、オリジナル発話文を生成して出力する。 The utterance generation unit 150 generates an utterance sentence (text data) (S1) and outputs the utterance sentence (text data) to the utterance determination unit 120 and the speech synthesis unit 110. Hereinafter, this utterance sentence is also called an original utterance sentence. In the utterance generation unit 150, for example, as in the dialogue system called “chat dialogue system” described in Non-Patent Document 2, the text of the utterance is input according to the rules described in advance using the input word as a trigger. An interactive system that generates and outputs a message is provided. The dialogue system provided in the utterance generation unit 150 generates and outputs an original utterance sentence based on a preset word.

または、発話生成部１５０内には、例えば、非特許文献１に記載された「シナリオ対話システム」と呼ばれる対話システムのように、事前に設定された単語が対話システム内に予め記憶されたシナリオの選択肢に対応する場合に、その選択肢に対応して予め記憶された発話のテキストを選択して出力する対話システムが備えられている。発話生成部１５０内に備えられた当該対話システムが予め記憶しているテキストからオリジナル発話文を選択して出力する。ここで、事前に設定された単語に基づいてオリジナル発話文を生成する例で説明したが、事前に単語を設定していなくてもよい。例えば、オリジナル発話文生成時点が継続中の対話の一時点である場合には、事前に設定した単語に代わり、オリジナル発話文生成時点より前の対話中の単語（トピック等）を用いてもよい。 Alternatively, in the utterance generation unit 150, for example, a scenario in which words set in advance are stored in advance in the dialogue system as in a dialogue system called “scenario dialogue system” described in Non-Patent Document 1. In the case of corresponding to an option, there is provided an interactive system for selecting and outputting an utterance text stored in advance corresponding to the option. The dialogue system provided in the utterance generation unit 150 selects and outputs an original utterance sentence from texts stored in advance. Here, the example in which the original utterance is generated based on the word set in advance has been described, but the word may not be set in advance. For example, when the original utterance generation time is one of ongoing conversations, a word (topic, etc.) in the dialog before the original utterance generation time may be used instead of the preset word. .

発話決定部１２０は、発話生成部１５０から入力されたオリジナル発話文を受け取り、オリジナル発話文の少なくとも一部を曖昧化することにより生成した発話文を変換後発話文（テキストデータ）として得（Ｓ２）、音声合成部１１０に出力する。なお、前述の発話文の少なくとも一部を(i)指示語で置換する処理、(ii)言い間違った語で置換する処理、あるいは、(iii)省略する処理、が発話文を曖昧化することに相当する。なお、発話文の少なくとも一部を言い間違った語に置換する処理とは、発話文に含まれる語を当該語の意味を有さない語に置換する処理とも言える。 The utterance determination unit 120 receives the original utterance sentence input from the utterance generation part 150, and obtains the utterance sentence generated by obscuring at least a part of the original utterance sentence as a converted utterance sentence (text data) (S2). ) And output to the speech synthesizer 110. Note that (i) the process of replacing at least part of the utterance sentence with the instruction word, (ii) the process of replacing with the wrong word, or (iii) the process of omitting obscures the utterance sentence. It corresponds to. Note that the process of saying at least a part of an uttered sentence and replacing it with an incorrect word can be said to be a process of replacing a word included in the uttered sentence with a word having no meaning of the word.

音声合成部１１０は、発話決定部１２０から入力された変換後発話文（テキストデータ）に対して音声合成を行い合成音声データを得て（Ｓ３）、得られた合成音声データをロボットＲ１の提示部１０１−１に出力する。 The speech synthesis unit 110 performs speech synthesis on the converted utterance (text data) input from the utterance determination unit 120 to obtain synthesized speech data (S3), and presents the obtained synthesized speech data to the robot R1. To the unit 101-1.

提示部１０１−１は、音声合成部１１０から入力された変換後発話文の合成音声データに対応する音声を再生する、すなわち、変換後発話文をロボットＲ１の発話として提示する（Ｓ４）。合成音声データの提示先として、変換後発話文の合成音声データに対応する音声を再生したロボット自身を提示先とする場合には、独り言を話しているように処理を行えばよい。 The presentation unit 101-1 reproduces the voice corresponding to the synthesized voice data of the converted utterance sentence input from the voice synthesizer 110, that is, presents the converted utterance sentence as the utterance of the robot R1 (S4). When the robot itself that has reproduced the voice corresponding to the synthesized speech data of the converted speech sentence is used as the presentation destination of the synthesized speech data, the processing may be performed as if speaking alone.

発話生成部１５０は、発話決定部１２０から入力された変換後発話文の内容を確認する発話文（以下、「確認発話文」ともいう）を生成し（Ｓ６）、音声合成部１１０へ出力する。なお、確認発話文は、変換後発話文を１つの意味に特定するための質問を含む。 The utterance generation unit 150 generates an utterance sentence (hereinafter, also referred to as “confirmed utterance sentence”) for confirming the content of the converted utterance sentence input from the utterance determination unit 120 (S6), and outputs the utterance sentence to the speech synthesis unit 110. . The confirmation utterance sentence includes a question for specifying the converted utterance sentence as one meaning.

確認発話文とは、例えば、(i)正しい内容を特定して確認を行う発話文、(ii)内容を何ら特定せずに確認を行う発話文、(iii)間違った内容を特定して確認を行う発話文、である。例えば、(i)正しい内容を特定して確認を行う発話文としては「それって、ＸＸのこと？」(ここではＸＸは正しい内容であり、変換後発話文を１つの意味に特定する語である)との発話文、(ii)内容を何ら特定せずに確認を行う発話文としては「何のこと？」との発話文、(iii)間違った内容を特定して確認を行う発話文としては「それって、ＹＹのこと？」「ＹＹって言った？」「ＹＹって何のこと？」（ここではＹＹは間違いである）などの発話文が有り得る。発話生成部１５０が(i)から(iii)の何れの種類の確認発話文を具体的にどのように生成するかは、発話生成部１５０内で予め定めておいてもよいし、発話生成部１５０外から対話システムの運用者が指定できるようにしておいてもよい。また、正しい内容は、発話生成部１５０が生成したオリジナル発話文と発話決定部１２０が生成した変換後発話文とに基づいて、発話決定部１２０が曖昧化した部分に対応する語をオリジナル発話文から取得することにより決定する。間違った内容は、発話生成部１５０が生成したオリジナル発話文と発話決定部１２０が生成した変換後発話文とに基づいて、発話決定部１２０が曖昧化した部分に対応する語をオリジナル発話文から取得して、取得した語に基づいて生成すればよい。なお、確認発話文は、変換後発話文を１つの意味に特定するための質問を含むが、変換後発話文を１つの意味に特定するものではない。 Confirmation utterances include, for example, (i) an utterance sentence that identifies and confirms the correct contents, (ii) an utterance sentence that confirms without identifying any contents, and (iii) identifies and confirms incorrect contents Is an utterance sentence. For example, (i) As an utterance sentence that identifies and confirms the correct content, “That is XX?” (Here, XX is the correct content, and the word that specifies the converted utterance sentence as one meaning) (Ii) As an utterance sentence to be confirmed without specifying any contents, an utterance sentence as "What?", (Iii) An utterance to identify and confirm wrong contents As the sentence, there may be utterances such as “That is YY?” “YY?” “What is YY?” (YY is wrong here). How the utterance generation unit 150 specifically generates the type of confirmation utterance sentence (i) to (iii) may be determined in advance in the utterance generation unit 150, or the utterance generation unit The operator of the dialogue system may be designated from outside 150. Also, the correct content is based on the original utterance sentence generated by the utterance generation section 150 and the post-conversion utterance sentence generated by the utterance determination section 120, and the word corresponding to the obscured part of the utterance determination section 120 is the original utterance sentence Determine by obtaining from. The incorrect content is based on the original utterance sentence generated by the utterance generation section 150 and the post-conversion utterance sentence generated by the utterance determination section 120 from the original utterance sentence. What is necessary is just to acquire and produce | generate based on the acquired word. The confirmation utterance includes a question for specifying the converted utterance with one meaning, but does not specify the converted utterance with one meaning.

音声合成部１１０は、発話生成部１５０から入力された確認発話文に対して音声合成を行い合成音声データを得て（Ｓ７）、得られた合成音声データをロボットＲ２の提示部１０１−２に出力する。 The speech synthesizer 110 performs speech synthesis on the confirmed utterance sentence input from the utterance generator 150 to obtain synthesized speech data (S7), and sends the obtained synthesized speech data to the presentation unit 101-2 of the robot R2. Output.

提示部１０１−２は、音声合成部１１０から入力された確認発話文の合成音声データに対応する音声を再生する、すなわち、確認発話文をロボットＲ２の発話として提示する（Ｓ８）。 The presentation unit 101-2 reproduces the voice corresponding to the synthesized speech data of the confirmation utterance input from the speech synthesizer 110, that is, presents the confirmation utterance as the utterance of the robot R2 (S8).

発話生成部１５０は、さらに、確認発話文に応答する発話文（以下、「応答発話文」ともいう）を生成し（Ｓ９）、音声合成部１１０へ出力する。なお、応答発話文は、確認発話文に含まれる質問の回答であり、かつ、変換後発話文を１つの意味に特定する語を含む。 The utterance generation unit 150 further generates an utterance sentence (hereinafter also referred to as “response utterance sentence”) in response to the confirmation utterance sentence, and outputs the utterance sentence to the speech synthesis unit 110 (S9). The response utterance is an answer to a question included in the confirmation utterance and includes a word that specifies the converted utterance as one meaning.

音声合成部１１０は、発話生成部１５０から入力された応答発話文に対して音声合成を行い合成音声データを得て（Ｓ１０）、得られた合成音声データをロボットＲ１の提示部１０１−１に出力する。なお、確認発話文が(i)正しい内容を特定して確認を行う発話文である場合には、応答発話文は、確認内容を肯定した上で、正しい内容を復唱する発話文などであり、例えば、「うん、ＸＸ」である。確認発話文が(ii)内容を何ら特定せずに確認を行う発話文である場合や(iii)間違った内容を特定して確認を行う発話文である場合には、応答発話文は、正しい内容を確認する発話文などであり、例えば、「ＸＸ」である。 The speech synthesizer 110 performs speech synthesis on the response utterance sentence input from the utterance generator 150 to obtain synthesized speech data (S10), and sends the obtained synthesized speech data to the presentation unit 101-1 of the robot R1. Output. If the confirmation utterance is (i) an utterance that identifies and confirms the correct content, the response utterance is an utterance that repeats the correct content after confirming the confirmation content, For example, “Yeah, XX”. If the confirmation utterance is (ii) an utterance that confirms without identifying any content, or (iii) an utterance that identifies and confirms the wrong content, the response utterance is correct. An utterance sentence for confirming the contents, for example, “XX”.

提示部１０１−１は、音声合成部１１０から入力された応答発話文の合成音声データに対応する音声を再生する、すなわち、応答発話文をロボットＲ１の発話として提示する（Ｓ１１）。 The presentation unit 101-1 reproduces the voice corresponding to the synthesized voice data of the response utterance text input from the voice synthesizer 110, that is, presents the response utterance text as the utterance of the robot R1 (S11).

音声合成部１１０は、発話生成部１５０から入力されたオリジナル発話文に対して音声合成を行い合成音声データを得て（Ｓ１２）、得られた合成音声データをロボットＲ１の提示部１０１−１に出力する。 The speech synthesizer 110 performs speech synthesis on the original utterance sentence input from the utterance generator 150 to obtain synthesized speech data (S12), and the obtained synthesized speech data is sent to the presentation unit 101-1 of the robot R1. Output.

提示部１０１−１は、音声合成部１１０から入力されたオリジナル発話文の合成音声データに対応する音声を再生する、すなわち、オリジナル発話文をロボットＲ１の発話として提示する（Ｓ１３）。 The presentation unit 101-1 reproduces the voice corresponding to the synthesized voice data of the original utterance sentence input from the voice synthesizer 110, that is, presents the original utterance sentence as the utterance of the robot R1 (S13).

＜各部の処理について＞
以下では、対話システム１００の各部の処理を中心に説明する。なお、ここでは、各発話文の音声合成を、対話の開始よりも前に行う例を示す。<About processing of each part>
Below, it demonstrates centering around the process of each part of the dialogue system 100. FIG. Here, an example is shown in which speech synthesis of each utterance is performed before the start of the dialogue.

［ロボットＲ１、Ｒ２］
ロボットＲ１とＲ２は、ユーザと対話するためのものであり、ユーザの近くに配置され、対話装置１９０が生成した発話を行う。[Robots R1, R2]
The robots R <b> 1 and R <b> 2 are for interacting with the user, are placed near the user, and perform utterances generated by the dialog device 190.

［発話生成部１５０］
発話生成部１５０は、オリジナル発話文を生成し、発話決定部１２０及び音声合成部１１０に出力する。[Speech generator 150]
The utterance generation unit 150 generates an original utterance sentence and outputs it to the utterance determination unit 120 and the speech synthesis unit 110.

また、発話生成部１５０は、発話決定部１２０で得た変換後発話文とオリジナル発話文を用いて、発話決定部１２０が曖昧化した部分を求め、曖昧化した部分を確認するための確認発話文を生成し、音声合成部１１０に出力する。発話決定部１２０が曖昧化した部分は、変換後発話文とオリジナル発話文との差分から求めることができる。なお、発話決定部１２０から曖昧化した部分を示す情報を受け取る構成としてもよい。 Further, the utterance generation unit 150 uses the converted utterance sentence and the original utterance sentence obtained by the utterance determination section 120 to obtain an obscured part, and a confirmation utterance for confirming the obscured part. A sentence is generated and output to the speech synthesizer 110. The part of the utterance determination unit 120 that is obscured can be obtained from the difference between the converted utterance and the original utterance. In addition, it is good also as a structure which receives the information which shows the part obscure from the speech determination part 120. FIG.

さらに、発話生成部１５０は、確認発話文に対する応答発話文を生成し、音声合成部１１０に出力する。 Furthermore, the utterance generation unit 150 generates a response utterance sentence for the confirmation utterance sentence, and outputs the response utterance sentence to the speech synthesis unit 110.

なお、オリジナル発話文、確認発話文、応答発話文を音声合成部１１０に出力する際には、それぞれの発話文に発話順を表す情報を付加して出力する。例えば、確認発話文の発話順がN+2であり、応答発話文の発話順がN+3であり、オリジナル発話文の発話順がN+4である。Nは0以上の整数の何れかである。確認発話文、応答発話文、オリジナル発話文の発話順は連続している必要はないが、順序は入れ替わらないものとする。発話生成部１５０は、確認発話文、応答発話文、オリジナル発話文を発話するロボットも決定してもよく、この場合には、発話するロボットを表す情報も音声合成部１１０に出力する。 When the original utterance sentence, the confirmation utterance sentence, and the response utterance sentence are output to the speech synthesizer 110, information indicating the utterance order is added to each utterance sentence and output. For example, the utterance order of the confirmation utterance sentence is N + 2, the utterance order of the response utterance sentence is N + 3, and the utterance order of the original utterance sentence is N + 4. N is any integer of 0 or more. The utterance order of the confirmation utterance sentence, the response utterance sentence, and the original utterance sentence does not need to be continuous, but the order is not changed. The utterance generation unit 150 may also determine a robot that utters a confirmation utterance sentence, a response utterance sentence, and an original utterance sentence. In this case, information representing the uttering robot is also output to the speech synthesizer 110.

［発話決定部１２０］
発話決定部１２０は、発話生成部１５０で生成したオリジナル発話文を受け取り、オリジナル発話文の少なくとも一部を曖昧化することにより生成した発話文を変換後発話文として得、音声合成部１１０に出力する。また、発話決定部１２０は、変換後発話文または曖昧化した部分を示す情報を発話生成部１５０に出力する。[Speech determination unit 120]
The utterance determination unit 120 receives the original utterance sentence generated by the utterance generation part 150, obtains an utterance sentence generated by obscuring at least a part of the original utterance sentence, and outputs it to the speech synthesizer 110. To do. Further, the utterance determination unit 120 outputs information indicating the converted utterance sentence or the ambiguous part to the utterance generation unit 150.

なお、変換後発話文を音声合成部１１０に出力する際に、変換後発話文に発話順を表す情報を付加して出力する。変換後発話文の発話順は例えばN+1であり、確認発話文、応答発話文、オリジナル発話文より前である。発話決定部１２０は、変換後発話文を発話するロボットも決定してもよく、この場合には、発話するロボットを表す情報も音声合成部１１０に出力する。 When the converted utterance is output to the speech synthesizer 110, information indicating the utterance order is added to the converted utterance and output. The utterance order of the converted utterance is, for example, N + 1, and is before the confirmation utterance, the response utterance, and the original utterance. The utterance determination unit 120 may also determine a robot that utters the converted utterance sentence. In this case, information representing the uttered robot is also output to the speech synthesis unit 110.

［音声合成部１１０］
音声合成部１１０は、発話生成部１５０から入力された確認発話文、応答発話文、オリジナル発話文、及び、発話決定部１２０から入力された変換後発話文に対する音声合成を行って、合成音声データを得て、得られた合成音声データをロボットＲ１の提示部１０１−１またはロボットＲ２の提示部１０１−２に出力する。発話決定部１２０は、発話順を表す情報に従って、合成音声データを出力する。よって、本実施形態では、変換後発話文、確認発話文、応答発話文、オリジナル発話文の順に合成音声データを出力する。発話決定部１２０から発話文と共に当該発話文を発話するロボットを表す情報が入力された場合には、当該情報に対応するロボットの提示部に対して合成音声データを出力する。[Speech synthesizer 110]
The speech synthesis unit 110 performs speech synthesis on the confirmation utterance sentence, the response utterance sentence, the original utterance sentence, and the converted utterance sentence input from the utterance determination part 120, which are input from the utterance generation unit 150, and the synthesized speech data And the obtained synthesized voice data is output to the presentation unit 101-1 of the robot R1 or the presentation unit 101-2 of the robot R2. The utterance determination unit 120 outputs the synthesized voice data according to the information indicating the utterance order. Therefore, in this embodiment, the synthesized speech data is output in the order of the converted utterance sentence, the confirmation utterance sentence, the response utterance sentence, and the original utterance sentence. When information representing a robot that utters the utterance sentence is input together with the utterance sentence from the utterance determination section 120, the synthesized voice data is output to the presentation section of the robot corresponding to the information.

［提示部１０１−１、１０１−２］
提示部１０１−１、１０１−２は、音声合成部１１０から入力された合成音声データに対応する音声を再生する。これにより、ユーザはロボットＲ１またはＲ２の発話を受聴することになり、ユーザと対話システム１００との対話が実現される。[Presentation sections 101-1 and 101-2]
Presentation units 101-1 and 101-2 reproduce voice corresponding to the synthesized voice data input from voice synthesis unit 110. As a result, the user listens to the speech of the robot R1 or R2, and the dialogue between the user and the dialogue system 100 is realized.

＜効果＞
以上の構成により、対話のターン数を増やすことができる。<Effect>
With the above configuration, the number of dialogue turns can be increased.

対話システムと人との会話において、対話システムの発話が人の予測や共感を超えた文脈のものと解釈されるものとなってしまう場合がある。例えば、対話システムの発話が突然で、急には、その発話意図が理解できない場合である。本実施形態では、文の一部をいったん曖昧なものにして、その曖昧性を確認させる対話をするための発話を別のロボットに差し込ませている。対話システムがこのような発話を差し込むことで、人が対話システムの発話意図を理解しやすくなる。 In a conversation between a dialogue system and a person, the utterance of the dialogue system may be interpreted as a context beyond human prediction and empathy. For example, there is a case where the utterance of the dialogue system is sudden and suddenly the utterance intention cannot be understood. In the present embodiment, a part of a sentence is once made ambiguous, and an utterance for performing a dialog for confirming the ambiguity is inserted into another robot. When the dialogue system inserts such an utterance, a person can easily understand the intention of the dialogue system.

＜第二実施形態＞
図３は第二実施形態に係る対話システム１００の機能ブロック図を、図４は第二実施形態に係る対話システム１００の処理フローを示す。<Second embodiment>
FIG. 3 shows a functional block diagram of the interactive system 100 according to the second embodiment, and FIG. 4 shows a processing flow of the interactive system 100 according to the second embodiment.

第二実施形態の対話システム１００は、第一実施形態の対話システム１００と同様に、ロボットＲ１、Ｒ２と、対話装置１９０と、を含む。第二実施形態の対話装置１９０が第一実施形態の対話装置１９０と異なるのは、発話終了検出部１４０も含むことである。第二実施形態のロボットＲ１が第一実施形態のロボットＲ１と異なるのは、入力部１０２−１も含むことであり、第二実施形態のロボットＲ２が第一実施形態のロボットＲ２と異なるのは、入力部１０２−２も含むことである。入力部１０２−１、１０２−２は、ロボットの周囲で発せられた音響信号を収音するものであり、例えばマイクロホンである。入力部はユーザが発話した発話音声を収音可能とすればよいので、入力部１０２−１、１０２−２の何れか一方を備えないでもよい。また、ユーザの近傍などの、ロボットＲ１，Ｒ２とは異なる場所に設置されたマイクロホンを入力部とし、入力部１０２−１、１０２−２の双方を備えない構成としてもよい。 Similar to the interactive system 100 of the first embodiment, the interactive system 100 of the second embodiment includes robots R1 and R2 and an interactive device 190. The interaction device 190 of the second embodiment is different from the interaction device 190 of the first embodiment in that it also includes an utterance end detection unit 140. The robot R1 of the second embodiment differs from the robot R1 of the first embodiment in that it also includes an input unit 102-1, and the robot R2 of the second embodiment is different from the robot R2 of the first embodiment. The input unit 102-2 is also included. The input units 102-1 and 102-2 collect sound signals generated around the robot, and are microphones, for example. Since the input unit only needs to be able to pick up speech uttered by the user, the input unit may not include any one of the input units 102-1 and 102-2. Further, a microphone installed in a place different from the robots R1 and R2 such as the vicinity of the user may be used as the input unit, and the input units 102-1 and 102-2 may not be provided.

以下、第二実施形態の対話システム１００が行う動作の流れを、第一実施形態の対話システム１００が行う動作の流れと異なる点を中心に説明する。 Hereinafter, the flow of operations performed by the interactive system 100 of the second embodiment will be described focusing on differences from the flow of operations performed by the interactive system 100 of the first embodiment.

まず、第二実施形態の対話システム１００は、ステップＳ１〜Ｓ４を行う。 First, the interactive system 100 according to the second embodiment performs steps S1 to S4.

ステップＳ４による変換後発話文の提示後に、入力部１０２−１、１０２−２の少なくとも何れかにおいて収音されたユーザの発話に対応する音声データは、発話終了検出部１４０に出力される。 After the post-conversion utterance text is presented in step S4, the voice data corresponding to the user utterance collected by at least one of the input units 102-1 and 102-2 is output to the utterance end detection unit 140.

発話終了検出部１４０は、入力部１０２−１、１０２−２の少なくとも何れかから収音された取得した音声データを用いて、ユーザの発話の終了を検出するか、または、ユーザの発話がないまま予め定めた時間が経過したこと、すなわち、タイムアウトしたことを検出し（Ｓ５）、発話生成部１５０に発話の終了、または、タイムアウトしたことを知らせる制御信号を出力する。 The utterance end detection unit 140 detects the end of the user's utterance using the acquired voice data collected from at least one of the input units 102-1 and 102-2, or there is no user's utterance. It is detected that a predetermined time has passed, that is, a time-out has occurred (S5), and a control signal notifying the utterance generation unit 150 of the end of the utterance or the time-out is output.

発話生成部１５０に発話終了検出部１４０からの制御信号が入力されると、第二実施形態の対話システム１００は、ステップＳ６〜Ｓ１３を行う。 When the control signal from the utterance end detection unit 140 is input to the utterance generation unit 150, the interactive system 100 according to the second embodiment performs steps S6 to S13.

すなわち、本実施形態では、変換後発話文の提示後にユーザが発話する時間を設けているものの、対話システム１００は、ユーザの発話は音声認識せずに、ユーザの発話が終了した時点、または、所定時間経過した時点で、確認発話文を提示する。なお、ユーザの発話内容が曖昧化した部分の正しい内容を含むものであろうと、間違った内容を含むものであろうと、対話システム１００が提示する確認発話文と応答発話文は、上記の(i)の場合と同様のものとすればよい。例えば、対話システム１００は、「それって、ＸＸのこと？」を確認発話文として提示し、「うん、ＸＸ」を応答発話文として提示する。 That is, in this embodiment, although the time for the user to utter after providing the converted utterance is provided, the dialog system 100 does not recognize the user's utterance and the user's utterance ends, or When a predetermined time has elapsed, a confirmation utterance is presented. It should be noted that the confirmation utterance sentence and the response utterance sentence presented by the dialog system 100 are the above (i), whether the user utterance contents include the correct contents of the ambiguous part or the incorrect contents. ). For example, the dialogue system 100 presents “That is XX?” As a confirmation utterance and “Yeah, XX” as a response utterance.

以下に、本実施形態の以下、対話の例を示す。
（例５）
発話t(1):ロボットＲ１→ユーザ：『あれ』、どんなタイプが好き？
発話t(2):ユーザ→ロボットＲ１：え、何？
発話t(3):ロボットＲ２→ロボットＲ１：それって車の話？
発話t(4):ロボットＲ１→ロボットＲ２：そう、車。車、どんなタイプが好き？
例５は、ロボットＲ１が変換後発話文t(1)を発話し、ロボットＲ１が変換後発話文t(1)を発話した後にユーザの発話を受け付ける時間を設ける。ユーザの発話t(2)が終了した時点でロボットＲ２が確認発話文t(3)を発話する。次にロボットＲ１が発話t(4)として応答発話文とオリジナル発話文を発話する例である。In the following, an example of dialogue in the present embodiment is shown below.
(Example 5)
Utterance t (1): Robot R1 → User: “That”, what type do you like?
Utterance t (2): User → Robot R1: Eh, what?
Utterance t (3): Robot R2 → Robot R1: Is that a car story?
Utterance t (4): Robot R1 → Robot R2: Yes, car. What type of car do you like?
In Example 5, the robot R1 utters the post-conversion utterance t (1), and the robot R1 utters the post-conversion utterance t (1). When the user's utterance t (2) is completed, the robot R2 utters a confirmation utterance t (3). Next, the robot R1 utters a response utterance and an original utterance as utterance t (4).

なお、本実施形態では、対話システム１００が提示する確認発話文と応答発話文はユーザの発話内容には依存しないため、本実施形態の対話システム１００は音声認識する機能を備えなくてよい。 In this embodiment, since the confirmation utterance sentence and the response utterance sentence presented by the dialogue system 100 do not depend on the user's utterance content, the dialogue system 100 of this embodiment does not have to have a voice recognition function.

＜第三実施形態＞
図５は第三実施形態に係る対話システム１００の機能ブロック図を、図６は第三実施形態に係る対話システム１００の処理フローを示す。<Third embodiment>
FIG. 5 is a functional block diagram of the dialogue system 100 according to the third embodiment, and FIG. 6 shows a processing flow of the dialogue system 100 according to the third embodiment.

第三実施形態の対話システム１００は、第二実施形態の対話システム１００と同様に、ロボットＲ１、Ｒ２と、対話装置１９０と、を含む。第三実施形態の対話装置１９０が第二実施形態の対話装置１９０と異なるのは、発話終了検出部１４０を含まず、音声認識部１４１を含むことである。 Similar to the interactive system 100 of the second embodiment, the interactive system 100 of the third embodiment includes robots R1 and R2 and an interactive device 190. The interaction device 190 of the third embodiment is different from the interaction device 190 of the second embodiment in that it does not include the utterance end detection unit 140 but includes the voice recognition unit 141.

以下、第三実施形態の対話システム１００が行う動作の流れを、第二実施形態の対話システム１００が行う動作の流れと異なる点を中心に説明する。 Hereinafter, the flow of operations performed by the interactive system 100 according to the third embodiment will be described focusing on differences from the flow of operations performed by the interactive system 100 according to the second embodiment.

まず、第三実施形態の対話システム１００は、ステップＳ１〜Ｓ４を行う。 First, the interactive system 100 according to the third embodiment performs steps S1 to S4.

ステップＳ４による変換後発話文の提示後に、入力部１０２−１、１０２−２の少なくとも何れかにおいて収音されたユーザの発話に対応する音声データは、音声認識部１４１に出力される。 After the post-conversion utterance sentence is presented in step S4, the voice data corresponding to the user's utterance collected by at least one of the input units 102-1 and 102-2 is output to the voice recognition unit 141.

音声認識部１４１は、入力部１０２−１、１０２−２の少なくとも何れかから収音された音声データを音声認識して、音声認識結果の発話文（ユーザの発話に対応する発話文）を得て（Ｓ５１）、音声認識結果の発話文を発話生成部１５０に出力する。 The voice recognition unit 141 recognizes voice data collected from at least one of the input units 102-1 and 102-2, and obtains an utterance sentence (utterance sentence corresponding to the user's utterance) as a voice recognition result. (S51), the utterance sentence of the speech recognition result is output to the utterance generation unit 150.

発話生成部１５０は、音声認識結果の発話文が生成した確認発話文と同一の内容であるか否かを判断し（Ｓ５２）、音声認識結果の発話文が生成した確認発話文と同一の内容である場合には、第三実施形態の対話システム１００は、ステップＳ６〜Ｓ８を行わずに、ステップＳ９〜Ｓ１３を行い、音声認識結果の発話文が生成した確認発話文と同一の内容ではない場合には、第三実施形態の対話システム１００は、ステップＳ６〜Ｓ１３を行う。すなわち、第三実施形態の対話システム１００は、ユーザが曖昧化した文の内容を確認する発話文を発話した場合には、曖昧化した文の内容を確認する発話文を発話せず、ユーザの発話の後に応答発話文を発話する。 The utterance generation unit 150 determines whether or not the utterance sentence of the speech recognition result has the same content as the generated confirmation utterance sentence (S52), and the same content as the confirmation utterance sentence generated by the utterance sentence of the speech recognition result. In such a case, the dialog system 100 of the third embodiment does not perform steps S6 to S8, but performs steps S9 to S13, and the speech sentence of the speech recognition result is not the same content as the confirmation utterance sentence generated. In that case, the interactive system 100 according to the third embodiment performs steps S6 to S13. That is, when the user utters an utterance sentence that confirms the content of the ambiguous sentence, the dialog system 100 of the third embodiment does not utter an utterance sentence that confirms the content of the ambiguous sentence, and the user's Speak response utterance after utterance.

なお、本実施形態では対話システム１００による変換後発話文の提示後にユーザの発話を受け付ける例について説明したが、対話システム１００による何れの発話文の提示後にユーザの発話を受け付ける構成としてもよい。また、変換後発話文の提示後のユーザの発話の音声認識結果の発話文が生成した確認発話文と同一の内容ではない場合などの、ユーザの発話が対話システム１００が予め想定した発話以外の発話を行った場合について説明する。そのような場合には、第一実施形態で説明した確認発話文、応答発話文、オリジナル発話文、の何れでもない発話文を対話システム１００が発話してもよい。例えば、発話生成部１００は、音声認識の結果が肯定してよい内容であれば「うん、ＸＸ」を応答発話文とする。一方、発話生成部１００は、音声認識の結果が否定する必要のある内容であれば「ごめん、ＸＸ」を応答発話文として生成する。発話生成部１００は、生成した何れかの応答発話文をロボットＲ１の発話として提示すればよい。 In this embodiment, the example in which the user's utterance is received after the post-conversion utterance sentence is presented by the dialogue system 100 has been described. However, the user's utterance may be accepted after any utterance sentence is presented by the dialogue system 100. In addition, the utterance of the user other than the utterance that the dialogue system 100 assumed in advance, such as the case where the utterance of the speech recognition result of the user's utterance after presentation of the converted utterance is not the same as the generated confirmation utterance A case where an utterance is performed will be described. In such a case, the dialogue system 100 may utter an utterance sentence that is not any of the confirmation utterance sentence, the response utterance sentence, and the original utterance sentence described in the first embodiment. For example, the utterance generation unit 100 sets “Yes, XX” as the response utterance if the result of the speech recognition is affirmative. On the other hand, the utterance generation unit 100 generates “sorry, XX” as a response utterance sentence if the result of speech recognition needs to be denied. The utterance generation unit 100 may present any one of the generated response utterances as the utterance of the robot R1.

なお、対話システム１００がユーザの発話を受け付ける場合には、例えば、ロボットの頭部や視線をユーザに向ける等の動きによりユーザに発話を促すようにしてもよい。 Note that when the interactive system 100 accepts a user's utterance, the user may be prompted to utter by moving the robot head or line of sight toward the user, for example.

＜変形例１＞
上述の実施形態では、対話システムは、発話の前にロボットの発話文(オリジナル発話文、変換後発話文、確認発話文、応答発話文)を生成していたが、実際には、最初の発話をする前に、生成、音声合成を行っておき、合成音声データを図示しない記憶部に記憶しておき、実際の対話時には、所定のタイミングで各合成音声データを提示部１０１−１または１０１−２で再生する構成としてもよい。また、最初の発話をする前に、発話の前にロボットの発話文を生成し、発話文を図示しない記憶部に記憶しておき、実際の対話時には、所定のタイミングで、各発話文を音声合成して合成音声データを得て、提示部１０１−１または１０１−２で再生する構成としてもよい。<Modification 1>
In the above-described embodiment, the dialog system generates the utterance sentence of the robot (original utterance sentence, post-conversion utterance sentence, confirmation utterance sentence, response utterance sentence) before the utterance. Before performing, generation and speech synthesis are performed, the synthesized speech data is stored in a storage unit (not shown), and each synthesized speech data is presented at a predetermined timing during the actual dialogue. It is good also as a structure reproduced | regenerated by 2. In addition, before the first utterance, the robot utterance is generated before the utterance, and the utterance is stored in a storage unit (not shown), and each utterance is spoken at a predetermined timing during actual dialogue. It is good also as a structure which synthesize | combines and obtains synthetic | combination audio | voice data and reproduces | regenerates by the presentation part 101-1 or 101-2.

＜変形例２＞
上述の実施形態では２台のロボットを含む対話システムについて説明した。しかし、上述したように発話決定部１２０が発話するロボットを決定しない形態などもある。そのため、対話システム１００に必ずしも２台のロボットを必要としない形態がある。この形態とする場合には、対話システム１００に含むロボットを１台としてもよい。また、上述したように発話決定部１２０が２台のロボットを発話するロボットとして決定する形態がある。この形態を対話システム１００に３台以上のロボットを含む構成で動作させてもよい。<Modification 2>
In the above-described embodiment, an interactive system including two robots has been described. However, as described above, there is a form in which the utterance determination unit 120 does not determine the robot to speak. Therefore, there is a form in which the dialog system 100 does not necessarily require two robots. In the case of this form, the robot included in the dialogue system 100 may be one. In addition, as described above, there is a form in which the utterance determining unit 120 determines two robots as uttering robots. This form may be operated in a configuration in which the dialog system 100 includes three or more robots.

＜変形例３＞
対話システム１００が複数台のロボットを含む構成において、どのロボットが発話しているのかをユーザが判別可能とされていれば、提示部の個数はロボットの個数と同一でなくてもよい。また、提示部はロボットに設置されていなくてもよい。どのロボットが発話しているのかをユーザが判別可能とする方法としては、合成する音声の声質をロボットごとに異ならせる、複数のスピーカを用いてロボットごとに定位を異ならせる、などの周知の技術を用いればよい。<Modification 3>
In a configuration in which the dialogue system 100 includes a plurality of robots, the number of presentation units may not be the same as the number of robots as long as the user can determine which robot is speaking. Further, the presentation unit may not be installed in the robot. Well-known technologies such as making it possible for the user to determine which robot is speaking by changing the voice quality of the synthesized speech for each robot, or making the localization different for each robot using multiple speakers May be used.

＜変形例４＞
上述した実施形態では、エージェントとしてロボットを用いて音声による対話を行う例を説明したが、上述した実施形態のロボットは身体等を有する人型ロボットであっても、身体等を有さないロボットであってもよい。また、この発明の対話技術はこれらに限定されず、ロボットのように身体等の実体がなく、発声機構を備えないエージェントを用いて対話を行う形態とすることも可能である。そのような形態としては、例えば、コンピュータの画面上に表示されたエージェントを用いて対話を行う形態が挙げられる。より具体的には、「LINE」や「２ちゃんねる（登録商標）」のような、複数アカウントがテキストメッセージにより対話を行うグループチャットにおいて、ユーザのアカウントと対話装置のアカウントとが対話を行う形態に本対話システムを適用することも可能である。この形態では、エージェントを表示する画面を有するコンピュータは人の近傍にある必要があるが、当該コンピュータと対話装置とはインターネットなどのネットワークを介して接続されていてもよい。つまり、本対話システムは、人とロボットなどの話者同士が実際に向かい合って話す対話だけではなく、話者同士がネットワークを介してコミュニケーションを行う会話にも適用可能である。<Modification 4>
In the embodiment described above, an example in which a robot is used as an agent to perform a voice conversation has been described. However, the robot in the embodiment described above is a humanoid robot having a body or the like, but a robot having no body or the like. There may be. In addition, the dialogue technique of the present invention is not limited to these, and it is also possible to adopt a form in which a dialogue is performed using an agent that does not have an entity such as a body and does not have an utterance mechanism like a robot. As such a form, for example, a form in which dialogue is performed using an agent displayed on a computer screen can be cited. More specifically, in a group chat such as “LINE” or “2 Channel (registered trademark)” in which multiple accounts interact by text messages, the user account and the dialog device account interact. It is also possible to apply this dialogue system. In this form, the computer having the screen for displaying the agent needs to be in the vicinity of the person, but the computer and the interactive device may be connected via a network such as the Internet. That is, this dialogue system can be applied not only to a dialogue in which speakers such as a person and a robot actually talk each other but also to a conversation in which the speakers communicate via a network.

本変形例の対話装置は、図７に示すように、発話生成部１５０、発話決定部１２０、および提示部１０１を少なくとも備える。発話決定部１２０は、外部に存在する雑談対話システムおよびシナリオ対話システムと通信可能なインターフェースを備える。雑談対話システムおよびシナリオ対話システムは同様の機能を持つ処理部として対話装置内に構成しても構わない。また、発話生成部１５０、発話決定部１２０は、外部に存在する情報処理装置と通信可能なインターフェースを備え、各部の一部または同様の機能を持つ処理部を対話装置外にある情報処理装置内に構成しても構わない。 As shown in FIG. 7, the interactive apparatus according to this modification includes at least an utterance generation unit 150, an utterance determination unit 120, and a presentation unit 101. The utterance determination unit 120 includes an interface capable of communicating with an external chat dialog system and a scenario dialog system. The chat dialogue system and the scenario dialogue system may be configured in the dialogue apparatus as a processing unit having the same function. Further, the utterance generation unit 150 and the utterance determination unit 120 include an interface capable of communicating with an information processing apparatus existing outside, and a part of each part or a processing unit having a similar function is provided inside the information processing apparatus outside the dialog apparatus. You may comprise.

本変形例の対話装置は、例えば、スマートフォンやタブレットのようなモバイル端末、もしくはデスクトップ型やラップトップ型のパーソナルコンピュータなどの情報処理装置である。以下、対話装置がスマートフォンであるものとして説明する。提示部１０１はスマートフォンが備える液晶ディスプレイである。この液晶ディスプレイにはチャットアプリケーションのウィンドウが表示され、ウィンドウ内にはグループチャットの対話内容が時系列に表示される。グループチャットとは、チャットにおいて複数のアカウントが互いにテキストメッセージを投稿し合い対話を展開する機能である。このグループチャットには、対話装置が制御する仮想的な人格に対応する複数の仮想アカウントと、ユーザのアカウントとが参加しているものとする。すなわち、本変形例は、エージェントが、対話装置であるスマートフォンの液晶ディスプレイに表示された仮想アカウントである場合の一例である。なお、第二実施形態や第三実施形態に対応する本変形例の対話装置では、スマートフォンの液晶ディスプレイに表示されたソフトウェアキーボードを入力部１０２とすることでユーザが発話内容を入力し、自らのアカウントを通じてグループチャットへ投稿することができる。なお、スマートフォンに搭載されたマイクロホンを入力部１０２として機能させ、ユーザが発声により発話内容を入力する構成としてもよい。この構成とする場合には、対話装置は発話終了検出部１４０または音声認識部１４１を備えるか、外部に存在する情報処理装置と通信可能なインターフェースを備え、発話終了検出部１４０または音声認識部１４１と同様の機能を持つ処理部を対話装置外にある情報処理装置内に構成する。また、スマートフォンに搭載されたスピーカと音声合成機能を用い、各対話システムから得た発話内容を、各仮想アカウントに対応する音声でスピーカから出力する構成としてもよい。 The interactive apparatus according to this modification is an information processing apparatus such as a mobile terminal such as a smartphone or a tablet or a desktop or laptop personal computer. In the following description, it is assumed that the interactive device is a smartphone. The presentation unit 101 is a liquid crystal display included in the smartphone. A chat application window is displayed on the liquid crystal display, and conversation contents of the group chat are displayed in time series in the window. The group chat is a function in which a plurality of accounts post a text message to each other and develop a conversation in the chat. It is assumed that a plurality of virtual accounts corresponding to a virtual personality controlled by the dialogue apparatus and a user account participate in this group chat. That is, this modification is an example in which the agent is a virtual account displayed on a liquid crystal display of a smartphone that is an interactive device. In addition, in the interactive apparatus according to this modification corresponding to the second embodiment or the third embodiment, the user inputs the utterance content by using the software keyboard displayed on the liquid crystal display of the smartphone as the input unit 102, You can post to group chat through your account. Note that a microphone mounted on a smartphone may function as the input unit 102, and the user may input utterance content by speaking. In this configuration, the dialogue apparatus includes the utterance end detection unit 140 or the voice recognition unit 141, or includes an interface capable of communicating with an information processing apparatus existing outside, and the utterance end detection unit 140 or the voice recognition unit 141. A processing unit having the same function as is configured in an information processing apparatus outside the interactive apparatus. Moreover, it is good also as a structure which outputs the utterance content obtained from each dialog system from the speaker with the audio | voice corresponding to each virtual account, using the speaker and speech synthesis function which were mounted in the smart phone.

＜変形例５＞
変形例１で説明した通り、発話生成部１５０と発話決定部１２０とにより、複数のロボットを対話させるための複数の発話文を得ることができる。また、発話生成部１５０と発話決定部１２０と音声合成部１１０とにより、複数のロボットを対話させるための複数の発話の合成音声データを得ることができる。また、変形例４で説明した通り、生成した発話文は、ロボットではなく、コンピュータ画面上に表示されたエージェントなどの発声機構を備えないエージェントに提示させてもよい。すなわち、発話生成部１５０と発話決定部１２０とによる装置は、複数のエージェントを対話させるための複数の発話文を生成する対話シナリオ生成装置として機能させることができる。また、発話生成部１５０と発話決定部１２０と音声合成部１１０による装置は、複数のエージェントを対話させるための複数の発話の合成音声データを生成する対話シナリオ生成装置として機能させることができる。<Modification 5>
As described in the first modification, the utterance generation unit 150 and the utterance determination unit 120 can obtain a plurality of utterance sentences for allowing a plurality of robots to interact with each other. Further, the speech generation unit 150, the speech determination unit 120, and the speech synthesis unit 110 can obtain synthesized speech data of a plurality of utterances for allowing a plurality of robots to interact with each other. Further, as described in the fourth modification, the generated utterance may be presented not to the robot but to an agent having no utterance mechanism such as an agent displayed on the computer screen. That is, the apparatus using the utterance generation unit 150 and the utterance determination unit 120 can function as an interaction scenario generation apparatus that generates a plurality of utterance sentences for allowing a plurality of agents to interact with each other. In addition, the apparatus including the utterance generation unit 150, the utterance determination unit 120, and the speech synthesis unit 110 can function as an interaction scenario generation apparatus that generates synthesized speech data of a plurality of utterances for allowing a plurality of agents to interact with each other.

＜その他の変形例＞
本発明は上記の実施形態及び変形例に限定されるものではない。例えば、提示部が提示する発話順以外の上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。<Other variations>
The present invention is not limited to the above-described embodiments and modifications. For example, the above-described various processes other than the utterance order presented by the presentation unit are not only executed in time series according to the description, but also in parallel or individually as required by the processing capability of the apparatus that executes the processes or as necessary. May be. In addition, it can change suitably in the range which does not deviate from the meaning of this invention.

＜プログラム及び記録媒体＞
また、上記の実施形態及び変形例１−３、５で説明した各装置における各種の処理機能をコンピュータによって実現してもよい。その場合、各装置が有すべき機能の処理内容はプログラムによって記述される。また、上記変形例４で説明した対話システムにおける各種の処理機能をコンピュータによって実現してもよい。その場合、対話システムが有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記各装置における各種の処理機能がコンピュータ上で実現される。<Program and recording medium>
In addition, various processing functions in each device described in the above embodiment and Modifications 1-3 and 5 may be realized by a computer. In that case, the processing contents of the functions that each device should have are described by a program. Further, various processing functions in the dialogue system described in Modification 4 may be realized by a computer. In that case, the processing content of the function that the dialogue system should have is described by a program. Then, by executing this program on a computer, various processing functions in each of the above devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶部に格納する。そして、処理の実行時、このコンピュータは、自己の記憶部に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実施形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよい。さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、プログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its storage unit. When executing the process, this computer reads the program stored in its own storage unit and executes the process according to the read program. As another embodiment of this program, a computer may read a program directly from a portable recording medium and execute processing according to the program. Further, each time a program is transferred from the server computer to the computer, processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program includes information provided for processing by the electronic computer and equivalent to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、コンピュータ上で所定のプログラムを実行させることにより、各装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In addition, although each device is configured by executing a predetermined program on a computer, at least a part of these processing contents may be realized by hardware.

Claims

A dialogue method performed by a dialogue system,
The dialog system includes an utterance generation step of generating an utterance;
The dialogue system obfuscates at least a part of the utterance generated by the utterance generation step, and / or replaces a word included in the utterance generated by the utterance generation step with a word having no meaning of the word An utterance determination step for obtaining an utterance generated by conversion as an utterance after conversion,
The dialog system includes an utterance presentation step of presenting the converted utterance obtained by the utterance determination step;
How to interact.

A dialogue method performed by a dialogue system,
A first utterance generated by the dialogue system by obscuring at least a part of a predetermined utterance and / or replacing a word included in the predetermined utterance with a word having no meaning of the word A first utterance presentation step of presenting an utterance;
A second utterance presentation step of presenting a second utterance which is an utterance that can be read that the first utterance cannot be uniquely interpreted after the dialogue system presents the first utterance;
How to interact.

A dialogue method performed by a dialogue system,
A first utterance generated by the dialogue system by obscuring at least a part of a predetermined utterance and / or replacing a word included in the predetermined utterance with a word having no meaning of the word A first utterance presentation step of presenting an utterance;
A second utterance presentation step of presenting a second utterance that is an utterance including a question for specifying the first utterance as one meaning after the dialogue system presents the first utterance; Including,
How to interact.

A dialogue method according to claim 2 or 3, wherein
The dialogue system is an utterance that responds to the second utterance after presenting the second utterance and includes an utterance including a word that specifies the first utterance as one meaning. A third utterance presentation step of presenting three utterances,
How to interact.

A dialogue method performed by a dialogue system,
A first utterance presentation step in which the dialogue system presents an utterance that is at least partially obfuscated, and / or an utterance that includes a meaningless word;
The dialogue system has a meaning corresponding to an utterance including specific content corresponding to the obfuscated part and / or a part of a word having no meaning after the presentation by the first utterance presentation step. A second utterance presentation step for presenting utterances including words,
How to interact.

A dialog scenario generation method in which a dialog scenario generation device generates a dialog scenario used for a dialog performed by a dialog system,
The dialogue scenario generation device includes:
A first utterance that is an utterance generated by obscuring at least a part of the predetermined utterance and / or replacing a word included in the predetermined utterance with a word having no meaning of the word;
A second utterance that is an utterance that is presented after presenting the first utterance and that can be read that the first utterance cannot be uniquely interpreted;
Generate an interactive scenario containing
Dialog scenario generation method.

A dialog scenario generation method in which a dialog scenario generation device generates a dialog scenario used for a dialog performed by a dialog system,
The dialogue scenario generation device includes:
A first utterance that is an utterance generated by obscuring at least a part of the predetermined utterance and / or replacing a word included in the predetermined utterance with a word having no meaning of the word;
A second utterance that is an utterance that is presented after presenting the first utterance, and that includes a question for identifying the first utterance as one meaning;
Generate an interactive scenario containing
Dialog scenario generation method.

A dialog scenario generation method according to claim 6 or 7,
An utterance responding to the second utterance, which is an utterance to be presented after presenting the second utterance, and further including an utterance including a word specifying the first utterance as one meaning Generate conversation scenarios,
Dialog scenario generation method.

A dialog scenario generation method in which a dialog scenario generation device generates a dialog scenario used for a dialog performed by a dialog system,
The dialogue scenario generation device includes:
A first utterance that is an utterance that is at least partially obfuscated, and / or includes a word that has no meaning;
An utterance to be presented after presenting the first utterance, and an utterance including specific contents corresponding to the obfuscated part, and / or a word having a meaning corresponding to a part of a word not having the meaning A second utterance that is,
Generate an interactive scenario containing
Dialog scenario generation method.

An utterance generator for generating an utterance;
An utterance generated by obscuring at least a part of the utterance generated by the utterance generation unit and / or by replacing a word included in the utterance generated by the utterance generation step with a word having no meaning of the word An utterance determination unit that obtains utterance after conversion,
An utterance presentation unit that presents the converted utterance obtained by the utterance determination unit,
Dialog system.

A first utterance that is an utterance generated by obscuring at least a part of a predetermined utterance and / or replacing a word included in the predetermined utterance with a word having no meaning of the word One utterance presentation unit;
A second utterance presenting unit that presents a second utterance that can be read that the first utterance cannot be uniquely interpreted after presenting the first utterance,
Dialog system.

A first utterance that is an utterance generated by obscuring at least a part of a predetermined utterance and / or replacing a word included in the predetermined utterance with a word having no meaning of the word One utterance presentation unit;
A second utterance presentation unit that presents a second utterance that is an utterance including a question for specifying the first utterance as one meaning after presenting the first utterance;
Dialog system.

The dialogue system according to claim 11 or 12,
After presenting the second utterance, present a third utterance that is an utterance that responds to the second utterance and that includes a word that identifies the first utterance as one meaning. And a third utterance presentation unit
Dialog system.

An utterance including at least part of an obscured utterance, and / or an utterance including a word having no meaning, and an utterance including a specific content corresponding to the obscured part, and / or the meaning Including a presentation unit for presenting an utterance including a word having a meaning corresponding to a part of a word not having
Dialog system.

A dialogue scenario generation device for generating a dialogue scenario used for dialogue performed by a dialogue system,
A first utterance that is an utterance generated by obscuring at least a part of the predetermined utterance and / or replacing a word included in the predetermined utterance with a word having no meaning of the word;
A second utterance that is an utterance that is presented after presenting the first utterance and that can be read that the first utterance cannot be uniquely interpreted;
Generate an interactive scenario containing
Dialog scenario generator.

A dialogue scenario generation device for generating a dialogue scenario used for dialogue performed by a dialogue system,
A first utterance that is an utterance generated by obscuring at least a part of the predetermined utterance and / or replacing a word included in the predetermined utterance with a word having no meaning of the word;
A second utterance that is an utterance that is presented after presenting the first utterance, and that includes a question for identifying the first utterance as one meaning;
Generate an interactive scenario containing
Dialog scenario generator.

The dialogue scenario generation device according to claim 15 or 16,
An utterance responding to the second utterance, which is an utterance to be presented after presenting the second utterance, and further including an utterance including a word specifying the first utterance as one meaning Generate conversation scenarios,
Dialog scenario generator.

A dialogue scenario generation device for generating a dialogue scenario used for dialogue performed by a dialogue system,
A first utterance that is an utterance that is at least partially obfuscated, and / or includes a word that has no meaning;
An utterance to be presented after presenting the first utterance, and an utterance including specific contents corresponding to the obfuscated part, and / or a word having a meaning corresponding to a part of a word not having the meaning A second utterance that is,
Generate an interactive scenario containing
Dialog scenario generator.

15. A program for causing a computer to function as the interactive system according to claim 10.

A program for causing a computer to function as the dialogue scenario generation device according to any one of claims 15 to 18.