JP2017162268A

JP2017162268A - Dialog system and control program

Info

Publication number: JP2017162268A
Application number: JP2016047006A
Authority: JP
Inventors: 浩平小川; Kohei Ogawa; 美紀渡辺; Miki Watanabe; 石黒　浩; Hiroshi Ishiguro; 浩石黒
Original assignee: Osaka University NUC
Current assignee: Osaka University NUC
Priority date: 2016-03-10
Filing date: 2016-03-10
Publication date: 2017-09-14

Abstract

CONSTITUTION: A dialog system 10 is configured so that, according to a script which is preset to a dialog database 14, a robot R produces a speech, and one or two or more choices by which a person H responds to the speech production of the robot are displayed on a touch display 12 according to the script. When the person H selects the choice by a touch, a content of the selected response is uttered from a speaker provided in relation to the touch display 12. The choices displayed on the touch display 12 are set according to a phase of a dialog, such as collection of information, construction of relationship, persuasion for decision making, decision making, and idle talk having no purpose.EFFECT: Speech production is performed according to a script, and a person selects a response item, so that the person can continue natural dialog with a robot while feeling actual sensation that the person represented his/her intention.SELECTED DRAWING: Figure 1

Description

この発明は、対話システムおよび制御プログラムに関し、特にたとえば、ロボットのようなエージェントまたは人と、少なくとも１人の人とがタッチディスプレイを通して対話する、新規な対話システムおよび制御プログラムに関する。 The present invention relates to an interactive system and a control program, and more particularly to a novel interactive system and control program in which an agent or person such as a robot interacts with at least one person through a touch display.

近年の音声認識の技術の発展により、たとえば特許文献１のような対話システムが提案されている。特許文献１のシステムは、人とロボットとの対話システムにおいて、両者の同調を図ることで、持続的で自然なインタラクションを実現しようとするものである。 With the recent development of speech recognition technology, for example, a dialogue system as disclosed in Patent Document 1 has been proposed. The system of Patent Document 1 is intended to realize continuous and natural interaction in a dialogue system between a human and a robot by trying to synchronize the two.

特開２０１２‐１８１６９７号公報[G06F 3/16…]JP 2012-181697 A [G06F 3/16 ...]

特許文献１の技術においても、音声認識に基づく処理には限界があり、人と同等の対話感を与えること、および特定の内容を対話を通じて適切に伝達することは容易ではない。 Even in the technique of Patent Document 1, there is a limit to processing based on voice recognition, and it is not easy to give a feeling of dialogue equivalent to that of a person and to appropriately transmit specific contents through dialogue.

それゆえに、この発明の主たる目的は、新規な、対話システムおよび制御プログラムを提供することである。 Therefore, the main object of the present invention is to provide a novel interactive system and control program.

この発明の他の目的は、人に対話に参加している感覚を持続させることができる、対話システムおよび制御プログラムを提供することである。 Another object of the present invention is to provide a dialog system and a control program capable of sustaining a sense of participating in a dialog to a person.

この発明の他の目的は、質問に対して適切な内容を伝達することができる、対話システムおよび制御プログラムを提供することである。 Another object of the present invention is to provide an interactive system and a control program capable of transmitting appropriate contents to a question.

この発明は、上記の課題を解決するために、以下の構成を採用した。なお、括弧内の参照符号および補足説明等は、この発明の理解を助けるために記述する実施形態との対応関係を示したものであって、この発明を何ら限定するものではない。 The present invention employs the following configuration in order to solve the above problems. The reference numerals in parentheses, supplementary explanations, and the like indicate the corresponding relationship with the embodiments described in order to help understanding of the present invention, and do not limit the present invention.

第１の発明は、スクリプトを予め設定しているダイアログデータベース、ダイアログデータベースのスクリプトに従って発話する発話エージェント、発話エージェントによる発話に対して人が返答すべき１つまたは２以上の選択肢をスクリプトに従って表示する表示手段、およびディスプレイに表示された選択肢を人が選択したとき、その選択肢の内容をスクリプトに従って発話する発話手段を備える、対話システムである。 The first invention displays a dialog database in which a script is set in advance, an utterance agent that utters according to the script of the dialog database, and one or more options that a person should respond to the utterance by the utterance agent according to the script. When a person selects an option displayed on a display, a dialog system includes an utterance unit that utters the contents of the option according to a script.

第１の発明では、対話システム（１０：実施例において相当する部分を例示する参照符号。以下同様。）は、スクリプトを予め設定しているダイアログデータベース（１４）を備える。たとえばロボット（Ｒ）のような発話エージェントは、たとえば対話制御マネージャ（１６）のような制御手段によって、ダイアログデータベースのスクリプトに従って、たとえば人（Ｈ）に対して発話する。この発話に対して人は返答するのであるが、表示手段（１６、２４）は、ダイアログデータベースのスクリプトに従って、ディスプレイ（１２、１２２）に、人が返答すべき１つまたは２以上の選択肢を表示させる。この選択肢の表示を見て、人がそのうちのどれか１つを選択すると、発話手段（２４、２６）が、の選択肢の内容をスクリプトに従って発話する。 In the first invention, the dialogue system (10: reference numerals exemplifying corresponding parts in the embodiment; the same applies hereinafter) includes a dialog database (14) in which scripts are preset. For example, an utterance agent such as a robot (R) utters a person (H), for example, according to a script of a dialog database by a control means such as a dialogue control manager (16). The person responds to this utterance, but the display means (16, 24) displays one or more options to be answered by the person on the display (12, 122) according to the script of the dialog database. Let When the person selects one of the options by looking at the display of the options, the utterance means (24, 26) utters the contents of the options according to the script.

第１の発明によれば、発話エージェントの発話と人の返答はすべてスクリプトに従っているため、対話の破綻はない。そのうえ、人は自分で選択肢を選択し、その内容が音声で発話されるので、スクリプトに従っているとはいえ、人は自分の意思を反映した選択肢（返答）であると認識させることができる。 According to the first aspect, since the utterance of the utterance agent and the response of the person all follow the script, there is no failure of the dialogue. In addition, since a person selects an option by himself and the content is spoken by voice, the person can recognize that the option (response) reflects his intention even though the script is followed.

第２の発明は、第１の発明に従属し、表示手段に表示される選択肢は、似通った意味を持つ２以上の選択肢を含む、対話システムである。 The second invention is an interactive system according to the first invention, wherein the options displayed on the display means include two or more options having similar meanings.

第２の発明では、人の最終的な意思決定の対話フェーズでは、たとえば対話エージェントが商品の購入を人に持ちかけた場合、意思決定の最終段階において、たとえば
「気に入ったので買います」、「うん、そうします」というような、すべて同意を示す返答のみを提示することで、意思決定の誘導を行うことができる。 In the second invention, in the dialogue phase of a person's final decision-making, for example, when a dialogue agent asks a person to purchase a product, in the final stage of the decision-making, for example, “I buy because I like”, “Yeah” It is possible to guide decision making by presenting only a reply indicating consent.

第２の発明によれば、人の意思決定を誘導することができる。 According to the second invention, it is possible to induce human decision making.

第３の発明は、第１または第２の発明に従属し、表示手段に表示される選択肢は、１つである、対話システムである。 A third invention is an interactive system according to the first or second invention, wherein the number of options displayed on the display means is one.

第３の発明では、最終的に意思決定してもらいたい方向に人を誘導するために返答の選択肢を１つに限定することで、強制的に対話システムが想定する方向に人を誘導する。 In the third invention, by limiting the number of response options to one in order to guide the person in the direction in which he / she wants to finally make a decision, the person is forcibly guided in the direction assumed by the dialogue system.

第３の発明によれば、選択肢を１つに限定することで、強制的に人の返答を誘導し、人のその後の意思決定を、対話システムが想定する方向に誘導することができる。 According to the third invention, by limiting the number of options to one, it is possible to forcibly induce a person's response and guide the person's subsequent decision making in the direction assumed by the dialogue system.

第４の発明は、ダイアログデータベースに予め設定されているスクリプトに従って対話する対話システムのコンピュータによって実行される制御プログラムであって、コンピュータを、ダイアログデータベースのスクリプトに従って発話する発話エージェント、発話エージェントによる発話に対して人が返答すべき１つまたは２以上の選択肢をスクリプトに従って表示する表示手段、およびディスプレイに表示された選択肢を人が選択したとき、その選択肢の内容をスクリプトに従って発話する発話手段として機能させる、制御プログラムである。 According to a fourth aspect of the present invention, there is provided a control program executed by a computer of an interactive system that performs dialogue according to a script preset in a dialog database, and the utterance agent that utters the computer according to the script of the dialog database. A display means for displaying one or more choices to be answered by a person according to the script, and a function that serves as an utterance means for uttering the contents of the choice according to the script when the person selects an option displayed on the display. Is a control program.

第４の発明によっても、第１の発明と同様の効果が期待できる。 According to the fourth invention, the same effect as that of the first invention can be expected.

この発明によれば、発話エージェントの発話と人の返答はすべてスクリプトに従っているため、対話の破綻がなく、しかも自分の意思を反映した返答を行ったと人に認識させることができる。 According to the present invention, since the utterance of the utterance agent and the reply of the person all follow the script, it is possible to make the person recognize that there is no failure of the dialogue and the reply reflecting his / her intention is made.

この発明の上述の目的、その他の目的、特徴および利点は、図面を参照して行う以下の実施例の詳細な説明から一層明らかとなろう。 The above object, other objects, features, and advantages of the present invention will become more apparent from the following detailed description of embodiments with reference to the drawings.

図１はこの発明の一実施例の対話システムの概要を示す概略図である。FIG. 1 is a schematic diagram showing an outline of an interactive system according to an embodiment of the present invention. 図２は図１実施例における対話制御マネージャの構成の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of the configuration of the dialog control manager in the FIG. 1 embodiment. 図３は図１実施例におけるロボットの一例を示す概略図である。FIG. 3 is a schematic view showing an example of the robot in the embodiment of FIG. 図４は図３のロボットを制御するロボットコントローラの構成の一例を示すブロック図である。FIG. 4 is a block diagram showing an example of the configuration of a robot controller that controls the robot of FIG. 図５は図１実施例におけるタッチディスプレイＩ／Ｆの構成の一例を示すブロック図である。FIG. 5 is a block diagram showing an example of the configuration of the touch display I / F in the FIG. 1 embodiment. 図６は図１に示す対話制御マネージャの動作の一例を示すフロー図である。FIG. 6 is a flowchart showing an example of the operation of the dialog control manager shown in FIG. 図７は図１に示すロボットコントローラの動作の一例を示すフロー図である。FIG. 7 is a flowchart showing an example of the operation of the robot controller shown in FIG. 図８は図１に示すタッチディスプレイＩ／Ｆの動作の一例を示すフロー図である。FIG. 8 is a flowchart showing an example of the operation of the touch display I / F shown in FIG. 図９は図１実施例においてタッチディスプレイに表示される選択肢（返答項目）を例示する図解図である。FIG. 9 is an illustrative view illustrating options (response items) displayed on the touch display in the embodiment of FIG. 1. 図１０はこの発明の他の実施例の対話システムの概要を示す概略図である。FIG. 10 is a schematic diagram showing an outline of a dialog system according to another embodiment of the present invention.

図１を参照して、この実施例の対話システム１０の対話場所には、対話エージェントとしてのロボットＲと、１人の人Ｈが存在する。ただし、ロボットＲの数や人Ｈの数は２以上であってもよい。そして、この実施例の対話システム１０では、ロボットＲと人Ｈとがタッチディスプレイ（タッチディスプレイ）１２を通じて対話する。 Referring to FIG. 1, a robot R as a dialogue agent and one person H exist at a dialogue place of the dialogue system 10 of this embodiment. However, the number of robots R and the number of people H may be two or more. In the interactive system 10 of this embodiment, the robot R and the person H interact through a touch display (touch display) 12.

簡単に言えば、ダイアログデータベース１４に予め設定または準備したダイアログまたはスクリプト（台本）に従って対話制御マネージャ１６がロボットＲに、人Ｈに対する質問など発話させるとともに、タッチディスプレイ１２に人Ｈが返答の際に選択できる１つまたは２以上の返答項目を表示する。 In short, the dialog control manager 16 causes the robot R to speak a question about the person H according to a dialog or script (script) preset or prepared in the dialog database 14, and when the person H answers the touch display 12. Displays one or more response items that can be selected.

人Ｈは返答項目の１つをタッチディスプレイ１２上で選択する。応じて、タッチディスプレイ１２がその選択した返答項目に応じた発話を、人Ｈの返答として発話する。実施例の対話システム１０は、このような対話を繰り返す、新規な対話システムである。 The person H selects one of the response items on the touch display 12. In response, the touch display 12 utters the utterance corresponding to the selected response item as the response of the person H. The dialogue system 10 of the embodiment is a novel dialogue system that repeats such a dialogue.

人が発話するとき、その発話に応じる形で相手の発話が生成されなければ、対話を続けていくこと自体が困難になりやすいため、音声認識および自然言語処理の能力が完璧でないロボットは、人間の発話に対して適切な発話をし続けることは容易ではなく、対話は破綻しやすい。 When a person speaks, if the other person's utterance is not generated in response to the utterance, it is difficult to continue the conversation itself, so a robot whose speech recognition and natural language processing capabilities are not perfect is human. It is not easy to continue to speak appropriately for the utterances, and the dialogue is likely to break down.

そこで、この実施例では、ロボットが発話する際に、必ずしも人に発話で返答させる形にするのではなく、タッチディスプレイ１２だけで返答させる。こうすることによって、ロボットの能力不足による対話の破綻を回避することができる。しかも、タッチディスプレイ１２にロボットＲの発話人Ｈ対する返答としてふさわしい返答項目を表示し、それを人Ｈに選択させることによって、人Ｈに対話に参加している感覚を持続させ、さらにはロボットＲからの質問に対して適切な返答を返すことでできる。 Therefore, in this embodiment, when the robot speaks, it is not always necessary to make the person respond by speaking, but only by the touch display 12. By doing so, it is possible to avoid the failure of the dialogue due to insufficient robot capabilities. In addition, a response item suitable as a response to the utterer H of the robot R is displayed on the touch display 12, and by causing the person H to select it, the sensation that the person H is participating in the dialogue is maintained. You can do this by returning an appropriate response to the question from.

この実施例のような対話システム１０は、たとえば、情報収集サービス、情報提供サービス、広告サービス、販売サービスなどのシステムとして利用可能である。 The dialogue system 10 as in this embodiment can be used as a system such as an information collection service, an information provision service, an advertisement service, and a sales service.

対話システム１０は、上述のようにロボットＲに人Ｈに対して発話させたり、タッチディスプレイ１２上に人Ｈの返答のための返答項目を表示したりするための、ダイアログ（Dialog：対話）データベース１４を備える。ここで、「ダイアログ」は、対話中に行うべき発話や非言語動作の指令の系列を意味し、ダイアログデータベース１４は、ダイアログの集合（たとえば、バナナの何が好きという対話やロボット介護の何が大切かという対話など、各トピックの対話のための指令の系列が含まれる）である。そして、「スクリプト」は、その指令の系列を表す文字列のことであり、スクリプトデータは、その指令を表す文字列である。したがって、スクリプトデータの系列がスクリプトになる。スクリプトは、この実施例では、State Chart XML (SCXML: w3c.org)により記述され、すべてダイアログデータベース１４に保管されている。 The dialog system 10 is a dialog (Dialog) database for causing the robot R to speak to the person H and displaying response items for the reply of the person H on the touch display 12 as described above. 14. Here, “dialog” means a sequence of utterances and non-verbal motion commands to be performed during a dialogue, and the dialogue database 14 is a collection of dialogues (for example, what a banana likes and what a robot care provides. It includes a series of commands for dialogue on each topic, such as dialogue about importance. “Script” is a character string representing a series of commands, and script data is a character string representing the commands. Therefore, the script data series is a script. In this embodiment, the script is described by State Chart XML (SCXML: w3c.org), and is all stored in the dialog database 14.

一例として、『T=XX、RH SPEAKER=R L=HUMAN TEXT=……、NONVERVAL=……』というスクリプトにおいて、「T=XX」は、当該スクリプトが実行されるべき時間（または時刻）XXであり、「RH SPEAKER=R L=HUMAN」は、ロボットＲから人Ｈに向かって発話することを意味し、「TEXT=……」が発話すべきテキスト文を示す。「NONVERVAL=……」は、たとえばロボットＲの動作、人Ｈを見る、頷く、首を横に振る、首をかしげるなどの、非言語動作を示す。 As an example, in the script “T = XX, RH SPEAKER = RL = HUMAN TEXT = ……, NONVERVAL = ……”, “T = XX” is the time (or time) XX at which the script should be executed. “RH SPEAKER = RL = HUMAN” means that the robot R speaks to the person H, and “TEXT = ……” indicates a text sentence to be spoken. “NONVERVAL =...” Indicates a non-linguistic operation such as the operation of the robot R, watching the person H, whispering, shaking his / her head, or curling his / her neck.

このようなスクリプトは、対話制御マネージャ１６によって、ダイアログキュー１８からロボットＲを制御するためのロボットコントローラ２０に送信される。 Such a script is transmitted from the dialog queue 18 to the robot controller 20 for controlling the robot R by the dialog control manager 16.

図２に示す対話制御マネージャ１６は、ＣＰＵ１６aを含み、ＣＰＵ１６ａには、内部バス１６ｂを介して通信装置１６ｃが接続される。通信装置１６ｃは、たとえばネットワークインタフェース（ＮＩＣ）などを含み、ＣＰＵ１６ａはこの通信装置１６ｃを介してロボットコントローラ２０、タッチディスプレイＩ／Ｆ２４などと通信でき、それらの間でデータの授受を行うことができる。 The dialogue control manager 16 shown in FIG. 2 includes a CPU 16a, and a communication device 16c is connected to the CPU 16a via an internal bus 16b. The communication device 16c includes, for example, a network interface (NIC). The CPU 16a can communicate with the robot controller 20, the touch display I / F 24, and the like via the communication device 16c, and can exchange data between them. .

ＣＰＵ１６ａにはさらに、内部バス１６ｂを介して、メモリ１６ｄおよび入力装置１６ｅが接続される。メモリ１６ｄはＲＯＭやＲＡＭを含む。メモリＩ／Ｆ１６ｆを通してダイアログデータベース１４から、スクリプト（ダイアログ）を読み込み、それをメモリ１６ｄに一時的に記憶する。 Further, a memory 16d and an input device 16e are connected to the CPU 16a via an internal bus 16b. The memory 16d includes a ROM and a RAM. A script (dialog) is read from the dialog database 14 through the memory I / F 16f and temporarily stored in the memory 16d.

また、対話制御マネージャ１６に必要なプログラム（ＯＳやセンサ信号取得プログラムなど）は、メモリ１６ｄに記憶される。対話制御マネージャ１６はメモリ１６ｄに記憶されたプログラムに従って動作する。 Further, programs (such as OS and sensor signal acquisition program) necessary for the dialogue control manager 16 are stored in the memory 16d. The dialog control manager 16 operates according to a program stored in the memory 16d.

ダイアログキュー１８もたとえばメモリ１６ｄの中の一領域であるが、このダイアログキュー１８には、次ダイアログ候補プール２２にロードされているスクリプトデータを、次にロボットＲが実行できるように，待ち行列の形でロードすることができる。 The dialog queue 18 is also an area in the memory 16d, for example. In this dialog queue 18, the queue data is loaded in the next dialog candidate pool 22 so that the robot R can next execute the script data. Can be loaded in the form.

図３を参照して、この図３は実施例のロボットＲの外観を示し、ロボットＲは台３０上に、台３０に対して、前後左右に回転できるように、設けられる。つまり、胴体３２には２自由度が設定されている。 Referring to FIG. 3, FIG. 3 shows an appearance of the robot R according to the embodiment, and the robot R is provided on the base 30 so as to be able to rotate back and forth and right and left with respect to the base 30. In other words, the body 32 has two degrees of freedom.

胴体３２の人の肩に相当する左右位置からは、それぞれに、肩関節（図示せず）によって、右腕３４Ｒおよび左腕３４Ｌが、前後左右に回転可能に設けられる。つまり、右腕３４Ｒおよび左腕３４Ｌには、それぞれ、２自由度が設定されている。 From the left and right positions corresponding to the shoulders of the body 32 of the human body, the right arm 34R and the left arm 34L are provided so as to be rotatable forward and backward and left and right by shoulder joints (not shown). That is, two degrees of freedom are set for each of the right arm 34R and the left arm 34L.

胴体３２の上端中央部には首３６が設けられ、さらにその上には頭部３８が設けられる。首３６すなわち頭部３８は、胴体３２に対して、ロール角（左右の傾げ）、ピッチ各（前後の傾げ）、ヨー（左右の回転）で３自由度が設定されている。 A neck 36 is provided at the center of the upper end of the body 32, and a head 38 is provided thereon. The neck 36, that is, the head 38, has three degrees of freedom with respect to the body 32 in terms of a roll angle (left-right tilt), pitches (front-back tilt), and yaw (left-right rotation).

頭部３８の前面すなわち人間の顔に相当する面には、右目４０Ｒおよび左目４０Ｌが設けられ、右目４０Ｒおよび左目４０Ｌには眼球４２Ｒおよび４２Ｌが設けられる。右目４０Ｒおよび左目４０Ｌは、まぶたを閉じたり開いたりでき、眼球４２Ｒおよび４２Ｌはそれぞれ上下左右に回転可能である。つまり、右目４０Ｒおよび左目４０Ｌすなわちまぶたには１自由度が、眼球４２Ｒおよび４２Ｌには２自由度が設定されている。 A right eye 40R and a left eye 40L are provided on the front surface of the head 38, that is, a surface corresponding to a human face, and eyeballs 42R and 42L are provided on the right eye 40R and the left eye 40L. The right eye 40R and the left eye 40L can close and open the eyelids, and the eyeballs 42R and 42L can rotate up, down, left and right, respectively. That is, one degree of freedom is set for the right eye 40R and the left eye 40L, that is, the eyelid, and two degrees of freedom are set for the eyeballs 42R and 42L.

顔にはさらに、口４４が設けられていて、口４４は、閉じたり開いたりできる。つまり、口４４には１自由度が設定されている。 The face is further provided with a mouth 44, which can be closed or opened. That is, one degree of freedom is set for the mouth 44.

胴体３２の、人間の胸の位置には、対話システム１０において人Ｈに聞かせるための発話を行うスピーカ４６および環境特に人Ｈの発話音声を聞き取るマイク４８が設けられる。 At the position of the chest of the human body 32, a speaker 46 that makes a speech for the person H to hear in the dialogue system 10 and a microphone 48 that listens to the environment, particularly the voice of the person H, are provided.

なお、頭部３８の顔の額に相当する部分には動画または静止画を撮影できるカメラ５０が内蔵される。このカメラ５０は、対面する人Ｈを撮影でき、このカメラ５０からのカメラ信号（映像信号）は、環境カメラ１６（図１）と同様に、センサマネージャ１８のセンサＩ／Ｆを介してＣＰＵ２２ａに、入力されてもよい。 A camera 50 capable of shooting a moving image or a still image is built in a portion corresponding to the forehead of the head 38. The camera 50 can photograph the person H who is facing, and the camera signal (video signal) from the camera 50 is sent to the CPU 22a via the sensor I / F of the sensor manager 18 in the same manner as the environmental camera 16 (FIG. 1). , May be entered.

図４はロボットＲの動作（発話や非言語動作など）を制御するロボットコントローラ２０を示すブロック図である。この図４を参照して、ロボットコントローラ２０は、ＣＰＵ２０ａを含み、ＣＰＵ２０ａには、内部バス２０ｂを介して通信装置２０ｃが接続される。通信装置２０ｃは、たとえばネットワークインタフェースなどを含み、ＣＰＵ２０ａはこの通信装置２０ｃを介して対話制御マネージャ１６、さらには外部のコンピュータや他のロボット（ともに図示せず）などと通信でき、それらの間でデータの授受を行うことができる。 FIG. 4 is a block diagram showing the robot controller 20 that controls the operation of the robot R (such as speech and non-language operation). Referring to FIG. 4, the robot controller 20 includes a CPU 20a, and a communication device 20c is connected to the CPU 20a via an internal bus 20b. The communication device 20c includes, for example, a network interface, etc., and the CPU 20a can communicate with the dialog control manager 16 and further with an external computer or another robot (both not shown) via the communication device 20c. Data can be exchanged.

ＣＰＵ２０ａにはさらに、内部バス２０ｂを介して、メモリ２０ｄが接続される。メモリ２０ｄはＲＯＭやＲＡＭを含む。対話制御マネージャ１６から送られる制御データやスクリプトデータがメモリ２０ｄに一時的に記憶される。 A memory 20d is further connected to the CPU 20a via an internal bus 20b. The memory 20d includes a ROM and a RAM. Control data and script data sent from the dialogue control manager 16 are temporarily stored in the memory 20d.

また、ロボット制御に必要なプログラム（ＯＳやセンサ信号取得プログラムなど）は、メモリ２０ｄに記憶される。ロボットコントローラ２０はメモリ２０ｄに記憶されたプログラムに従ってロボットＲの動作を制御する。 In addition, programs (such as an OS and a sensor signal acquisition program) necessary for robot control are stored in the memory 20d. The robot controller 20 controls the operation of the robot R according to the program stored in the memory 20d.

つまり、ロボットコントローラ２０のＣＰＵ２０ａにはさらに、たとえばＤＳＰで構成された出力ボード２０ｅが接続され、この出力ボード２０ｅは、ロボットＲの上述の各部（図３）に設けられたアクチュエータの動作を制御する。ただし、ロボットＲの各部の具体的な動作は、実施例には関係しないので、ここでは詳しい説明は省略する
なお、図３に示すロボットＲのスピーカ４６がロボットコントローラ２０の出力ボード２０ｅに接続される。したがって、ＣＰＵ２０ａは、対話制御マネージャ１６から与えられ、必要に応じてメモリ２０ｄに記憶されたスクリプトデータに従って、スピーカ４６から発声（発話）させる。 In other words, the CPU 20a of the robot controller 20 is further connected to an output board 20e made of, for example, a DSP, and this output board 20e controls the operation of the actuators provided in the above-described units (FIG. 3) of the robot R. . However, since the specific operation of each part of the robot R is not related to the embodiment, detailed description is omitted here. Note that the speaker 46 of the robot R shown in FIG. 3 is connected to the output board 20e of the robot controller 20. The Therefore, the CPU 20a utters (speaks) from the speaker 46 according to the script data given from the dialogue control manager 16 and stored in the memory 20d as necessary.

さらに、図３に示すロボットＲのマイク４８やカメラ５０が入力ボード２０ｆを経て、ＣＰＵ２０ａに入力される。ＣＰＵ２０ａは、その入力データを、対話制御マネージャ１６に送る。対話制御マネージャ１６は、マイク４８から取り込んだ音声データをメモリ１６ｄ（図２）に記憶し、必要に応じて、音声認識処理を実行する。対話制御マネージャ１６はまた、カメラ５０からのカメラ信号を処理して、対話場所の状況をセンシングする。 Further, the microphone 48 and the camera 50 of the robot R shown in FIG. 3 are input to the CPU 20a via the input board 20f. The CPU 20a sends the input data to the dialogue control manager 16. The dialogue control manager 16 stores the voice data captured from the microphone 48 in the memory 16d (FIG. 2), and executes voice recognition processing as necessary. The dialogue control manager 16 also processes the camera signal from the camera 50 to sense the situation of the dialogue place.

また、図１実施例の対話システム１０に用いられるロボットＲは図３を参照して上で説明したロボットに限定されるものではなく、少なくともスクリプトに従って発話できる機能があればよい。 Further, the robot R used in the dialogue system 10 of FIG. 1 embodiment is not limited to the robot described above with reference to FIG.

図５はタッチディスプレイ１４の表示や発話を制御するタッチディスプレイＩ／Ｆ２４を示すブロック図である。タッチディスプレイＩ／Ｆ２４は、ＣＰＵ２４ａを含み、ＣＰＵ２４ａには、内部バス２４ｂを介して通信装置２４ｃが接続される。通信装置２４ｃは、たとえばネットワークインタフェースなどを含み、ＣＰＵ２４ａはこの通信装置２４ｃを介して対話制御マネージャ１６などと通信でき、それらの間でデータの授受を行うことができる。 FIG. 5 is a block diagram showing a touch display I / F 24 that controls display and speech of the touch display 14. The touch display I / F 24 includes a CPU 24a, and a communication device 24c is connected to the CPU 24a via an internal bus 24b. The communication device 24c includes, for example, a network interface. The CPU 24a can communicate with the dialogue control manager 16 and the like via the communication device 24c, and can exchange data between them.

ＣＰＵ２４ａにはさらに、内部バス２４ｂを介して、メモリ２４ｄが接続される。メモリ２４ｄはＲＯＭやＲＡＭを含む。対話制御マネージャ１６から送られる制御データやスクリプトデータがメモリ２４ｄに一時的に記憶される。 Further, a memory 24d is connected to the CPU 24a via an internal bus 24b. The memory 24d includes a ROM and a RAM. Control data and script data sent from the dialogue control manager 16 are temporarily stored in the memory 24d.

また、タッチディスプレイ１２の制御に必要なプログラム（ＯＳやセンサ信号取得プログラムなど）は、メモリ２４ｄに記憶される。タッチディスプレイＩ／Ｆ２４はメモリ２４ｄに記憶されたプログラムに従ってタッチディスプレイ１２の動作を制御する。 A program (such as an OS or a sensor signal acquisition program) necessary for controlling the touch display 12 is stored in the memory 24d. The touch display I / F 24 controls the operation of the touch display 12 according to a program stored in the memory 24d.

つまり、タッチディスプレイＩ／Ｆ２４のＣＰＵ２４ａにはさらに、たとえばＤＳＰで構成された出力ボード２４ｅが接続され、この出力ボード２４ｅは、タッチディスプレイ１２に設けられているスピーカ２６から音声を出力させるとともに、タッチディスプレイ１２に表示データを出力する。 In other words, the CPU 24a of the touch display I / F 24 is further connected with an output board 24e configured by, for example, a DSP. The output board 24e outputs sound from the speaker 26 provided on the touch display 12 and touches the touch display I / F 24. Display data is output to the display 12.

ただし、スピーカ２６はタッチディスプレイ１２に設けられてもいいが、タッチディスプレイ１２とは別にそれの近傍に設けられていてもよい。 However, although the speaker 26 may be provided in the touch display 12, it may be provided in the vicinity of it separately from the touch display 12.

タッチディスプレイＩ／Ｆ２４のＣＰＵ２４ａにはさらに、入力ボード２４ｆが接続され、この入力ボード２４ｆは、タッチディスプレイ１２に設けられている座標検出回路２８からの座標データ（タッチ座標データ）を取得してＣＰＵ２４ａに入力する。ＣＰＵ２４ａは、その入力座標データを、対話制御マネージャ１６に送る。対話制御マネージャ１６は、その座標データを受けて、そのとき人Ｈがタッチディスプレイ１２上のどの返答項目にタッチしたかを知ることができる。 Further, an input board 24f is connected to the CPU 24a of the touch display I / F 24. The input board 24f acquires coordinate data (touch coordinate data) from the coordinate detection circuit 28 provided in the touch display 12, and the CPU 24a. To enter. The CPU 24a sends the input coordinate data to the dialogue control manager 16. The dialog control manager 16 receives the coordinate data and can know which response item on the touch display 12 the person H touched at that time.

図１の対話システム１０では、対話場所のロボットＲは、先に説明したスクリプトに従って対話を進行するが、ロボットＲの発話およびタッチディスプレイ１２による発話を統括的に制御するのが、対話制御マネージャ１６である。 In the dialogue system 10 of FIG. 1, the robot R at the dialogue place proceeds with the dialogue according to the script described above. The dialogue control manager 16 controls the utterance of the robot R and the utterance by the touch display 12 in an integrated manner. It is.

図６に示すフロー図は、その対話制御マネージャ１６のＣＰＵ１６ａ（図２）の動作を示す。この図６の動作は、たとえばフレームレート程度の速度で繰り返し実行される。 The flowchart shown in FIG. 6 shows the operation of the CPU 16a (FIG. 2) of the dialog control manager 16. The operation of FIG. 6 is repeatedly executed at a speed of about the frame rate, for example.

最初のステップＳ１でＣＰＵ１６ａは、ダイアログデータベース１４（図１）から上述したようなスクリプトデータを読み込むなど、初期化を実行する。 In the first step S1, the CPU 16a performs initialization such as reading the script data as described above from the dialog database 14 (FIG. 1).

続くステップＳ３においてＣＰＵ１６ａは、たとえばロボットコントローラ２０から取り込んだカメラ信号やマイク信号などのセンサデータを更新する。 In subsequent step S3, the CPU 16a updates sensor data such as a camera signal and a microphone signal captured from the robot controller 20, for example.

次のステップＳ５では、ステップＳ３で更新したセンサデータに基づいて、対話制御マネージャ１６のＣＰＵ１６ａは、対話場所に人Ｈがいるかどうか判断する。このステップＳ５で“ＮＯ”なら、ステップＳ３に戻って待機する。 In the next step S5, based on the sensor data updated in step S3, the CPU 16a of the dialogue control manager 16 determines whether or not there is a person H at the dialogue place. If “NO” in the step S5, the process returns to the step S3 and waits.

ステップＳ５で“ＹＥＳ”を判断したとき、ＣＰＵ１６ａは、次のステップＳ７において、次候補プール２２から次候補のスクリプトデータを読み込む。そして、ステップＳ９で、次候補スクリプトがあるかどうか判断する。 When “YES” is determined in the step S5, the CPU 16a reads the script data of the next candidate from the next candidate pool 22 in the next step S7. In step S9, it is determined whether there is a next candidate script.

次候補スクリプトがあれば、ステップＳ１１においてＣＰＵ１６ａは、そのスクリプトはロボットＲの制御用のスクリプトであるかどうか判断する。このステップＳ１１で“ＹＥＳ”ならステップＳ１３において、ロボットコントローラ２０がロボットＲの制御動作を実行する。ステップＳ１１で“ＮＯ”ならステップＳ１５でさらに、次候補スクリプトはタッチディスプレイ１２のためのものかどうか判断する。ステップＳ１５で“ＹＥＳ”ならステップＳ１７において、タッチディスプレイＩ／Ｆ２４がタッチディスプレイ１２の制御動作を実行する。 If there is a next candidate script, in step S11, the CPU 16a determines whether the script is a script for controlling the robot R. If “YES” in the step S11, the robot controller 20 executes a control operation of the robot R in a step S13. If “NO” in the step S11, it is further determined whether or not the next candidate script is for the touch display 12 in a step S15. If “YES” in the step S15, the touch display I / F 24 executes a control operation of the touch display 12 in a step S17.

つまり、次候補プール２２に登録されている情報から、ロボットＲを制御するかタッチディスプレイ１２に表示するかを、対話の制御ルールに従って対話制御マネージャ１６が決定し、ロボットコントローラ２０もしくはタッチディスプレイＩ／Ｆ２４に制御信号を伝達する。たとえば、ロボットＲを制御すべきときは、ステップＳ１１で、発話する内容に最適な制御信号および再生音声の情報をロボットコントローラ２０に伝達する。タッチディスプレイ１２を制御すべき場合は、タッチディスプレイ１２に表示して人Ｈに選択させるための返答項目の情報を伝達する。 That is, from the information registered in the next candidate pool 22, the dialog control manager 16 determines whether to control the robot R or display on the touch display 12 according to the control rules of the dialog, and the robot controller 20 or the touch display I / A control signal is transmitted to F24. For example, when the robot R is to be controlled, in step S11, a control signal and reproduction voice information that are optimum for the content to be uttered are transmitted to the robot controller 20. When the touch display 12 is to be controlled, information on response items to be displayed on the touch display 12 and to be selected by the person H is transmitted.

図７を参照して、ロボットコントローラ２０は、ステップＳ３１で初期化を実行する。ロボットコントローラ２０によって制御するロボットＲの動作は、アイドル動作と制御信号に従った動作の２種類である。アイドル動作の実行とは、明示的な制御信号が対話制御マネージャ１６から送信されていない場合でも、止まることなく、瞬き、呼吸、近くの人の方を見る、などの基本的な動作をさせることを指す。その際、ロボットコントローラ２０の動作フローとは独立に、ロボットＲのマイク４８やカメラ５０などのセンサを用いて人Ｈの存在の有無や人の位置などの情報を取得することで、より社会的にふさわしい動作を実現する（ステップＳ３３）。 Referring to FIG. 7, the robot controller 20 performs initialization in step S31. The robot R controlled by the robot controller 20 has two types of operations: an idle operation and an operation according to a control signal. Execution of an idle operation means that even if an explicit control signal is not transmitted from the dialogue control manager 16, a basic operation such as blinking, breathing, looking at a nearby person is performed without stopping. Point to. At that time, independent of the operation flow of the robot controller 20, by acquiring information such as the presence / absence of the person H and the position of the person using sensors such as the microphone 48 and the camera 50 of the robot R, it is more social. An operation suitable for the operation is realized (step S33).

次に、アイドル動作が実行されているなかで、対話制御マネージャ１６からの制御信号を受信した場合、ステップＳ３５で“ＹＥＳ”を判断し、ステップＳ３７において、ロボットコントローラ２０のＣＰＵ２０ａは、その制御信号に従った動作をロボットＲに実行させる。具体的には、指示された台詞（テキスト文）の発話とそれに従う動作を実行する。 Next, when the control signal is received from the dialogue control manager 16 while the idle operation is being executed, “YES” is determined in step S35, and in step S37, the CPU 20a of the robot controller 20 determines the control signal. The robot R is caused to execute the operation according to the above. Specifically, the utterance of the instructed dialogue (text sentence) and the operation according to the utterance are executed.

なお、ステップＳ３７の動作が終了したとき、ロボットコントローラ２０のＣＰＵ２０ａがロボットＲの動作が終了したロボット終了フラグ（図示せず）を対話制御マネージャ１６に送って、図６の対話制御マネージャ１６による制御に遷移する。それによって、対話制御マネージャ１６はタッチディスプレイ１２での表示タスクに移行する。つまり、対話制御マネージャ１６は、ロボットコントローラ２０からのロボット終了フラグを待って、タッチディスプレイ１２の制御に移行する。 When the operation of step S37 ends, the CPU 20a of the robot controller 20 sends a robot end flag (not shown) indicating the end of the operation of the robot R to the dialog control manager 16, and the control by the dialog control manager 16 of FIG. Transition to. Thereby, the dialogue control manager 16 shifts to a display task on the touch display 12. That is, the dialogue control manager 16 waits for the robot end flag from the robot controller 20 and shifts to control of the touch display 12.

図８を参照して、タッチディスプレイ１２の動作は図１および図５で示したタッチディスプレイＩ／Ｆ２４によって制御する。タッチディスプレイＩ／Ｆ２４のＣＰＵ２４ａは、ステップＳ４１で初期化を実行する。そして、ステップＳ４３で対話制御マネージャ１６から制御信号を受け取ったかどうか判断し、ステップＳ４３で“ＹＥＳ”を判断したとき、ステップＳ４５でその送信された制御信号に従って、その状態において表示すべき１または２以上の選択肢（返答項目）を表示する。ただし、実施例では人ＨからロボットＲへ質問するための選択肢も存在するが、このような選択肢も便宜上「返答項目」と呼ぶことがある。 Referring to FIG. 8, the operation of touch display 12 is controlled by touch display I / F 24 shown in FIGS. The CPU 24a of the touch display I / F 24 performs initialization in step S41. Then, it is determined whether or not a control signal has been received from the dialog control manager 16 in step S43. When “YES” is determined in step S43, 1 or 2 to be displayed in that state in accordance with the transmitted control signal in step S45. The above options (response items) are displayed. However, in the embodiment, there are options for asking a question from the person H to the robot R, but such an option may also be referred to as a “response item” for convenience.

その後、ステップＳ４７でユーザが表示されたいずれかの返答項目にタッチしたと判定できた場合、ステップＳ４９において、その選択された返答項目の音声による読み上げを行う。 Thereafter, when it is determined in step S47 that the user has touched any displayed response item, in step S49, the selected response item is read out by voice.

この実施例のようにタッチディスプレイ１２の画面に表示された複数個の選択肢（返答項目）の中から自分の意思で１つを選択すると、与えられたのはシステムが設定した有限個の選択肢であるにも拘わらず、選択した返答が自分の意見であるように人Ｈに感じさせることができる。また、タッチディスプレイ１２に表示されたものの中から人Ｈに選択してもらうことで、ロボットＲは音声認識を使わずに対話することができる。 When one is selected by one's own intention from among a plurality of options (response items) displayed on the screen of the touch display 12 as in this embodiment, a limited number of options set by the system are given. Despite being there, it is possible to make the person H feel that the selected response is his / her opinion. In addition, the robot R can interact without using voice recognition by having the person H select from the items displayed on the touch display 12.

また、人Ｈが選択した選択肢（返答項目）をタッチディスプレイ１２またはそれぞれの近傍に設けたスピーカ２６から読み上げる（発話する）ようにすれば、人Ｈはあたかもその返答を自分が発話したように感じ、自然な対話感が損なわれない。 If the option (response item) selected by the person H is read (speaks) from the touch display 12 or the speaker 26 provided in the vicinity thereof, the person H feels as if he / she spoke the response. , The natural feeling of dialogue is not impaired.

ここで、人Ｈがどの返答項目を選択したか（タッチしたか）は、先に説明したように、タッチディスプレイ１２に関連して設けられた座標検出回路２８（図５）が検出したタッチ位置をＣＰＵ２４ａが判定することによって判断される。そして、この判断結果（選択した返答項目）は、それ以降の対話の進捗のために、対話制御マネージャ１６に返される。 Here, as described above, which response item the person H has selected (touched) is the touch position detected by the coordinate detection circuit 28 (FIG. 5) provided in association with the touch display 12. Is determined by the CPU 24a. Then, this determination result (selected response item) is returned to the dialog control manager 16 for the progress of the subsequent dialog.

その後、読み上げられた音声の終了を判定した（ステップＳ５１）後、図６にリターンして、対話制御マネージャ１６の制御に遷移する。 Thereafter, after determining the end of the read-out voice (step S51), the process returns to FIG.

ここで、ステップＳ４９の動作が終了したとき、タッチディスプレイＩ／Ｆ２４のＣＰＵ２４ａがタッチディスプレイ１２での動作が終了したタッチディスプレイ終了フラグ（図示せず）を対話制御マネージャ１６に送って、図６の対話制御マネージャ１６によるロボットＲの制御に遷移する。 Here, when the operation of step S49 is finished, the CPU 24a of the touch display I / F 24 sends a touch display end flag (not shown) that the operation on the touch display 12 is finished to the dialog control manager 16, and FIG. Transition to control of the robot R by the dialogue control manager 16 is made.

このように、タッチディスプレイ１２による人Ｈの選択した返答の発話音声の終了を待って、ロボットＲが次の発話を開始することにより、あたかもロボットＲがタッチディスプレイ１２すなわち人Ｈからの音声を理解した上で、次の発話をしたように感じさせることができる。 In this way, the robot R starts the next utterance after the utterance voice of the response selected by the person H on the touch display 12 is finished, so that the robot R understands the voice from the touch display 12, that is, the person H. After that, you can feel as if the next utterance.

ここで、対話制御マネージャ１６が、タッチディスプレイＩ／Ｆ２４すなわちタッチディスプレイ１２に送信する動作指令の決定プロセスについて説明する。 Here, a process for determining an operation command transmitted from the dialog control manager 16 to the touch display I / F 24, that is, the touch display 12 will be described.

まず、一連の対話を、「情報の収集」、「関係性の構築」、「意思決定に向けた説得」、「意思決定」、「目的のない雑談」の５つのフェーズに分類する。ただし、以下において「相手」とは、人Ｈから見たロボットＲのことを指す。 First, a series of dialogues are classified into five phases: “information collection”, “relationship building”, “persuasion for decision making”, “decision making”, and “unintentional chat”. However, in the following, the “partner” refers to the robot R viewed from the person H.

「情報の収集」とは日常会話を通じて、人Ｈの考えや好みなどの情報を収集することが目的である。具体的には、「今日はどこから来ましたか？」や「名前はなんというのですか？」といった質問が考えられる。ここで収集された情報は、その後の「意思決定に向けた説得」などで利用することができる。 The “information collection” is intended to collect information such as the thoughts and preferences of the person H through daily conversation. Specifically, questions such as "Where did you come from today" or "What is your name?" The information collected here can be used in subsequent “persuades for decision making”.

情報を収集する場合、人Ｈの情報を正確に取得する必要があるため、意味の異なる４つの質問をタッチディスプレイに表示する。その際、人Ｈ自身に当てはまる事柄が、その選択肢の１つには必ず合致するよう文言（ダイアログまたはスクリプト）を設計する。たとえば「どこから来ましたか」という質問に対しては、図９（Ａ）に示すように、「大阪です」、「兵庫です」、「京都です」および「日本のどこかだよ」という選択肢（返答項目）をタッチディスプレイ１２に表示する。 When collecting information, since it is necessary to accurately acquire information about the person H, four questions having different meanings are displayed on the touch display. At that time, the wording (dialog or script) is designed so that a matter that applies to the person H always matches one of the options. For example, for the question “Where did you come from?” As shown in Figure 9 (A), the options are “Is Osaka,” “Hyogo,” “Kyoto,” and “Somewhere in Japan.” Response item) is displayed on the touch display 12.

ただし、返答項目（選択肢）は４つ以上でもよいが、実施例では４つとした。その理由は、選択肢が５つ以上になると、表示されている選択肢（の内容）を理解するために時間がかかってしまい、対話感が阻害されると考えるからである。 However, the number of response items (options) may be four or more, but is four in the embodiment. The reason is that if there are five or more options, it takes time to understand the displayed options (contents), and the feeling of dialogue is hindered.

次に、「関係の構築」とは、人Ｈと相手（ロボットＲ）との関係をより深めることが目的である。お互いの関係性を深めるためには、様々な方法があるが、その中でも効果的な方法はお互いに好意をもっていることを確かめあうことである。しかし、感情とは曖昧なものであるため、「情報の収集」で用いた４つの選択肢では人Ｈの感情に合致した候補を提示することが困難である。そこで、「関係の構築」フェーズではタッチディスプレイ１２が、適切なタイミングで、対話システム１０すなわちダイアログデータベース１４に予め用意している文言（テキスト文）を自動的に読み上げる。具体的には、「(ロボットＲ)なんだかとってもうれしそうに見えます。」という問いかけに対して「(人Ｈ)はい、とても楽しいです」といった返答を自動的に読み上げる。これにより、現在自分がうれしいという感情の認定が行われる。また、逆に「(人Ｈ)私と話していて楽しいでしょう？」という質問を自動的に読み上げ、それに対して「(ロボットＲ)はい、楽しいです」とロボットＲにも自動的に返答させることで、お互いに好意を持っている状況を構築することができる。感情とは定義が曖昧であり、人も自分自身でどのような感情であるか明確に理解することが難しい事柄である。そのため、たとえ自動的に読み上げられたとしても、それを自分の本来持っている感情であると認定されやすいと考えられる。 Next, “building a relationship” is intended to deepen the relationship between the person H and the opponent (robot R). There are various ways to deepen the relationship with each other, but the most effective method is to make sure that each other has a good will. However, since the emotion is ambiguous, it is difficult to present a candidate that matches the emotion of the person H with the four options used in “collecting information”. Accordingly, in the “relationship building” phase, the touch display 12 automatically reads out a word (text sentence) prepared in advance in the dialog system 10, that is, the dialog database 14 at an appropriate timing. Specifically, in response to the question “(Robot R) looks very happy”, the response “(Human H) Yes, it ’s very fun” is automatically read out. In this way, recognition of feelings that I am happy now is performed. Conversely, the question “(Human H) is it fun to talk to me?” Is automatically read out, and “(Robot R) Yes, it is fun” is automatically answered to Robot R. By doing so, it is possible to build a situation in which each other is friendly. Emotion is an ambiguous definition, and it is difficult for people to understand clearly what emotions they have. Therefore, even if it is automatically read out, it is considered that it is easy to be recognized as the emotion that I have.

次に「意思決定に向けた説得」とは、最終的に意思決定してもらいたい方向に人Ｈを誘導するための、ある問いかけに対しての同意を得ることを指す。具体的には、後の意思決定に影響を与える相手からの質問に対する返答の選択肢を１つに限定することで、強制的にシステム側が想定する方向に人Ｈを誘導する。人Ｈに対して、ある意思決定をしてもらうためには、それに必要な一定の根拠が必要である。たとえば、ロボットＲと人Ｈの対話の目的が商品の購入である場合、突然買って下さいと言っても、購入するという意思決定にはつながりにくい。一方、事前におすすめする商品の色と人Ｈの好みが一致していることがお互いの間で会話を通じて明らかにされていた場合、その会話が購入にいたる根拠になる。 Next, “persuasion toward decision making” refers to obtaining consent for a question to guide the person H in a direction in which he / she wants to finally make a decision. Specifically, the person H is forcibly guided in the direction assumed by the system side by limiting the number of options for replying to a question from a partner that affects subsequent decision making. In order for a person H to make a decision, a certain basis is necessary. For example, if the purpose of the dialogue between the robot R and the person H is to purchase a product, even if it is said that it should be suddenly purchased, it is difficult to make a decision to purchase. On the other hand, when it is revealed through conversation between each other that the color of the recommended product matches the preference of the person H, the conversation becomes the basis for the purchase.

たとえば、「(ロボットＲ)このカラーはお客さんにぴったりですね。お客さんもお好きですよね？」という質問に対して「(人Ｈ)はい、好きです」と返答している場合、その会話はその後の意思決定に影響を与えることが予想される。このような後の意思決定に影響を与える質問の場合、図９（Ｂ）に示すように、選択肢を１つに限定することで、強制的に人Ｈの返答を誘導する。これにより、その後に意思決定を、システムが想定する方向に誘導することができる。 For example, if you answered “(Robot R) This color is perfect for your customers. Do you like your customers?” Is expected to influence subsequent decisions. In the case of such a question that affects the subsequent decision making, as shown in FIG. 9B, the answer of the person H is forcibly induced by limiting the number of options to one. This makes it possible to guide decision making in the direction assumed by the system thereafter.

次に、「意思決定」とは、人Ｈの最終的な意思決定の表明のことを指す。具体的には、人Ｈに最終的な意思決定を促す際に、ロボットＲからの提案について同意する意味をもつが、表現の異なる複数の選択肢を提示することにより、返答が誘導されているにも関わらず自分で意思決定を行ったと感じされることができる。 Next, “decision” refers to the final decision-making statement of the person H. Specifically, when urging the person H to make a final decision, it has the meaning of agreeing on the proposal from the robot R, but the response is guided by presenting multiple options with different expressions. Nevertheless, you can feel that you have made your own decisions.

たとえば、「(ロボットＲ)お客さん、この商品購入しますか？」に対して、図９（Ｃ）に示すように、「気に入ったので買います」「うん、そうします」といった双方ともに同意を示す返答のみを提示することで、意思決定の誘導を行うことができる。ここで、同義の返答を複数表示する理由は、複数の中から１つを選択するという決定を人Ｈに促すことで、その選択に対する責任が生じるためである。最終的な意思決定を行うことは、これまでの「意思決定に向けた説得」や「関係を深める」フェーズでの選択よりも、より重い責任が生じ、熟慮が必要とされる行為である。そのため「自動的に読み上げる」や「選択肢を１つだけ提示する」といった方法では、人Ｈが自身で意思決定した感覚を与えることが困難である。よって、「意思決定」フェーズでは、同義の表現の異なる複数個の選択肢の中から選択させるという方法を採用した。 For example, “(Robot R) customer, do you want to buy this product?” As shown in Fig. 9 (C), both parties agree, “I want to buy because I like it”, “Yeah, I will” It is possible to guide decision making by presenting only the response indicating. Here, the reason why a plurality of synonymous responses are displayed is that by prompting the person H to decide to select one of the plurality of responses, responsibility for the selection is generated. Making a final decision is an action that requires more responsibilities and requires more consideration than the previous choices in the “persuade to decision” and “deepen relationships” phases. Therefore, methods such as “automatically read out” or “present only one option” make it difficult for the person H to give a sense of decision making. Therefore, in the “decision-making” phase, a method of selecting from a plurality of options having different synonymous expressions was adopted.

なお、ロボットＲと人Ｈの自然な会話に必要な条件として、会話のターンテイキング（turn taking：話者交替）が挙げられる。つまり、相手の発話に対して適切なタイミングで発話する必要がある。そのためにはタッチディスプレイ１２において人Ｈが返答するタイミングを適切に人Ｈに伝える必要がある。実施例では、人Ｈからの返答が必要とされないときはタッチディスプレイ１２になにも表示しないこととし、返答が必要とされるときだけタッチディスプレイ１２に選択肢を表示するようにした。その際、ビープ音を鳴らすなどして、表示された選択肢に人Ｈの注意を向けさせる。 Note that, as a condition necessary for a natural conversation between the robot R and the person H, there is a turn taking of the conversation. In other words, it is necessary to speak at an appropriate timing with respect to the other party's speech. For this purpose, it is necessary to appropriately inform the person H of the timing at which the person H responds on the touch display 12. In the embodiment, nothing is displayed on the touch display 12 when a response from the person H is not required, and options are displayed on the touch display 12 only when a response is required. At that time, a beep sound is generated to cause the person H to pay attention to the displayed option.

また、たとえば、ロボットＲの視線の向き、つまりアイコンタクトなどの非言語的な情報を与えることでさらに効果的にターンテイキングのタイミングを人Ｈに通知することができる。 Further, for example, by giving non-verbal information such as the direction of the line of sight of the robot R, that is, eye contact, the turn taking timing can be notified to the person H more effectively.

さらに、タッチディスプレイ１２に表示した選択肢の内容を人Ｈが理解しなければならないため、返答に時間がかかる場合がある。しかし、対話の内容によっては「即座に返答する」、「あえて言いよどむ」といった社会的に正しいタイミングでの発話が期待される場合がある。 Furthermore, since the person H must understand the contents of the options displayed on the touch display 12, it may take time to reply. However, depending on the content of the dialogue, there are cases where utterances are expected at socially correct timings such as “reply immediately” or “dare to say”.

そういう場合は、タッチディスプレイ１２に選択肢を表示せず、予め決められた内容を、適切なタイミングで自動的に読み上げる。これにより、より効果的に自然な会話を実現することができる。この場合の問題としては、自動的に読み上げられることで、自らの選択ではないという感覚を人Ｈに与えてしまう可能性があるが、会話のほとんどは自分で選択しているため、必要なときだけ自動的に返答しても、自分の意思が阻害されたとは思わないであろう。 In such a case, the option is not displayed on the touch display 12, and the predetermined content is automatically read out at an appropriate timing. Thereby, natural conversation can be realized more effectively. As a problem in this case, there is a possibility of giving a feeling to the person H that it is not one's choice by being automatically read out, but since most of the conversation is chosen by oneself, when necessary If you just answer automatically, you won't think your intention has been hampered.

上述の実施例は、物理的なエージェントであるロボットＲを用いた対話システムであるが、この発明は、そのような物理的なエージェントだけでなく、たとえばディスプレイの画面上に表示されるアバタないしキャラクタのようなエージェントと人Ｈとの対話システムも適用されてもよい。この場合、図１のロボットコントローラ２０は、そのようなアバタやキャラクタを表示するためのディスプレイＩ／Ｆ（図示せず）に代えられる。 The above-described embodiment is an interactive system using the robot R which is a physical agent. However, the present invention is not limited to such a physical agent, but for example, an avatar or character displayed on a display screen. A dialogue system between the agent and the person H as described above may also be applied. In this case, the robot controller 20 in FIG. 1 is replaced with a display I / F (not shown) for displaying such avatars and characters.

さらには、人Ｈの相手は、エージェントである必要はなく、相手も人であってよい。その場合の実施例が図１０に示される。 Furthermore, the person H does not need to be an agent, and the person may also be a person. An embodiment in that case is shown in FIG.

この実施例では、人Ｈ１が図１実施例のロボットＲに相当し、人Ｈ２が図１実施例の人Ｈに相当すると理解されたい。つまり、人Ｈ１がロボットＲに代わるので、対話制御マネージャ１６は、人Ｈ１のためのタッチディスプレイ１２１またはその近傍に設けたスピーカ（図示せず：図５のスピーカ２６に相当する。）から、図１実施例のロボットＲの発話文を発話させる。このような人Ｈ１のタッチディスプレイ１２１からの発話に対して、人Ｈ２は、図１実施例の人Ｈと同様に、自己のタッチディスプレイ１２２に表示された選択肢（返答項目）を、タッチして選択する。すると、その選択した項目を、タッチディスプレイ１２２またはその近傍に設けたスピーカ（図示せず：図５のスピーカ２６に相当する。）から、発話する。 In this embodiment, it should be understood that the person H1 corresponds to the robot R of FIG. 1 embodiment, and the person H2 corresponds to the person H of FIG. 1 embodiment. That is, since the person H1 takes the place of the robot R, the dialogue control manager 16 uses the touch display 121 for the person H1 or a speaker (not shown: corresponding to the speaker 26 in FIG. 5) provided in the vicinity thereof. The utterance sentence of the robot R of one embodiment is uttered. In response to such an utterance from the touch display 121 of the person H1, the person H2 touches an option (response item) displayed on his / her touch display 122 in the same manner as the person H in the embodiment of FIG. select. Then, the selected item is uttered from the touch display 122 or a speaker (not shown: corresponding to the speaker 26 in FIG. 5) provided in the vicinity thereof.

この図１０の実施例によれば、人どうしの対話において、言葉を発することが難しい人、たとえば、心理的に言葉を発することに躊躇を覚える人、発話を行うことが困難な障害を持つ人でも、対話をすることができる。また、対話システムが人の代わりにすべての発話文を読み上げることで、あたかもその発話は人の意図に沿ったものであるように感じさせることができる。 According to the embodiment of FIG. 10, in a dialogue between people, a person who has difficulty in speaking, for example, a person who is hesitant to speak words psychologically, or a person who has difficulty in speaking But you can talk. In addition, when the dialogue system reads out all the utterances on behalf of the person, it can feel as if the utterance is in line with the intention of the person.

図１０実施例では、上述とは逆に、人Ｈ２が図１実施例のロボットＲに相当し、人Ｈ１が図１実施例の人Ｈに相当するものとしてもよい。 In the embodiment in FIG. 10, contrary to the above, the person H2 may correspond to the robot R in the embodiment in FIG. 1, and the person H1 may correspond to the person H in the embodiment in FIG.

たとえば図１の実施例におけるロボットＲは、それ自身のスピーカ４６からスクリプトに従った発話文を発話し、図１０の実施例におけるタッチディスプレイ１２１または１２２に関連して設けられているスピーカからスクリプトに従った発話文を発話するので、これらのロボットＲやタッチディスプレイ１２１（１２２）を用いて発話させる対話制御マネージャ１６、ロボットコントローラ２０、タッチディスプレイＩ／Ｆ２４等は、発話エージェントということができる。 For example, the robot R in the embodiment of FIG. 1 utters an utterance sentence according to the script from its own speaker 46, and from the speaker provided in association with the touch display 121 or 122 in the embodiment of FIG. Since the uttered sentence is uttered, the dialog control manager 16, the robot controller 20, the touch display I / F 24, etc., which are uttered using the robot R or the touch display 121 (122), can be referred to as utterance agents.

なお、上述の実施例では、人Ｈまたは人Ｈ２もしくは人Ｈ１の返答項目（選択肢）をタッチディスプレイで表示し、タッチによって選択させたが、タッチディスプレイに限ることなく、表示のために他のディスプレイを用い、選択のために他のポインティングデバイスを用いることができる。 In the above-described embodiment, the response item (option) of the person H or the person H2 or the person H1 is displayed on the touch display and is selected by the touch. However, the display is not limited to the touch display, and other displays are used for display. And other pointing devices can be used for selection.

さらに、図１において、対話制御マネージャ１６、ロボットコントローラ２０、タッチディスプレイＩ／Ｆ２４はそれぞれ別々のものとして図示したが、これらは１つのコンピュータで実現するようにしてもよい。 Further, in FIG. 1, the dialogue control manager 16, the robot controller 20, and the touch display I / F 24 are illustrated as being separate from each other, but these may be realized by one computer.

１０ …対話システム
Ｒ …ロボットＲ
Ｈ、Ｈ１、Ｈ２ …人
１２、１２１、１２２ …タッチディスプレイ
１４ …ダイアログデータベース
１６ …対話制御マネージャ
１８ …ダイアログキュー
２０ …ロボットコントローラ
２２ …次候補プール
２４ …タッチディスプレイＩ／Ｆ
２６、４６ …スピーカ
２８ …座標検出回路 10 ... Dialog system R ... Robot R
H, H1, H2 ... person 12, 121, 122 ... touch display 14 ... dialog database 16 ... dialog control manager 18 ... dialog queue 20 ... robot controller 22 ... next candidate pool 24 ... touch display I / F
26, 46 ... speaker 28 ... coordinate detection circuit

Claims

Dialog database with preset scripts,
An utterance agent that utters according to the dialog database script;
Display means for displaying one or more options to be answered by the person in response to the utterance by the utterance agent according to the script, and when the person selects the option displayed on the display means, the content of the option is scripted A dialogue system comprising utterance means for uttering according to the above.

The dialog system according to claim 1, wherein the options displayed on the display means include two or more options having similar meanings.

The interactive system according to claim 1, wherein the number of options displayed on the display unit is one.

A control program that is executed by a computer of an interactive system that interacts according to a script that is preset in a dialog database, the utterance agent that utters the computer according to the script of the dialog database,
Display means for displaying one or more options to be answered by the person in response to the utterance by the utterance agent according to the script, and when the person selects the option displayed on the display, the content of the option is determined according to the script. A control program that functions as a means of speaking.