JP2023001299A

JP2023001299A - Dialog systems and programs

Info

Publication number: JP2023001299A
Application number: JP2022179640A
Authority: JP
Inventors: 雄一郎吉川; Yuichiro Yoshikawa; 尊優飯尾; Takamasa Iio; 浩石黒; Hiroshi Ishiguro
Original assignee: Osaka University NUC
Current assignee: Osaka University NUC
Priority date: 2018-07-30
Filing date: 2022-11-09
Publication date: 2023-01-04
Anticipated expiration: 2038-07-30
Also published as: JP7432960B2; JP2020020846A

Abstract

A dialogue system capable of avoiding breakdown of dialogue as much as possible when dialogue with an agent is provided.
SOLUTION: Two agents (R1, R2) are arranged at a dialogue place, and those robots dialogue with a person according to a script (dialogue data). One agent (R1) utters a question to the person, for example, "Where do you want to go on your day off?" When there is a response utterance from a person, it is determined whether or not the response sentence hits the keyword. When a keyword is hit, the other agent utters a pre-phrase that further develops the dialogue related to the keyword (for example, "As expected Umeda kana"), and the other agent further utters a recognition response. If there is no response utterance, the other agent utters a proxy response.
[Selection drawing] Fig. 1

Description

この発明は、対話システムおよびプログラムに関し、特にたとえば、対話場所において少なくとも１体のエージェントが少なくとも１人の人と対話する、対話システムおよびプログラムに関する。 The present invention relates to dialogue systems and programs, and more particularly to dialogue systems and programs in which, for example, at least one agent interacts with at least one person at a dialogue location.

ロボット分野の広がりとともに、ロボット研究は日常的な場面で働くロボットの研究に焦点を移しつつあり、人間が生活する環境の中で、人と対話することができるロボットの開発が注目されている。 With the expansion of the robot field, the focus of robot research is shifting to research on robots that work in everyday situations, and the development of robots that can interact with people in the environment where humans live is attracting attention.

近年の音声認識の技術の発展により、これまでにも人間と音声言語でやりとりをする機能を持つロボットが開発されてきているが、音声認識技術をいくら優れたものにしても、ロボットと人との対話において、人が人との対話に参加しているときに抱く「対話感（対話に参加しているという感覚）」と同等の感覚を、ロボットと対話する人に与え続けることは容易ではなかった。つまり、人が明らかに対話感を喪失することがあった。 With the recent development of speech recognition technology, robots with the ability to communicate with humans in verbal language have been developed. It is not easy to continue to give the person who interacts with the robot the same feeling of "dialogue feeling (the feeling of participating in the dialogue)" that a person has when participating in a dialogue with a human being. I didn't. In other words, people sometimes clearly lost the sense of dialogue.

背景技術の一例である特許文献１には、ロボットと人との対話において、ロボットが人の感情を推測して応答文の発話とその発話に伴う動作を決定することが開示されている。 Patent Literature 1, which is an example of the background art, discloses that in a dialogue between a robot and a person, the robot guesses the emotion of the person and determines the utterance of a response sentence and the action associated with the utterance.

特許文献２には、人とロボットとの対話システムにおいて、両者の同調を図ることで、持続的で自然なインタラクションを実現しようとするものである。 Japanese Patent Laid-Open No. 2002-200000 describes a dialog system between a human and a robot that attempts to realize a continuous and natural interaction by synchronizing the two.

特開2004-90109号公報[B25J 13/00…]Japanese Unexamined Patent Publication No. 2004-90109 [B25J 13/00...] 特開2012-181697号公報[G06F 3/16…]JP 2012-181697 A [G06F 3/16...]

特許文献１の技術においても、特許文献２の技術においても、音声認識に基づく処理に限界があり、上述の「対話感」を人が持続することは容易ではない。つまり、対話の破綻を招来し易い。 In both the technique of Patent Document 1 and the technique of Patent Document 2, there is a limit to the processing based on speech recognition, and it is not easy for a person to maintain the above-described "sense of dialogue". In other words, it is easy to invite the failure of dialogue.

それゆえに、この発明の主たる目的は、新規な、対話システムおよびプログラムを提供することである。 SUMMARY OF THE INVENTION Therefore, a primary object of the present invention is to provide a novel interactive system and program.

この発明の他の目的は、人との対話の破綻を可及的回避できる、対話システムおよびプログラムを提供することである。 Another object of the present invention is to provide a dialogue system and program that can avoid breakdown of dialogue with people as much as possible.

この発明は、上記の課題を解決するために、以下の構成を採用した。なお、括弧内の参照符号および補足説明等は、この発明の理解を助けるために記述する実施形態との対応関係を示したものであって、この発明を何ら限定するものではない。 In order to solve the above problems, the present invention employs the following configuration. It should be noted that reference numerals in parentheses, supplementary descriptions, etc. indicate the correspondence with the embodiments described to aid understanding of the invention, and do not limit the invention in any way.

第１の発明は、対話場所にある少なくとも１体のエージェントを備え、対話場所においてエージェントが、ダイアログに従って、人と対話する対話システムであって、エージェントに質問文を発話させる質問文発話部、質問文に対する人からの応答文の発話の有無を判断する第１判断部、第１判断部が、応答文が発話されたことを判断したとき、応答文が所定のキーワードにヒットしたかどうか判断する第２判断部、第２判断部が、応答文が特定のキーワードにヒットしたことを判断したとき、エージェントに、人からの応答の後のエージェントの次の発話を誘う、事前フレーズを発話させる事前フレーズ発話部、および事前フレーズ発話部による事前フレーズの発話に続いて、事前フレーズに関し対話の脈絡を作る認識応答文を発話させる認識応答文発話部を備える、対話システムである。 A first invention is a dialogue system comprising at least one agent at a dialogue place, wherein the agent at the dialogue place interacts with a person according to a dialogue, comprising a question sentence utterance part for causing the agent to utter a question sentence, a question; A first judging part for judging whether or not a response sentence is uttered by a person to the sentence, when judging that the response sentence is uttered, judges whether or not the response sentence hits a predetermined keyword. A second determination unit, when the second determination unit determines that the response sentence hits a specific keyword, causes the agent to utter a pre-phrase that invites the agent's next utterance after the response from the person. The dialog system includes a phrase utterance unit and a recognition response sentence utterance unit that, following the utterance of the preliminary phrase by the preliminary phrase utterance unit, utters a recognition response sentence that creates a dialogue context for the preliminary phrase.

第１の発明では、対話システム（１０：実施例において相当する部分を例示する、限定を意図しない参照符号。以下、同様。）は、対話場所（１２）ある少なくとも１体のエージェント（Ｒ１、Ｒ２）を備え、対話場所においてエージェントがダイアログに従って人（Ｈ）と対話する。質問発話部（２０ａ、Ｓ７）は、エージェントに質問文を発話させる。人（Ｈ）はその質問文に対して応答文を発話するが、第１判断部（２０ａ、Ｓ９）が、その質問文に対する人からの応答文の発話の有無を判断する。第１判断部（２０ａ、Ｓ９）が、応答文が発話されたことを判断したとき、第２判断部（２０ａ、Ｓ１１、Ｓ１３）は、応答文が所定のキーワードにヒットしたかどうか判断する。第２判断部（２０ａ、Ｓ１１、Ｓ１３）が、応答文が特定のキーワードにヒットしたことを判断したとき、事前フレーズ発話部（２０ａ、Ｓ１７）が、エージェントに、人からの応答の後のエージェントの次の発話を誘う、事前フレーズを発話させる。認識応答文発話部（２０ａ、Ｓ１９）は、事前フレーズに関し対話の脈絡を作る認識応答文を発話させる。 In the first invention, the dialogue system (10: a non-limiting reference sign exemplifying the corresponding part in the embodiment; hereinafter the same) includes at least one agent (R1, R2) in a dialogue place (12). ), and the agent interacts with the person (H) according to the dialog at the interaction place. The question utterance unit (20a, S7) makes the agent utter a question sentence. The person (H) utters a response sentence to the question sentence, and the first determination unit (20a, S9) determines whether or not the person has uttered a response sentence to the question sentence. When the first determination unit (20a, S9) determines that the response sentence is uttered, the second determination unit (20a, S11, S13) determines whether the response sentence hits a predetermined keyword. When the second determination unit (20a, S11, S13) determines that the response sentence hits a specific keyword, the pre-phrase utterance unit (20a, S17) instructs the agent to Invite the next utterance of the utterance of a pre-phrase. A recognition response sentence utterance unit (20a, S19) utters a recognition response sentence that creates a dialogue context for the preliminary phrase.

第１の発明によれば、認識応答文を発話させることによって、対話の破綻を可及的回避できる。 According to the first invention, by causing the recognized response sentence to be spoken, it is possible to avoid breakdown of the dialogue as much as possible.

第２の発明は、対話場所にある少なくとも１体のエージェントを備え、対話場所においてエージェントが、ダイアログに従って、人と対話する対話システムであって、エージェントに質問文を発話させる質問文発話部、質問文に対する人からの応答文の発話の有無を判断する第１判断部、第１判断部が、応答文が発話されたことを判断しなかったとき、エージェントに、質問文に対して人にかわって応答するための代理応答文を発話させる、代理応答文発話部、および代理応答文発話部による代理応答文の発話に続いて、代理応答文に関し対話の脈絡を作る認識応答文を発話させる認識応答文発話部を備える、対話システムである。 A second invention is a dialogue system comprising at least one agent at a dialogue place, wherein the agent at the dialogue place interacts with a person according to a dialogue, comprising a question sentence utterance unit for causing the agent to utter a question sentence, a question; When the first judging unit for judging whether or not the person has uttered a response sentence to the sentence does not judge that the response sentence has been uttered, the agent is instructed to replace the person in response to the question sentence. a proxy response sentence utterance unit, and following the utterance of the proxy response sentence by the proxy response sentence utterance unit, recognition of uttering a response sentence that creates a dialogue context for the proxy response sentence A dialogue system comprising a response sentence utterance unit.

第２の発明では、第１判断部（２０ａ、Ｓ９）が、応答文が発話されたことを判断しなかったとき、代理応答文発話部（２０ａ、Ｓ２７）によって、エージェントに、質問文に対して人にかわって応答するための代理応答文を発話させ、さらに認識応答文発話部（２０ａ、Ｓ２９）が、代理応答文発話部による代理応答文の発話に続いて、代理応答文に関し対話の脈絡を作る認識応答文を発話させる。 In the second invention, when the first determination unit (20a, S9) does not determine that the response sentence is uttered, the proxy response sentence utterance unit (20a, S27) instructs the agent to respond to the question sentence. A recognized response sentence utterance unit (20a, S29) utters a proxy response sentence for responding on behalf of a person, and furthermore, a recognition response sentence utterance unit (20a, S29) utters a dialogue regarding the proxy response sentence following the utterance of the proxy response sentence by the proxy response sentence utterance unit. Speak a recognition response sentence that creates a context.

第２の発明によれば、応答文が発話されなくても、代理応答発話や認識応答発話を行わせることによって、対話の破綻を可及的回避できる。 According to the second invention, even if no response sentence is uttered, the breakdown of the dialogue can be avoided as much as possible by making the proxy response utterance or the recognition response utterance.

第３の発明は、対話場所にある少なくとも１体のエージェントを備え、対話場所においてエージェントが、ダイアログに従って、人と対話する対話システムのコンピュータを、エージェントに質問文を発話させる質問文発話部、質問文に対する人からの応答文の発話の有無を判断する第１判断部、第１判断部が、応答文が発話されたことを判断したとき、応答文が所定のキーワードにヒットしたかどうか判断する第２判断部、第２判断部が、応答文が特定のキーワードにヒットしたことを判断したとき、エージェントに、人からの応答の後のエージェントの次の発話を誘う、事前フレーズを発話させる事前フレーズ発話部、および事前フレーズ発話部による事前フレーズの発話に続いて、事前フレーズに関し対話の脈絡を作る認識応答文を発話させる認識応答文発話部として機能させる、対話システムのプログラムである。 A third aspect of the invention comprises at least one agent at a dialogue location, and the agent at the dialogue location interacts with a person according to the dialogue. A first judging part for judging whether or not a response sentence is uttered by a person to the sentence, when judging that the response sentence is uttered, judges whether or not the response sentence hits a predetermined keyword. A second determination unit, when the second determination unit determines that the response sentence hits a specific keyword, causes the agent to utter a pre-phrase that invites the agent's next utterance after the response from the person. This is a dialogue system program that functions as a phrase utterance unit and a recognition response sentence utterance unit that, following the utterance of a preliminary phrase by the preliminary phrase utterance unit, utters a recognition response sentence that creates a dialogue context for the preliminary phrase.

第３の発明によれば、第１の発明と同様の効果が期待できる。 According to the third invention, effects similar to those of the first invention can be expected.

第４の発明は、対話場所にある少なくとも１体のエージェントを備え、対話場所においてエージェントが、ダイアログに従って、人と対話する対話システムのコンピュータを、エージェントに質問文を発話させる質問文発話部、質問文に対する人からの応答文の発話の有無を判断する第１判断部、第１判断部が、応答文が発話されたことを判断しなかったとき、エージェントに、質問文に対して人にかわって応答するための代理応答文を発話させる、代理応答文発話部、および代理応答文発話部による代理応答文の発話に続いて、代理応答文に関し対話の脈絡を作るる認識応答文を発話させる認識応答文発話部として機能させる、対話システムのプログラムである。 A fourth invention is provided with at least one agent at a dialogue location, and the agent at the dialogue location interacts with a person according to the dialogue. When the first judging unit for judging whether or not the person has uttered a response sentence to the sentence does not judge that the response sentence has been uttered, the agent is instructed to replace the person in response to the question sentence. the proxy response sentence utterance unit, and following the utterance of the proxy response sentence by the proxy response sentence utterance unit, the recognition response sentence that creates the dialogue context for the proxy response sentence is uttered. This is a dialog system program that functions as a recognition response sentence utterance unit.

第４の発明によれば、第２の発明と同様の効果が期待できる。 According to the fourth invention, effects similar to those of the second invention can be expected.

この発明によれば、エージェントと人の対話の破綻を可及的回避することができる。 According to the present invention, breakdown of dialogue between an agent and a person can be avoided as much as possible.

この発明の上述の目的、その他の目的、特徴および利点は、図面を参照して行う以下の実施例の詳細な説明から一層明らかとなろう。 The above object, other objects, features and advantages of the present invention will become more apparent from the following detailed description of the embodiments with reference to the drawings.

図１はこの発明の一実施例の対話システムの概要を示す概略図である。FIG. 1 is a schematic diagram showing an overview of a dialogue system according to one embodiment of the present invention. 図２は図１実施例におけるセンサマネージャの構成の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of the configuration of the sensor manager in the FIG. 1 embodiment. 図３は図１実施例におけるグループマネージャの構成の一例を示すブロック図である。FIG. 3 is a block diagram showing an example of the configuration of the group manager in the embodiment of FIG. 1; 図４は図１実施例におけるロボットの一例を示す概略図である。FIG. 4 is a schematic diagram showing an example of the robot in FIG. 1 embodiment. 図５は図４のロボットを制御するロボットコントローラの構成の一例を示すブロック図である。5 is a block diagram showing an example of the configuration of a robot controller that controls the robot of FIG. 4. FIG. 図６は図１に示すグループマネージャの動作の一例を示すフロー図である。FIG. 6 is a flowchart showing an example of the operation of the group manager shown in FIG. 1;

図１を参照して、この実施例の対話システム１０の対話場所１２には、第１ロボットＲ１および第２ロボットＲ２と１人の人Ｈが存在する。ただし、ロボットの数は１体でもよく、３体以上でもよい。また、人の数は２人以上でもよい。なお、以下において、第１ロボットＲ１および第２ロボットＲ２を特に区別する必要がないとき、単にロボットＲと呼ぶことがある。 Referring to FIG. 1, a first robot R1, a second robot R2 and one person H are present at a dialogue place 12 of the dialogue system 10 of this embodiment. However, the number of robots may be one, or three or more. Also, the number of people may be two or more. In addition, hereinafter, the first robot R1 and the second robot R2 may simply be referred to as robots R when there is no particular need to distinguish between them.

この実施例の対話システム１０は、図１の矢印Ａで示すように、ロボットＲ１またはＲ２が予め準備したダイアログ（台本）に従って人Ｈに対して質問文を発話させ、その質問文に対して人Ｈからの応答発話が適切な場合や、その質問文に対して人Ｈからの応答発話がない場合など、矢印Ｂで示すようにロボットＲ１またはＲ２が質問文を発話したロボットＲ１またはＲ２に対してダイアログに従って事前フレーズ発話や代理応答発話などを行わせる。 The dialogue system 10 of this embodiment, as indicated by arrow A in FIG. When the response utterance from H is appropriate, or when there is no response utterance from the person H to the question, as indicated by the arrow B, the robot R1 or R2 responds to the robot R1 or R2 that uttered the question. Then, according to the dialog, pre-phrase utterances, proxy response utterances, etc. are performed.

人Ｈに対してロボットＲ１またはＲ２が質問文を発話しても、人Ｈから応答発話がなかったり、あるいは応答発話があったとしても、たとえば「わからない」、「知らない」、「忘れた」、「覚えていない」または「答えたくない」などのネガティブな発話であったりした場合、人ＨとロボットＲ１またはＲ２が対話を続けていくこと自体が困難になりやすい。つまり、ロボットＲ１またはＲ２に応じる形で人Ｈの発話が生成されなければ、対話は破綻しやすい。そこで、この実施例では、人Ｈから適切な応答発話がない場合には、人Ｈに対して質問文を発話したロボットＲ１またはＲ２に対して、ロボットＲ１またはＲ２（質問発話をしたロボットと同じであってもよいし、別のロボットであってもよい）に発話をさせることによって、対話の継続を可及的可能にする。 Even if the robot R1 or R2 utters a question sentence to the person H, there is no response utterance from the person H, or even if there is a response utterance, for example, "I don't know", "I don't know", "I forgot". , "I don't remember" or "I don't want to answer", it is likely to be difficult for the person H and the robot R1 or R2 to continue the dialogue. In other words, unless the human H's utterance is generated in response to the robot R1 or R2, the dialogue is likely to break down. Therefore, in this embodiment, when there is no appropriate response utterance from the person H, the robot R1 or R2 (same as the robot that uttered the question) is given to the robot R1 or R2 that has uttered the question sentence to the person H. or another robot) to make it possible to continue the dialogue as much as possible.

他方で、人Ｈから適切な応答発話がある場合には、人Ｈに対して質問文を発話したロボットＲ１またはＲ２に対して、ロボットＲ１またはＲ２（質問発話をしたロボットと同じであってもよいし、別のロボットであってもよい）に事前フレーズを発話させることによって、そのメイントピックでの対話の継続を助長する。つまり、人Ｈに対して質問を続けることによって、ロボットＲ側からするとロボットＲの、人Ｈと経験を共有したいという欲求を表現して、人Ｈに対話感を提供する。 On the other hand, when there is an appropriate response utterance from the person H, the robot R1 or R2 (even if it is the same robot that uttered the question) is sent to the robot R1 or R2 that uttered the question sentence to the person H. or another robot) to utter a preliminary phrase to encourage continuation of the dialogue on that main topic. In other words, by continuing to ask questions to the person H, the robot R expresses the desire of the robot R to share the experience with the person H and provides the person H with a sense of dialogue.

この実施例のような対話システムは、たとえば、高齢者の発話を引き出すツールなどとして、利用可能である。 A dialogue system like this embodiment can be used, for example, as a tool for extracting utterances from the elderly.

対話システム１０の対話場所１２には、この実施例では、聴覚センサとしてのマイク１４および視覚センサとしてのカメラ１６が設けられる。マイク１４は、ロボットＲや人Ｈの発話による音声を聴取し、あるいは環境音を取得するためのもので、必要ならマイクアレイであってよい。カメラ１６は、同じく対話場所１２の状況、特に人Ｈの表情や動作を撮影するカメラであり、動画または静止画を撮影する。カメラ１６も必要なら２台以上設置してもよい。 The dialogue location 12 of the dialogue system 10 is provided with a microphone 14 as an auditory sensor and a camera 16 as a visual sensor in this embodiment. The microphone 14 is for listening to voices uttered by the robot R or the person H, or acquiring environmental sounds, and may be a microphone array if necessary. The camera 16 is also a camera for photographing the situation of the dialogue place 12, especially the facial expressions and actions of the person H, and photographs moving images or still images. Two or more cameras 16 may be installed if necessary.

さらに、上述のマイク１４やカメラ１６の他に、センサとして、図示はしないが、装着型の姿勢センサ、加速度センサ、心拍の状態、呼吸の状態、体動（体の動き）の状態などの生体信号を検知する生体センサ、モーションキャプチャシステムなどを設けてもよい。 Furthermore, in addition to the microphone 14 and the camera 16 described above, sensors (not shown) include a wearable posture sensor, an acceleration sensor, a heartbeat state, a breathing state, and a body motion (body movement) state. Biometric sensors, motion capture systems, etc. may be provided to detect signals.

マイク１４が取得した音声信号およびカメラ１６が撮影した画像信号などのセンサ信号は、センサマネージャ１８に入力される。センサマネージャ１８は、これらのセンサ信号を取得して、対話場所１２の状況を判定して、判定結果をセンシングデータとして、グループマネージャ２０に出力する。 Sensor signals such as audio signals acquired by the microphone 14 and image signals captured by the camera 16 are input to the sensor manager 18 . The sensor manager 18 acquires these sensor signals, determines the situation of the dialogue place 12, and outputs the determination result to the group manager 20 as sensing data.

図２を参照して、センサマネージャ１８は、ＣＰＵ（中央演算処理装置）１８aを含み、ＣＰＵ１８ａには、内部バス１８ｂを介して通信装置１８ｃが接続される。通信装置１８ｃは、たとえばネットワークインターフェースコントローラ（ＮＩＣ）などを含み、ＣＰＵ１８ａはこの通信装置１８ｃを介してグループマネージャ２０などと通信でき、それらの間でデータの授受を行うことができる。 Referring to FIG. 2, the sensor manager 18 includes a CPU (Central Processing Unit) 18a, and a communication device 18c is connected to the CPU 18a via an internal bus 18b. The communication device 18c includes, for example, a network interface controller (NIC) and the like, and the CPU 18a can communicate with the group manager 20 and the like via this communication device 18c, and can exchange data therebetween.

ＣＰＵ１８ａにはさらに、内部バス１８ｂを介して、メモリ１８ｄが接続される。メモリ１８ｄはＲＯＭやＲＡＭを含む。たとえばＤＳＰ（Digital Signal Processor）で構成されるセンサＩ／Ｆ（インタフェース）１８ｅを通して、マイク１４（図１）からの音声信号やカメラ１６（図１）からの画像信号を含むセンサ信号が入力される。そして、メモリ１８ｄは、センサ信号を一時的に記憶する。 A memory 18d is further connected to the CPU 18a via an internal bus 18b. The memory 18d includes ROM and RAM. For example, a sensor signal including an audio signal from the microphone 14 (FIG. 1) and an image signal from the camera 16 (FIG. 1) is input through a sensor I/F (interface) 18e configured by a DSP (Digital Signal Processor). . The memory 18d temporarily stores the sensor signal.

センサマネージャ１８は一種の判定器であり、ＣＰＵ１８ａは、メモリ１８ｄに記憶したセンサデータに基づいて、対話場所１２の状態を判定する。そして、センサマネージャ１８は、判定した状態を示すデータをグループマネージャ２０に送る。 The sensor manager 18 is a kind of determiner, and the CPU 18a determines the state of the dialogue place 12 based on the sensor data stored in the memory 18d. The sensor manager 18 then sends data indicating the determined state to the group manager 20 .

また、センサマネージャ１８に必要なプログラム（ＯＳやセンサ信号取得プログラムなど）は、メモリ１８ｄに記憶される。センサマネージャ１８はメモリ１８ｄに記憶されたプログラムに従って動作する。 Programs (OS, sensor signal acquisition program, etc.) required for the sensor manager 18 are stored in the memory 18d. The sensor manager 18 operates according to programs stored in the memory 18d.

なお、ＣＰＵ１８ａにはさらに、図示しないが、キーボードやディスプレイが付属されてもよい。 Note that the CPU 18a may be further provided with a keyboard and a display (not shown).

グループマネージャ２０は、たとえば後述の図６のフロー図に従って、２体のロボットＲ１およびＲ２のそれぞれの発話動作（言語動作：Verbal operation）および振舞い（非言語動作：Nonverbal operation）を制御する。 The group manager 20 controls the speech operation (verbal operation) and behavior (nonverbal operation) of each of the two robots R1 and R2, for example, according to the flow diagram of FIG. 6, which will be described later.

グループマネージャ２０は、ＣＰＵ２０aを含み、ＣＰＵ２０ａには、内部バス２０ｂを介して通信装置２０ｃが接続される。通信装置２０ｃは、たとえばネットワークインターフェースコントローラ（ＮＩＣ）などを含み、ＣＰＵ２０ａはこの通信装置２０ｃを介してセンサマネージャ１８やロボットＲなどと通信でき、それらの間でデータの授受を行うことができる。 The group manager 20 includes a CPU 20a, and a communication device 20c is connected to the CPU 20a via an internal bus 20b. The communication device 20c includes, for example, a network interface controller (NIC), etc., and the CPU 20a can communicate with the sensor manager 18, the robot R, and the like via this communication device 20c, and can exchange data therebetween.

ＣＰＵ２０ａにはさらに、内部バス２０ｂを介して、メモリ２０ｄが接続される。メモリ２０ｄはＲＯＭやＲＡＭを含む。メモリＩ／Ｆ２０ｅを通してダイアログ（Dialog：対話）データベース２２から、スクリプトデータを読み込み、それをメモリ２０ｄに一時的に記憶する。 A memory 20d is further connected to the CPU 20a via an internal bus 20b. The memory 20d includes ROM and RAM. Script data is read from a dialog database 22 through a memory I/F 20e and temporarily stored in a memory 20d.

また、グループマネージャ２０に必要なプログラム（ＯＳやセンサ信号取得プログラムなど）は、メモリ２０ｄに記憶される。グループマネージャ２０はメモリ２０ｄに記憶されたプログラムに従って動作する。 Programs (OS, sensor signal acquisition program, etc.) required for the group manager 20 are stored in the memory 20d. Group manager 20 operates according to a program stored in memory 20d.

なお、このグループマネージャ２０のＣＰＵ２０ａは、上述のように、各ロボットの動作つまり振舞いを制御するが、その振舞いの履歴は、上述のメモリ２０ｄに蓄積され、必要に応じて、センサマネージャ１８に提供する。 The CPU 20a of the group manager 20 controls the actions or behaviors of each robot as described above. do.

ＣＰＵ２０ａにはさらに、内部バス２０ｂを介して、メモリ２０ｄおよび入力装置２０ｅが接続される。メモリ２０ｄはＲＯＭやＲＡＭを含む。メモリＩ／Ｆ２０ｆを通してダイアログデータベース２２から、スクリプト（ダイアログ）を読み込み、それをメモリ２０ｄに一時的に記憶する。 A memory 20d and an input device 20e are further connected to the CPU 20a via an internal bus 20b. The memory 20d includes ROM and RAM. A script (dialog) is read from the dialog database 22 through the memory I/F 20f and temporarily stored in the memory 20d.

ただし、「ダイアログ」は、対話中に行うべき発話や非言語動作の指令の系列を意味し、ダイアログデータベース２２は、ダイアログの集合（たとえば、子供のころの話、旅行の話、健康の話など、各メイントピックの対話のための指令の系列が含まれる）である。そして、「スクリプト」は、その指令の系列を表す文字列のことであり、スクリプトデータは、その指令を表す文字列である。したがって、スクリプトデータの系列がスクリプトになる。このようなスクリプトは、グループマネージャ２０によって、ダイアログキュー２３ｃからロボットＲ１およびＲ２に送信される。 However, "dialogue" means a sequence of commands for utterances and nonverbal actions to be performed during a dialogue, and the dialogue database 22 stores a collection of dialogues (for example, stories about childhood, stories about travel, stories about health, etc.). , which contains a series of directives for each main topic dialogue). "Script" is a character string representing the series of commands, and script data is a character string representing the command. Therefore, the series of script data becomes a script. Such scripts are sent by the group manager 20 from the dialogue queue 23c to the robots R1 and R2.

ただし、スクリプトデータは、言語データだけでなく、非言語データも含む。言語データは、ロボットＲ１およびＲ２の発話を指示するスクリプトデータであり、非言語データは、たとえばロボットＲ１およびＲ２の動作、人Ｈを見る、頷く、首を横に振る、首をかしげるなどの、非言語動作を指示するスクリプトデータである。 However, script data includes not only linguistic data but also non-linguistic data. The linguistic data is script data that instructs the utterances of the robots R1 and R2, and the non-linguistic data is, for example, actions of the robots R1 and R2, looking at the person H, nodding, shaking his head sideways, tilting his head, etc. It is script data that instructs non-verbal actions.

さらに、図１に示す対話システム１０は、次ダイアログ候補プール２３ａを備える。次ダイアログ候補プール２３ａは、ダイアログキュー２３ｃに記憶されている現在進行中のスクリプトに対する人Ｈの応答に応じて動的に選択される候補となる次に発話すべき一群のスクリプトを記憶しておくための記憶領域であって、特に、人Ｈからの応答文の発話の有無、応答文に含まれるキーワードが予め設定している特定のキーワードに対してヒットしたかどうか、などに応じて、個別に準備しておく。 Further, the dialog system 10 shown in FIG. 1 comprises a next dialog candidate pool 23a. The next dialog candidate pool 23a stores a group of scripts to be spoken next, which are candidates to be dynamically selected according to the response of the person H to the currently ongoing script stored in the dialog queue 23c. In particular, depending on whether or not there is an utterance of a response sentence from person H, whether a keyword included in the response sentence hits a specific keyword set in advance, etc. be prepared for

不応答ダイアログプール２３ｂは、たとえばロボットＲ１が発話した質問文に対して人Ｈからの応答文の発話がないとき（マイク１４への応答文の音声入力がないときだけでなく、音声入力がマイク１４にあった場合でも、その応答文が認識できなかった場合、認識できたとしてもその応答文がネガティブな発話のものである場合なども含む。）にロボットＲ１および／またはＲ２が発話すべきダイアログがプールしている。すなわち、不応答ダイアログプール２３ｂは、次ダイアログ候補プール２３ａにロードした将来の一群のダイアログやダイアログキュー２３ｃにロードされ、進行中であるダイアログでは予定していなかった例外的な場合に対処するために発話しまたは非言語動作を実行すべき一群のスクリプトを記憶しておくための記憶領域である。 The non-response dialog pool 23b, for example, when there is no response sentence from the person H to the question sentence uttered by the robot R1 (not only when there is no voice input of the response sentence to the microphone 14, but also when the voice input is 14, the robot R1 and/or R2 should speak to the robot R1 and/or R2. Dialog pooling. That is, the unresponsive dialog pool 23b is loaded into the dialog queue 23c and a group of future dialogs loaded into the next dialog candidate pool 23a, and is loaded into the dialog queue 23c to handle exceptional cases that were not planned for the ongoing dialog. A storage area for storing a group of scripts that should perform speech or non-verbal actions.

ダイアログキュー２３ｃもたとえばメモリ２０ｄの中の一領域であるが、このダイアログキュー２３ｃには、次ダイアログ候補プール２３ａにロードされているスクリプトデータや、不応答ダイアログプール２３ｂにロードされているスクリプトデータを、次にロボットＲ１および／またはＲ２が即座に実行できるように，待ち行列の形でロードすることができる。 The dialog queue 23c is also an area in the memory 20d, for example. Script data loaded in the next dialog candidate pool 23a and script data loaded in the non-response dialog pool 23b are stored in this dialog queue 23c. , can then be loaded in a queue for immediate execution by robots R1 and/or R2.

スクリプトには、たとえばヘッダとして、それに含まれるスクリプトデータの実行時間（tnext）が書き込まれていて、ダイアログキュー２３ｃでは、その実行時間（tnext）によって常にスクリプトデータがソーティングされ、グループマネージャ２０は、実行時間が同じスクリプトデータが同時に実行されるように、各ロボットコントローラ２４にスクリプトデータを送る。したがって、たとえば、ロボットＲ１およびＲ２が同時に同じ動作、たとえば人Ｈを見るなどの動作ができるし、同じロボットＲ１またはＲ２が、たとえば、発話と同時に他方のロボットまたは人Ｈを見ることもできる。 In the script, for example, the execution time (tnext) of the script data contained therein is written as a header, and the dialog queue 23c always sorts the script data according to the execution time (tnext). The script data is sent to each robot controller 24 so that the script data with the same time are executed at the same time. Therefore, for example, the robots R1 and R2 can perform the same action at the same time, such as looking at the person H, and the same robot R1 or R2 can look at the other robot or the person H at the same time as speaking, for example.

ここで、この実施例におけるダイアログは、メイントピックないしメインカテゴリと、各メイントピックないしメインカテゴリの中のいくつかのサブトピックないしサブカテゴリで構成される。この実施例では、人Ｈとの対話の深度を深くすることができるように、たとえば子供のころの話、旅行の話、健康の話など、比較的少ないメイントピック（ないしカテゴリ）のダイアログを準備する。 Here, the dialog in this embodiment consists of a main topic or main category and several subtopics or subcategories within each main topic or main category. In this embodiment, in order to deepen the depth of dialogue with person H, dialogues on relatively few main topics (or categories) such as childhood stories, travel stories, and health stories are prepared. do.

子供のころの話、というメイントピック（大トピック）の中には、たとえば、遊び、食事、生活・住まい、などのサブトピック（中トピック）を設定する。サブトピック「遊び」には、たとえば、場所、公園、おもちゃ、かくれんぼ、ままごと、鬼ごっこ、かけっこ、だるまさんがころんだ、などのサブトピック（小トピック）を設定する。サブトピック「食事」には、たとえば、給食、おやつ、おかず、玄米、好きなもの、嫌いなもの、牛乳、ケーキ、魚と肉、ごはんとパン、カレーと寿司、などのサブトピックを設定する。サブトピック「生活・住まい」には、たとえば、住んでいたところ、家、井戸、お父さんとお母さん、兄弟姉妹、仕事、鶏、牛、馬、犬と猫、楽しかったこと、辛かったこと、などを設定する。 In the main topic (major topic) of the story of childhood, subtopics (middle topics) such as play, meals, life and housing are set. For the subtopic “play”, subtopics (small topics) such as places, parks, toys, hide-and-seek, playing house, tag, running, and Daruma-san fell are set. For the subtopic "meal", subtopics such as school lunch, snacks, side dishes, brown rice, favorite things, disliked things, milk, cake, fish and meat, rice and bread, curry and sushi, etc. are set. In the sub-topic "life/housing", for example, where you lived, house, well, father and mother, brothers and sisters, work, chickens, cows, horses, dogs and cats, fun times, hard times, etc. set.

旅行の話、というメイントピックの中には、たとえば、温泉、富士山、移動手段（飛行機、新幹線）などのサブトピックを設定し、それぞれのサブトピックにはさらに細かいサブトピックを準備しておく。 Within the main topic of travel, set subtopics such as hot springs, Mt.

健康の話、というメイントピックの中には、たとえば、運動、ゴルフ、などのサブトピックを設定し、それぞれのサブトピックにはさらに細かいサブトピックを準備しておく。 Within the main topic of health, subtopics such as exercise and golf are set, and further detailed subtopics are prepared for each subtopic.

図４を参照して、この図４は実施例のロボットＲの外観を示し、ロボットＲは台３０上に、台３０に対して、前後左右に回転できるように、設けられる。つまり、胴体３２には２自由度が設定されている。 Referring to FIG. 4, this FIG. 4 shows the appearance of the robot R of the embodiment. In other words, the body 32 has two degrees of freedom.

胴体３２の人の肩に相当する左右位置からは、それぞれに、肩関節（図示せず）によって、右腕３４Ｒおよび左腕３４Ｌが、前後左右に回転可能に設けられる。つまり、右腕３４Ｒおよび左腕３４Ｌには、それぞれ、２自由度が設定されている。 A right arm 34R and a left arm 34L are rotatably provided forward, backward, leftward and rightward by shoulder joints (not shown), respectively, from left and right positions of the body 32 corresponding to human shoulders. That is, two degrees of freedom are set for each of the right arm 34R and the left arm 34L.

胴体３２の上端中央部には首３６が設けられ、さらにその上には頭部３８が設けられる。首３６すなわち頭部３８は、胴体３２に対して、前後左右に回転できるように、取り付けられている。つまり、首３６すなわち頭部３８には、ロール角（左右の傾げ）、ピッチ角（前後の傾げ）、ヨー（左右の回転）３自由度が設定されている。 A neck 36 is provided at the center of the upper end of the body 32, and a head 38 is provided thereon. A neck 36 or head 38 is attached to the body 32 so as to be rotatable forward, backward, left and right. That is, the neck 36, that is, the head 38, is set with three degrees of freedom of roll angle (tilt to the left and right), pitch angle (tilt to the front and back), and yaw (rotation to the left and right).

頭部３８の前面すなわち人間の顔に相当する面には、右目４０Ｒおよび左目４０Ｌが設けられ、右目４０Ｒおよび左目４０Ｌには眼球４２Ｒおよび４２Ｌが設けられる。右目４０Ｒおよび左目４０Ｌは、まぶたを閉じたり開いたりでき、眼球４２Ｒおよび４２Ｌはそれぞれ上下左右に回転可能である。つまり、右目４０Ｒおよび左目４０Ｌすなわちまぶたには１自由度が、眼球４２Ｒおよび４２Ｌには２自由度が設定されている。 A right eye 40R and a left eye 40L are provided on the front surface of the head 38, that is, a surface corresponding to a human face, and eyeballs 42R and 42L are provided on the right eye 40R and the left eye 40L. The eyelids of the right eye 40R and the left eye 40L can be closed and opened, and the eyeballs 42R and 42L can be rotated up, down, left and right, respectively. That is, the right eye 40R and the left eye 40L, that is, the eyelids, have one degree of freedom, and the eyeballs 42R and 42L have two degrees of freedom.

顔にはさらに、口４４が設けられていて、口４４は、閉じたり開いたりできる。つまり、口４４には１自由度が設定されている。 The face is further provided with a mouth 44, which can be closed and opened. That is, the mouth 44 is set with one degree of freedom.

胴体３２の、人間の胸の位置には、対話システム１０において人Ｈに聞かせるための発話を行うスピーカ４６および環境特に人Ｈの発話音声を聞き取るマイク４８が設けられる。 A speaker 46 for uttering speech to be heard by the person H in the dialogue system 10 and a microphone 48 for listening to the environment, especially the speech voice of the person H, are provided on the torso 32 at the chest position of the person.

なお、頭部３８の顔の額に相当する部分には動画または静止画を撮影できるカメラ５０が内蔵される。このカメラ５０は、対面する人Ｈを撮影でき、このカメラ５０からのカメラ信号（映像信号）は、環境カメラ１６（図１）と同様に、センサマネージャ１８のセンサＩ／Ｆを介してＣＰＵ２２ａに、入力されてもよい。 A camera 50 capable of capturing moving images or still images is incorporated in a portion of the head 38 corresponding to the forehead of the face. This camera 50 is capable of photographing a person H facing him, and a camera signal (video signal) from this camera 50 is sent to the CPU 22a via the sensor I/F of the sensor manager 18, as with the environmental camera 16 (FIG. 1). , may be entered.

図５はロボットＲに内蔵されてロボットＲの動作（発話やジェスチャなど）を制御するロボットコントローラ２４を示すブロック図である。この図５を参照して、ロボットコントローラ２４は、ＣＰＵ２０ａを含み、ＣＰＵ２０ａには、内部バス２４ｂを介して通信装置２４ｃが接続される。通信装置２４ｃは、たとえばネットワークインターフェースコントローラ（ＮＩＣ）などを含み、ＣＰＵ２０ａはこの通信装置２４ｃを介してセンサマネージャ１８、グループマネージャ２０、さらには外部のコンピュータや他のロボット（ともに図示せず）などと通信でき、それらの間でデータの授受を行うことができる。 FIG. 5 is a block diagram showing a robot controller 24 that is built in the robot R and that controls the actions of the robot R (utterance, gestures, etc.). Referring to FIG. 5, the robot controller 24 includes a CPU 20a, and a communication device 24c is connected to the CPU 20a via an internal bus 24b. The communication device 24c includes, for example, a network interface controller (NIC), etc., and the CPU 20a communicates with the sensor manager 18, the group manager 20, an external computer, other robots (both not shown), etc. via this communication device 24c. They can communicate and send and receive data between them.

ＣＰＵ２０ａにはさらに、内部バス２４ｂを介して、メモリ２４ｄが接続される。メモリ２４ｄはＲＯＭやＲＡＭを含む。グループマネージャ２０から送られる制御データやスクリプトデータがメモリ２４ｄに一時的に記憶される。 A memory 24d is further connected to the CPU 20a via an internal bus 24b. The memory 24d includes ROM and RAM. Control data and script data sent from the group manager 20 are temporarily stored in the memory 24d.

また、ロボット制御に必要なプログラム（ＯＳやセンサ信号取得プログラムなど）は、メモリ２４ｄに記憶される。ロボットコントローラ２４はメモリ２４ｄに記憶されたプログラムに従ってロボットＲの動作を制御する。 Programs (OS, sensor signal acquisition program, etc.) necessary for robot control are stored in the memory 24d. The robot controller 24 controls the motion of the robot R according to a program stored in the memory 24d.

つまり、ロボットコントローラ２４のＣＰＵ２０ａにはさらに、たとえばＤＳＰで構成されたアクチュエータ制御ボード２４ｅが接続され、このアクチュエータ制御ボード２４ｅは、以下に説明するように、ロボットＲの上述の各部に設けられたアクチュエータの動作を制御する。 That is, the CPU 20a of the robot controller 24 is further connected to an actuator control board 24e, which is composed of, for example, a DSP. controls the behavior of

胴体３２の２自由度の動き、すなわち前後左右の回転は、アクチュエータ制御ボード２４ｅを通してＣＰＵ２０ａが胴体アクチュエータ５２を制御するとこによって制御される。 The two-degree-of-freedom movement of the torso 32, namely forward, backward, left and right rotation, is controlled by the torso actuator 52 controlled by the CPU 20a through the actuator control board 24e.

右腕３４Ｒおよび左腕３４Ｌの２自由度の動き、すなわち前後左右の回転は、アクチュエータ制御ボード２４ｅを通してＣＰＵ２０ａが腕アクチュエータ５４を制御することによって制御される。 The two-degree-of-freedom movement of the right arm 34R and the left arm 34L, that is, the forward/backward/leftward/rightward rotation is controlled by the CPU 20a controlling the arm actuator 54 through the actuator control board 24e.

首３６すなわち頭部３８の３自由度の動き、すなわち前後左右の回転は、アクチュエータ制御ボード２４ｅを通してＣＰＵ２０ａが頭部アクチュエータ５６によって制御される。 Movement of the neck 36 or head 38 in three degrees of freedom, that is, rotation in the forward/backward/leftward/rightward direction is controlled by the head actuator 56 by the CPU 20a through the actuator control board 24e.

右目４０Ｒおよび左目４０Ｌすなわちまぶたの開閉動作は、アクチュエータ制御ボード２４ｅを通してＣＰＵ２０ａがまぶたアクチュエータ５８を制御することによって制御される。眼球４２Ｒおよび眼球４２Ｌの２自由度の動きすなわち前後左右の回転は、アクチュエータ制御ボード２４ｅを通してＣＰＵ２０ａが眼球アクチュエータ６０を制御することによって制御される。口４４の開閉動作は、アクチュエータ制御ボード２４ｅを通してＣＰＵ２０ａが口アクチュエータ６２を制御することによって制御される。 The right eye 40R and left eye 40L, ie, the opening and closing operations of the eyelids, are controlled by the CPU 20a controlling the eyelid actuator 58 through the actuator control board 24e. The two-degree-of-freedom movement of the eyeballs 42R and 42L, that is, the forward/backward/leftward/rightward rotation is controlled by the CPU 20a controlling the eyeball actuator 60 through the actuator control board 24e. The opening and closing operation of the mouth 44 is controlled by the CPU 20a controlling the mouth actuator 62 through the actuator control board 24e.

なお、図４に示すロボットＲのスピーカ４６がロボットコントローラ２４のＣＰＵ２４ａに接続される。ＣＰＵ２４ａは、グループマネージャ２０から与えられ、必要に応じてメモリ２４ｄに記憶されたスクリプトデータに従って、スピーカ４６から発声（発話）させる。 A speaker 46 of the robot R shown in FIG. 4 is connected to the CPU 24 a of the robot controller 24 . The CPU 24a causes the speaker 46 to speak (speak) according to the script data given from the group manager 20 and stored in the memory 24d as required.

このようなロボットコントローラ２４によって、ロボットＲの頭や腕は、対話システム１０において必要なとき、たとえばスクリプトで非言語動作が要求されているとき、必要な動きをするが、以下の説明では、各アクチュエータなどの具体的な制御は、上述の説明から容易に推測できるので、必ずしも説明しない。 With such a robot controller 24, the head and arms of the robot R move as required in the dialogue system 10, for example, when a non-verbal action is requested by a script. Specific control of actuators and the like can be easily guessed from the above description, and therefore will not necessarily be described.

図１に示すように、それぞれのロボットＲ１およびＲ２には、ロボットコントローラ２４と同様に内蔵したロボットセンサ２６が設けられる。ロボットセンサ２６は、ロボットＲ１およびＲ２のそれぞれの可動コンポーネントの状態を検知するための姿勢センサや加速度センサなどを含み、それらのセンサからのセンサ信号は、センサマネージャ１８に入力される。したがって、センサマネージャ１８は、ロボットセンサ２６からのセンサ信号に基づいて、ロボットＲ１およびＲ２の状態をセンシングすることができる。 As shown in FIG. 1, each robot R1 and R2 is provided with a built-in robot sensor 26 as well as a robot controller 24 . Robot sensors 26 include attitude sensors, acceleration sensors, and the like for detecting the state of each movable component of robots R1 and R2, and sensor signals from these sensors are input to sensor manager 18 . Therefore, sensor manager 18 can sense the states of robots R1 and R2 based on sensor signals from robot sensors 26 .

なお、図４に示すロボットＲのマイク４８やカメラ５０がロボットセンサ２６を介してセンサマネージャ１８に入力される。センサマネージャ１８は、マイク４８から取り込んだ音声データをメモリ１８ｄ（図２）に記憶し、必要に応じて、音声認識処理を実行する。センサマネージャ１８はまた、カメラ５０からのカメラ信号を処理して、対話場所１２の状況をセンシングする。 Note that the microphone 48 and camera 50 of the robot R shown in FIG. 4 are input to the sensor manager 18 via the robot sensor 26 . The sensor manager 18 stores voice data captured from the microphone 48 in the memory 18d (FIG. 2), and executes voice recognition processing as necessary. Sensor manager 18 also processes camera signals from camera 50 to sense the context of interaction location 12 .

なお、センサマネージャ１８は、図１の実施例では１つだけが図示されているが、２つ以上の任意数のセンサマネージャが設けられてもよく、その場合には、各センサマネージャはセンシング項目を分担することができる。 Although only one sensor manager 18 is shown in the embodiment of FIG. 1, any number of two or more sensor managers may be provided. can be shared.

同様に、必要なら、２以上のグループマネージャ２０を用いるようにしてもよいし、逆にセンサマネージャ１８およびグループマネージャ２０を１台のコンピュータで実現するようにしてもよい。 Similarly, if desired, two or more group managers 20 may be used, or conversely, the sensor manager 18 and group manager 20 may be implemented by one computer.

また、図１実施例の対話システム１０に用いられるロボットＲは図４を参照して上で説明したロボットに限定されるものではなく、少なくともスクリプトに従って発話できる機能があればよい。 Also, the robot R used in the dialog system 10 of the embodiment of FIG. 1 is not limited to the robot described above with reference to FIG. 4, and may at least have the function of speaking according to a script.

図６を参照して、図１の対話システム１０のグループマネージャ２０のＣＰＵ２０ａは、ダイアログデータベース２６（図１）からたとえば先に説明したようなダイアログデータ（スクリプトデータ）を読み込むなど、初期化を実行する。この図６の動作は、たとえばフレームレート程度の速度で繰り返し実行される。 Referring to FIG. 6, CPU 20a of group manager 20 of dialog system 10 of FIG. 1 performs initialization such as reading dialog data (script data) as described above from dialog database 26 (FIG. 1). do. The operation shown in FIG. 6 is repeatedly executed at a speed of about the frame rate, for example.

次のステップＳ３でＣＰＵ２０ａは、ダイアログのメイントピック（大トピック）を変更するかどうか判断する。メイントピックを変更するかどうかは、タイムスケジュールに従って変更する場合、所定時間経過したかどうか、などを判断することによって、このステップＳ３で決定される。なお、以下の実施例の具体的な説明では、メイントピック「旅行の話」のダイアログに従う場合を例に挙げて説明する。 In the next step S3, the CPU 20a determines whether or not to change the main topic (major topic) of the dialog. Whether or not to change the main topic is determined in this step S3 by judging whether or not a predetermined period of time has elapsed in the case of changing according to the time schedule. In the specific description of the embodiment below, the case of following the dialog of the main topic "travel story" will be described as an example.

なお、ステップＳ３で判断するメイントピックを変更する条件としては、他に、前回のメイントピックの変更から所定数Ｎ回（これは、同じメイントピックの話が続きすぎることによって、対話が退屈になるのを避けるために設定する、同一メイントピックの繰り返し回数の最大値である。）以上経過したとき、人Ｈからの応答発話が今対話中のメイントピックとは別のメイントピックのキーワードにヒットしたとき、人Ｈからの応答発話が所定回数認識できなかったとき、などが考えられる。 In addition, as a condition for changing the main topic determined in step S3, a predetermined number of times N times since the last change of the main topic (this is because the same main topic continues too long, making the dialogue boring). This is the maximum number of repetitions of the same main topic, which is set in order to avoid this.) When the above period has elapsed, the response utterance from person H hits a keyword of a main topic different from the main topic currently being spoken. and when the response utterance from the person H cannot be recognized a predetermined number of times.

ステップＳ３で“ＮＯ”を判断したときはそのまま、“ＹＥＳ”を判断したときはステップＳ５でメイントピックを変更して、次ダイアログ候補プール２３ａから読み出したスクリプトに従って、たとえばロボットＲ１が、人Ｈに対して、たとえば「休みの日にはどこへ行きたいですか？」のような質問文を発話する。ここで、「どこへ行く」というのが、「旅行の話」というメイントピックのサブトピックと考えることができる。このステップＳ７を実行するＣＰＵ２０ａは、質問文発話部として機能する。 If "NO" is determined in step S3, the main topic is changed in step S5. In response, a question such as "Where do you want to go on your day off?" is uttered. Here, "where to go" can be considered a subtopic of the main topic "travel story". The CPU 20a that executes this step S7 functions as a question sentence utterance unit.

ステップＳ９において、ＣＰＵ２０ａは、ステップＳ７でたとえばロボットＲ１が発話した質問文に対して人Ｈからの応答発話があったかどうか、センサマネージャ１６で検出したマイク１４からの音声データに基づいて、判断する。このステップＳ９を実行するＣＰＵ２０ａは、人からの応答発話の有無を判断する第１判断部として機能する。 In step S9, the CPU 20a determines, based on the voice data from the microphone 14 detected by the sensor manager 16, whether or not the person H has responded to the question uttered by the robot R1 in step S7. The CPU 20a that executes this step S9 functions as a first determination unit that determines whether or not there is a response utterance from a person.

応答発話があったと判断したとき、ＣＰＵ２０ａは、次のステップＳ１１では、ステップＳ９で検出した人Ｈからの応答発話が認識できたかどうか、すなわち、その応答発話がたとえばダイアログデータベース２２に予め設定しているキーワードにヒットしたかどうか、判断する。つまり、人Ｈの応答文の中に予め設定しているキーワードが含まれているかどうか、判断する。これは、マイク１４からの音声データを任意の音声認識技術を利用して処理することによって、簡単に実行することができる。 When it is determined that there is a response utterance, in the next step S11, the CPU 20a determines whether or not the response utterance from the person H detected in step S9 has been recognized. Determines whether or not there is a hit for a keyword that is In other words, it is determined whether or not the preset keyword is included in the response sentence of the person H. This can be easily done by processing the voice data from the microphone 14 using any voice recognition technique.

ただし、キーワードがヒットした場合であっても、人Ｈの応答文が複数のキーワードに同時にヒットしている場合には、このステップＳ１１では“ＮＯ”と判断するようにしている。どのキーワードで対話を進めていくべきか判断しにくいためである。このステップＳ１１（次のステップＳ１３を含むことがある）を実行するＣＰＵ２０ａは、人からの応答文がキーワードにヒットしたかどうかを判断する第２判断部として機能する。 However, even if a keyword is hit, if the response sentence of the person H hits a plurality of keywords at the same time, it is determined "NO" in step S11. This is because it is difficult to determine which keyword should be used to proceed with the dialogue. The CPU 20a that executes this step S11 (which may include the next step S13) functions as a second judgment unit that judges whether or not the response sentence from the person hits the keyword.

ステップＳ１３でＣＰＵ２０ａは、ステップＳ１１で検出したキーワードがネガティブな発言であるかどうか、判断する。ネガティブな発言とは、前述したとおり、質問文に対して回答を拒否しているかのような発言のことである。 In step S13, the CPU 20a determines whether the keyword detected in step S11 is a negative remark. A negative remark is, as described above, a remark as if refusing to answer a question.

ステップＳ１３で“ＮＯ”を判断したとき、次のステップＳ１５で、そのキーワードは現在対話中のメイントピックの中に定められているキーワードか、別のメイントピックの中に設定されているキーワードかを判断する。 If "NO" is determined in step S13, in the next step S15, it is determined whether the keyword is defined in the main topic currently being spoken or set in another main topic. to decide.

もし、このステップＳ１５で“ＹＥＳ”を判断したら、ステップＳ３に関連して説明したように、人Ｈからの応答発話が今対話中のメイントピックとは別のメイントピックのキーワードに及んだとき、という条件を充足することになるので、ステップＳ５に戻って、メイントピックの変更処理を実行した後、再度ステップＳ７に進む。 If "YES" is determined in step S15, as described in relation to step S3, when the response utterance from person H reaches a keyword of a main topic different from the main topic currently being spoken. , the process returns to step S5 to execute the main topic change process, and then the process proceeds to step S7 again.

ステップＳ１１でキーワードがヒットしたことを判断しかつステップＳ１５でそのキーワードが現在進行中のメイントピックのものであると判断したとき、続くステップＳ１７で、ＣＰＵ２０ａは、質問文を発話したロボット、この例ではロボットＲ１と別のロボット、ロボット２に、たとえば「やっぱり梅田かな」という事前フレーズを発話させる。ここで、事前フレーズとは、次のロボットＲ１（またはロボットＲ２）の発話を誘導する意味の発話文である。ただし、検出されたキーワードの発話を人Ｈが実際に意図していたら、そのときには、ロボットＲ２が発話した事前フレーズは単に傾聴感（ロボットＲが人Ｈの発話を傾聴しているという感覚）に貢献するに過ぎないが、人Ｈが意図していない事前フレーズであった場合、続くロボットＲ１からの返答の脈絡を作る効果がある。 When it is determined in step S11 that the keyword is hit and in step S15 the keyword is of the main topic currently in progress, in subsequent step S17, the CPU 20a determines whether the robot that uttered the question sentence, in this example Then, the robot R1 and another robot, the robot 2, are made to utter a preliminary phrase, for example, "As expected, it's Umeda." Here, the pre-phrase is an utterance sentence with the meaning of inducing the next robot R1 (or robot R2) to utter. However, if the person H actually intends to utter the detected keyword, then the preliminary phrase uttered by the robot R2 is simply a listening feeling (a feeling that the robot R is listening to the utterance of the person H). Although it only contributes, if the pre-phrase is not intended by the person H, it has the effect of creating a context for the subsequent reply from the robot R1.

つまり、事前フレーズは、いわば「話の振り」（対話や議論などが円滑に進行するように、話題を提供することを意味する語）の役目をする。この実施例では、ロボットＲ２がそのキーワード（この例では「梅田」）で話を振ったので、ロボットＲ１がそれを受け継いで、たとえば「梅田は便利だもんね」という発話をし、それによって人Ｈに、梅田の話になったことについて違和感を与えない効果がある。 In other words, the pre-phrase serves as a so-called "talking point" (a word that means providing a topic so that dialogue, discussion, etc., can proceed smoothly). In this embodiment, the robot R2 speaks with the keyword ("Umeda" in this example), so the robot R1 picks it up and says, for example, "Umeda is convenient, isn't it?" H has the effect of not giving a sense of incongruity to Umeda's discussion.

他に想定されている対話としては次の例１や例２などがある。
＜例１＞
ロボットＲ１：休みの日にはどこにいきたいですか？（ステップＳ７）
人Ｈ：一番は梅田かな（「梅田」と認識される）（ステップＳ１１）
ロボットＲ２：やっぱり梅田かな（ステップＳ１７）
ロボットＲ１：梅田は便利だもんね（ステップＳ１９）
＜例２＞
ロボットＲ１：休みの日にはどこにいきたいですか？（ステップＳ７）
人Ｈ：青梅だな（「お、梅田な」と認識される）（ステップＳ１１）
ロボットＲ２：やっぱり梅田かな（ステップＳ１７）
ロボットＲ１：梅田は便利だもんね（ステップＳ１９）
このように、ロボットＲ２による「やっぱり梅田かな」という事前フレーズの発話は、次のステップＳ１９において発話されるロボットＲ１の返答の脈絡になる。 Other assumed interactions include example 1 and example 2 below.
<Example 1>
Robot R1: Where would you like to go on your day off? (Step S7)
Person H: First is Umeda Kana (recognized as "Umeda") (step S11)
Robot R2: Umeda after all (step S17)
Robot R1: Umeda is convenient (step S19)
<Example 2>
Robot R1: Where would you like to go on your day off? (Step S7)
Person H: It's Ome (recognized as "O, Umeda") (step S11)
Robot R2: Umeda after all (step S17)
Robot R1: Umeda is convenient (step S19)
In this way, the utterance of the pre-phrase "As expected, Umeda kana" by the robot R2 becomes the context of the reply uttered by the robot R1 in the next step S19.

ここで、ステップＳ１７では、ステップＳ７で質問文を発話したロボットＲ１とは違うロボットＲ２に事前フレーズを発話させるようにし、さらにステップＳ１９でそれに続く認識応答発話（ロボットＲ２による事前フレーズを認識した上での発話）を別のロボットＲ１に発話させるようにした。つまり、２体のロボットＲ１およびＲ２に交互に、質問文、事前フレーズ、認識応答発話を行わせたが、順番は逆でもよい。さらには、質問文、事前フレーズ、認識応答発話を全て同じロボットＲ１またはＲ２に発話させるようにしてもよい。あるいは、図示しいてない、さらに他のロボット（Ｒ３）にステップＳ１９の認識応答発話を行わせるようにしてもよい。 Here, in step S17, the robot R2, which is different from the robot R1 that uttered the question sentence in step S7, is made to utter a preliminary phrase, and in step S19, the following recognition response utterance (after the robot R2 has recognized the preliminary phrase) is made to utter a preliminary phrase. ) is made to be uttered by another robot R1. In other words, the two robots R1 and R2 alternately made the question sentence, the preliminary phrase, and the recognition response utterance, but the order may be reversed. Furthermore, the same robot R1 or R2 may be made to utter all of the question sentence, preliminary phrase, and recognition response utterance. Alternatively, another robot (R3), not shown, may be made to perform the recognition response utterance of step S19.

なお、ステップＳ１７でたとえば「エキスポランド」という事前フレーズをロボットＲ２に発話させたときには、ステップＳ１９でたとえば「エキスポは人気だね」という認識応答発話をロボットＲ１（またはＲ３）にさせるようなダイアログも考えられる。 It should be noted that when the robot R2 is made to utter a pre-phrase such as "Expoland" in step S17, a dialogue may be considered in which the robot R1 (or R3) is made to make a recognition response utterance such as "Expo is popular" in step S19. be done.

また、ステップＳ１７でたとえば「そうだ、北海道があった」という事前フレーズをロボットＲ２に発話させたときには、ステップＳ１９でたとえば「北海道はカニがおすすめです」という認識応答発話をロボットＲ１（またはＲ３）にさせるようなダイアログも考えられる。 Further, when the robot R2 is made to utter a preliminary phrase, for example, "Yes, there was Hokkaido" in step S17, the recognition response utterance, for example, "Hokkaido recommends crab" is made to the robot R1 (or R3) in step S19. It is also conceivable to have a dialog that

このような事前フレーズに含まれる「梅田」、「エキスポランド」または「北海道」などは、「旅行の話」というメイントピックの「どこへ行くか」というサブトピックのさらにサブトピックであると考えられる。 "Umeda", "Expoland", or "Hokkaido" included in such preliminary phrases are considered to be further subtopics of the subtopic "where to go" of the main topic "travel story".

なお、ステップＳ１７を実行するＣＰＵ２０ａは、事前フレーズ発話部として機能する。 Note that the CPU 20a that executes step S17 functions as a pre-phrase utterance unit.

ステップＳ１９でロボットＲ２に認識応答発話をさせた後、ＣＰＵ２０ａは、次のステップＳ２１で、対話を終了するかどうか、判断する。ここでは、たとえば、対話の開始から一定時間（たとえば１５分）経過したこと、カメラ１６（図１）の映像によると人Ｈが不在になったこと、などの対話を終了する条件を判断する。 After having the robot R2 make the recognition response utterance in step S19, the CPU 20a determines whether or not to end the dialogue in the next step S21. Here, conditions for terminating the dialogue are determined, for example, that a certain period of time (for example, 15 minutes) has passed since the start of the dialogue, or that the person H is absent according to the image of the camera 16 (FIG. 1).

終了する場合は、ステップＳ２３で終了処理をした後、終了する。終了処理は、たとえば対話のログを保存するなどの処理を含む。 If it is to be terminated, the process is terminated after termination processing is performed in step S23. Termination processing includes, for example, processing such as saving a log of dialogue.

終了しない場合には、先のステップＳ３に戻る。 If not, the process returns to step S3.

先のステップＳ９で“ＮＯ”の場合、すなわちマイク１４を通して人Ｈの返答発話を取得できなかった場合、ＣＰＵ２０ａは、不応答ダイアログプール２３ｂから、次のステップＳ２３で現在進行中のメイントピックの中でキーワードをランダムに選択する。そして、ステップＳ２５で、質問文を発話したロボットＲ１とは異なるロボットＲ２によって、その選択したキーワードに従った代理応答文（たとえば、「僕は、やっぱり梅田かな」）を発話させる。ここでは、先のステップＳ１７の事前フレーズとは異なり、たとえば「僕は」という発話主体を表す語を発話させることによって、ロボットＲ２の主体的な代理応答文であることをはっきりさせる。このステップＳ２５（ステップＳ２３を含むことがある。）を実行するＣＰＵ２０ａは、代理応答文発話部部として機能する。 If "NO" in the previous step S9, that is, if the response utterance of the person H could not be obtained through the microphone 14, the CPU 20a selects from the non-response dialogue pool 23b the main topic currently in progress in the next step S23. Randomly select a keyword with . Then, in step S25, the robot R2, which is different from the robot R1 that uttered the question sentence, is caused to utter a proxy response sentence (for example, "I am Umeda kana after all") according to the selected keyword. Here, unlike the pre-phrasing in step S17, for example, by uttering a word representing the subject of the utterance, such as "I am", it is clarified that it is a subjective substitute response sentence of the robot R2. The CPU 20a that executes step S25 (which may include step S23) functions as a proxy response sentence utterance unit.

その後、ステップＳ２９で、ＣＰＵ２０ａは、先のステップＳ１９と同じような認識応答発話をロボットＲ１に行わせる。ただし、ステップＳ７、Ｓ２７およびＳ２９がすべて同じロボットＲ１またはＲ２であってもよいことは、先に述べたとおりである。 After that, in step S29, the CPU 20a causes the robot R1 to make a recognition response utterance similar to that in step S19. However, as described above, steps S7, S27 and S29 may all be performed by the same robot R1 or R2.

ステップＳ２７においてロボットＲ２に代理応答文を発話させることによって、人Ｈからの応答発話が無くても、取り敢えず対話が破綻することはなく、ステップＳ２９での認識応答発話によって、人Ｈの対話意欲の回復を期待することができる。たとえば、ステップＳ７での質問文に対して人Ｈが急には返答できない場合であっても、ステップＳ２７の代理応答文によってトリガされ人Ｈが応答文を着想する可能性がある。この場合、人Ｈは、そのときのメイントピックたとえば「旅行の話」について対話を継続することができる。その意味では、ステップＳ２７の代理応答文はステップＳ１７での事前フレーズと同様の効果（対話の脈絡を作る）を奏することができる。 By causing the robot R2 to utter a substitute response sentence in step S27, even if there is no response utterance from the person H, the dialogue will not be broken for the time being, and the recognition response utterance in step S29 will increase the willingness of the person H to have a dialogue. recovery can be expected. For example, even if the person H cannot immediately reply to the question in step S7, there is a possibility that the person H will come up with a response sentence triggered by the proxy response sentence in step S27. In this case, person H can continue the dialogue on the main topic at the time, for example, "talk about travel." In that sense, the substitute response sentence in step S27 can have the same effect as the preliminary phrase in step S17 (creating a dialogue context).

なお、ステップＳ１３で“ＹＥＳ”を判断しステップＳ２５でキーワードを変更した回数が一定回数以上になったときステップＳ３、Ｓ５でメイントピックを変更するようにしてもよい。 It should be noted that the main topic may be changed in steps S3 and S5 when "YES" is determined in step S13 and the number of times the keyword has been changed in step S25 exceeds a predetermined number of times.

ステップＳ２９の後、先のステップＳ２１に進んで、終了かどうか判断する。 After step S29, the process advances to the previous step S21 to determine whether or not the process is completed.

ステップＳ１１で“ＮＯ”を判断したとき、ＣＰＵ２０ａは、ステップＳ３１において、不応答ダイアログプール２３ｂから選択した曖昧な応答文（たとえば、「とっか行きたいね」）をたとえばロボットＲ２に発話させる。この曖昧な応答は、ステップＳ２７で代理応答が「ロボットＲ２が人Ｈの代わりにロボットＲ１の質問に対して答える」という意味を持つのに対し、「ロボットＲ１やＲ２が人Ｈの発話に対して答える」という意味を持つ。つまり、ステップＳ９で人Ｈからの応答発話は検出したけれどもステップＳ１１でその応答発話文を認識することができなかったとき、そのままであれば対話が破綻することがあるが、曖昧な応答文をロボットに発話させることによって、人Ｈの次の発話を引き出すことができ、それによって対話の破綻を回避できる可能性が生まれる。 When "NO" is determined in step S11, the CPU 20a, in step S31, causes the robot R2 to utter an ambiguous response sentence (eg, "I want to go somewhere else") selected from the non-response dialog pool 23b. This ambiguous response has the meaning that "the robot R2 answers the question of the robot R1 on behalf of the person H" in step S27, whereas "the robot R1 or R2 answers the utterance of the person H". It has the meaning of "answer". In other words, when the response utterance from the person H is detected in step S9 but the response utterance sentence cannot be recognized in step S11, an ambiguous response sentence may be generated, although the dialogue may collapse if the response sentence is not recognized in step S11. By making the robot speak, it is possible to draw out the next speech of the person H, thereby avoiding the breakdown of the dialogue.

このステップＳ３１を実行するＣＰＵ２０ａは、曖昧応答文発話部として機能し、そして、ステップＳ３１の後、ステップＳ２１に進む。 The CPU 20a that executes this step S31 functions as an ambiguous response sentence utterance unit, and after step S31 proceeds to step S21.

なお、上述の説明ではメイントピックについては時間の経過や、人Ｈの発話や発話なしなどで変更できることを説明したが、サブトピックについては、次のような場合に変更することができる。 In the above description, it was explained that the main topic can be changed according to the passage of time, the person H speaking or not speaking, etc. However, the subtopic can be changed in the following cases.

人Ｈの発話が、現在のサブトピックと同じサブトピック内に前の発話に関連度の高い未発話のダイアログ（シナリオ）がある場合、その関連度の高いサブトピックに移動する。たとえば、各シナリオに予め登録しておくキーワードと距離が近い語（言葉）が含まれているかどうか判定する。距離は、たとえばＷｏｒｄ２Ｖｅｃ等の手法を用いて評価する。ただし、そのような言葉が含まれていても、同様に他のキーワードと近い別の言葉が含まれていたら、それを割り引いて評価する。 If there is an unspoken dialogue (scenario) highly relevant to the previous utterance within the same subtopic as the current subtopic, the utterance of person H moves to that highly relevant subtopic. For example, it is determined whether or not each scenario includes a word (word) that is close to a keyword that has been registered in advance. The distance is evaluated using a technique such as Word2Vec. However, even if such a word is included, if another word similarly similar to other keywords is included, it will be discounted and evaluated.

ステップＳ１１でキーワードにヒットせず、ステップＳ３１へ進む回数が所定回数Ｎ以上になったときに、サブトピックを変更するようにしてもよい。 The subtopic may be changed when no keyword is hit in step S11 and the number of times of proceeding to step S31 reaches a predetermined number N or more.

なお、上述の実施例では、各ロボットＲ１およびダイアログデータベース２２に予め蓄積しておくようにした。しかしながら、このデータベース２２に代えて、たとえばネットから、必要なダイアロク（スクリプトデータ）を逐次グループマネージャ２０に供給するようにしてもよい。 In the above-described embodiment, each robot R1 and dialog database 22 are stored in advance. However, instead of the database 22, the necessary dialogs (script data) may be sequentially supplied to the group manager 20 from the net, for example.

さらに、上述の実施例は、物理的なエージェントであるロボットを用いた対話システムであるが、この発明は、そのような物理的なエージェントだけでなく、たとえばディスプレイの画面上に表示されるアバタないしキャラクタのようなエージェントを用いることも可能である。この場合、図１のロボットコントローラ２４やロボットセンサ２６は、そのようなアバタやキャラクタを表示するためのディスプレイコントローラ（図示せず）に代えられ、対話場所はそのエージェントを表示しているディスプレイの近傍が想定できる。 Furthermore, the above-described embodiment is a dialogue system using a robot that is a physical agent, but the present invention can be applied not only to such a physical agent, but also to an avatar or robot displayed on the screen of a display. It is also possible to use agents such as characters. In this case, the robot controller 24 and the robot sensor 26 of FIG. 1 are replaced by a display controller (not shown) for displaying such avatars and characters, and the dialogue location is near the display displaying the agent. can be assumed.

さらに、上述のロボットによるエージェントやＣＧによるエージェントに代えて、音声だけのエージェントも人との対話のためのエージェントとして採用することができる。たとえば、カーナビのスピーカが車両の左右についているとして、その左側から聞こえてくる声の主をＲ１（実施例のロボットＲ１に相当する。）とし、右側の声の主をＲ２（実施例のロボットＲ２に相当する。）とすることが考えられる。この場合、対話場所は車の中ということになり、図１のロボットコントローラ２４は、そのような音声エージェントの発話を制御するオーディオコントローラ（図示せず）に代えられる。 Furthermore, in place of the robot agent or the CG agent described above, an agent that only speaks can also be employed as an agent for dialogue with people. For example, if car navigation speakers are installed on the left and right sides of the vehicle, the voice heard from the left side is R1 (corresponding to the robot R1 in the embodiment), and the voice from the right side is R2 (the robot R2 in the embodiment). equivalent to ). In this case, the place of interaction is in a car, and the robot controller 24 of FIG. 1 is replaced by an audio controller (not shown) that controls the speech of such a voice agent.

つまり、この発明は、任意のエージェントを用いた人との対話システムである。 In other words, the present invention is a dialogue system with humans using arbitrary agents.

１０ …対話システム
１２ …対話場所
Ｒ１、Ｒ２ …ロボット
１８ …センサマネージャ
２０ …グループマネージャ
２２ …ダイアログデータベース
２４ …ロボットコントローラ 10 ... dialogue system 12 ... dialogue place R1, R2 ... robot 18 ... sensor manager 20 ... group manager 22 ... dialogue database 24 ... robot controller

Claims

A dialogue system comprising at least one agent at a dialogue location, wherein the agent at the dialogue location interacts with a person according to a dialogue,
a question sentence utterance unit that causes the agent to utter a question sentence;
A first determination unit that determines whether or not the person has uttered a response sentence to the question sentence,
a second determination unit that determines whether the response sentence hits a predetermined keyword when the first determination unit determines that the response sentence is uttered;
When the second determination unit determines that the response sentence hits the specific keyword, the agent utters a pre-phrase that invites the agent to next utterance after the response from the person. A dialogue system, comprising: a pre-phrase utterance unit; and a recognition response sentence utterance unit that, following utterance of the pre-phrase by the pre-phrase utterance unit, utters a recognition response sentence that creates a dialogue context for the pre-phrase.

A dialogue system comprising at least one agent at a dialogue location, wherein the agent at the dialogue location interacts with a person according to a dialogue,
a question sentence utterance unit that causes the agent to utter a question sentence;
A first determination unit that determines whether or not the person has uttered a response sentence to the question sentence,
proxy response, wherein the first determination unit causes the agent to utter a proxy response sentence for responding to the question sentence on behalf of the person when the first determination unit does not determine that the response sentence has been uttered. A dialogue system comprising: a sentence utterance unit; and a recognition response sentence utterance unit that, following utterance of a proxy response sentence by the proxy response sentence utterance unit, utters a recognition response sentence that creates a dialogue context for the proxy response sentence.

a computer of a dialogue system comprising at least one agent at a dialogue place, where the agent interacts with a person according to a dialog at the dialogue place;
a question sentence utterance unit that causes the agent to utter a question sentence;
A first determination unit that determines whether or not the person has uttered a response sentence to the question sentence,
a second determination unit that determines whether the response sentence hits a predetermined keyword when the first determination unit determines that the response sentence is uttered;
When the second determination unit determines that the response sentence hits the specific keyword, the agent utters a pre-phrase that invites the agent to next utterance after the response from the person. A dialogue system program that functions as a pre-phrase utterance unit, and a recognition response sentence utterance unit that utters a recognition response sentence that creates a dialogue context for the pre-phrase following the utterance of the pre-phrase by the pre-phrase utterance unit. .

a computer of a dialogue system comprising at least one agent at a dialogue place, where the agent interacts with a person according to a dialog at the dialogue place;
a question sentence utterance unit that causes the agent to utter a question sentence;
A first determination unit that determines whether or not the person has uttered a response sentence to the question sentence,
proxy response, wherein the first determination unit causes the agent to utter a proxy response sentence for responding to the question sentence on behalf of the person when the first determination unit does not determine that the response sentence has been uttered. A dialogue system program that functions as a sentence utterance unit, and a recognition response sentence utterance unit that, following the utterance of the proxy response sentence by the proxy response sentence utterance unit, utters a recognition response sentence that creates a dialogue context for the proxy response sentence. .