JP2011097531A

JP2011097531A - System for continuing listening interaction

Info

Publication number: JP2011097531A
Application number: JP2009252276A
Authority: JP
Inventors: Tomoko Yonezawa; 朋子米澤; Yuichi Kamiyama; 祐一神山; Hirotake Yamazoe; 大丈山添; Shinji Abe; 伸治安部
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2009-11-02
Filing date: 2009-11-02
Publication date: 2011-05-12
Anticipated expiration: 2029-11-02
Also published as: JP5407069B2

Abstract

PROBLEM TO BE SOLVED: To provide a video phone for continuing an interaction between persons handicapped in communication. SOLUTION: A listening interaction continuation system 100 includes a monitor 16a, a PC 14a to which a microphone 20a and a monitor camera 22a are connected, and a robot 10a including a belly camera 12a. In the PC 14a, the action of a user A is determined based on an image of the user A captured by the monitor camera 22a and the belly camera 12a, and the sound of the user A collected by the microphone 20a, and stored in a memory. In the PC 14a, a condition of the user A is recognized based on action data for a first predetermined time. When the condition of the user is recognized as "active talk monitor", the PC 14a gives an operating instruction to the robot 10a to perform simulated listening to the user A. COPYRIGHT: (C)2011,JPO&INPIT

Description

この発明は、傾聴対話持続システムに関し、特にたとえば、テレビ電話を利用して対話を行う、傾聴対話システムに関する。 The present invention relates to a listening dialog sustaining system, and more particularly to a listening dialog system that performs a dialog using, for example, a videophone.

特許文献１に開示されている、対話型アノテーションシステムでは、認知症患者である被介護者が利用するテレビ電話が、ネットワークを介して介護者が利用するコンピュータと接続される。ネットワークにはコンピュータおよびテレビ電話の間で、コマンドの転送を行うサーバが接続され、さらにサーバには写真の画像データが蓄えられるデータベースが結合される。また、介護者は、コンピュータによってサーバのURLを開いて写真を指定することで、相手（被介護者）が利用するテレビ電話に表示させることができる。これにより、介護の一環として、テレビ電話を利用して介護者と被介護者とが会話をすることができる。
特開２００７−１５０９５５号公報［H04N 7/14］ In the interactive annotation system disclosed in Patent Document 1, a videophone used by a care recipient who is a dementia patient is connected to a computer used by a caregiver via a network. A server for transferring commands between the computer and the videophone is connected to the network, and a database for storing picture image data is coupled to the server. Also, the caregiver can display the videophone used by the partner (caregiver) by opening the URL of the server with a computer and designating the photo. Thereby, as a part of care, a caregiver and a cared person can have a conversation using a videophone.
JP 2007-150955 A [H04N 7/14]

近年、軽度認知症患者や、高齢者および言語障害のある人の話を聴き、情緒的なサポートを行う傾聴活動が注目されている。そして、傾聴活動の一環として、傾聴ボランティアが特許文献１の介護者として、軽度認知症患者などの被介護者と会話するようになっている。しかし、傾聴ボランティアの人数は、話し相手を求める介護者の人数に対して少ない。そこで、軽度認知症患者などは対話を行うことも可能であるため、話し相手を求める軽度認知症患者同士で対話を行う活動が進められている。 In recent years, attention activities have been focused on listening to stories of patients with mild dementia, elderly people, and people with speech impairments and providing emotional support. As a part of the listening activity, the listening volunteer is talking with a care recipient such as a patient with mild dementia as a caregiver of Patent Document 1. However, the number of listening volunteers is smaller than the number of caregivers seeking a conversation partner. Therefore, since patients with mild dementia and the like can also have a dialogue, activities for conducting a dialogue between patients with mild dementia who seek a talking partner are being promoted.

ところが、軽度認知症患者同士の対話では、軽度認知症患者が対話に対して集中力を欠如してしまい、対話が持続しないことが多々ある。 However, in a dialogue between patients with mild dementia, the patient with mild dementia often lacks concentration on the dialogue, and the dialogue often does not continue.

それゆえに、この発明の主たる目的は、新規な、傾聴対話持続システムを提供することである。 Therefore, the main object of the present invention is to provide a novel listening dialogue sustaining system.

この発明の他の目的は、コミュニケーション障害のある者同士の対話を持続させる、傾聴対話持続システムを提供することである。 Another object of the present invention is to provide a listening dialog sustaining system that maintains a dialog between persons with communication disabilities.

この発明は、上記の課題を解決するために、以下の構成を採用した。なお、括弧内の参照符号および補足説明等は、この発明の理解を助けるために記述する実施形態との対応関係を示したものであって、この発明を何ら限定するものではない。 The present invention employs the following configuration in order to solve the above problems. The reference numerals in parentheses, supplementary explanations, and the like indicate the corresponding relationship with the embodiments described in order to help understanding of the present invention, and do not limit the present invention.

第１の発明は、第１カメラおよびマイクを含むテレビ電話機およびロボットを含む、傾聴対話持続システムであって、第１カメラによって撮影された画像およびマイクによって集音された音声に基づいて、ユーザの行動を判定する判定手段、判定手段によって判定された第１所定時間分の行動から、ユーザの状態を認識する認識手段、および認識手段によって認識されたユーザの状態に基づいて、対話を持続させるようにロボットを動作させる動作付与手段を備える、傾聴対話持続システムである。 A first invention is a listening dialogue sustaining system including a video phone and a robot including a first camera and a microphone, and based on an image captured by the first camera and a sound collected by the microphone, the user's Based on the determination means for determining the action, the recognition means for recognizing the user's state from the action for the first predetermined time determined by the determination means, and the conversation based on the user's state recognized by the recognition means It is a listening dialogue continuous system provided with the operation | movement provision means to operate a robot.

第１の発明では、傾聴対話持続システム（１００）は、モニタ（１６ａ，１６ｂ）の近くに設けられる第１カメラ（２２ａ，２２ｂ）およびユーザが発話する音声を集音するマイク（２０ａ，２０ｂ）と接続されるテレビ電話機（１４ａ，１４ｂ）と、ぬいぐるみ型のロボット（１０ａ，１０ｂ，１４ａ，１４ｂ）とを含む。 In the first invention, the listening dialogue sustaining system (100) includes a first camera (22a, 22b) provided near the monitor (16a, 16b) and a microphone (20a, 20b) that collects a voice spoken by the user. And a stuffed toy robot (10a, 10b, 14a, 14b).

判定手段（６４，Ｓ６５，Ｓ６７）は、第１カメラによって撮影された画像に対して、所定のテンプレートマッチング処理や顔認識処理を加えた結果と、マイクによって集音された音声の音声レベルとに基づいて、ユーザの行動を判定する。認識手段（６４，Ｓ７５−Ｓ８１）は、たとえば３０秒である第１所定時間分の行動に基づいて、ユーザの状態を認識する。動作付与手段（６４，Ｓ１７１，Ｓ１７３，Ｓ１７７，Ｓ１７９）は、ユーザの状態に基づいて、たとえばユーザの発話を疑似傾聴したり、ユーザの発話を調節したり、ユーザの注意を引きつけたりするようにロボットを動作させることで、対話を持続させる。 The determination means (64, S65, S67) uses a result obtained by performing predetermined template matching processing and face recognition processing on the image captured by the first camera, and the sound level of the sound collected by the microphone. Based on this, the user's behavior is determined. The recognition means (64, S75-S81) recognizes the state of the user based on the action for the first predetermined time that is, for example, 30 seconds. The action giving means (64, S171, S173, S177, S179), for example, pseudo-listens the user's utterance, adjusts the user's utterance, or attracts the user's attention based on the user's state. The conversation is sustained by operating the robot.

第１の発明によれば、ユーザの状態に応じてロボットが対話を持続させるように動作するため、コミュニケーション障害のある者同士の対話を持続させることができる。 According to the first aspect, since the robot operates so as to maintain the dialogue according to the user's state, the dialogue between persons with communication disabilities can be sustained.

第２の発明は、第１の発明に従属し、認識手段は、対話に対する積極的および非積極的を認識する積極性認識手段を含み、動作付与手段は、積極性認識手段の認識結果に基づき、ユーザに対する疑似傾聴をロボットに行わせる傾聴動作付与手段を含む。 A second invention is dependent on the first invention, and the recognition means includes a positive recognition means for recognizing positive and non-positive actions, and the action giving means is based on the recognition result of the positive recognition means. A listening operation giving means for causing the robot to perform a pseudo-listening with respect to.

第２の発明では、積極性認識手段（６４，Ｓ７５）は、たとえばユーザが相手を見る時間（ＷＴ）、前傾視線の時間（ＦＴ）および相槌の頻度（ＲＦ）に基づいて、対話に対する積極的（アクティブ状態）および非積極的（パッシブ状態）を認識する。傾聴動作付与手段（６４，Ｓ１７１，Ｓ１７３，Ｓ２０１，Ｓ２２５，Ｓ２６５，Ｓ２８５，Ｓ３０５，Ｓ３６７，Ｓ４３５，Ｓ４４１，Ｓ４４３）は、たとえばユーザが対話に対して積極的である場合に、ユーザの発話を疑似傾聴するようにロボットを動作させる。 In the second invention, the aggressiveness recognition means (64, S75) is configured to actively respond to the dialogue based on, for example, the time when the user looks at the other party (WT), the forward tilt line of sight time (FT), and the frequency of conflict (RF). Recognize (active state) and inactive (passive state). The listening motion giving means (64, S171, S173, S201, S225, S265, S285, S305, S367, S435, S441, S443) simulates the user's utterance, for example, when the user is active in dialogue. Move the robot to listen.

第２の発明によれば、ユーザは、ロボットの疑似傾聴によって、自身の話を聴いてもらっているように感じることができるため、対話が持続するようになる。 According to the second aspect of the invention, the user can feel as if he / she is listening to his / her story by the pseudo-listening of the robot, so that the dialogue continues.

第３の発明は、第１の発明または第２の発明に従属し、認識手段は、対話における聴取側状態および話手側状態を認識する話者状態認識手段をさらに含み、ユーザまたは相手の発声時間を計測する計測手段をさらに備え、動作付与手段は、計測手段によって計測された発声時間に基づき、ユーザの発話を制御するように、ロボットを動作させる発話制御動作付与手段をさらに含む。 A third invention is dependent on the first invention or the second invention, and the recognition means further includes a speaker state recognition means for recognizing a listener side state and a speaker side state in a dialog, and the utterance of the user or the other party Measuring means for measuring time is further provided, and the action giving means further includes utterance control action giving means for operating the robot so as to control the user's utterance based on the utterance time measured by the measuring means.

第３の発明では、話者状態認識手段（６４，Ｓ７７）は、ユーザの発話の有無に基づいて、対話における傾聴（リッスン状態）および発話（トーク状態）を認識する。計測手段（６４，Ｓ１４１，Ｓ１４３，Ｓ１４５）は、たとえば音声レベルが所定値以上であるとき、ユーザまたは相手の発声時間を計測する。発話制御動作付与手段（６４，Ｓ２０７，Ｓ２２７，Ｓ２９７，Ｓ３１９）は、たとえばとちらかのユーザが一方的に話している状態で発声時間が閾値以上になったとき、ユーザの発話を制御するように、ロボットを動作させる。 In the third invention, the speaker state recognition means (64, S77) recognizes the listening (listen state) and the utterance (talk state) in the dialogue based on the presence or absence of the user's utterance. The measuring means (64, S141, S143, S145) measures the utterance time of the user or the other party when, for example, the sound level is a predetermined value or more. The utterance control operation giving means (64, S207, S227, S297, S319) controls the user's utterance when the utterance time exceeds a threshold value, for example, in a state where one of the users is speaking unilaterally. Next, operate the robot.

第３の発明によれば、発話が制御されることでお互いがバランス良く発話するようになるため、対話が持続するようになる。 According to the third invention, since the utterances are controlled, the utterances are uttered in a well-balanced manner, so that the dialogue continues.

第４の発明は、第３の発明に従属し、発話制御手段は、積極性認識手段によって積極的と認識され、かつ話者状態認識手段によって話手状態と認識される積極的発話状態であり、かつ発声時間が閾値以上になったとき、ユーザの発話が抑制されるように、ロボットを動作させる発話抑制動作付与手段を含む。 A fourth invention is dependent on the third invention, and the utterance control means is an active utterance state that is recognized as active by the aggressiveness recognition means and is recognized as a speaker state by the speaker state recognition means, And the speech suppression operation | movement provision part which operates a robot is included so that a user's speech may be suppressed when speech time becomes more than a threshold value.

第４の発明では、発話抑制動作付与手段（６４，Ｓ２９７，Ｓ４０５，Ｓ４１１，Ｓ４１７，Ｓ４２３，Ｓ４２５）は、ユーザが対話に対して積極的に発話している場合に、発声時間が閾値以上になると、ユーザの発話を抑制するように、ロボットを動作させる。 In the fourth invention, the utterance suppression operation giving means (64, S297, S405, S411, S417, S423, S425) is configured such that the utterance time exceeds the threshold when the user is actively speaking. Then, the robot is operated so as to suppress the user's speech.

第４の発明によれば、ユーザが一方的に話しすぎている場合には、発話が抑制される。 According to the fourth invention, when the user is unilaterally speaking, the utterance is suppressed.

第５の発明は、第４の発明に従属し、発話抑制動作付与手段は、ユーザの注意を誘導するように、ロボットを動作させる注意誘導手段を含む。 A fifth invention is dependent on the fourth invention, and the utterance suppression operation giving means includes attention guiding means for operating the robot so as to guide the user's attention.

第５の発明では、注意誘導手段（６４，Ｓ４２５，Ｓ４５５，Ｓ４６１）は、たとえばユーザと相手が表示されるモニタ（１６）とを交互に見るように、ロボットを動作させることでユーザの注意を誘導する。 In the fifth aspect of the invention, the attention guiding means (64, S425, S455, S461), for example, operates the robot so that the user and the monitor (16) on which the other party is displayed are alternately viewed so as to alert the user. Induce.

第５の発明によれば、一方的に話すユーザの注意が誘導されるため、発話が抑制される。また、ユーザの注意が相手に誘導される場合には、相手が発話する機会を得ることができ、対話がより持続するようになる。 According to the fifth aspect, since the user's unilaterally speaking user's attention is induced, the utterance is suppressed. In addition, when the user's attention is guided to the other party, the other party can have an opportunity to speak, and the dialogue is more sustained.

第６の発明は、第３の発明ないし第５の発明のいずれかに従属し、発話制御手段は、積極性認識手段によって積極的と認識され、かつ話者状態認識手段によって聴取側状態と認識される積極的聴取側状態であるとき、ユーザの発話が促進されるように、ロボットを動作させる発話促進動作付与手段をさらに含む。 The sixth invention is dependent on any one of the third to fifth inventions, and the utterance control means is recognized as positive by the positiveness recognition means and is recognized as the listening state by the speaker state recognition means. In the active listening side state, the apparatus further includes an utterance promoting action giving means for operating the robot so that the user's utterance is promoted.

第６の発明では、発話促進動作付与手段（６４，Ｓ２２７，Ｓ３１９，Ｓ４７１）は、ユーザが積極的に相手の話を聴き取っている（傾聴している）場合に、たとえば相手の発声時間が閾値以上になれば、ユーザの発話が促進されるように、ロボットを動作させる。 In the sixth invention, when the user actively listens to (listens to) the other party's story, the utterance promoting action giving means (64, S227, S319, S471), for example, If the threshold value is exceeded, the robot is operated so that the user's speech is promoted.

第６の発明によれば、ユーザが相手の話を積極的に傾聴している状態が長く続けば、発話が促進される。 According to the sixth aspect of the invention, speech is promoted if the user is actively listening to the other party for a long time.

第７の発明は、第２の発明ないし第５の発明のいずれかに従属し、動作付与手段は、積極性認識手段によって非積極的と認識されたとき、ユーザを対話に参加させるように、ロボットを動作させる参加動作付与手段をさらに含む。 A seventh invention is according to any one of the second to fifth inventions, and the motion giving means causes the user to participate in the dialogue when recognized as inactive by the positiveness recognition means. It further includes a participation action giving means for operating the.

第７の発明では、参加動作付与手段（６４，Ｓ１７７，Ｓ２３１，Ｓ３１９，Ｓ３２１）は、ユーザが対話に非積極的であれば、ユーザを対話に参加させるように、ロボットを動作させる。 In the seventh invention, the participation operation giving means (64, S177, S231, S319, S321) operates the robot so as to cause the user to participate in the dialogue if the user is inactive in the dialogue.

第７の発明によれば、対話に非積極的なユーザを対話に参加させることで、対話を持続させる。 According to the seventh aspect of the present invention, the dialog is maintained by allowing users who are not active in the dialog to participate in the dialog.

第８の発明は、第７の発明に従属し、参加動作付与手段は、ユーザの注意を引きつけるようにロボットを動作させる注意引付手段を含む。 An eighth invention is according to the seventh invention, and the participation motion giving means includes attention attracting means for operating the robot so as to attract the user's attention.

第８の発明では、注意引付手段（６４，Ｓ３２１，Ｓ３３５，Ｓ３４１，Ｓ３４３）は、たとえば、ユーザを見たり、話しかけたりするようにロボットを動作させることで、ユーザの注意を引きつける。 In the eighth invention, the attention attracting means (64, S321, S335, S341, S343) attracts the user's attention by, for example, operating the robot so as to see or talk to the user.

第８の発明によれば、対話に非積極的なユーザの注意を引きつけることで、対話に参加させる。 According to the eighth aspect of the invention, the user is allowed to participate in the dialogue by attracting the attention of the non-aggressive user.

第９の発明は、第７の発明または第８の発明に従属し、参加動作付与手段は、ユーザの発話を促すようにロボットを動作させる、発話促し手段をさらに含む。 A ninth invention is dependent on the seventh invention or the eighth invention, and the participation motion giving means further includes speech prompting means for operating the robot so as to prompt the user to speak.

第９の発明では、発話促し手段（６４，Ｓ３１９，Ｓ４７１）は、たとえばユーザに対して質問をするようにロボットを動作させることで、ユーザの発話を促す。 In the ninth invention, the speech prompting means (64, S319, S471) prompts the user to speak by operating the robot so as to ask the user a question, for example.

第９の発明によれば、対話に非積極的なユーザに発話させることで、対話に参加させる。 According to the ninth aspect of the present invention, the user who is not active in the dialogue is allowed to speak to participate in the dialogue.

第１０の発明は、第１の発明ないし第９の発明のいずれかに従属し、ロボットが接続されるネットワーク、ネットワークに接続されるサーバ、判定手段によって判定されたユーザの行動をサーバに送信する送信手段、およびサーバから相手ユーザの行動を取得する取得手段をさらに備え、動作付与手段は、取得手段によって取得された相手ユーザの行動とユーザの行動とに基づき、相手ユーザに対する疑似傾聴をロボットに行わせる相手傾聴動作付与手段をさらに含む。 A tenth invention is according to any one of the first to ninth inventions, and transmits a network to which the robot is connected, a server connected to the network, and a user action determined by the determination means to the server. The transmission means and an acquisition means for acquiring the other user's action from the server are further provided, and the action giving means gives the robot a pseudo-listening to the other user based on the other user's action and the user's action acquired by the acquisition means. It further includes an opponent listening operation giving means to be performed.

第１０の発明では、ネットワーク（２００）には、たとえば無線ＬＡＮなどでロボットおよびサーバ（２４）が接続される。送信手段（６４，Ｓ６９）は、ユーザの行動を一定時間毎にサーバに送信する。取得手段（６４，Ｓ２６９，Ｓ２８９，Ｓ３０９）は、サーバに送信された相手ユーザの行動を取得する。相手傾聴動作付与手段（６４，Ｓ２７５，Ｓ２９５，Ｓ３１５，Ｓ３８５，Ｓ３９１，Ｓ３９３）は、相手ユーザが発話していれば、たとえば相手ユーザが表示されるモニタに対して疑似傾聴動作を行うように、ロボットを動作させる。 In the tenth invention, a robot and a server (24) are connected to the network (200) by, for example, a wireless LAN. A transmission means (64, S69) transmits a user's action to a server for every fixed time. The acquisition means (64, S269, S289, S309) acquires the other user's action transmitted to the server. If the other user is speaking, the other party listening operation giving means (64, S275, S295, S315, S385, S391, S393) performs, for example, a pseudo listening operation on the monitor on which the other user is displayed. Move the robot.

第１０の発明によれば、たとえユーザが相手ユーザの話を傾聴していなくても、ロボットが相手の話を傾聴することで、相手ユーザは自身の話を聴いてもらっているように感じることができる。 According to the tenth aspect, even if the user is not listening to the other user's story, the robot may feel as if the other user is listening to his / her story because the robot listens to the other's story. it can.

第１１の発明は、第１の発明ないし第１０の発明のいずれかに従属し、認識手段は、ユーザの興味対象を認識する興味対象認識手段をさらに含み、動作付与手段は、興味対象認識手段による認識結果に基づき、ユーザが対話に興味を持つように、ロボットを動作させる興味動作付与手段をさらに含む。 An eleventh invention is according to any one of the first to tenth inventions, wherein the recognition means further includes an interest object recognition means for recognizing a user's interest object, and the action giving means is the interest object recognition means. The robot further includes an interest motion giving means for operating the robot so that the user is interested in the dialogue based on the recognition result obtained by the above.

第１１の発明では、興味対象認識手段（６４，Ｓ７９）は、ユーザが見ているものをユーザの興味対象として認識する。興味動作付与手段（６４，Ｓ１７９，Ｓ２４１，Ｓ３２１，Ｓ３３５，Ｓ３４１，Ｓ３４３，Ｓ４０５）は、ユーザの興味が、たとえばロボットでなければ、ユーザが対話に興味を持つように、ロボットを動作させる。 In the eleventh invention, the object recognition means (64, S79) recognizes what the user is viewing as the user's object of interest. The interest motion giving means (64, S179, S241, S321, S335, S341, S343, S405) operates the robot so that the user is interested in the dialogue if the user is not interested in the robot, for example.

第１１の発明によれば、ユーザに対話に興味を持たせることで、対話が持続するようになる。 According to the eleventh aspect of the present invention, the dialog is sustained by making the user interested in the dialog.

第１２の発明は、第１１の発明に従属し、ロボットは、第２カメラを含み、第１カメラおよび第２カメラによるそれぞれの画像に対して顔認識処理を実行する顔認識手段をさらに備え、興味対象認識手段は、顔認識手段による第２所定時間分の顔認識結果から第１カメラの第１認識率を算出する第１認識率算出手段、顔認識手段による第２所定時間分の顔認識結果から第２カメラの第２認識率を算出する第２認識率算出手段および第１認識率と第２認識率とに基づいて認識結果を設定する設定手段を含む。 A twelfth invention is according to the eleventh invention, wherein the robot further includes a face recognition unit that includes a second camera and performs face recognition processing on each image by the first camera and the second camera, The interest object recognition means includes first recognition rate calculation means for calculating a first recognition rate of the first camera from a face recognition result for a second predetermined time by the face recognition means, and face recognition for a second predetermined time by the face recognition means. A second recognition rate calculating means for calculating a second recognition rate of the second camera from the result; and a setting means for setting the recognition result based on the first recognition rate and the second recognition rate.

第１２の発明では、ロボットは、たとえば腹部に第２カメラ（１２）が設けられる。顔認識手段（６４，Ｓ４１）は、第１カメラおよび第２カメラによって撮影されたユーザの顔を認識する。第１認識率算出手段（６４，Ｓ１２３）は、第１カメラによって撮影されたユーザの顔の第１認識率（Ｍ）を算出する。第２認識率算出手段（６４，Ｓ１２５）は、第２カメラによって撮影されたユーザの顔の第２認識率（Ｓ）を算出する。そして、設定手段（６４，Ｓ１３１，Ｓ１３３，Ｓ１３５）は、第１認識率および第２認識率に基づいて、ユーザの興味対象を設定する。 In the twelfth invention, the robot is provided with the second camera (12) in the abdomen, for example. The face recognition means (64, S41) recognizes the user's face photographed by the first camera and the second camera. The first recognition rate calculating means (64, S123) calculates the first recognition rate (M) of the user's face taken by the first camera. The second recognition rate calculation means (64, S125) calculates the second recognition rate (S) of the user's face photographed by the second camera. And a setting means (64, S131, S133, S135) sets a user's interest object based on a 1st recognition rate and a 2nd recognition rate.

第１２の発明によれば、興味対象がユーザの顔の認識率Ｍ，Ｓに基づいて設定されるため、ユーザが見ている対象を正確に認識することができる。 According to the twelfth aspect, since the object of interest is set based on the recognition rates M and S of the user's face, the object the user is looking at can be accurately recognized.

第１３の発明は、第１２の発明に従属し、設定手段によって設定された興味対象に基づいて第１カメラまたは第２カメラの画像のどちらか一方を送信する画像送信手段をさらに備える。 A thirteenth invention is according to the twelfth invention, and further comprises image transmission means for transmitting either the first camera image or the second camera image based on the object of interest set by the setting means.

第１３の発明では、画像送信手段（６４，Ｓ４８９，Ｓ４９１）は、たとえばユーザの興味対象がテレビ電話機であれば第１カメラの画像を送信し、ロボットであれば第２カメラの画像を送信する。 In the thirteenth invention, the image transmission means (64, S489, S491) transmits the image of the first camera if, for example, the user's interest is a videophone, and transmits the image of the second camera if the user is a robot. .

第１３の発明によれば、相手は、たとえユーザがロボットに話しかけている状態であったとしても、相手にはユーザの顔が正面に写る画像が送られる。そのため、相手は自身に話しかけられているように感じるため、対話が持続するようになる。 According to the thirteenth aspect, even if the partner is in a state where the user is talking to the robot, an image of the user's face in front is sent to the partner. As a result, the other party feels as if they are speaking to them, and the dialogue continues.

第１４の発明は、第１の発明ないし第１３の発明のいずれかに従属し、判定手段は、ユーザの姿勢を判定する姿勢判定手段、ユーザの発話の有無を判定する発話判定手段、ユーザの頭部方向を判定する頭部方向判定手段、ユーザの視線方向を判定する視線方向判定手段、ユーザの頷きの有無を判定する頷き判定手段およびユーザの相槌を判定する相槌判定手段を含み、ユーザの行動は、姿勢判定結果、発話判定結果、頭部方向判定結果、視線方向判定結果、頷き判定結果および相槌判定結果に基づいて決定される。 A fourteenth invention is dependent on any one of the first to thirteenth inventions, and the determining means is a posture determining means for determining the user's posture, an utterance determining means for determining the presence or absence of the user's utterance, A head direction determining means for determining a head direction, a gaze direction determining means for determining a user's gaze direction, a whirling determining means for determining the presence or absence of a user's whisper, and a conflict determining means for determining a user's conflict, The behavior is determined based on the posture determination result, the utterance determination result, the head direction determination result, the gaze direction determination result, the whirl determination result, and the conflict determination result.

第１４の発明では、姿勢判定手段（６４，Ｓ９）、頭部方向判定手段（６４，Ｓ２５）、視線方向判定手段（６４，Ｓ３３）および頷き判定手段（６４，Ｓ４９）は、ユーザが写る画像に基づいて判定する。発話判定手段（６４，Ｓ１７）はユーザの音声に基づいて判定する。さらに、相槌判定手段（６４，Ｓ５９）は、ユーザの発話と頷きの判定結果に基づいて判定する。そして、ユーザの行動は、各判定結果に基づいて決められる。 In the fourteenth invention, the posture determination means (64, S9), the head direction determination means (64, S25), the gaze direction determination means (64, S33) and the whirl determination means (64, S49) Determine based on. The utterance determination means (64, S17) determines based on the user's voice. Further, the conflict determination means (64, S59) determines based on the determination result of the user's speech and whisper. And a user's action is decided based on each judgment result.

第１４の発明によれば、ユーザの行動データに複数の判定結果が含まれることで、ユーザの状態を適確に認識できるようになる。 According to the fourteenth aspect, a plurality of determination results are included in the user's behavior data, so that the user's state can be recognized accurately.

この発明によれば、ユーザの状態に応じてロボットが対話を持続させるように動作するため、コミュニケーション障害のある者同士の対話を持続させることができる。 According to the present invention, since the robot operates so as to maintain the conversation according to the state of the user, it is possible to maintain the conversation between persons with communication disabilities.

この発明の上述の目的、その他の目的、特徴および利点は、図面を参照して行う以下の実施例の詳細な説明から一層明らかとなろう。 The above object, other objects, features, and advantages of the present invention will become more apparent from the following detailed description of embodiments with reference to the drawings.

図１はこの発明の一実施例の傾聴対話持続システムの概要を示す図解図である。FIG. 1 is an illustrative view showing an outline of a listening dialogue sustaining system according to an embodiment of the present invention. 図２は図１に示すモニタカメラとモニタとロボットとユーザとの側面的な位置関係およびそのモニタカメラと腹部カメラとの撮影範囲の一例を示す図解図である。FIG. 2 is an illustrative view showing an example of a lateral positional relationship among the monitor camera, the monitor, the robot, and the user shown in FIG. 1 and an imaging range of the monitor camera and the abdominal camera. 図３は図１に示すロボットの外観を正面から見た図解図である。FIG. 3 is an illustrative view showing the appearance of the robot shown in FIG. 1 from the front. 図４は図１に示すロボットの電気的な構成の一例を示すブロック図である。FIG. 4 is a block diagram showing an example of the electrical configuration of the robot shown in FIG. 図５は図１に示すＰＣの電気的な構成の一例を示すブロック図である。FIG. 5 is a block diagram showing an example of the electrical configuration of the PC shown in FIG. 図６は図４に示すメモリに記憶される行動テーブルの一例を示す図解図である。FIG. 6 is an illustrative view showing one example of an action table stored in the memory shown in FIG. 図７は図１に示すモニタカメラおよび腹部カメラによる顔認識結果の一例を示す図解図である。FIG. 7 is an illustrative view showing one example of a face recognition result by the monitor camera and the abdominal camera shown in FIG. 図８は図１に示すサーバの電気的な構成の一例を示すブロック図である。FIG. 8 is a block diagram showing an example of the electrical configuration of the server shown in FIG. 図９は図４に示すＰＣのメモリのメモリマップの一例を示す図解図である。FIG. 9 is an illustrative view showing one example of a memory map of the memory of the PC shown in FIG. 図１０は図９に示すデータ記憶領域の一例を示す図解図である。FIG. 10 is an illustrative view showing one example of a data storage area shown in FIG. 図１１は図９に示す状況認識プログラムの構成の一例を示す図解図である。FIG. 11 is an illustrative view showing one example of a configuration of the situation recognition program shown in FIG. 図１２は図９に示すロボット制御プログラムの構成の一例を示す図解図である。FIG. 12 is an illustrative view showing one example of a configuration of the robot control program shown in FIG. 図１３は図５に示すＰＣのプロセッサの画像／音声取得処理を示すフロー図である。FIG. 13 is a flowchart showing image / sound acquisition processing of the processor of the PC shown in FIG. 図１４は図５に示すＰＣのプロセッサの姿勢判定処理を示すフロー図である。FIG. 14 is a flowchart showing the attitude determination processing of the processor of the PC shown in FIG. 図１５は図５に示すＰＣのプロセッサの発話判定処理を示すフロー図である。FIG. 15 is a flowchart showing speech determination processing of the processor of the PC shown in FIG. 図１６は図５に示すＰＣのプロセッサの頭部方向判定処理を示すフロー図である。FIG. 16 is a flowchart showing head direction determination processing of the processor of the PC shown in FIG. 図１７は図５に示すＰＣのプロセッサの視線方向判定処理を示すフロー図である。FIG. 17 is a flowchart showing the gaze direction determination processing of the processor of the PC shown in FIG. 図１８は図５に示すＰＣのプロセッサの顔認識処理を示すフロー図である。FIG. 18 is a flowchart showing face recognition processing of the processor of the PC shown in FIG. 図１９は図５に示すＰＣのプロセッサの頷き判定処理を示すフロー図である。FIG. 19 is a flowchart showing the whirling determination process of the PC processor shown in FIG. 図２０は図５に示すＰＣのプロセッサの相槌判定処理を示すフロー図である。FIG. 20 is a flowchart showing the compatibility determination process of the processor of the PC shown in FIG. 図２１は図５に示すＰＣのプロセッサの同期処理を示すフロー図である。FIG. 21 is a flowchart showing synchronization processing of the processor of the PC shown in FIG. 図２２は図５に示すＰＣのプロセッサの状態認識処理を示すフロー図である。FIG. 22 is a flowchart showing state recognition processing of the processor of the PC shown in FIG. 図２３は図５に示すＰＣのプロセッサのＡｃ／Ｐａ認識処理を示すフロー図である。FIG. 23 is a flowchart showing the Ac / Pa recognition process of the processor of the PC shown in FIG. 図２４は図５に示すＰＣのプロセッサのＴａ／Ｌｉ認識処理を示すフロー図である。FIG. 24 is a flowchart showing Ta / Li recognition processing of the processor of the PC shown in FIG. 図２５は図５に示すＰＣのプロセッサの興味対象認識処理を示すフロー図である。FIG. 25 is a flowchart showing the interest object recognition processing of the processor of the PC shown in FIG. 図２６は図５に示すＰＣのプロセッサの発声時間計測処理を示すフロー図である。FIG. 26 is a flowchart showing the utterance time measurement processing of the processor of the PC shown in FIG. 図２７は図５に示すＰＣのプロセッサの全体処理を示すフロー図である。FIG. 27 is a flowchart showing the overall processing of the processor of the PC shown in FIG. 図２８は図５に示すＰＣのプロセッサのアクティブトーク処理を示すフロー図である。FIG. 28 is a flowchart showing the active talk processing of the processor of the PC shown in FIG. 図２９は図５に示すＰＣのプロセッサのアクティブリッスン処理を示すフロー図である。FIG. 29 is a flowchart showing an active listening process of the processor of the PC shown in FIG. 図３０は図５に示すＰＣのプロセッサの非アクティブ処理を示すフロー図である。FIG. 30 is a flowchart showing inactive processing of the processor of the PC shown in FIG. 図３１は図５に示すＰＣのプロセッサのアザー処理を示すフロー図である。FIG. 31 is a flowchart showing the other processing of the processor of the PC shown in FIG. 図３２は図５に示すＰＣの発話継続処理を示すフロー図である。FIG. 32 is a flowchart showing the utterance continuation processing of the PC shown in FIG. 図３３は図５に示すＰＣのプロセッサの発話抑制処理を示すフロー図である。FIG. 33 is a flowchart showing speech suppression processing of the processor of the PC shown in FIG. 図３４は図５に示すＰＣのプロセッサの発話促進処理を示すフロー図である。FIG. 34 is a flowchart showing the speech promotion process of the processor of the PC shown in FIG. 図３５は図５に示すＰＣのプロセッサの注意引きつけ処理を示すフロー図である。FIG. 35 is a flowchart showing the attention attracting process of the processor of the PC shown in FIG. 図３６は図５に示すＰＣのプロセッサの傍参与者的疑似傾聴処理を示すフロー図である。FIG. 36 is a flowchart showing an attendant pseudo-listening process of the processor of the PC shown in FIG. 図３７は図５に示すＰＣのプロセッサの第１積極的疑似傾聴処理を示すフロー図である。FIG. 37 is a flowchart showing the first positive pseudo-listening process of the processor of the PC shown in FIG. 図３８は図５に示すＰＣのプロセッサのユーザ発話抑制処理を示すフロー図である。FIG. 38 is a flowchart showing user utterance suppression processing of the processor of the PC shown in FIG. 図３９は図５に示すＰＣのプロセッサの第２積極的疑似傾聴処理を示すフロー図である。FIG. 39 is a flowchart showing the second active pseudo-listening process of the processor of the PC shown in FIG. 図４０は図５に示すＰＣのプロセッサの注意誘導処理を示すフロー図である。FIG. 40 is a flowchart showing attention guidance processing of the processor of the PC shown in FIG. 図４１は図５に示すＰＣのプロセッサのユーザ発話促進処理を示すフロー図である。41 is a flowchart showing user utterance promotion processing of the processor of the PC shown in FIG. 図４２は図５に示すＰＣのプロセッサのカメラ制御処理を示すフロー図である。FIG. 42 is a flowchart showing camera control processing of the processor of the PC shown in FIG.

図１を参照して、この実施例の傾聴対話持続システム１００は、たとえば認知症患者のような軽度脳障害を持つユーザＡと遠隔地に居るユーザＢとの対話に利用される。そのため、傾聴対話持続システム１００には、ユーザＡが居る部屋１に設置される腹部カメラ１２ａ（第２カメラ）を含むぬいぐるみ型ロボット（以下、単に「ロボット」と言う。）１０ａ、ＰＣ１４ａ、モニタ１６ａ、スピーカ１８ａ、マイク２０ａおよびモニタカメラ２２ａ（第１カメラ）と、ユーザＢが居る部屋２（遠隔地）に設置される腹部カメラ１２ｂを含むロボット１０ｂ、ＰＣ１４ｂ、モニタ１６ｂ、スピーカ１８ｂ、マイク２０ｂおよびモニタカメラ２２ｂと、ネットワーク２００に接続されるサーバ２４とを備える。なお、本明細書では、部屋１および部屋２において対応する機器および人間を、区別なく説明する必要がある場合には、符号にアルファベットを添えずに参照番号だけで表示されることがあることに留意されたい。 Referring to FIG. 1, the listening dialogue sustaining system 100 of this embodiment is used for a dialogue between a user A having a mild brain disorder such as a dementia patient and a user B in a remote place. Therefore, the listening dialogue sustaining system 100 includes a stuffed robot (hereinafter simply referred to as “robot”) 10a, a PC 14a, and a monitor 16a including an abdominal camera 12a (second camera) installed in the room 1 where the user A is present. A robot 10b, a PC 14b, a monitor 16b, a speaker 18b, a microphone 20b, and a speaker 18a, a microphone 20a, a monitor camera 22a (first camera), and an abdominal camera 12b installed in a room 2 (remote location) where the user B is located. A monitor camera 22b and a server 24 connected to the network 200 are provided. In addition, in this specification, when it is necessary to describe the corresponding devices and people in the room 1 and the room 2 without distinction, the reference numerals may be displayed only without reference numerals. Please keep in mind.

ロボット１０はＰＣ１４による制御信号に基づいて傾聴動作や、発話を行う。ロボット１０の腹部に設けられた腹部カメラ１２はユーザを撮影し、ロボット１０を介して画像をＰＣ１４に出力する。ＰＣ１４は、ロボット１０に対して制御信号を出力するとともに、腹部カメラ１２およびモニタカメラ２２によって撮影された画像と、マイク２０によって集音される音声とが入力される。そして、ＰＣ１４は、入力された画像と音声とに基づいてユーザの行動および状態を判定および認識し、その結果をネットワーク２００を介してサーバ２４に送信する。 The robot 10 listens and speaks based on the control signal from the PC 14. An abdominal camera 12 provided on the abdomen of the robot 10 captures the user and outputs an image to the PC 14 via the robot 10. The PC 14 outputs a control signal to the robot 10 and receives an image captured by the abdominal camera 12 and the monitor camera 22 and a sound collected by the microphone 20. Then, the PC 14 determines and recognizes the user's action and state based on the input image and sound, and transmits the result to the server 24 via the network 200.

ＰＣ１４、モニタ１６、スピーカ１８、マイク２０およびモニタカメラ２２はテレビ電話機として機能する。たとえば、ＰＣ１４ａは、ユーザＢ側のＰＣ１４ｂから送信されたユーザＢの画像および音声を受信する。そのため、モニタ１６ａはユーザＢの画像を表示し、スピーカ１８はユーザＢの音声を出力する。さらに、マイク２０はユーザＡの音声を集音してＰＣ１４に出力し、モニタカメラ２２はユーザＡの画像を撮影してＰＣ１４に出力する。そして、ＰＣ１４は、ユーザＡの画像と音声とを、ネットワーク２００を介してＰＣ１４ｂに送信する。 The PC 14, the monitor 16, the speaker 18, the microphone 20, and the monitor camera 22 function as a video phone. For example, the PC 14a receives the image and sound of the user B transmitted from the PC 14b on the user B side. Therefore, the monitor 16a displays the image of the user B, and the speaker 18 outputs the user B's voice. Further, the microphone 20 collects the voice of the user A and outputs it to the PC 14, and the monitor camera 22 captures an image of the user A and outputs it to the PC 14. Then, the PC 14 transmits the image and sound of the user A to the PC 14b via the network 200.

サーバ２４は、ＰＣ１４ａおよびＰＣ１４ｂから送信される、ユーザＡおよびユーザＢの行動や状態のデータを受信すると、データベース（ＤＢ）に蓄積させる。そして、ＰＣ１４から行動および状態のデータを取得する要求がある場合に、その要求に応じてデータをＰＣ１４に送信する。 When the server 24 receives the data of the actions and states of the user A and the user B transmitted from the PC 14a and the PC 14b, the server 24 accumulates them in a database (DB). And when there exists a request | requirement which acquires the data of action and a state from PC14, data are transmitted to PC14 according to the request | requirement.

なお、他の実施例では、ロボット１０とＰＣ１４とが有線接続ではなく、無線接続であってもよい。また、ＰＣ１４およびサーバ２４のネットワーク２００との接続も、有線接続であってもよいし、無線接続であってもよい。 In another embodiment, the robot 10 and the PC 14 may be wirelessly connected instead of wiredly connected. Further, the connection between the PC 14 and the server 24 with the network 200 may be a wired connection or a wireless connection.

図２は図１に示す実施例を側面から見た実施例である。図２から分かるように、モニタカメラ２２はモニタ１６の上に置かれ、ロボット１２とモニタ１６とは机の上に置かれる。ユーザは、机の上に置かれるモニタ１６およびモニタカメラ２２に対面する状態で、腹部カメラ１２およびモニタカメラ２２によって撮影される。さらに、ロボット１０は、ユーザとモニタ１６との間に配置されるため、モニタカメラ２２はロボット１０とユーザとを同時に撮影する。これにより、ロボット１０は、ユーザＡに対して疑似的な傾聴動作（疑似傾聴動作）を行ったり、ユーザＢが表示されるモニタ１６ａに対して疑似傾聴動作を行ったりする。 FIG. 2 shows an embodiment of the embodiment shown in FIG. 1 viewed from the side. As can be seen from FIG. 2, the monitor camera 22 is placed on the monitor 16, and the robot 12 and the monitor 16 are placed on the desk. The user is photographed by the abdominal camera 12 and the monitor camera 22 while facing the monitor 16 and the monitor camera 22 placed on the desk. Furthermore, since the robot 10 is disposed between the user and the monitor 16, the monitor camera 22 photographs the robot 10 and the user at the same time. Thereby, the robot 10 performs a pseudo-listening operation (pseudo-listening operation) on the user A, or performs a pseudo-listening operation on the monitor 16a on which the user B is displayed.

なお、ロボット１０は、モニタカメラ２２によって撮影され、かつユーザを撮影可能な位置であれば、机の上に置かれていなくてもよい。 Note that the robot 10 does not have to be placed on the desk as long as it is captured by the monitor camera 22 and can capture the user.

図３にはロボット１０の外観が図示される。このロボット１０は、頭部２６とそれを支える胴体２８とを含む。胴体２８の上部（人間の肩に相当）の左右に左腕３０Ｌおよび右腕３０Ｒが設けられ、胴体２８の腹部には腹部カメラ１２が設けられる。この腹部カメラ１２には、たとえばCCDやCMOSのような固体撮像素子を用いるカメラを採用することができる。また、頭部２６には、前面に口３２が配置され、その口３２の上方には眼球３４が設けられる。そして、頭部２６の上部側面には耳３６が取り付けられている。 FIG. 3 illustrates the appearance of the robot 10. The robot 10 includes a head 26 and a body 28 that supports the head 26. The left arm 30L and the right arm 30R are provided on the left and right of the upper part of the body 28 (corresponding to a human shoulder), and the abdomen camera 12 is provided on the abdomen of the body 28. As the abdomen camera 12, for example, a camera using a solid-state imaging device such as CCD or CMOS can be employed. The head 26 is provided with a mouth 32 on the front surface, and an eyeball 34 is provided above the mouth 32. An ear 36 is attached to the upper side surface of the head 26.

頭部２６は、胴体２８によって旋回・俯仰可能に支持され、眼球３４も稼働的に保持されている。また、胴体２８は、腰の部分を中心として左右方向に傾くことが可能である。さらに、口３２にはスピーカ５６（図４）が内蔵され、耳３６にはマイク５８（図４）が内蔵される。 The head 26 is supported by the body 28 so as to be able to turn and rise, and the eyeball 34 is also operatively held. In addition, the body 28 can be tilted in the left-right direction around the waist. Further, a speaker 56 (FIG. 4) is built in the mouth 32, and a microphone 58 (FIG. 4) is built in the ear 36.

なお、マイク５８を両方の耳３６にそれぞれ内蔵すれば、ステレオマイクとして機能し、それによって、そのステレオマイクに入力された音声の位置を必要に応じて特定することができる。また、ロボット１０の外見は、熊だけに限らず、他の動物や、人型であってもよい。 If the microphones 58 are incorporated in both ears 36, they function as stereo microphones, whereby the position of the sound input to the stereo microphones can be specified as necessary. Further, the appearance of the robot 10 is not limited to a bear, but may be another animal or a humanoid.

図４にはロボット１０の電気的な構成を示すブロック図が示される。ロボット１０には、マイクロコンピュータ或いはCPUとも呼ばれる、プロセッサ３８が内蔵されており、通信路の一例であるバス４０を介して、腹部カメラ１２、メモリ４２、モータ制御ボード４４、音声入力／出力ボード５４、センサ入力／出力ボード６０およびＩ／Ｏ６２に接続される。 FIG. 4 is a block diagram showing an electrical configuration of the robot 10. The robot 10 incorporates a processor 38, also called a microcomputer or a CPU, via a bus 40, which is an example of a communication path, via an abdominal camera 12, a memory 42, a motor control board 44, and an audio input / output board 54. , Connected to the sensor input / output board 60 and the I / O 62.

メモリ４２は、図示しないROMやRAMが組み込まれており、ROMには主として、ロボット１０による傾聴動作や、発話を行うためのプログラムや、発話を行う際にスピーカ５６から出力される音声データなどが予め記憶されている。また、RAMは一時記憶メモリとして用いられるとともに、ワーキングメモリとして利用される。 The memory 42 includes a ROM and a RAM (not shown). The ROM mainly includes a listening operation by the robot 10, a program for uttering, voice data output from the speaker 56 when uttering, and the like. Stored in advance. The RAM is used as a temporary storage memory and a working memory.

モータ制御ボード４４は、たとえばDSP(Digital Signal Processor)で構成され、図３に示すロボット１０の各腕や頭部の各軸モータを制御する。すなわち、モータ制御ボード４４は、プロセッサ３８からの制御データを受け、右腕３０Ｒ（図３）を前後や左右に動かすことができるように、Ｘ，ＹおよびＺ軸のそれぞれの角度を制御する３つのモータ（図４ではまとめて、「右腕モータ」として示す。）４６Ｒの回転角度を調節する。また、モータ制御ボード４４は、左腕３０Ｌの３つのモータ（図４ではまとめて、「左腕モータ」として示す。）４６Ｌの回転角度を調節する。モータ制御ボード４４は、また、頭部２６の旋回角や俯仰角を制御する３つのモータ（図４ではまとめて、「頭部モータ」として示す。）４８の回転角度を調節する。モータ制御ボード４４は、また、眼球３４を動かす眼球モータ５０および胴体２８を傾ける腰モータ５２も制御する。 The motor control board 44 is constituted by, for example, a DSP (Digital Signal Processor), and controls each axis motor of each arm and head of the robot 10 shown in FIG. That is, the motor control board 44 receives the control data from the processor 38 and controls the three angles for controlling the X, Y, and Z axes so that the right arm 30R (FIG. 3) can be moved back and forth and left and right. The rotation angle of the motor 46R (collectively shown as “right arm motor” in FIG. 4) 46R is adjusted. Further, the motor control board 44 adjusts the rotation angle of three motors 46L of the left arm 30L (collectively shown as “left arm motor” in FIG. 4) 46L. The motor control board 44 also adjusts the rotation angle of three motors 48 (collectively shown as “head motors” in FIG. 4) that control the turning angle and the elevation angle of the head 26. The motor control board 44 also controls an eyeball motor 50 that moves the eyeball 34 and a waist motor 52 that tilts the body 28.

なお、上述のモータは、制御を簡単化するために、それぞれステッピングモータまたはパルスモータであるが、直流モータであってもよい。 Note that the motors described above are stepping motors or pulse motors in order to simplify the control, but may be direct current motors.

スピーカ５６には音声入出力ボード５４を介して、プロセッサ３８から合成音声データが与えられ、それに応じて、スピーカ５６からはそのデータに従った音声または声が出力される。そして、マイク５８によって集音された音声は、音声入出力ボード５４を介して、プロセッサ３８に取り込まれる。 The speaker 56 is provided with the synthesized voice data from the processor 38 via the voice input / output board 54, and accordingly, the speaker 56 outputs voice or voice according to the data. Then, the sound collected by the microphone 58 is taken into the processor 38 via the sound input / output board 54.

センサ入力／出力ボード６０は、モータ制御ボード４４と同様に、DSPで構成され、腹部カメラ１２からの信号を取り込んで、プロセッサ３８に与える。腹部カメラ１２からの映像信号が、必要に応じてセンサ入力／出力ボード６０で所定の処理を施してからプロセッサ３８に入力される。 Similar to the motor control board 44, the sensor input / output board 60 is composed of a DSP, takes in a signal from the abdominal camera 12, and provides it to the processor 38. The video signal from the abdominal camera 12 is input to the processor 38 after being subjected to predetermined processing by the sensor input / output board 60 as necessary.

Ｉ／Ｏ６２は、各々入力／出力の制御が可能なディジタルポートであり、出力ポートからは映像信号が出力され、ＰＣ１４に与えられる。一方、ＰＣ１４からは、制御信号が出力され、入力ポートに与えられる。 The I / O 62 is a digital port that can control input / output, and a video signal is output from the output port and applied to the PC 14. On the other hand, a control signal is output from the PC 14 and applied to the input port.

図５にはＰＣ１４の電気的な構成を示すブロック図が示される。ＰＣ１４には、ロボット１０と同様、マイクロコンピュータ或いはCPUとも呼ばれる、プロセッサ６４が内蔵されており、バス６６を介して、メモリ６８、視線サーバ７０、音声入力／出力ボード７２およびＩ／Ｏ７４に接続される。なお、プロセッサ６４には、日時情報を出力するＲＴＣ(Real Time Clock)６４ａが内蔵されている。 FIG. 5 is a block diagram showing the electrical configuration of the PC 14. Like the robot 10, the PC 14 includes a processor 64, which is also called a microcomputer or CPU, and is connected to the memory 68, the line-of-sight server 70, the audio input / output board 72, and the I / O 74 via the bus 66. The The processor 64 has a built-in RTC (Real Time Clock) 64a for outputting date information.

メモリ６８は、図示しないROM、RAMおよびHDDが組み込まれており、ROMには主として、電話機能を実現するためのプログラムや、後述のフローチャート（図１３−図４２）で表現されるプログラムが記憶される。また、RAMには主として、腹部カメラ１２およびモニタカメラ２２によって撮影された画像や、マイク２０によって集音された音声などが一時的に記憶されるバッファなどが設定されている。そして、HDDには主として、ユーザの行動を判断した結果や、状態を認識した結果などが随時記憶される。 The memory 68 incorporates a ROM, a RAM, and an HDD (not shown). The ROM mainly stores a program for realizing a telephone function and a program expressed by flowcharts (FIGS. 13 to 42) described later. The The RAM is mainly set with a buffer for temporarily storing images taken by the abdominal camera 12 and the monitor camera 22 and sounds collected by the microphone 20. The HDD mainly stores the result of determining the user's action, the result of recognizing the state, and the like as needed.

視線サーバ７０は、腹部カメラ１２およびモニタカメラ２２によって撮影されたユーザの顔の画像から、ユーザの視線方向や位置をリアルタイムで検出する。そして、プロセッサ２０は、視線サーバ７０が特定または検出したユーザの視線方向や位置を示すデータを、バス６６を通して刻々受け取ることで、ユーザが注視する対象を判定する。 The line-of-sight server 70 detects the user's line-of-sight direction and position in real time from the image of the user's face taken by the abdominal camera 12 and the monitor camera 22. Then, the processor 20 receives the data indicating the user's line-of-sight direction and position identified or detected by the line-of-sight server 70 through the bus 66 to determine an object to be watched by the user.

なお、視線サーバ７０によるユーザの視線方向や位置の検出方法については、本件出願人が先に出願し既に公開された、特開２００８−１１３８７５号公報に開示されているので、ここでは記述を省略する。 Note that the method of detecting the user's line-of-sight direction and position by the line-of-sight server 70 has been disclosed in Japanese Patent Application Laid-Open No. 2008-113875 previously filed by the applicant of the present application and is therefore omitted here. To do.

スピーカ１８には、音声入力／出力ボード７２を介して、プロセッサ６４から相手ユーザの音声データが与えられ、それに応じて、スピーカ１８からはそのデータに従った音声が出力される。そして、マイク２０によって集音された相手ユーザの音声は、音声入力／出力ボード７２を介して、プロセッサ６４に取り込まれる。 The audio data of the other user is given to the speaker 18 from the processor 64 via the audio input / output board 72, and the audio according to the data is output from the speaker 18 accordingly. Then, the voice of the partner user collected by the microphone 20 is taken into the processor 64 via the voice input / output board 72.

Ｉ／Ｏ７４は、ロボット１０のＩ／Ｏ６２と同様に、各々入力／出力の制御が可能なディジタルポートであり、出力ポートからは、制御信号がロボット１０に出力され、画像信号がモニタ１６に出力される。また、ロボット１０およびモニタカメラ２２からは、映像信号が出力され、入力ポートに与えられる。なお、スピーカ１８が出力する音声データおよびマイク２０から入力される音声データは、Ｉ／Ｏ７４を介して音声入力／出力ボード７２に入出力されるようにされてもよい。 Similar to the I / O 62 of the robot 10, the I / O 74 is a digital port capable of controlling input / output. From the output port, a control signal is output to the robot 10 and an image signal is output to the monitor 16. Is done. In addition, a video signal is output from the robot 10 and the monitor camera 22 and applied to the input port. Note that the audio data output from the speaker 18 and the audio data input from the microphone 20 may be input / output to / from the audio input / output board 72 via the I / O 74.

また、プロセッサ６４は、バス６６を介して通信ＬＡＮボード７６に接続される。この通信ＬＡＮボード７６は、たとえばDSPで構成され、プロセッサ６４から与えられた送信データを無線通信装置７８に与える。無線通信装置７８は送信データを、ネットワーク２００を介して外部のコンピュータ（サーバ２４および相手のＰＣ１４）に送信する。また、通信ＬＡＮボード７６は、無線通信装置７８を介してデータを受信し、受信したデータをプロセッサ６４に与える。 The processor 64 is connected to the communication LAN board 76 via the bus 66. The communication LAN board 76 is constituted by, for example, a DSP, and supplies transmission data given from the processor 64 to the wireless communication device 78. The wireless communication device 78 transmits the transmission data to an external computer (the server 24 and the partner PC 14) via the network 200. The communication LAN board 76 receives data via the wireless communication device 78 and gives the received data to the processor 64.

たとえば、送信データとしては、テレビ電話機として必要なコマンド、画像データおよび音声データや、ユーザの行動を判定した結果およびユーザの状態を認識した結果であったりする。また、受信データとしては、テレビ電話機として得られる相手の画像データおよび音声データや、相手ユーザの行動を判定した結果および状態を認識した結果であったりする。 For example, the transmission data may be a command, image data and audio data necessary for a videophone, a result of determining a user's action, and a result of recognizing the user's state. The received data may be image data and audio data of the other party obtained as a video phone, or a result of determining the action and state of the other user and recognizing the state.

図６にはＰＣ１４のメモリ６８に記憶される、行動テーブルが示される。この行動テーブルとは、ユーザの行動が判定された結果が行動データにされ、その行動データが一定時間（たとえば、１秒）毎に刻々と記録されるテーブルである。 FIG. 6 shows an action table stored in the memory 68 of the PC 14. The behavior table is a table in which the result of determining the user's behavior is used as behavior data, and the behavior data is recorded every predetermined time (for example, 1 second).

図６を参照して、行動テーブルは、左側から「時刻」、「前傾姿勢」、「発話」、「頭部方向（腹部）」、「視線方向（腹部）」、「顔認識（腹部）」、「頭部方向（モニタ）」、「視線方向（モニタ）」、「顔認識（モニタ）」、「頷き」および「相槌」の列で構成されている。そして、各行動データは、「時刻」の列に同期して、各欄に記録される判定結果から構成される。 Referring to FIG. 6, the action table includes “time”, “forward tilt posture”, “utterance”, “head direction (abdomen)”, “gaze direction (abdomen)”, “face recognition (abdomen) from the left side. ”,“ Head direction (monitor) ”,“ line-of-sight direction (monitor) ”,“ face recognition (monitor) ”,“ blink ”and“ contrast ”. Each action data is composed of a determination result recorded in each column in synchronization with the “time” column.

「時刻」の列に記録される数値は、ＲＴＣ６４ａが出力する日時情報であり、たとえば「10:00:30」は「１０時００分３０秒」を表す。 The numerical value recorded in the “time” column is the date and time information output by the RTC 64a. For example, “10:00:30” represents “10:00:30”.

「前傾姿勢」の列には、ユーザがモニタ１６に対して前傾姿勢を取っているか否かを示す判定結果が記録される。たとえば、「前傾姿勢」の列に「あり」が記録されていればユーザがモニタ１６に対して前傾姿勢を取っており、「なし」が記録されていれば、ユーザがモニタ１６に対して前傾姿勢を取っていないことを示す。そして、前傾姿勢の「あり」／「なし」は、腹部カメラ１２およびモニタカメラ２２によって撮影された画像に対して、テンプレートマッチング処理を加えて判定される。たとえば、ユーザの姿勢が前傾視線のテンプレートと一致する場合に前傾姿勢が「あり」と判定され、テンプレートと一致しない場合に前傾姿勢が「なし」と判定される。なお、ロボット１０に超音波センサなどの距離を計測可能なセンサを取りつけて、ユーザの姿勢を判定するようにしてもよい。 In the column of “forward leaning posture”, a determination result indicating whether or not the user is leaning forward with respect to the monitor 16 is recorded. For example, if “Yes” is recorded in the “Forward leaning” column, the user is leaning forward with respect to the monitor 16, and if “None” is recorded, the user is directed to the monitor 16. It shows that it is not taking forward leaning posture. Then, the presence / absence of the forward tilt posture is determined by applying a template matching process to images taken by the abdominal camera 12 and the monitor camera 22. For example, the forward tilt posture is determined as “present” when the user posture matches the template of the forward tilt line of sight, and the forward tilt posture is determined as “none” when it does not match the template. Note that a sensor that can measure a distance, such as an ultrasonic sensor, may be attached to the robot 10 to determine the user's posture.

「発話」の列には、ユーザが発話しているか否かを示す判定結果が記録される。たとえば、「発話」の列に「あり」が記録されていれば、ユーザが発話していることを示し、「なし」が記録されていれば、ユーザが発話していないことを示す。そして、発話の「あり」／「なし」は、マイク２０によって集音された音声データの音声レベルから判定される。たとえば、音声データの音声レベルが決められた値以上であれば「あり」と判定され、決められた値未満であれば「なし」と判定される。 In the “utterance” column, a determination result indicating whether or not the user is speaking is recorded. For example, if “Yes” is recorded in the “Speech” column, it indicates that the user is speaking, and if “None” is recorded, it indicates that the user is not speaking. Then, “Yes” / “No” of the utterance is determined from the voice level of the voice data collected by the microphone 20. For example, if the sound level of the sound data is greater than or equal to a predetermined value, it is determined as “present”, and if it is less than the determined value, it is determined as “not present”.

「頭部方向（腹部）」および「頭部方向（モニタ）」の列には、ユーザの頭部が向いている方向に有る物が記録される。たとえば、「モニタ」が記録されていれば、ユーザの頭部が向く方向にはモニタ１６が有ることを示し、「ロボット」が記録されていれば、ユーザの頭部が向く方向にはロボット１０が有ることを示し、「アザー」が記録されていれば、ユーザの頭部が向く方向にはロボット１０およびモニタ１６が無いことを示す。そして、「頭部方向（腹部）」の列については、腹部モニタ１２によって撮影された画像に対してテンプレートマッチング処理を加えて判定される。一方、「頭部方向（モニタ）」の列については、モニタカメラ２２によって撮影された画像に対してテンプレートマッチング処理を加えて判定される。たとえば、ユーザの頭部の形がモニタ１６の方向を向くテンプレートと一致していれば「モニタ」と判定され、ユーザの頭部の形がロボット１０の方向を向くテンプレートと一致していれば「ロボット」と判定され、ユーザの頭部の形がいずれのテンプレートとも一致しなければ「アザー」と判定される。 In the columns of “head direction (abdomen)” and “head direction (monitor)”, an object in the direction in which the user's head is facing is recorded. For example, if “monitor” is recorded, it indicates that the monitor 16 is in the direction facing the user's head, and if “robot” is recorded, the robot 10 is directed in the direction toward the user's head. If “other” is recorded, it indicates that the robot 10 and the monitor 16 are not in the direction in which the user's head faces. The column of “head direction (abdomen)” is determined by applying template matching processing to the image captured by the abdomen monitor 12. On the other hand, the column of “head direction (monitor)” is determined by applying template matching processing to the image captured by the monitor camera 22. For example, if the shape of the user's head matches the template facing the direction of the monitor 16, it is determined as “monitor”. If the shape of the user's head matches the template facing the direction of the robot 10, “ If the shape of the user's head does not match any of the templates, it is determined as “other”.

「視線方向（腹部）」および「視線方向（モニタ）」の列には、ユーザの視線が向いている方向に有る物が記録される。また、「視線方向（腹部）」および「視線方向（モニタ）」の列には、「頭部方向（腹部）」の列と同様に、「モニタ」、「ロボット」および「アザー」が記録される。そして、「視線方向（腹部）」および「視線方向（モニタ）」の列における、「モニタ」、「ロボット」および「アザー」は視線サーバ７０の出力に基づいて判定される。たとえば、ユーザの視線方向がモニタ１６に向いていれば「モニタ」と判定され、ユーザの視線方向がロボット１０に向いていれば「ロボット」と判定され、ユーザの視線方向がロボット１０およびモニタ１６のいずれの方向にも向いていなければ、「アザー」と判定される。 In the columns “line-of-sight direction (abdomen)” and “line-of-sight direction (monitor)”, objects in the direction in which the user's line of sight faces are recorded. Similarly to the “head direction (abdomen)” column, “monitor”, “robot”, and “other” are recorded in the “line of sight direction (abdomen)” and “line of sight direction (monitor)” columns. The Then, “monitor”, “robot”, and “other” in the columns of “line-of-sight direction (abdomen)” and “line-of-sight direction (monitor)” are determined based on the output of the line-of-sight server 70. For example, if the user's line-of-sight direction is directed to the monitor 16, it is determined as “monitor”, and if the user's line-of-sight direction is directed toward the robot 10, it is determined as “robot”, and the user's line-of-sight direction is determined as the robot 10 and the monitor 16. If it is not in any direction, it is determined as “other”.

「顔認識（腹部）」および「顔認識（モニタ）」の列には、ユーザの顔の認識結果が記録される。たとえば、ユーザの顔が認識されていれば「成功」が記録され、ユーザの顔が認識されていなければ「失敗」が記録される。そして、「顔認識（腹部）」の列については、腹部モニタ１２によって撮影された画像に対して顔認識処理が実施されることで判定される。一方、「顔認識（モニタ）」の列については、モニタカメラ２２によって撮影された画像に対して所定の顔認識処理が実施されることで判定される。 In the columns of “face recognition (abdomen)” and “face recognition (monitor)”, the recognition results of the user's face are recorded. For example, “success” is recorded if the user's face is recognized, and “failure” is recorded if the user's face is not recognized. The column of “face recognition (abdomen)” is determined by performing face recognition processing on an image photographed by the abdomen monitor 12. On the other hand, the column “face recognition (monitor)” is determined by performing a predetermined face recognition process on the image captured by the monitor camera 22.

図７（Ａ）にはモニタカメラ２２による顔認識結果の成功列が示され、図７（Ｂ）には腹部カメラ１２による顔認識結果の成功例が示される。まず、図７（Ａ）を参照して、左側が腹部カメラ１２による画像であり、右側がモニタカメラ２２による画像であり、どちらの画像も同じ時刻に撮影された画像である。このとき、ユーザはモニタカメラ２２を注視している状態である。そのため、モニタカメラ２２による画像では、ユーザの顔が正面に写っているため、顔認識が成功している。一方、腹部カメラ１２による画像では、ユーザの顔は傾いて写っているため、顔認識が失敗している。 7A shows a success sequence of face recognition results by the monitor camera 22, and FIG. 7B shows a successful example of face recognition results by the abdominal camera 12. FIG. First, referring to FIG. 7A, the left side is an image by the abdominal camera 12, the right side is an image by the monitor camera 22, and both images are images taken at the same time. At this time, the user is watching the monitor camera 22. Therefore, in the image by the monitor camera 22, the face of the user is reflected in the front, and thus the face recognition is successful. On the other hand, in the image obtained by the abdominal camera 12, the user's face is tilted and the face recognition fails.

次に、図７（Ｂ）を参照して、図７（Ａ）と同様に、左側が腹部カメラ１２による画像であり、右側がモニタカメラ２２による画像であり、同じ時刻に撮影された画像である。このとき、ユーザは腹部カメラ１２を注視しているため、腹部カメラ１２による画像では顔認識が成功している。一方、モニタカメラ２２による画像では顔認識が失敗している。 Next, referring to FIG. 7B, as in FIG. 7A, the left side is an image by the abdominal camera 12, the right side is an image by the monitor camera 22, and is an image taken at the same time. is there. At this time, since the user is gazing at the abdomen camera 12, the face recognition is successful in the image by the abdomen camera 12. On the other hand, face recognition has failed in the image by the monitor camera 22.

図６に戻って、「頷き」の列には、ユーザによる頷きの有無が記録される。たとえば、「頷き」の列に「あり」が記録されていればユーザが頷いたことを示し、「なし」が記録されていればユーザが頷かなかったことを示す。そして、ユーザの頷きの有無は、腹部カメラ１２およびモニタカメラ２２によって撮影された画像に対して、所定のテンプレートマッチング処理を加えることで判定する。たとえば、ユーザの頭部の形が頷いている状態のテンプレートと一致していれば「あり」と判定され、テンプレートと一致していなければ「なし」と判定される。 Returning to FIG. 6, the presence / absence of whispering by the user is recorded in the “whispering” column. For example, if “yes” is recorded in the “whispering” column, it indicates that the user has asked, and if “none” is recorded, it indicates that the user has not spoken. The presence / absence of the user's whisper is determined by applying a predetermined template matching process to the images taken by the abdominal camera 12 and the monitor camera 22. For example, “Yes” is determined if the user's head shape matches the crawling template, and “No” is determined if the template does not match the template.

「相槌」の列には、ユーザによる相槌の有無が記録される。たとえば、「相槌」の列に「あり」と記録されていればユーザが頷いたことを示し、「なし」と記録されていればユーザが頷かなかったことを示す。そして、相槌の有無については、頷きの判定結果と、音声データにおける「あー」や「うん」などの相槌らしい音を判定可能な、音声プロソディの判定結果とに基づいて判断する。たとえば、ユーザの頷きが「あり」ときに相槌らしい音があれば、相槌の判定結果は「あり」と判定される。 The presence / absence of a user's conflict is recorded in the “contrast” column. For example, if “Yes” is recorded in the column of “Consideration”, it indicates that the user has asked, and if “No” is recorded, it indicates that the user has not asked. The presence / absence of the conflict is determined based on the determination result of the whispering and the determination result of the voice prosody that can determine a sound that is compatible with the voice data such as “Ah” and “Ye”. For example, if there is a sound that is compatible when the user's whisper is “Yes”, the determination result of the conflict is determined as “Yes”.

なお、上述したテンプレートマッチング処理および顔認識処理は、広く一般的な手法が用いられているため、詳細な説明は省略する。 Note that the template matching process and the face recognition process described above use widely general methods, and thus detailed description thereof is omitted.

そして、ＰＣ１４では、これらの処理によって一定時間毎に記録された行動データから、ユーザの状態を第１所定時間（たとえば、３０秒）毎に認識する。具体的には、第１所定時間毎の行動データに基づいて、Ａｃ／Ｐａ認識、Ｔａ／Ｌｉ認識および興味対象認識の処理を実行し、各処理の認識結果からユーザの状態を決定する。 And PC14 recognizes a user's state for every 1st predetermined time (for example, 30 seconds) from the action data recorded for every fixed time by these processes. Specifically, Ac / Pa recognition, Ta / Li recognition, and interest recognition processing are executed based on the behavior data for each first predetermined time, and the state of the user is determined from the recognition result of each processing.

まず、Ａｃ／Ｐａ認識とは、ユーザが対話に積極的に参加し、対話に集中している「アクティブ：Ａｃｔｉｖｅ（Ａｃ）」状態か、ユーザが対話に非積極的であり、他のものに集中または注意力が散漫な「パッシブ：Ｐａｓｓｉｖｅ（Ｐａ）」状態かを認識する処理である。そのため、本実施例のＡｃ／Ｐａ認識では、第１所定時間分のユーザの行動データのうち、視線方向、前傾姿勢、頷きおよび相槌に基づいて、アクティブ状態らしさを示すＡＣ値を算出することで、アクティブ状態またはパッシブ状態を認識する。 First, Ac / Pa recognition refers to an “active (Ac)” state where the user actively participates in the dialogue and concentrates on the dialogue, or the user is inactive in the dialogue, This is a process for recognizing a “passive (Pa)” state in which concentration or attention is distracting. Therefore, in the Ac / Pa recognition of the present embodiment, the AC value indicating the likelihood of the active state is calculated based on the line-of-sight direction, the forward leaning posture, the whirling, and the conflict among the user behavior data for the first predetermined time. To recognize the active or passive state.

たとえば、相手を見る時間をＷＴ、体を倒して近づく前傾姿勢の時間をＡＴ、相槌の頻度をＲＦで示す場合に、相手を見る時間ＷＴは、行動テーブルにおける「視線方向（腹部）」および「視線方向（モニタ）」の列で、第１所定時間分の行のうち、「あり」と判定された回数をカウントすることで求めることができる。また、前傾姿勢の時間ＦＴは、「前傾姿勢」の列で、第１所定時間分の行のうち、「あり」と判定された回数をカウントすることで求めることができる。さらに、相槌の頻度ＲＦは、「相槌」の列で「あり」と判定された回数を、「頷き」の列で「あり」と判定された回数で割ることで求めることができる。 For example, when WT is the time to watch the opponent, AT is the time of the forward leaning posture approaching the body, and RF is the frequency of reciprocity, the time WT to look at the opponent is the “sight line direction (abdomen)” in the action table and It can be obtained by counting the number of times determined as “present” in the row of “first line of sight (monitor)” for the first predetermined time. Further, the forward tilt posture time FT can be obtained by counting the number of times determined to be “present” in the first predetermined time row in the “forward tilt posture” column. Further, the frequency RF of the conflict can be obtained by dividing the number of times determined as “Yes” in the column of “Consult” by the number of times determined as “Yes” in the column of “Swing”.

また、算出した相手を見る時間ＷＴ、体を倒して近づく時間ＡＴおよび相槌の頻度ＲＦから、数１に示す式に基づいてＡＣ値を算出する。 Further, the AC value is calculated based on the equation shown in Formula 1 from the calculated time WT for watching the opponent, the time AT for approaching the body by defeating, and the frequency RF of the companion.

［数１］
Ｃ１×ＷＴ＋Ｃ２×ＦＴ＋Ｃ３×ＲＦ＝ＡＣ値
Ｃ１，Ｃ２，Ｃ３：定数
そして、このように算出されたＡＣ値を閾値Ａに基づいて判断することで、アクティブ状態またはパッシブ状態を認識することができる。つまり、数１に示す式で算出されたＡＣ値が閾値Ａより大きければアクティブ状態と認識され、ＡＣ値が閾値以下であればパッシブ状態と認識される。なお、定数Ｃ１，Ｃ２およびＣ３の値を変化させることで、各パラメータに重みを付けることができる。 [Equation 1]
C1 × WT + C2 × FT + C3 × RF = AC values C1, C2, C3: constants By determining the AC value calculated in this way based on the threshold A, the active state or the passive state can be recognized. That is, if the AC value calculated by the equation shown in Equation 1 is larger than the threshold value A, the active state is recognized, and if the AC value is equal to or less than the threshold value, the passive state is recognized. Each parameter can be weighted by changing the values of the constants C1, C2, and C3.

次に、Ｔａ／Ｌｉ認識とは、ユーザが話し手となり発話している「トーク：Ｔａｌｋ（Ｔａ）」状態（話手側状態）であるか、相手ユーザの話を傾聴している「リッスン：Ｌｉｓｔｅｎ（Ｌｉ）」状態（聴取側状態）であるかを認識する処理である。そのため、Ｔａ／Ｌｉ認識では、第１所定時間分のユーザの行動のうち、ユーザの発話に基づいて認識する。この実施例では、ユーザの発声時間を発話量とし、この発話量が閾値Ｔに基づいて、トーク状態またはリッスン状態を認識する。 Next, Ta / Li recognition is a “talk: Talk (Ta)” state (speaker side state) in which a user speaks as a speaker, or “listen: Listen” listening to the other user's story. This is a process of recognizing whether the state is (Li) ”(listening side state). Therefore, in Ta / Li recognition, it recognizes based on a user's utterance among a user's action for the 1st predetermined time. In this embodiment, the utterance amount of the user is used as the utterance amount, and the utterance amount recognizes the talk state or the listen state based on the threshold value T.

たとえば、発話量をＴａとする場合に、発話量Ｔａは、行動テーブルにおける「発話」の列で、第１所定時間分の行のうち、「あり」と判定された回数をカウントすることで求めることができる。そして、このように算出された発話量Ｔａを閾値Ｔと比較することで、トーク状態またはパッシブ状態を認識することができる。つまり、発話量Ｔａが閾値Ｔより大きければトーク状態と認識され、発話量Ｔａが閾値Ｔ以下であれば、リッスン状態と認識される。 For example, when the utterance amount is Ta, the utterance amount Ta is obtained by counting the number of times determined to be “present” in the row of the first predetermined time in the “utterance” column in the action table. be able to. Then, the talk state or the passive state can be recognized by comparing the utterance amount Ta thus calculated with the threshold value T. That is, if the utterance amount Ta is larger than the threshold value T, it is recognized as a talk state, and if the utterance amount Ta is equal to or less than the threshold value T, it is recognized as a listening state.

そして、興味対象認識とは、ユーザの興味が「ロボット」、「モニタ」および「アザー」のいずれであるかを認識する処理である。そのため、興味対象認識では、第２所定時間（たとえば、１０秒）分のユーザの行動データのうち、ユーザの顔認識および視線方向に基づいて、モニタカメラ２２側の顔の認識率Ｍと、腹部カメラ１２側の顔の認識率Ｓとを算出することで、ユーザの興味が有る物を認識する。 Interest object recognition is processing for recognizing whether the user's interest is “robot”, “monitor”, or “other”. Therefore, in interest recognition, the recognition rate M of the face on the monitor camera 22 side and the abdomen based on the user's face recognition and line-of-sight direction of the user's behavior data for a second predetermined time (for example, 10 seconds) By calculating the face recognition rate S on the camera 12 side, an object that the user is interested in is recognized.

たとえば、認識率Ｍは、行動テーブルにおける「顔認識（モニタ）」および「視線方向（モニタ）」の列で、第１所定時間分の行のうち、「成功」および「モニタ」と判定された回数をカウントすることで求めることができる。一方、認識率Ｓは、行動テーブルにおける「顔認識（腹部）」および「視線方向」の列で、「成功」および「ロボット」と判定された回数をカウントすることで求めることができる。そして、このように算出された認識率Ｍおよび認識率Ｓと閾値Ｉｎとを比較することでユーザの興味対象を認識することができる。つまり、認識率Ｍおよび認識率Ｓが共に閾値Ｉｎ以下であれば、興味対象が「アザー」と認識される。また、認識率Ｍおよび認識率Ｓが共に閾値Ｉｎより大きく、かつ認識率Ｍが認識率Ｓより大きければ、興味対象が「モニタ」と認識され、認識率Ｓが認識率Ｍより大きければ、興味対象が「ロボット」と認識される。 For example, the recognition rate M is determined as “success” and “monitor” in the first predetermined time row in the columns of “face recognition (monitor)” and “gaze direction (monitor)” in the action table. It can be obtained by counting the number of times. On the other hand, the recognition rate S can be obtained by counting the number of times determined as “success” and “robot” in the columns of “face recognition (abdomen)” and “gaze direction” in the action table. Then, the user's interest can be recognized by comparing the recognition rate M and the recognition rate S calculated in this way with the threshold value In. That is, if both the recognition rate M and the recognition rate S are equal to or less than the threshold value In, the object of interest is recognized as “other”. If the recognition rate M and the recognition rate S are both greater than the threshold value In and the recognition rate M is greater than the recognition rate S, the object of interest is recognized as “monitor”, and if the recognition rate S is greater than the recognition rate M, The target is recognized as a “robot”.

なお、認識率Ｓおよび認識率Ｍは、顔認識の結果だけで求められてもよい。また、顔認識の結果が百分率で示される場合には、１秒毎の認識結果の総積を認識率としてもよい。さらに、興味対象認識では、認識率の代わりに、前傾姿勢の時間割合と視線方向の時間割合と頭部方向の時間割合とから求められる数値に基づいて認識されてもよい。そして、第１所定時間と第２所定時間とは同じ長さであってもよい。 Note that the recognition rate S and the recognition rate M may be obtained only from the result of face recognition. Further, when the result of face recognition is shown as a percentage, the total product of recognition results every second may be used as the recognition rate. Furthermore, in interest recognition, instead of the recognition rate, recognition may be performed based on numerical values obtained from the time ratio of the forward tilt posture, the time ratio in the line-of-sight direction, and the time ratio in the head direction. The first predetermined time and the second predetermined time may be the same length.

このようにして、Ａｃ／Ｐａ認識、Ｔａ／Ｌｉ認識および興味対象認識の処理が行われた後に、ユーザの状態は各認識結果を組み合わせることで決定する。具体的なユーザの状態としては、ユーザの興味対象がモニタ１６である「アクティブ・トーク・モニタ」、「アクティブ・リッスン・モニタ」、「パッシブ・トーク・モニタ」および「パッシブ・リッスン・モニタ」の４種類と、ユーザの興味対象がロボット１０である「アクティブ・トーク・ロボット」、「アクティブ・リッスン・ロボット」、「パッシブ・トーク・ロボット」、「パッシブ・リッスン・ロボット」の４種類と、ユーザの興味対象がロボット１０でもモニタ１６でもない「アクティブ・トーク・アザー」、「アクティブ・リッスン・アザー」、「パッシブ・トーク・アザー」および「パッシブ・リッスン・アザー」の４種類とから成る、合計１２種類である。 Thus, after the process of Ac / Pa recognition, Ta / Li recognition, and interest object recognition is performed, a user's state is determined by combining each recognition result. Specific user states include “active talk monitor”, “active listen monitor”, “passive talk monitor”, and “passive listen monitor” in which the user is interested in the monitor 16. There are four types, “Active Talk Robot”, “Active Listen Robot”, “Passive Talk Robot”, and “Passive Listen Robot” whose users are interested in the robot 10, and the user A total of 4 types of "active talk other", "active listen other", "passive talk other", and "passive listen other" that are not interested in robot 10 or monitor 16 There are 12 types.

ここで、一定時間が１秒、第１所定時間を３０秒とした場合に、「アクティブ・トーク・モニタ」、「アクティブ・リッスン・モニタ」、「パッシブ・トーク・モニタ」および「パッシブ・リッスン・モニタ」と認識されるユーザの状態の一例について説明する。なお、「ロボット」および「アザー」におけるユーザの状態については、ユーザの興味対象が異なるだけで「モニタ」の場合と同じであるため、詳細な説明は省略する。 Here, when the predetermined time is 1 second and the first predetermined time is 30 seconds, "active talk monitor", "active listen monitor", "passive talk monitor" and "passive listen monitor" An example of the state of the user recognized as “monitor” will be described. Note that the user states of “robot” and “other” are the same as in the case of “monitor” except that the user's interests are different, and thus detailed description thereof is omitted.

まず、ユーザの興味対象が「モニタ」と認識されている状態で、発話の継続時間が１５秒（５０％）以上であり、視線方向の判定において「モニタ」が３０回のうち１８回（６０％）以上判定され、さらに頷きが２回以上あれば「アクティブ・トーク・モニタ」と認識される。 First, in a state where the user's interest is recognized as “monitor”, the duration of the utterance is 15 seconds (50%) or more, and “monitor” is 18 times out of 30 times (60 %) Or more is judged, and if there are two or more whisperings, it is recognized as “active talk monitor”.

次に、ユーザの興味対象が「モニタ」と認識されている状態で、発話の継続時間が９秒（３０％）以下であり、視線方向の判定において「モニタ」が３０回のうち１５回（５０％）以上判定され、さらに前傾姿勢の判定において「あり」が３０回のうち２１回（７０％）以上判定されていれば、「アクティブ・リッスン・モニタ」と認識される。 Next, in a state where the user's interest is recognized as “monitor”, the duration of the utterance is 9 seconds (30%) or less, and “monitor” is determined 15 times out of 30 times in the determination of the gaze direction ( 50%) or more, and if “Yes” is determined 21 times out of 30 (70%) or more in the determination of the forward tilt posture, it is recognized as “active listen monitor”.

次に、ユーザの興味対象が「モニタ」と認識されている状態で、発話の継続時間が１２秒（４０％）以下であり、視線方向の判定において「モニタ」が３０回のうち６回（２０％）以上判定され、さらに頷きが一度のなければ、「パッシブ・トーク・モニタ」と認識される。 Next, in a state where the user's interest is recognized as “monitor”, the duration of the utterance is 12 seconds (40%) or less, and “monitor” is 6 times out of 30 times in the determination of the line-of-sight direction ( 20%) or more, and if there is no further whispering, it is recognized as “passive talk monitor”.

そして、ユーザの興味対象が「モニタ」と認識されている状態で、発話の継続時間が９秒（３０％）以下であり、頷きが一度も無く、さらに前傾姿勢の判定において「あり」が３０回のうち６回（２０％）以下であれば、「パッシブ・リッスン・モニタ」と認識される。 Then, in a state where the user's interest is recognized as “monitor”, the duration of the utterance is 9 seconds (30%) or less, there is no whispering, and there is “Yes” in the determination of the forward leaning posture. If it is 6 times (20%) or less out of 30 times, it is recognized as a “passive listen monitor”.

なお、このようにして認識されたユーザの状態は、状態データ３５４（図１０）としてメモリ６４に記憶されると共に、サーバ２４に送信される。また、ユーザの状態の認識には、SVM(Support Vector Machine)が利用されてもよい。 The user status recognized in this way is stored in the memory 64 as status data 354 (FIG. 10) and transmitted to the server 24. Further, a support vector machine (SVM) may be used for recognizing the user state.

図８にはサーバ２４の電気的な構成を示すブロック図が示される。サーバ２４は、プロセッサ３８，６４と同様に、マイクロコンピュータ或いはCPUとも呼ばれる、プロセッサ８０が内蔵されている。また、プロセッサ８０は、バス８２を介して、メモリ８４、第１ロボット情報ＤＢ８６、第２ロボット情報ＤＢ８８および通信ＬＡＮボード９０に接続されている。 FIG. 8 is a block diagram showing the electrical configuration of the server 24. Similarly to the processors 38 and 64, the server 24 includes a processor 80 called a microcomputer or CPU. The processor 80 is connected to the memory 84, the first robot information DB 86, the second robot information DB 88, and the communication LAN board 90 via the bus 82.

メモリ８４は、図示しないROMやRAMが組み込まれており、ROMには主として、サーバ２４とＰＣ１４ａ，１４ｂなどとのデータ通信を行うためのプログラムなどが予め記憶されている。また、RAMは、一時記憶メモリとして用いられるとともに、ワーキングメモリとして利用される。 The memory 84 incorporates a ROM and a RAM (not shown), and the ROM mainly stores a program for performing data communication between the server 24 and the PCs 14a and 14b and the like. The RAM is used as a temporary storage memory and a working memory.

第１ロボット情報ＤＢ８６は、ＰＣ１４ａから送信されるユーザＡの行動データおよび状態データを蓄積するためのデータベースである。また、第２ロボットＤＢ８８は、ＰＣ１４ｂから送信されるユーザＢの行動データおよび状態データを蓄積するためのデータベースである。そして、第１ロボット情報ＤＢ８６および第２ロボット情報ＤＢ８８は、HDDやSSDのような記憶媒体を用いて構成される。 The first robot information DB 86 is a database for accumulating user A's action data and state data transmitted from the PC 14a. The second robot DB 88 is a database for accumulating user B's action data and state data transmitted from the PC 14b. The first robot information DB 86 and the second robot information DB 88 are configured using a storage medium such as an HDD or an SSD.

通信ＬＡＮボード９０は、ＰＣ１４の通信ＬＡＮボード７６と同様に、たとえばDSPで構成され、プロセッサ８０から与えられた送信データを無線通信装置９２に与える。無線通信装置９２は送信データを、ネットワーク２００を介して外部のコンピュータ（ＰＣ１４ａ，１４ｂ）に送信する。また、通信ＬＡＮボード９０は、無線通信装置９２を介してデータを受信し、受信データをプロセッサ８０に与える。 Similar to the communication LAN board 76 of the PC 14, the communication LAN board 90 is configured by a DSP, for example, and provides transmission data given from the processor 80 to the wireless communication device 92. The wireless communication device 92 transmits the transmission data to an external computer (PC 14a, 14b) via the network 200. The communication LAN board 90 receives data via the wireless communication device 92 and provides the received data to the processor 80.

たとえば、受信データはＰＣ１４ａから送信されるユーザＡの行動データであり、プロセッサ８０はユーザＡの行動データを第１ロボット情報ＤＢ８６に保存する。さらに、受信データとして、ＰＣ１４ｂからユーザＡの行動データ取得要求がプロセッサ８０に与えられると、プロセッサ８０は、ユーザＡの行動データを送信データとして、通信ＬＡＮボード９０に与える。 For example, the received data is the action data of the user A transmitted from the PC 14a, and the processor 80 stores the action data of the user A in the first robot information DB 86. Further, when the action data acquisition request of the user A is given to the processor 80 from the PC 14b as the received data, the processor 80 gives the action data of the user A to the communication LAN board 90 as transmission data.

ここで、本願発明の傾聴対話持続システムでは、ユーザの行動データおよび状態データや、相手ユーザの行動データや状態データに基づいて、２人の対話が持続するように、ロボット１０の動作を制御する。そして、ロボット１０は、ユーザＡとユーザＢとの対話に対して、「疑似傾聴動作」、「発話制御動作」および「注意の引きつけの動作」の３種類の動作を行い、対話を持続させる。 Here, in the listening dialogue sustaining system of the present invention, the operation of the robot 10 is controlled based on the user's behavior data and state data and the other user's behavior data and state data so that the conversation between the two people is sustained. . Then, the robot 10 performs three types of operations, “pseudo-listening operation”, “speech control operation”, and “attention attracting operation”, with respect to the dialogue between the user A and the user B, and continues the dialogue.

また、疑似傾聴動作とは、ユーザＡとユーザＢとが積極的に対話している場合には、どちらか一方の発話を傾聴しているかのように振る舞う動作のことである。さらに、発話制御動作とは、どちらかのユーザが一方的に話している場合に、２人の発話のバランスを取るため、ユーザを見ることで発話を抑制したり、ユーザに話しかけたりすることで発話を促進したりする動作のことである。そして、注意の誘導や引きつけの動作は、ユーザが対話に対して集中していない場合に、ユーザに話しかけることでユーザの注意を引きつける動作の事である。なお、具体的なロボット１０の動作については、図２７−図４１に示すフローチャートを用いて説明する。 The pseudo-listening operation is an operation that behaves as if listening to one of the utterances when the user A and the user B are actively interacting with each other. Furthermore, when one user is speaking unilaterally, the utterance control action is to suppress the utterance by looking at the user or to talk to the user in order to balance the two utterances. It is an action that promotes speech. The attention inducing and attracting actions are actions to attract the user's attention by talking to the user when the user is not concentrated on the dialogue. A specific operation of the robot 10 will be described with reference to flowcharts shown in FIGS.

図９は、図５に示すＰＣ１４におけるメモリ６８のメモリマップ３００の一例を示す図解図である。図９に示すようにメモリ６８はプログラム記憶領域３０２およびデータ記憶領域３０４を含む。プログラム記憶領域３０２には、ＰＣ１４を動作させるためのプログラムとして、データ通信プログラム３１２、状況認識プログラム３１４、ロボット制御プログラム３１６、カメラ制御プログラム３１８、発声時間計測プログラム３２０および乱数生成プログラム３２２などが記憶される。 FIG. 9 is an illustrative view showing one example of a memory map 300 of the memory 68 in the PC 14 shown in FIG. As shown in FIG. 9, the memory 68 includes a program storage area 302 and a data storage area 304. The program storage area 302 stores a data communication program 312, a situation recognition program 314, a robot control program 316, a camera control program 318, an utterance time measurement program 320, and a random number generation program 322 as programs for operating the PC 14. The

データ通信プログラム３１２は、サーバ２４とデータ通信を行うためのプログラムである。状況認識プログラム３１４は、ユーザの行動を判定し、状態を認識するためのプログラムである。ロボット制御プログラム３１６は、ロボット１０の動作を決定するためのプログラムである。カメラ制御プログラム３１８は、相手に送信するカメラ画像を決定するためのプログラムである。発声時間計測プログラム３２０は、ユーザＡおよびユーザＢの発声時間を計測するためのプログラムである。乱数生成プログラム３２２は、ロボット１０の動作をランダムに決定する際に実行される処理である。 The data communication program 312 is a program for performing data communication with the server 24. The situation recognition program 314 is a program for determining a user's action and recognizing a state. The robot control program 316 is a program for determining the operation of the robot 10. The camera control program 318 is a program for determining a camera image to be transmitted to the other party. The utterance time measurement program 320 is a program for measuring the utterance time of the user A and the user B. The random number generation program 322 is a process executed when determining the operation of the robot 10 at random.

なお、図示は省略するが、ＰＣ１４を動作させるためのプログラムとしては、テレビで話機能を実現するためのプログラムなどを含む。 Although illustration is omitted, the program for operating the PC 14 includes a program for realizing a speech function on a television.

また、図１０を参照して、データ記憶領域３０４には、時刻バッファ３３０、モニタカメラバッファ３３２、腹部カメラバッファ３３４、音声バッファ３３６、判定結果バッファ３３８、顔認識結果バッファ３４０、興味対象認識結果バッファ３４２、データ通信バッファ３４４、相手行動データバッファ３４６、状態データバッファ３４８および乱数バッファ３５０が設けられる。さらに、データ記憶領域３０４には、行動テーブルデータ３５２および状態データ３５４が記憶されるとともに、Ａｃ／Ｐａフラグ３５６、Ｔａ／Ｌｉフラグ３５８、状態カウンタ３６０および発声カウンタ３６２がさらに設けられる。 Referring to FIG. 10, the data storage area 304 includes a time buffer 330, a monitor camera buffer 332, an abdominal camera buffer 334, an audio buffer 336, a determination result buffer 338, a face recognition result buffer 340, and an interest recognition result buffer. 342, a data communication buffer 344, a partner action data buffer 346, a state data buffer 348, and a random number buffer 350 are provided. Further, the action table data 352 and the state data 354 are stored in the data storage area 304, and an Ac / Pa flag 356, a Ta / Li flag 358, a state counter 360, and an utterance counter 362 are further provided.

時刻バッファ３３０は、ＲＴＣ６４ａが出力する日時情報が一時的に記憶されるバッファである。モニタカメラバッファ３３２は、モニタカメラ２２によって撮影された画像が一時的に記憶されるバッファである。腹部カメラバッファ３３４は、腹部カメラ１２によって撮影された画像が一時的に記憶されるバッファである。音声バッファ３３６は、マイク２０によって集音された音声が一時的に記憶されるバッファである。 The time buffer 330 is a buffer in which date / time information output from the RTC 64a is temporarily stored. The monitor camera buffer 332 is a buffer in which an image taken by the monitor camera 22 is temporarily stored. The abdominal camera buffer 334 is a buffer in which an image taken by the abdominal camera 12 is temporarily stored. The audio buffer 336 is a buffer in which audio collected by the microphone 20 is temporarily stored.

判定結果バッファ３３８は、ユーザの前傾姿勢の有無を判定する姿勢判定、ユーザの発話の有無を判定する発話判定、ユーザの頭部方向を判定する頭部方向判定、ユーザの視線方向を判定する視線方向判定、ユーザの頷きを判定する頷き判定およびユーザの相槌を判定する相槌判定の各判定結果を一時的に記憶するためのバッファである。顔認識結果バッファ３４０は、モニタカメラ２２および腹部カメラ１２によって撮影された画像に対して行われる顔認識の結果が一時的に記憶されるバッファである。興味対象認識結果バッファ３４２は、ユーザが興味を持っていると判定された結果が一時的に記憶されるバッファであり、たとえば「ロボット」、「モニタ」および「アザー」を示すデータが一時的に記憶される。 The determination result buffer 338 determines the presence / absence of the user's forward tilt posture, determines the user's utterance, determines the user's head direction, determines the user's head direction, and determines the user's gaze direction. It is a buffer for temporarily storing each determination result of gaze direction determination, whirl determination for determining a user's whirl, and conflict determination for determining a user's conflict. The face recognition result buffer 340 is a buffer that temporarily stores the results of face recognition performed on images taken by the monitor camera 22 and the abdominal camera 12. The interest recognition result buffer 342 is a buffer for temporarily storing a result determined that the user is interested. For example, data indicating “robot”, “monitor”, and “other” is temporarily stored. Remembered.

データ通信バッファ３４４は、サーバ２４とのデータ通信によって得られた相手の行動データや、状態データなどが一時的に記憶されるバッファである。相手動作データバッファ３４６は、相手の行動データを一時的に記憶するためのバッファである。状態データバッファ３４８は、相手の状態データや、こちら側のユーザの状態データを一時的に記憶するためのバッファである。乱数データバッファ３５０は、乱数生成プログラム３２２によって生成された乱数が一時的に記憶されるバッファである。 The data communication buffer 344 is a buffer that temporarily stores the other party's action data, state data, and the like obtained by data communication with the server 24. The opponent action data buffer 346 is a buffer for temporarily storing the action data of the opponent. The status data buffer 348 is a buffer for temporarily storing the status data of the other party and the status data of the user on this side. The random number data buffer 350 is a buffer in which random numbers generated by the random number generation program 322 are temporarily stored.

行動テーブルデータ３５２は、図６に示す行動テーブルであり、一定時間毎に最新の行動データが追記される。状態データ３５４は、第１所定時間分の行動データから認識されるユーザの状態を示すデータであり、たとえば「アクティブ・トーク・ロボット」を示す文字列で構成される。 The action table data 352 is the action table shown in FIG. 6, and the latest action data is added every certain time. The state data 354 is data indicating the state of the user recognized from the action data for the first predetermined time, and is composed of, for example, a character string indicating “active talk robot”.

Ａｃ／Ｐａフラグ３５６は、Ａｃ／Ｐａ認識結果を示すフラグである。たとえばＡｃ／Ｐａフラグ３５６は１ビットのレジスタで構成される。Ａｃ／Ｐａフラグ３５６がオン（成立）されると、レジスタにはデータ値「１」が設定される。一方、Ａｃ／Ｐａフラグ３５６がオフ（不成立）されると、レジスタにはデータ値「０」が設定される。また、Ａｃ／Ｐａフラグ３５６は、アクティブ状態と認識されるとオンになり、パッシブ状態と認識されるとオフになる。Ｔａ／Ｌｉフラグ３５８は、Ｔａ／Ｌｉ認識結果を示すフラグである。つまり、Ｔａ／Ｌｉフラグ３５８はトーク状態と認識されるとオンになり、リッスン状態と認識されるとオフになる。なお、Ｔａ／Ｌｉフラグ３５８の構成については、Ａｃ／Ｐａフラグ３５６と同様であるため、詳細な説明は省略する。 The Ac / Pa flag 356 is a flag indicating an Ac / Pa recognition result. For example, the Ac / Pa flag 356 is composed of a 1-bit register. When the Ac / Pa flag 356 is turned on (established), a data value “1” is set in the register. On the other hand, when the Ac / Pa flag 356 is turned off (not established), a data value “0” is set in the register. Further, the Ac / Pa flag 356 is turned on when recognized as an active state, and is turned off when recognized as a passive state. The Ta / Li flag 358 is a flag indicating a Ta / Li recognition result. That is, the Ta / Li flag 358 is turned on when recognized as a talk state, and turned off when recognized as a listen state. The configuration of the Ta / Li flag 358 is the same as that of the Ac / Pa flag 356, and thus detailed description thereof is omitted.

状態カウンタ３６０は、ユーザの状態を認識する際に、第１所定時間分の行動データを取得するために用いられるカウンタである。たとえば、状態カウンタ３６０は、ＰＣ１４の電源がオンにされるとカウントされ始め、第１所定時間分の行動データが取得される毎にリセットされる。発声カウンタ３６２は、発声時間計測プログラム３２０によって計測される発声時間をカウントするためのカウンタである。 The state counter 360 is a counter used for acquiring action data for a first predetermined time when recognizing the state of the user. For example, the state counter 360 starts counting when the power of the PC 14 is turned on, and is reset every time action data for a first predetermined time is acquired. The utterance counter 362 is a counter for counting the utterance time measured by the utterance time measurement program 320.

なお、図示は省略するが、データ記憶領域３０４には、各判定に利用されるテンプレートのデータや、様々な計算の結果を一時的に格納するバッファなどが設けられると共に、ＰＣ１４の動作に必要な他のカウンタやフラグなども設けられる。 Although not shown, the data storage area 304 is provided with a template for use in each determination, a buffer for temporarily storing various calculation results, and the like, and is necessary for the operation of the PC 14. Other counters and flags are also provided.

図１１には状況認識プログラム３１４を構成するプログラムが示される。図１１を参照して、状況認識プログラム３１４は、画像／音声取得プログラム３１４ａ、視線判定プログラム３１４ｂ、発話判定プログラム３１４ｃ、頭部方向判定プログラム３１４ｄ、視線方向判定プログラム３１４ｅ、顔認識プログラム３１４ｆ、頷き判定プログラム３１４ｇ、相槌判定プログラム３１４ｈ、同期プログラム３１４ｊ、状態認識プログラム３１４ｋ、Ａｃ／Ｐａ認識プログラム３１４ｍ、Ｔａ／Ｌｉ認識プログラム３１４ｎおよび興味対象認識プログラム３１４ｐから構成される。 FIG. 11 shows programs constituting the situation recognition program 314. Referring to FIG. 11, the situation recognition program 314 includes an image / sound acquisition program 314a, a gaze determination program 314b, an utterance determination program 314c, a head direction determination program 314d, a gaze direction determination program 314e, a face recognition program 314f, and a whisper determination. The program 314g, the conflict determination program 314h, the synchronization program 314j, the state recognition program 314k, the Ac / Pa recognition program 314m, the Ta / Li recognition program 314n, and the interest recognition program 314p.

画像／音声取得プログラム３１４ａは、モニタカメラ２２および腹部カメラ１２によって撮影された画像と、マイク２０によって集音された音声とを、バッファに取り込むためのプログラムである。姿勢判定プログラム３１４ｂは、ユーザの姿勢が前傾姿勢であるか否かを判定するためのプログラムである。発話判定プログラム３１４ｃは、ユーザが発話しているか否かを判定するためのプログラムである。頭部方向判定プログラム３１４ｄは、ユーザの頭部方向を判定するためのプログラムである。視線方向判定プログラム３１４ｅは、ユーザの視線方向を判定するためのプログラムである。顔認識プログラム３１４ｆは、モニタカメラ２２および腹部カメラ１２によって撮影された画像における顔領域を認識するためのプログラムである。頷き判定プログラム３１４ｇは、ユーザが頷いたか否かを判定するためのプログラムである。相槌判定プログラム３１４ｈは、ユーザが相槌をしたか否かを判定するためのプログラムである。同期プログラム３１４ｊは、姿勢判定結果、発話判定結果、頭部方向判定結果、視線方向判定結果、顔認識結果、頷き判定結果および相槌判定結果を同期して、行動データとするためのプログラムである。 The image / sound acquisition program 314a is a program for taking in an image captured by the monitor camera 22 and the abdominal camera 12 and the sound collected by the microphone 20 into a buffer. The posture determination program 314b is a program for determining whether or not the user's posture is a forward leaning posture. The speech determination program 314c is a program for determining whether or not the user is speaking. The head direction determination program 314d is a program for determining the user's head direction. The gaze direction determination program 314e is a program for determining the gaze direction of the user. The face recognition program 314f is a program for recognizing a face area in images taken by the monitor camera 22 and the abdominal camera 12. The whisper determination program 314g is a program for determining whether or not the user has whispered. The conflict determination program 314h is a program for determining whether or not the user has a conflict. The synchronization program 314j is a program for synchronizing action determination data, speech determination results, head direction determination results, gaze direction determination results, face recognition results, whisper determination results, and conflict determination results into action data.

状態認識プログラム３１４ｋは、ユーザの状態を第１所定時間毎に認識するためのプログラムである。Ａｃ／Ｐａ認識プログラム３１４ｍは、ユーザがアクティブ状態かパッシブ状態かを認識するためのプログラムである。Ｔａ／Ｌｉ認識プログラム３１４ｎは、ユーザがトーク状態かリッスン状態かを認識するためのプログラムである。興味対象認識プログラム３１４ｐは、ユーザの興味対対象を認識するためのプログラムである。 The state recognition program 314k is a program for recognizing the user state every first predetermined time. The Ac / Pa recognition program 314m is a program for recognizing whether the user is in an active state or a passive state. The Ta / Li recognition program 314n is a program for recognizing whether the user is in a talk state or a listen state. The interest object recognition program 314p is a program for recognizing a user's interest versus object.

図１２にはロボット制御プログラム３１６を構成するプログラムが示される。図１２を参照して、全体プログラム３１６ａ、アクティブトークプログラム３１６ｂ、アクティブリッスンプログラム３１６ｃ、非アクティブプログラム３１６ｄ、アザープログラム３１６ｅ、発話継続プログラム３１６ｆ、発話抑制プログラム３１６ｇ、発話促進プログラム３１６ｈ、注意引きつけプログラム３１６ｊ、傍参与者的疑似傾聴プログラム３１６ｋ、第１積極的疑似傾聴プログラム３１６ｍ、ユーザ発話抑制プログラム３１６ｍ、第２積極的疑似傾聴プログラム３１６ｐ、注意誘導プログラム３１６ｑおよびユーザ発話促進プログラム３１６ｒから構成される。 FIG. 12 shows programs constituting the robot control program 316. Referring to FIG. 12, overall program 316a, active talk program 316b, active listen program 316c, inactive program 316d, other program 316e, speech continuation program 316f, speech suppression program 316g, speech promotion program 316h, attention attracting program 316j, It consists of a by-participant pseudo-listening program 316k, a first active pseudo-listening program 316m, a user utterance suppression program 316m, a second active pseudo-listening program 316p, an attention guidance program 316q, and a user utterance promotion program 316r.

全体プログラム３１６ａは、メインルーチンとも呼ばれ、状態データに応じて、対話が持続するようにロボット１０を制御するための全体的な処理を行うプログラムである。アクティブトークプログラム３１６ｂは、状態データが「アクティブ・トーク・ロボット」である場合に実行されるプログラムである。アクティブリッスンプログラム３１６ｃは、状態データが「アクティブ・リッスン・ロボット」である場合に実行されるプログラムである。非アクティブプログラム３１６ｄは、ユーザがパッシブ状態であるか、興味対象が「ロボット」と認識されている場合に実行されるプログラムである。アザープログラム３１６ｅは、ユーザの興味対象が「アザー」である場合に実行されるプログラムである。 The overall program 316a is also called a main routine, and is a program that performs overall processing for controlling the robot 10 so as to maintain a dialogue according to state data. The active talk program 316b is a program that is executed when the state data is “active talk robot”. The active listen program 316c is a program executed when the state data is “active listen robot”. The inactive program 316d is a program that is executed when the user is in a passive state or when the object of interest is recognized as a “robot”. The other program 316e is a program that is executed when the user's interest is “other”.

発話継続プログラム３１６ｆは、２人のユーザの対話が継続されるようにロボット１０を動作させるためのプログラムである。発話抑制プログラム３１６ｇは、ユーザが一方的に発話している場合に発話を抑制するようにロボット１０を動作させるためのプログラムである。発話促進プログラム３１６ｈは、ユーザが発話をせずに、相手ユーザの発話を一方的に傾聴している場合に、発話を促すようロボット１０を動作させるログラムである。注意引きつけプログラム３１６ｊは、対話に興味を失っているユーザの注意を引けつけるようにロボット１０動作させるためのプログラムである。 The utterance continuation program 316f is a program for operating the robot 10 so that the dialogue between the two users is continued. The speech suppression program 316g is a program for operating the robot 10 so as to suppress speech when the user is speaking unilaterally. The utterance promotion program 316h is a program for operating the robot 10 to encourage utterance when the user unilaterally listens to the other user's utterance without speaking. The attention attracting program 316j is a program for operating the robot 10 so as to attract the attention of a user who has lost interest in the dialogue.

傍参与者的疑似傾聴プログラム３１６ｋは、２人のユーザの対話が継続している場合に、どちらか一方のユーザの話を傾聴しているかのように、ロボット１０を動作させるためのプログラムである。なお、「傍参与者」とは、会話に参加しているが「話し手」でも「聴き手」でもない人物を示す。 The bystander-like pseudo-listening program 316k is a program for operating the robot 10 as if listening to the story of one of the users when the conversation between the two users continues. . The “collateral participant” indicates a person who participates in a conversation but is neither a “speaker” nor a “listener”.

第１積極的疑似傾聴プログラム３１６ｍは、相手ユーザの話を傾聴しているかのように、ロボット１０を動作させるためのプログラムである。ユーザ発話抑制プログラム３１６ｎは、たとえばロボット１０ａの傍に居るユーザＡの発話を抑制するように、ロボット１０ａを動作させるためのプログラムである。第２積極的疑似傾聴プログラム３１６ｐは、たとえばロボット１０ａの傍に居るユーザＡの発話を傾聴しているかのように、ロボット１０ａを動作させるためのプログラムである。注意誘導プログラム３１６ｑは、ユーザの注意を誘導するように、ロボット１０を動作させるためのプログラムである。ユーザ発話促進プログラム３１６ｒは、たとえばロボット１０ａの傍に居るユーザＡの発話を促すように、ロボット１０を動作させるためのプログラムである。 The first active pseudo-listening program 316m is a program for operating the robot 10 as if listening to the other user's story. The user utterance suppression program 316n is a program for operating the robot 10a so as to suppress, for example, the utterance of the user A who is near the robot 10a. The second active pseudo-listening program 316p is a program for operating the robot 10a, for example, as if listening to the utterance of the user A near the robot 10a. The attention guidance program 316q is a program for operating the robot 10 so as to guide the user's attention. The user utterance promotion program 316r is a program for operating the robot 10 so as to urge the user A who is in the vicinity of the robot 10a, for example.

以下、ＰＣ１４によって実行される本願発明のフロー図について説明する。また、図１３−図２５のフロー図は状況認識プログラム３１４を構成する各プログラムの処理を示し、図２６は発声時間計測プログラム３２０による処理を示し、図２７−図４１はロボット制御プログラム３１６を構成する各プログラムの処理を示し、図４２はカメラ制御プログラム３１８による処理を示す。 Hereinafter, a flowchart of the present invention executed by the PC 14 will be described. 13 to 25 show the processing of each program constituting the situation recognition program 314, FIG. 26 shows the processing by the utterance time measurement program 320, and FIGS. 27 to 41 show the robot control program 316. FIG. 42 shows processing by the camera control program 318.

図１３には画像／音声取得プログラム３１４ａの処理を示すフロー図が示される。たとえば、ＰＣ１４のプロセッサ６４は、ユーザによってＰＣ１４の電源がオンにされると、ステップＳ１で腹部カメラ１２による画像データを取得する。つまり、ロボット１０から入力される映像信号から画像データを取得し、腹部カメラバッファ３３４に一旦記憶させる。続いて、ステップＳ３では、モニタカメラ２２による画像データを取得する。つまり、モニタカメラ２２から入力される映像信号から画像データを取得し、モニタカメラバッファ３３２に一旦記憶させる。続いて、ステップＳ５では、音声データを取得し、ステップＳ１に戻る。つまり、マイク２０によって集音された音声から音声データを抽出し、音声バッファ３３６に一旦記憶させる。なお、ステップＳ１−Ｓ５の処理は、約１秒毎に繰り返される。 FIG. 13 is a flowchart showing the processing of the image / sound acquisition program 314a. For example, when the power of the PC 14 is turned on by the user, the processor 64 of the PC 14 acquires image data from the abdominal camera 12 in step S1. That is, image data is acquired from the video signal input from the robot 10 and is temporarily stored in the abdominal camera buffer 334. Subsequently, in step S3, image data obtained by the monitor camera 22 is acquired. That is, image data is acquired from the video signal input from the monitor camera 22 and temporarily stored in the monitor camera buffer 332. Subsequently, in step S5, audio data is acquired, and the process returns to step S1. That is, voice data is extracted from the voice collected by the microphone 20 and temporarily stored in the voice buffer 336. In addition, the process of step S1-S5 is repeated about every 1 second.

図１４には姿勢判定プログラム３１４ｂの処理を示すフロー図が示される。プロセッサ６４は、ステップＳ７で画像データを取得したか否かを判断する。つまり、モニタカメラバッファ３３２および腹部カメラバッファ３３４に新たな画像データが記憶されたか否かを判断する。ステップＳ７で“ＮＯ”であれば、つまり画像データが取得されていなければステップＳ７の処理を繰り返し実行する。一方、ステップＳ７で“ＹＥＳ”であれば、つまり画像データが取得されていれば、ステップＳ９でユーザの姿勢を判定する。たとえば、モニタカメラバッファ３３２および腹部カメラバッファ３３４に記憶される画像データに対してユーザの前傾姿勢の有無を判定する。そして、どちらの画像でも前傾姿勢が「あり」と判定された場合に、姿勢判定結果を「あり」とする。なお、ステップＳ９の処理を実行するプロセッサ６４は姿勢判定手段として機能する。 FIG. 14 is a flowchart showing the processing of the posture determination program 314b. The processor 64 determines whether image data has been acquired in step S7. That is, it is determined whether new image data is stored in the monitor camera buffer 332 and the abdomen camera buffer 334. If “NO” in the step S7, that is, if the image data is not acquired, the process of the step S7 is repeatedly executed. On the other hand, if “YES” in the step S7, that is, if image data is acquired, the posture of the user is determined in a step S9. For example, the presence or absence of the user's forward leaning posture is determined with respect to the image data stored in the monitor camera buffer 332 and the abdomen camera buffer 334. If the forward tilt posture is determined to be “present” in both images, the posture determination result is determined to be “present”. Note that the processor 64 that executes the process of step S9 functions as an attitude determination unit.

続いて、ステップＳ１１では、現在時刻を取得する。つまり、時刻バッファ３３０に記憶される日時情報を取得する。続いて、ステップＳ１３では、姿勢の判定結果に現在時刻を対応付ける。つまり、複数の判定結果を同期させるために、現在時刻を対応付ける。そして、現在時刻が対応付けられた姿勢の判定結果は判定結果バッファ３３８に一時的に記憶される。 Subsequently, in step S11, the current time is acquired. That is, the date information stored in the time buffer 330 is acquired. In step S13, the current time is associated with the posture determination result. That is, the current time is associated in order to synchronize a plurality of determination results. The posture determination result associated with the current time is temporarily stored in the determination result buffer 338.

なお、他の実施例では、モニタカメラ２２または腹部カメラ１２のどちらか一方の画像だけで姿勢判定を行ってもよい。また、図１５−図２０に示す他の判定処理でも、ステップＳ１１と同様に日時情報を取得し、ステップＳ１３と同様に日時情報を対応付ける処理が存在するが、処理内容は全て同じであるため、他のフロー図では詳細な説明は省略する。 In another embodiment, posture determination may be performed using only one image of the monitor camera 22 or the abdominal camera 12. Also, in the other determination processes shown in FIGS. 15 to 20, there is a process for acquiring date / time information in the same manner as in step S 11 and associating the date / time information as in step S 13. Detailed description is omitted in other flowcharts.

図１５には発話判定プログラム３１４ｃの処理を示すフロー図が示される。プロセッサ６４は、ステップＳ１５で音声データを取得したか否かを判断する。つまり、音声バッファ３３６に新たな音声データが記憶されたか否かを判断する。ステップＳ１５で“ＮＯ”であれば、つまり音声データが取得されていなければステップＳ１５の処理を繰り返し実行する。一方、ステップＳ１５で“ＹＥＳ”であれば、つまり音声データが取得されていればステップＳ１７でユーザの発話を判定する。たとえば、音声バッファ３３６に記憶される音声データの音声レベルが一定値以上であるか否かを判定する。ステップＳ１９では現在時刻を取得し、ステップＳ２１では発話の判定結果に現在時刻を対応付ける。そして、ステップＳ１９の処理が終了するとステップＳ１５に戻る。なお、ステップＳ１９では、現在時刻が対応付けられた発話の判定結果は、判定結果バッファ３３８に一旦記憶される。なお、ステップＳ１７の処理を実行するプロセッサ６４は発話判定手段として機能する。 FIG. 15 is a flowchart showing the processing of the utterance determination program 314c. The processor 64 determines whether or not audio data has been acquired in step S15. That is, it is determined whether or not new audio data is stored in the audio buffer 336. If “NO” in the step S15, that is, if audio data is not acquired, the process of the step S15 is repeatedly executed. On the other hand, if “YES” in the step S15, that is, if voice data is acquired, the user's utterance is determined in a step S17. For example, it is determined whether or not the sound level of the sound data stored in the sound buffer 336 is greater than or equal to a certain value. In step S19, the current time is acquired, and in step S21, the current time is associated with the speech determination result. Then, when the process of step S19 ends, the process returns to step S15. In step S 19, the determination result of the utterance associated with the current time is temporarily stored in the determination result buffer 338. The processor 64 that executes the process of step S17 functions as an utterance determination unit.

図１６には頭部方向判定プログラム３１４ｄの処理を示すフロー図が示される。プロセッサ６４は、ステップＳ２３で画像データを取得したか否かを判断する。つまり、モニタカメラバッファ３３２および腹部カメラバッファ３３４に新たな画像データが記憶されたか否かを判断する。ステップＳ２３で“ＮＯ”であれば、つまり画像データが新たに取得されていなければステップＳ２３の処理を繰り返す。一方、ステップＳ２３で“ＹＥＳ”であれば、つまり新たな画像データが取得されていればステップＳ２５でユーザの頭部方向を判定する。たとえば、モニタカメラバッファ３３２に記憶される画像に対してテンプレートマッチング処理を実行することで、「モニタ」、「ロボット」および「アザー」を判定する。さらに、腹部カメラバッファ３３４に記憶される画像に対しても同様に判定する。なお、ステップＳ２５の処理を実行するプロセッサ６４は頭部方向判定手段として機能する。 FIG. 16 is a flowchart showing the processing of the head direction determination program 314d. The processor 64 determines whether or not image data has been acquired in step S23. That is, it is determined whether new image data is stored in the monitor camera buffer 332 and the abdomen camera buffer 334. If “NO” in the step S23, that is, if the image data is not newly acquired, the process in the step S23 is repeated. On the other hand, if “YES” in the step S23, that is, if new image data is acquired, the head direction of the user is determined in a step S25. For example, “monitor”, “robot”, and “other” are determined by executing template matching processing on the image stored in the monitor camera buffer 332. Further, the same determination is made for the image stored in the abdominal camera buffer 334. The processor 64 that executes the process of step S25 functions as a head direction determination unit.

続いて、ステップＳ２７では現在時刻を取得し、ステップＳ２９では頭部方向の判定結果に現在時刻を対応付ける。そして、ステップＳ２９の処理が終了するとステップＳ２３に戻る。なお、ステップＳ２９では、モニタカメラ２２によって撮影された画像の頭部方向の判定結果と、腹部カメラ１２によって撮影された画像の頭部方向の判定結果とをそれぞれ判定結果バッファ３３８に記憶させる。 Subsequently, in step S27, the current time is acquired, and in step S29, the current time is associated with the determination result of the head direction. Then, when the process of step S29 ends, the process returns to step S23. In step S29, the determination result of the head direction of the image captured by the monitor camera 22 and the determination result of the head direction of the image captured by the abdominal camera 12 are stored in the determination result buffer 338, respectively.

図１７には視線方向判定結果プログラム３１４ｅの処理を示すフロー図が示される。プロセッサ６４は、ステップＳ３１で画像データを取得したか否かを判断する。つまり、モニタカメラバッファ３３２および腹部カメラバッファ３３４に新しい画像が記憶されたか否かを判断する。ステップ３１で“ＮＯ”であれば、つまり新しい画像データが取得されていなければステップＳ３１の処理を繰り返し実行する。一方、ステップＳ３１で“ＹＥＳ”であれば、つまり画像データが取得されれば、ステップＳ３３でユーザの視線方向を判定する。たとえば、ステップＳ３３では、まず視線サーバ７０にモニタカメラバッファ３３２に記憶される画像データを入力することで、ユーザの視線方向を特定する。次に、視線サーバ７０によって特定された視線方向に基づいて「モニタ」、「ロボット」および「アザー」を判定する。そして、腹部カメラバッファ３３４に記憶される画像に対しても、同様の判定を行う。なお、ステップＳ３３の処理を実行するプロセッサ６４は視線方向判定手段として機能する。 FIG. 17 is a flowchart showing the processing of the gaze direction determination result program 314e. The processor 64 determines whether image data has been acquired in step S31. That is, it is determined whether or not a new image is stored in the monitor camera buffer 332 and the abdomen camera buffer 334. If “NO” in the step 31, that is, if new image data is not acquired, the process of the step S31 is repeatedly executed. On the other hand, if “YES” in the step S31, that is, if the image data is acquired, the user's line-of-sight direction is determined in a step S33. For example, in step S 33, first, the image data stored in the monitor camera buffer 332 is input to the line-of-sight server 70, thereby identifying the user's line-of-sight direction. Next, “monitor”, “robot”, and “other” are determined based on the line-of-sight direction specified by the line-of-sight server 70. The same determination is performed on the image stored in the abdominal camera buffer 334. The processor 64 that executes the process of step S33 functions as a line-of-sight direction determination unit.

続いて、ステップＳ３５では現在時刻を取得し、ステップＳ３７で視線方向の判定結果に現在時刻を対応付ける。そして、ステップＳ３７の処理が終了すれば、ステップＳ３１に戻る。なお、ステップＳ３７の処理では、ステップＳ２９（図１６）と同様に、モニタカメラ２２によって撮影された画像の視線方向の判定結果と、腹部カメラ１２によって撮影された画像の視線方向の判定結果とをそれぞれ判定結果バッファ３３８に記憶させる。 Subsequently, in step S35, the current time is acquired, and in step S37, the current time is associated with the visual line direction determination result. And if the process of step S37 is complete | finished, it will return to step S31. In the process of step S37, the determination result of the line-of-sight direction of the image captured by the monitor camera 22 and the determination result of the line-of-sight direction of the image captured by the abdominal camera 12 are obtained as in step S29 (FIG. 16). Each is stored in the determination result buffer 338.

図１８には顔認識処理プログラム３１４ｆの処理を示すフロー図が示される。プロセッサ６４は、ステップＳ３９で画像データを取得したか否かを判断する。つまり、モニタカメラバッファ３３２および腹部カメラバッファ３３４に新しい画像が記憶され、更新されたか否かを判断する。ステップＳ３９で“ＮＯ”であれば、つまり画像データが取得されていなければ、ステップＳ３９の処理を繰り返す。一方、ステップＳ３９で“ＹＥＳ”であれば、つまり画像データが取得されれば、ステップＳ４１でユーザの顔を認識する。たとえば、モニタカメラバッファ３３２に記憶される画像データに対して所定の顔認識処理を加えることで、顔領域を認識できたか否かを判断する。そして、認識できた場合には顔認識結果を「成功」と判定し、認識できなかった場合には顔認識結果を「失敗」と判定する。そして、腹部カメラバッファ３３４に記憶される画像データに対しても同様の判定を行う。なお、ステップＳ４１の処理を実行するプロセッサ６４は顔認識手段として機能する。 FIG. 18 is a flowchart showing the processing of the face recognition processing program 314f. The processor 64 determines whether or not image data has been acquired in step S39. That is, it is determined whether new images are stored and updated in the monitor camera buffer 332 and the abdomen camera buffer 334. If “NO” in the step S39, that is, if the image data is not acquired, the process of the step S39 is repeated. On the other hand, if “YES” in the step S39, that is, if image data is acquired, the user's face is recognized in a step S41. For example, it is determined whether or not the face area has been recognized by applying a predetermined face recognition process to the image data stored in the monitor camera buffer 332. If the face is recognized, the face recognition result is determined as “success”. If the face recognition result is not recognized, the face recognition result is determined as “failure”. The same determination is performed on the image data stored in the abdominal camera buffer 334. The processor 64 that executes the process of step S41 functions as a face recognition unit.

続いて、ステップＳ４３では現在時刻を取得し、ステップＳ４５では顔認識結果に現在時刻を対応付ける。そして、ステップＳ４５の処理が終了すれば、ステップＳ３９に戻る。また、ステップＳ３９の処理では、現在時刻が対応付けられた各認識結果を顔認識結果バッファ３４０に一旦記憶させる。 Subsequently, in step S43, the current time is acquired, and in step S45, the current time is associated with the face recognition result. And if the process of step S45 is complete | finished, it will return to step S39. In the process of step S39, each recognition result associated with the current time is temporarily stored in the face recognition result buffer 340.

図１９には頷き判定プログラム３１４ｇの処理を示すフロー図が示される。プロセッサ６４は、ステップＳ４７で画像データを取得したか否かを判断する。つまり、モニタカメラバッファ３３２および腹部カメラバッファ３３４が更新されたか否かを判断する。ステップＳ４７で“ＮＯ”であれば、つまり画像データが取得されていなければ、ステップＳ４７を繰り返す。一方、ステップＳ４７で“ＹＥＳ”であれば、つまり画像データが取得されれば、ステップＳ４９でユーザの頷きを判定する。たとえば、モニタカメラバッファ３３２および腹部カメラバッファ３３４に記憶される画像データに対して、テンプレートマッチング処理を加えて、ユーザの頷きの有無をそれぞれ判定する。そして、２つの判定結果で頷きが「あり」と判定される場合に、ユーザの頷きが「あり」と判定される。なお、ステップＳ４９の処理を実行するプロセッサ６４は頷き判定手段として機能する。 FIG. 19 is a flowchart showing the processing of the whirl determination program 314g. The processor 64 determines whether or not image data has been acquired in step S47. That is, it is determined whether or not the monitor camera buffer 332 and the abdomen camera buffer 334 have been updated. If “NO” in the step S47, that is, if the image data is not acquired, the step S47 is repeated. On the other hand, if “YES” in the step S47, that is, if the image data is acquired, it is determined whether or not the user is whispered in a step S49. For example, a template matching process is applied to the image data stored in the monitor camera buffer 332 and the abdomen camera buffer 334 to determine whether or not the user has whispered. Then, when it is determined that the whisper is “Yes” based on the two determination results, the user's whisper is determined to be “Yes”. Note that the processor 64 that executes the process of step S49 functions as a whirl determination unit.

続いて、ステップＳ５１で現在時刻を取得し、ステップＳ５３で頷きの判定結果に現在時刻を対応付ける。そして、ステップＳ５３の処理が終了すれば、ステップＳ４７に戻る。また、日時情報が対応付けられた頷きの判定結果は、判定結果バッファ３３８に記憶される。 Subsequently, the current time is acquired in step S51, and the current time is associated with the result of the determination in step S53. And if the process of step S53 is complete | finished, it will return to step S47. In addition, the determination result of the whisper associated with the date / time information is stored in the determination result buffer 338.

図２０には相槌判定プログラム３１４ｈの処理を示すフロー図が示される。プロセッサ６４は、ステップＳ５５で頷き判定が終了したか否かを判断する。たとえば、頷きの判定結果が判定結果バッファ３３８に記憶されているか否かを判断する。ステップＳ５５で“ＮＯ”であれば、つまり頷き判定が終了していなければ、ステップＳ５５の処理を繰り返す。一方、ステップＳ５５で“ＹＥＳ”であれば、つまり頷き判定が終了していれば、ステップＳ５７で発話判定が終了したか否かを判断する。たとえば、発話の判定結果が判定結果バッファ３３８に記憶されているか否かを判断する。ステップＳ５７で“ＮＯ”であれば、つまり発話判定が終了していなければステップＳ５５に戻る。一方、ステップＳ５７で“ＹＥＳ”であれば、つまり発話判定が終了していれば、ステップＳ５９で日時情報に基づいてユーザの相槌を判定する。たとえば、発話判定結果と頷き判定結果とに対応付けられている日時情報に基づいて、それぞれを同期する。そして、頷きの判定結果が「あり」である場合に、音声バッファに記憶される音声データに対して、上記した音声プロソディを判定することで、相槌の判定をする。なお、ステップＳ５９の処理を実行するプロセッサ６４は相槌判定手段として機能する。 FIG. 20 shows a flowchart showing the processing of the conflict determination program 314h. The processor 64 determines whether or not the whisper determination has been completed in step S55. For example, it is determined whether or not the determination result of whispering is stored in the determination result buffer 338. If “NO” in the step S55, that is, if the whispering determination is not ended, the process of the step S55 is repeated. On the other hand, if “YES” in the step S55, that is, if the whispering determination is completed, it is determined whether or not the speech determination is completed in a step S57. For example, it is determined whether or not an utterance determination result is stored in the determination result buffer 338. If “NO” in the step S57, that is, if the speech determination is not ended, the process returns to the step S55. On the other hand, if “YES” in the step S57, that is, if the speech determination is completed, the user's conflict is determined based on the date information in a step S59. For example, each is synchronized based on date and time information associated with an utterance determination result and a whisper determination result. Then, when the determination result of “whit” is “Yes”, the above-described sound method is determined for the sound data stored in the sound buffer, thereby determining the conflict. The processor 64 that executes the process of step S59 functions as a conflict determination unit.

続いて、ステップＳ６１では、相槌の判定結果に時刻を対応付けて、ステップＳ５５に戻る。つまり、ステップＳ６１では、発話または頷きの判定結果に対応付けられている日時情報を、相槌の判定結果に対応付ける。そして、時刻が対応付けられた相槌の判定結果は、判定結果バッファ３４０に記憶される。 Subsequently, in step S61, time is associated with the determination result of the conflict, and the process returns to step S55. That is, in step S61, the date / time information associated with the speech or whisper determination result is associated with the conflict determination result. Then, the determination result of the conflict associated with the time is stored in the determination result buffer 340.

図２１には同期プログラム３１４ｊの処理を示すフロー図が示される。プロセッサ６４は、ステップＳ６３で各判定が終了したか否かを判断する。たとえば、判定結果バッファ３３８に姿勢、発話、頭部方向、視線方向、頷きおよび相槌の判定結果が記憶され、かつ顔認識結果バッファ３４０に顔認識結果が記憶されているか否かを判定する。ステップＳ６３で“ＮＯ”であれば、つまり各判定が終了していなければ、ステップＳ６３の処理を繰り返し実行する。一方、ステップＳ６３で“ＹＥＳ”であれば、つまり各判定が終了していれば、ステップＳ６５で各判定結果および顔認識結果を時刻に基づいて同期する。つまり、各判定結果および顔認識結果に対応付けられた時刻に基づいて同期する。 FIG. 21 is a flowchart showing the processing of the synchronization program 314j. The processor 64 determines whether or not each determination is completed in step S63. For example, it is determined whether or not the determination result buffer 338 stores posture, speech, head direction, line-of-sight direction, whispering, and conflict determination results, and the face recognition result buffer 340 stores face recognition results. If “NO” in the step S63, that is, if each determination is not completed, the process of the step S63 is repeatedly executed. On the other hand, if “YES” in the step S63, that is, if each determination is completed, each determination result and the face recognition result are synchronized on the basis of the time in a step S65. That is, synchronization is performed based on the time associated with each determination result and the face recognition result.

続いて、ステップＳ６７では、同期した各判定結果を行動データとし、行動テーブルに記録する。つまり、図６に示す行動テーブルにおいて、新たな行に各判定結果および顔認識結果を記録する。続いて、ステップＳ６９では、現在の行動データをサーバ２４に送信する。そして、ステップＳ６９の処理が終了すればステップＳ６３に戻る。つまり、ステップＳ６９では、行動テーブルにおいて、新たに追加された行に対応する行動データをサーバ２４に送信する。 Subsequently, in step S67, each synchronized determination result is set as action data and recorded in the action table. That is, in the action table shown in FIG. 6, each determination result and face recognition result are recorded in a new line. Subsequently, in step S69, the current behavior data is transmitted to the server 24. And if the process of step S69 is complete | finished, it will return to step S63. That is, in step S69, the action data corresponding to the newly added row is transmitted to the server 24 in the action table.

このように、図１３−図２１の処理が一定時間毎に並列的に実行されることで、ユーザの行動データが判定されるとともに、サーバ２４に送信される。そして、ユーザの行動データに複数の判定結果が含まれるため、ユーザの状態が適確に認識される。 As described above, the processing of FIG. 13 to FIG. 21 is executed in parallel at regular time intervals, whereby the user behavior data is determined and transmitted to the server 24. And since a several determination result is contained in a user's action data, a user's state is recognized correctly.

なお、ステップＳ６５，Ｓ６７の処理を実行するプロセッサ６４は判定手段として機能する。また、ステップＳ６９の処理を実行するプロセッサ６４は送信手段として機能する。 The processor 64 that executes the processes of steps S65 and S67 functions as a determination unit. The processor 64 that executes the process of step S69 functions as a transmission unit.

図２２には状態認識プログラム３１４ｋの処理を示すフロー図が示される。たとえば、ＰＣ１４のプロセッサ６４は、ユーザによってＰＣ１４の電源がオンにされると、ステップＳ７１で第１所定時間が経過したか否かを判断する。たとえば、状態カウンタ３６０の値が第１所定時間を示す値と超えたか否かを判断する。ステップＳ７１で“ＮＯ”であれば、つまり第１所定時間が経過していなければ、ステップＳ７１の処理を繰り返し実行する。一方、ステップＳ７１で“ＹＥＳ”であれば、つまり第１所定時間が経過してれば、ステップＳ７３で第１所定時間分の行動データを取得する。つまり、行動テーブルから、現在時刻から第１所定時間前までの行動データを取得する。なお、ステップＳ７３の処理が終了すると、状態カウンタ３６０はリセットされる。 FIG. 22 is a flowchart showing the processing of the state recognition program 314k. For example, when the power of the PC 14 is turned on by the user, the processor 64 of the PC 14 determines whether or not a first predetermined time has elapsed in step S71. For example, it is determined whether or not the value of the status counter 360 exceeds a value indicating the first predetermined time. If “NO” in the step S71, that is, if the first predetermined time has not elapsed, the process of the step S71 is repeatedly executed. On the other hand, if “YES” in the step S71, that is, if the first predetermined time has elapsed, the action data for the first predetermined time is acquired in a step S73. That is, action data from the current time to the first predetermined time before is acquired from the action table. Note that when the process of step S73 ends, the state counter 360 is reset.

続いて、ステップＳ７５ではＡｃ／Ｐａ認識処理を実行し、ステップＳ７７ではＴａ／Ｌｉ認識処理を実行し、ステップＳ７９では興味対象認識処理を実行する。なお、ステップＳ７５，Ｓ７７およびＳ７９の処理については、図２３、図２４および図２５に示すフロー図を用いて後述するため、ここでの詳細な説明は省略する。 Subsequently, Ac / Pa recognition processing is executed in step S75, Ta / Li recognition processing is executed in step S77, and interest recognition processing is executed in step S79. Note that the processing in steps S75, S77, and S79 will be described later with reference to the flowcharts shown in FIGS. 23, 24, and 25, and thus detailed description thereof is omitted here.

続いて、ステップＳ８１では、各認識結果を状態データ３５４として記憶する。たとえば、Ａｃ／Ｐａ認識結果がアクティブ状態であり、Ｔａ／Ｌｉ認識結果がトークであり、興味対象認識結果が「ロボット」であれば、状態データ３５４は、「アクティブ・トーク・ロボット」としてメモリ６８に記憶される。続いて、ステップＳ８３では、状態データ３５４をサーバ２４に送信し、ステップＳ７１に戻る。 Subsequently, in step S81, each recognition result is stored as state data 354. For example, if the Ac / Pa recognition result is active, the Ta / Li recognition result is talk, and the interest recognition result is “robot”, the state data 354 is stored in the memory 68 as “active talk robot”. Is remembered. Subsequently, in step S83, the status data 354 is transmitted to the server 24, and the process returns to step S71.

なお、ステップＳ７５の処理を実行するプロセッサ６４は積極性認識手段として機能し、ステップＳ７７の処理を実行するプロセッサ６４は話者状態認識手段として機能し、ステップＳ７９の処理を実行するプロセッサ６４は興味対象認識手段として機能する。そして、ステップＳ７５−８１の処理を実行するプロセッサ６４は認識手段として機能する。 The processor 64 that executes the process of step S75 functions as an aggressiveness recognition unit, the processor 64 that executes the process of step S77 functions as a speaker state recognition unit, and the processor 64 that executes the process of step S79 is an interesting object. Functions as a recognition means. And the processor 64 which performs the process of step S75-81 functions as a recognition means.

図２３にはＡｃ／Ｐａ認識プログラム３１４ｍの処理を示すフロー図が示される。プロセッサ６４はステップＳ９１で相手を見る時間ＷＴを算出し、ステップＳ９３では前傾姿勢の時間ＦＴを算出し、ステップＳ９５では相槌の頻度ＲＦを算出する。つまり、ステップＳ９１，Ｓ９３およびＳ９５では、上位ルーチンのステップＳ７３で取得された行動データに基づいて、相手を見る時間ＷＴ、前傾姿勢の時間ＦＴおよび相槌の頻度ＲＦを算出する。そして、ステップＳ９７では、各算出結果からＡＣ値を算出する。つまり、上述の数１に示す式に基づいてＡＣ値を算出する。続いて、ステップＳ９９では、算出したＡＣ値が閾値Ａより大きいか否かを判断する。 FIG. 23 is a flowchart showing the processing of the Ac / Pa recognition program 314m. The processor 64 calculates the opponent watching time WT in step S91, calculates the forward tilting time FT in step S93, and calculates the conflict frequency RF in step S95. In other words, in steps S91, S93, and S95, based on the action data acquired in step S73 of the upper routine, the time WT for watching the opponent, the time FT of the forward tilt posture, and the frequency RF of the conflict are calculated. In step S97, an AC value is calculated from each calculation result. That is, the AC value is calculated based on the formula shown in the above equation 1. Subsequently, in step S99, it is determined whether or not the calculated AC value is larger than a threshold value A.

ステップＳ９９で“ＹＥＳ”であれば、つまりＡＣ値が閾値Ａよりも大きければ、ステップＳ１０１でアクティブ状態に設定する。つまり、Ａｃ／Ｐａフラグ３５６をオンに設定する。一方、ステップＳ９９で“ＮＯ”であれば、つまりＡＣ値が閾値Ａ以下であれば、ステップＳ１０３でパッシブ状態に設定する。つまり、Ａｃ／Ｐａフラグ３５６をオフに設定する。なお、ステップＳ１０１またはステップＳ１０３の処理が終了すれば、Ａｃ／Ｐａ認識処理を終了して状態認識処理に戻る。 If “YES” in the step S99, that is, if the AC value is larger than the threshold A, the active state is set in a step S101. That is, the Ac / Pa flag 356 is set to ON. On the other hand, if “NO” in the step S99, that is, if the AC value is equal to or less than the threshold A, the passive state is set in a step S103. That is, the Ac / Pa flag 356 is set to off. In addition, if the process of step S101 or step S103 is complete | finished, an Ac / Pa recognition process is complete | finished and it returns to a state recognition process.

図２４にはＴａ／Ｌｉ認識プログラム３１４ｎの処理を示すフロー図が示される。プロセッサ６４はステップＳ１１１で第１所定時間分の発話判定結果から発話量Ｔａを算出する。つまり、上位ルーチンのステップＳ７３で取得された行動データに基づいて、発話量Ｔａを算出する。続いて、ステップＳ１１３では算出された発話量Ｔａが閾値Ｔよりも大きいか否かを判断する。ステップＳ１１３で“ＹＥＳ”であれば、つまり発話量Ｔａが閾値Ｔよりも大きければ、ステップＳ１１５でトーク状態に設定する。つまり、ステップＳ１１５ではＴａ／Ｌｉフラグ３５８をオンにする。一方、ステップＳ１１３で“ＮＯ”であれば、つまり発話量Ｔａが閾値Ｔａ以下であれば、ステップＳ１１７でリッスン状態に設定する。つまり、ステップＳ１１７ではＴａ／Ｌｉフラグ３５８をオフにする。 FIG. 24 is a flowchart showing the processing of the Ta / Li recognition program 314n. In step S111, the processor 64 calculates the utterance amount Ta from the utterance determination result for the first predetermined time. That is, the utterance amount Ta is calculated based on the action data acquired in step S73 of the upper routine. Subsequently, in step S113, it is determined whether or not the calculated utterance amount Ta is larger than a threshold value T. If “YES” in the step S113, that is, if the utterance amount Ta is larger than the threshold T, the talk state is set in a step S115. That is, in step S115, the Ta / Li flag 358 is turned on. On the other hand, if “NO” in the step S113, that is, if the utterance amount Ta is equal to or less than the threshold value Ta, the listening state is set in a step S117. That is, in step S117, the Ta / Li flag 358 is turned off.

そして、ステップＳ１１５またはステップＳ１１７の処理が終了すれば、Ｔａ／Ｌｉ認識処理を終了して、状態認識処理に戻る。 And if the process of step S115 or step S117 is complete | finished, a Ta / Li recognition process will be complete | finished and it will return to a state recognition process.

図２５には興味対象認識プログラム３１４ｐの処理を示すフロー図が示される。プロセッサ６４はステップＳ１２１で第２所定時間分の行動データを取得する。つまり、上位ルーチンである、ステップＳ７３で取得された行動データから、第２所定時間分の行動データを取得する。続いて、ステップＳ１２３では、モニタカメラ１２３の顔認識結果および視線判定結果から認識率Ｍを算出し、ステップＳ１２５では、腹部カメラ１２の顔認識の結果および視線判定結果から認識率Ｓを算出する。つまり、ステップＳ１２３，Ｓ１２５では、ステップＳ１２１で取得した第２所定時間分の行動データに基づいて、認識率Ｍおよび認識率Ｓを算出する。なお、ステップＳ１２３の処理を実行するプロセッサ６４は第１認識率算出手段として機能し、ステップＳ１２５の処理を実行するプロセッサ６４は第２認識率算出手段として機能する。 FIG. 25 is a flowchart showing the processing of the interest recognition program 314p. The processor 64 acquires action data for the second predetermined time in step S121. That is, the action data for the second predetermined time is acquired from the action data acquired in step S73, which is the upper routine. Subsequently, in step S123, the recognition rate M is calculated from the face recognition result and the line-of-sight determination result of the monitor camera 123, and in step S125, the recognition rate S is calculated from the face recognition result and the line-of-sight determination result of the abdominal camera 12. That is, in steps S123 and S125, the recognition rate M and the recognition rate S are calculated based on the behavior data for the second predetermined time acquired in step S121. The processor 64 that executes the process of step S123 functions as a first recognition rate calculation unit, and the processor 64 that executes the process of step S125 functions as a second recognition rate calculation unit.

続いて、ステップＳ１２７では、各認識率は閾値Ｉｎより大きいか否かを判断する。つまり、認識率Ｍおよび認識率Ｓが閾値Ｉｎより大きいか否かを判断する。ステップＳ１２７で“ＮＯ”であれば、つまり認識率Ｍおよび認識率Ｓが共に閾値Ｉｎ以下であれば、ステップＳ１３５に進む。ステップＳ１２７で“ＹＥＳ”であれば、つまり認識率Ｍおよび認識率Ｓが閾値Ｉｎより大きければ、ステップＳ１２９で、認識率Ｍが認識率Ｓよりも大きいか否かを判断する。 Subsequently, in step S127, it is determined whether each recognition rate is greater than a threshold value In. That is, it is determined whether the recognition rate M and the recognition rate S are larger than the threshold value In. If “NO” in the step S127, that is, if both the recognition rate M and the recognition rate S are equal to or less than the threshold value In, the process proceeds to a step S135. If “YES” in the step S127, that is, if the recognition rate M and the recognition rate S are larger than the threshold value In, it is determined whether or not the recognition rate M is larger than the recognition rate S in a step S129.

ステップＳ１２９で“ＹＥＳ”であれば、つまり認識率Ｍが認識率Ｓよりも大きければ、ステップＳ１３１でモニタ状態に設定する。つまり、ユーザの興味対象がモニタ１６であるため、興味対象結果バッファ３４２に「モニタ」を示すデータを一時記憶させる。一方、ステップＳ１２９で“ＮＯ”であれば、つまり認識率Ｍが認識率Ｓ以下であれば、ステップＳ１３３でロボット状態に設定する。つまり、ユーザの認識対象がロボット１２であるため、興味対象認識結果バッファ３４２に「ロボット」を示すデータを一時的に記憶させる。そして、認識率Ｍおよび認識率Ｓが閾値Ｉｎ以下である場合、ステップＳ１３５でアザー状態に設定する。つまり、ステップＳ１３５では、ユーザの興味対象がロボット１２およびモニタ１６ではないため、興味対象認識結果バッファ３４２に「アザー」を示すデータを一時的に記憶させる。 If “YES” in the step S129, that is, if the recognition rate M is larger than the recognition rate S, the monitor state is set in a step S131. That is, since the user's interest is the monitor 16, data indicating “monitor” is temporarily stored in the interest result buffer 342. On the other hand, if “NO” in the step S129, that is, if the recognition rate M is equal to or less than the recognition rate S, the robot state is set in a step S133. That is, since the user's recognition target is the robot 12, the data indicating “robot” is temporarily stored in the interest target recognition result buffer 342. If the recognition rate M and the recognition rate S are less than or equal to the threshold value In, the other state is set in step S135. That is, in step S135, since the user's interest is not the robot 12 or the monitor 16, the data indicating “other” is temporarily stored in the interest recognition result buffer 342.

なお、ステップＳ１３１，Ｓ１３３およびステップＳ１３５の処理が終了すれば、興味対象認識処理を終了して、状態認識処理に戻る。また、本実施例では、興味対象がユーザの顔の認識率Ｍ，Ｓに基づいて設定されるため、ユーザが見る対象を正確に認識することができる。 In addition, if the process of step S131, S133, and step S135 is complete | finished, an interested object recognition process will be complete | finished and it will return to a state recognition process. In the present embodiment, since the object of interest is set based on the recognition rates M and S of the user's face, the object that the user sees can be accurately recognized.

また、ステップＳ１３１，Ｓ１３３およびＳ１３５の処理を実行するプロセッサ６４は設定手段として機能する。 The processor 64 that executes the processes of steps S131, S133, and S135 functions as a setting unit.

このように、図２２−図２５に示す処理が実行されることで、ユーザの状態データがメモリ６８に記憶されるとともに、サーバ２４に送信される。また、他の実施例ではＡｃ／Ｐａ認識処理、Ｔａ／Ｌｉ認識処理および興味対象認識処理が並列的に処理されてもよく、この場合は、興味対象認識処理に限り、行動データを第２所定時間毎に取得することで、ユーザの興味対象を認識する。 As described above, by executing the processing shown in FIGS. 22 to 25, the user status data is stored in the memory 68 and transmitted to the server 24. In another embodiment, the Ac / Pa recognition process, the Ta / Li recognition process, and the interest object recognition process may be performed in parallel. Recognize the user's interest by acquiring it every hour.

図２６には発声時間計測プログラム３２０の処理を示すフロー図が示される。プロセッサ２０はステップＳ１４１で音声レベルが所定値以上か否かを判断する。たとえば、マイク２０によって集音された音声の音声レベルが、人間の発話と判断できる所定値以上であるか否かを判断する。ステップＳ１４１で“ＮＯ”であれば、つまり音声レベルが所定値未満であれば、ステップＳ１４１の処理を繰り返す。一方、ステップＳ１４１で“ＹＥＳ”であれば、つまり音声レベルが所定値以上であれば、ステップＳ１４３で発声カウンタ３６２をインクリメントする。つまり、ステップＳ１４３では発声時間をカウントするために、発声カウンタ３６２をインクリメントする。なお、ステップＳ１４１−Ｓ１４５の処理を実行するプロセッサ６４は計測手段として機能する。 FIG. 26 is a flowchart showing the processing of the utterance time measurement program 320. In step S141, the processor 20 determines whether or not the sound level is equal to or higher than a predetermined value. For example, it is determined whether or not the sound level of the sound collected by the microphone 20 is equal to or higher than a predetermined value that can be determined as a human speech. If “NO” in the step S141, that is, if the sound level is less than a predetermined value, the process of the step S141 is repeated. On the other hand, if “YES” in the step S141, that is, if the sound level is equal to or higher than a predetermined value, the utterance counter 362 is incremented in a step S143. That is, in step S143, the utterance counter 362 is incremented to count the utterance time. The processor 64 that executes the processes of steps S141 to S145 functions as a measurement unit.

続いて、ステップＳ１４５では、音声レベルが所定値未満になったか否かを判断する。たとえば、マイク２０によって集音された音声の音声レベルが所定値未満になったか否かを判断する。つまり、ステップＳ１４５で“ＮＯ”であれば、つまり音声レベルが所定値以上であれば、ステップＳ１４３に戻る。一方、ステップＳ１４５で“ＹＥＳ”であれば、つまり音声レベルが所定値未満であれば、ステップＳ１４７で発声カウンタ３６２を初期化し、ステップＳ１４１に戻る。つまり、ユーザの発話が終了したため、発声時間を計測する発声カウンタ３６２を初期化する。 Subsequently, in step S145, it is determined whether or not the audio level has become less than a predetermined value. For example, it is determined whether or not the sound level of the sound collected by the microphone 20 has become less than a predetermined value. That is, if “NO” in the step S145, that is, if the sound level is equal to or higher than a predetermined value, the process returns to the step S143. On the other hand, if “YES” in the step S145, that is, if the sound level is less than a predetermined value, the utterance counter 362 is initialized in a step S147, and the process returns to the step S141. That is, since the user's utterance is completed, the utterance counter 362 for measuring the utterance time is initialized.

なお、ステップＳ１４１およびステップＳ１４５では、マイク２０に入力される音声の音声レベルだけに限らず、スピーカ１８から出力される音声の音声レベルに基づいて判断する。これにより、相手ユーザの発声時間を計測することも可能になる。 In step S141 and step S145, the determination is made not only based on the sound level of the sound input to the microphone 20, but also based on the sound level of the sound output from the speaker 18. Thereby, it is also possible to measure the utterance time of the other user.

また、他の実施例では、発声カウンタ３６２を利用せず、行動テーブルにおける発話の判定結果に基づいて発声時間が計測されてもよい。つまり、発話判定において連続して「あり」と判定される回数をカウントすることで、発声時間を測定することができる。また、この場合、たとえばユーザＡだけに特化して、発声時間を計測することが可能になり、ユーザＡとユーザＢとが同時に発話しているときには、正確に各ユーザの発声時間を計測できるようになる。さらに、対話中の全ての発話判定結果が行動テーブルに記録されるようにすれば、対話中の総合発声時間や、各発声時間をそれぞれ算出することができる。 In another embodiment, the utterance time may be measured based on the utterance determination result in the action table without using the utterance counter 362. That is, the utterance time can be measured by counting the number of times that “Yes” is continuously determined in the utterance determination. Further, in this case, for example, it becomes possible to measure the utterance time only for the user A, and when the user A and the user B speak at the same time, the utterance time of each user can be accurately measured. become. Furthermore, if all the utterance determination results during the conversation are recorded in the action table, the total utterance time during the conversation and each utterance time can be calculated.

図２７には、ロボット制御プログラム３１６に含まれる全体プログラム３１６ａの処理を示すフロー図が示される。たとえば、ＰＣ１４のプロセッサ６４は、テレビ電話機能による通話が開始されると、ステップＳ１６１で終了操作か否かを判断する。たとえば、テレビ電話機能による通話を終了する操作がされたか否かを判断する。ステップＳ１６１で“ＹＥＳ”であれば、つまり終了操作が行われると、全体処理を終了する。一方、ステップＳ１６１で“ＮＯ”であれば、つまり終了操作が行われなければ、ステップＳ１６３で状態データ３５４を参照する。 FIG. 27 is a flowchart showing the processing of the entire program 316a included in the robot control program 316. For example, when the telephone call by the videophone function is started, the processor 64 of the PC 14 determines whether or not the end operation is performed in step S161. For example, it is determined whether or not an operation for ending a call using the videophone function has been performed. If “YES” in the step S161, that is, if an end operation is performed, the entire process is ended. On the other hand, if “NO” in the step S161, that is, if the end operation is not performed, the state data 354 is referred to in a step S163.

続いて、ステップＳ１６５ではモニタ状態か否かを判断する。つまり、状態データ３５４に、ユーザの興味対象がモニタ１６であること示す「モニタ」が含まれているか否かを判断する。ステップＳ１６５で“ＮＯ”であれば、つまりユーザの興味対象がモニタ１６でなければ、ステップＳ１７５に進む。一方、ステップＳ１６５で“ＹＥＳ”であれば、つまりユーザの興味対象がモニタ１６であれば、ステップＳ１６７でアクティブか否かを判断する。つまり、状態データ３５４に、ユーザがアクティブ状態であることを示す「アクティブ」が含まれているか否かを判断する。ステップＳ１６７で“ＮＯ”であれば、つまりユーザがパッシブ状態であれば、ステップＳ１７７に進む。 Subsequently, in step S165, it is determined whether or not the monitor state is set. That is, it is determined whether or not the status data 354 includes “monitor” indicating that the user is interested in the monitor 16. If “NO” in the step S165, that is, if the user's interest is not the monitor 16, the process proceeds to a step S175. On the other hand, if “YES” in the step S165, that is, if the user's interest is the monitor 16, it is determined whether or not it is active in a step S167. That is, it is determined whether or not the status data 354 includes “active” indicating that the user is in the active status. If “NO” in the step S167, that is, if the user is in a passive state, the process proceeds to a step S177.

一方、ステップＳ１６７で“ＹＥＳ”であれば、つまりユーザがアクティブ状態であれば、ステップＳ１６９でトークであるか否かを判断する。つまり、状態データ３５４に、ユーザがトーク状態であることを示す「トーク」が含まれるか否かを判断する。ステップＳ１６９で“ＹＥＳ”であれば、つまり、ユーザがトーク状態であれば状態データ３５４は「アクティブ・トーク・モニタ」であるため、ステップＳ１７１でアクティブトーク処理を実行し、ステップＳ１６１に戻る。なお、このアクティブトーク処理については、図２８に示すフロー図を用いて後述するため、ここでの詳細な説明は省略する。 On the other hand, if “YES” in the step S167, that is, if the user is in an active state, it is determined whether or not the talk is in a step S169. That is, it is determined whether or not the status data 354 includes “talk” indicating that the user is in the talk state. If “YES” in the step S169, that is, if the user is in the talk state, the state data 354 is “active talk monitor”, so the active talk process is executed in a step S171, and the process returns to the step S161. Since this active talk process will be described later with reference to the flowchart shown in FIG. 28, a detailed description thereof is omitted here.

ステップＳ１６９で“ＮＯ”であれば、つまりユーザがリッスン状態であれば状態データ３５４は「アクティブ・リッスン・モニタ」であるため、ステップＳ１７３でアクティブリッスン処理を実行し、ステップＳ１６１に戻る。なお、このアクティブリッスン処理については、図２９に示すフロー図を用いて後述するため、ここでの詳細な説明は省略する。 If “NO” in the step S169, that is, if the user is in a listening state, the state data 354 is “active listening monitor”, so an active listening process is executed in a step S173, and the process returns to the step S161. Since this active listening process will be described later with reference to the flowchart shown in FIG. 29, a detailed description thereof will be omitted here.

また、ユーザの興味対象がモニタ１６以外である場合、ステップＳ１７５で、ロボット状態か否かを判断する。つまり、状態データ３５４に、ユーザの興味対象がロボット１０であることを示す「ロボット」が含まれているか否かを判断する。ステップＳ１７５で“ＹＥＳ”であれば、つまりユーザの興味対象がロボット１０であれば、ステップＳ１７７で非アクティブ処理を実行し、ステップＳ１６１に戻る。なお、この非アクティブ処理については図３０に示すフロー図を用いて後述するため、ここでの詳細な説明は省略する。一方、ステップＳ１７５で“ＮＯ”であれば、つまりユーザの興味対象がロボット１０でもモニタ１６でもなければ、ステップＳ１７９でアザー処理を実行し、ステップＳ１６１に戻る。なお、アザー処理については図３１に示すフロー図を用いて後述するため、ここでの詳細な説明は省略する。 If the user's interest is other than the monitor 16, it is determined in step S175 whether the robot is in the robot state. That is, it is determined whether or not the status data 354 includes “robot” indicating that the user is interested in the robot 10. If “YES” in the step S175, that is, if the user's object of interest is the robot 10, an inactive process is executed in a step S177, and the process returns to the step S161. Since this inactive process will be described later with reference to the flowchart shown in FIG. 30, a detailed description thereof will be omitted here. On the other hand, if “NO” in the step S175, that is, if the user is not interested in the robot 10 or the monitor 16, the other process is executed in a step S179, and the process returns to the step S161. Since the other process will be described later with reference to the flowchart shown in FIG. 31, detailed description thereof is omitted here.

なお、他の実施例では、ステップＳ１６５，Ｓ１７５およびＳ１７９の処理を省略し、ステップＳ１６７で、アクティブ・モニタか否かを判断するようにしてもよい。また、ステップＳ１７１，Ｓ１７３，Ｓ１７７およびＳ１７９の処理を実行するプロセッサ６４は動作付与手段として機能する。 In other embodiments, the processes in steps S165, S175, and S179 may be omitted, and it may be determined in step S167 whether or not the monitor is an active monitor. In addition, the processor 64 that executes the processes of steps S171, S173, S177, and S179 functions as an operation providing unit.

図２８にはアクティブトークプログラム３１６ｂの処理を示すフロー図が示される。なお、図２８−図４２までのフロー図に示される処理は、ＰＣ１４ａによって実行されるもとして説明する。そのため、「ユーザ」はユーザＡを示し、「相手」はユーザＢ、つまり相手ユーザを示す。 FIG. 28 is a flowchart showing the processing of the active talk program 316b. Note that the processing shown in the flowcharts of FIGS. 28 to 42 will be described as being executed by the PC 14a. Therefore, “user” indicates user A, and “partner” indicates user B, that is, the partner user.

ＰＣ１４ａのプロセッサ６４は、ステップＳ１７１の処理が実行されると、ステップＳ１９１でサーバ２４とのデータ通信を確立する。つまり、データ通信処理を実行することで、サーバ２４とのデータ通信を確立する。続いて、ステップＳ１９３では、相手の状態データを取得する。つまり、サーバ２４に記憶されるユーザＢの状態データを、サーバ２４とのデータ通信によって取得する。そして、取得された相手の状態データは、データ通信バッファ３４４に一時的に記憶される。 When the process of step S171 is executed, the processor 64 of the PC 14a establishes data communication with the server 24 in step S191. That is, data communication with the server 24 is established by executing data communication processing. Subsequently, in step S193, the other party's state data is acquired. That is, the state data of user B stored in the server 24 is acquired by data communication with the server 24. The acquired partner status data is temporarily stored in the data communication buffer 344.

続いて、ステップＳ１９５では、相手がモニタ状態か否かを判断する。つまり、データ通信バッファ３４４に一時的に記憶されたユーザＢの状態データに基づいて、ユーザＢの興味対象がモニタ１６ｂであるか否かを判断する。ステップＳ１９５で“ＮＯ”であれば、つまりユーザＢの興味対象がモニタ１６ｂでなければ、ステップＳ２０７に進む。一方、ステップＳ１９５で“ＹＥＳ”であれば、つまりユーザＢの興味対象がモニタ１６ｂであれば、ステップＳ１９７で相手がアクティブか否かを判断する。つまり、取得されたユーザＢの状態データに基づいて、ユーザＢがアクティブ状態であるか否かを判断する。ステップＳ１９７で“ＮＯ”であれば、つまりユーザＢがパッシブ状態であれば、ステップＳ２０７に進む。 Subsequently, in step S195, it is determined whether or not the partner is in a monitoring state. That is, based on the state data of the user B temporarily stored in the data communication buffer 344, it is determined whether or not the object of interest of the user B is the monitor 16b. If “NO” in the step S195, that is, if the user B is not interested in the monitor 16b, the process proceeds to a step S207. On the other hand, if “YES” in the step S195, that is, if the object of interest of the user B is the monitor 16b, it is determined whether or not the partner is active in a step S197. That is, based on the acquired state data of user B, it is determined whether or not user B is in an active state. If “NO” in the step S197, that is, if the user B is in a passive state, the process proceeds to a step S207.

一方、ステップＳ１９７で“ＹＥＳ”であれば、つまりユーザＢがアクティブ状態であれば、ステップＳ１９９で相手がトークであるか否かを判断する。つまり、ユーザＢの状態データに基づいて、ユーザＢがトーク状態であるか否かを判断する。ステップＳ１９９で“ＹＥＳ”であれば、つまりユーザＢがトーク状態であればステップＳ２０１で発話継続処理を実行する。そして、発話継続処理が終了すれば、アクティブトーク処理を終了して、全体処理に戻る。なお、この発話継続処理については、図３２に示すフロー図を用いて後述するため、ここでの詳細な説明は省略する。 On the other hand, if “YES” in the step S197, that is, if the user B is in an active state, it is determined whether or not the other party is a talk in a step S199. That is, based on the state data of the user B, it is determined whether or not the user B is in the talk state. If “YES” in the step S199, that is, if the user B is in a talk state, the speech continuation process is executed in a step S201. When the utterance continuation process ends, the active talk process ends, and the process returns to the overall process. Since this utterance continuation process will be described later with reference to the flowchart shown in FIG. 32, detailed description thereof will be omitted here.

また、ステップＳ１９９で“ＮＯ”であれば、つまりユーザＢがリッスン状態であれば、ステップＳ２０３でユーザと相手との状態データを一時記憶する。つまり、ユーザＡの状態を示す状態データ３５４と、データ通信バッファ３４４に記憶されるユーザＢの状態データとを状態データバッファ３４８に一旦格納する。続いて、ステップＳ２０５では、発声時間が閾値Ｌより短いか否かを判断する。つまり、発声カウンタ３６２によってカウントされたユーザＡの発声時間が閾値Ｌより短いか否かを判断する。 If “NO” in the step S199, that is, if the user B is in the listening state, the state data of the user and the partner is temporarily stored in a step S203. That is, the status data 354 indicating the status of the user A and the status data of the user B stored in the data communication buffer 344 are temporarily stored in the status data buffer 348. Subsequently, in step S205, it is determined whether or not the utterance time is shorter than the threshold value L. That is, it is determined whether or not the utterance time of the user A counted by the utterance counter 362 is shorter than the threshold value L.

ステップＳ２０５で“ＹＥＳ”であれば、つまりユーザＡの発声時間が閾値Ｌよりも短ければ、ステップＳ２０１に進む。一方、ステップＳ２０５で“ＮＯ”であれば、つまりユーザＡの発声時間が閾値Ｌ以上であれば、ユーザＡが一方的に話しすぎている状態のため、ステップＳ２０７で発話抑制処理を実行する。そして、ステップＳ２０７の処理が終了すれば、アクティブトーク処理を終了して、全体処理に戻る。なお、発話抑制処理については、図３３に示すフロー図を用いて後述するため、ここでの詳細な説明は省略する。 If “YES” in the step S205, that is, if the utterance time of the user A is shorter than the threshold L, the process proceeds to a step S201. On the other hand, if “NO” in the step S205, that is, if the utterance time of the user A is equal to or more than the threshold L, the utterance suppression process is executed in a step S207 because the user A is speaking unilaterally. When the process in step S207 is completed, the active talk process is terminated and the process returns to the overall process. Note that the utterance suppression process will be described later with reference to the flowchart shown in FIG. 33, and thus detailed description thereof is omitted here.

以下、図２９−図３１に示すフロー図において、図２８のフローと重複するフローについては、詳細な説明を省略する。 In the flowcharts shown in FIGS. 29 to 31, the detailed description of the same flow as that in FIG. 28 is omitted.

図２９にはアクティブリッスンプログラム３１６ｃの処理を示すフロー図が示される。ＰＣ１４ａのプロセッサ６４は、ステップＳ１７３の処理が実行されると、ステップＳ２１１でサーバ２４とのデータ通信を確立し、ステップＳ２１３では相手の状態データを取得する。続いて、ステップＳ２１５では、相手がモニタ状態であるか否かを判断する。ステップＳ２１５で“ＮＯ”であれば、つまりユーザＢの興味対象がモニタ１６ｂ以外であれば、ステップＳ２２７に進む。 FIG. 29 is a flowchart showing the processing of the active listen program 316c. When the process of step S173 is executed, the processor 64 of the PC 14a establishes data communication with the server 24 in step S211 and acquires the partner state data in step S213. Subsequently, in step S215, it is determined whether or not the partner is in a monitoring state. If “NO” in the step S215, that is, if the object of interest of the user B is other than the monitor 16b, the process proceeds to a step S227.

ステップＳ２１５で“ＹＥＳ”であれば、つまりユーザＢの興味対象がモニタ１６ｂであれば、ステップＳ２１７で相手がアクティブか否かを判断する。ステップＳ２１７で“ＮＯ”であれば、つまりユーザＢがパッシブ状態であれば、ステップＳ２２７に進む。一方、ステップＳ２１７で“ＹＥＳ”であれば、つまりユーザＢがアクティブ状態であれば、ステップＳ２１９で相手がトークであるか否かを判断する。ステップＳ２１９で“ＮＯ”であれば、つまりユーザＢがリッスン状態であれば、ステップＳ２２７に進む。一方、ステップＳ２１９で“ＹＥＳ”であれば、つまりユーザＢがトーク状態であれば、ステップＳ２２１でユーザと相手との状態データを一時記憶する。 If “YES” in the step S215, that is, if the object of interest of the user B is the monitor 16b, it is determined whether or not the partner is active in a step S217. If “NO” in the step S217, that is, if the user B is in a passive state, the process proceeds to a step S227. On the other hand, if “YES” in the step S217, that is, if the user B is in an active state, it is determined whether or not the other party is a talk in a step S219. If “NO” in the step S219, that is, if the user B is in the listening state, the process proceeds to a step S227. On the other hand, if “YES” in the step S219, that is, if the user B is in the talk state, the state data of the user and the other party are temporarily stored in a step S221.

続いて、ステップＳ２２３では発声時間が閾値Ｌよりも短いか否かを判断する。つまり、発声カウンタ３６２によってカウントされたユーザＢの発声時間が、閾値Ｌよりも短いか否かを判断する。ステップＳ２２３で“ＹＥＳ”であれば、つまりユーザＢの発声時間が閾値Ｌよりも短ければ、ステップＳ２２５で発話継続処理を実行する。そして、ステップＳ２２５の処理が終了すれば、アクティブリッスン処理を終了して、全体処理に戻る。一方、ステップＳ２２３で“ＮＯ”であれば、つまりユーザＢの発声時間が閾値Ｌ以上であれば、相手ユーザであるユーザＢが一方的に話しすぎている状態のため、ステップＳ２２７で発話促進処理を実行する。また、ステップＳ２２７の処理が終了すれば、アクティブリッスン処理を終了して、全体処理に戻る。なお、発話促進処理については、図３４に示すフロー図を用いて後述するため、ここでの詳細な説明は省略する。 Subsequently, in step S223, it is determined whether or not the utterance time is shorter than the threshold value L. That is, it is determined whether or not the utterance time of the user B counted by the utterance counter 362 is shorter than the threshold value L. If “YES” in the step S223, that is, if the utterance time of the user B is shorter than the threshold L, the utterance continuing process is executed in a step S225. When the process of step S225 ends, the active listen process ends, and the process returns to the overall process. On the other hand, if “NO” in the step S223, that is, if the utterance time of the user B is equal to or greater than the threshold value L, the user B who is the partner user is unilaterally speaking, so the utterance promoting process in the step S227. Execute. When the process of step S227 is completed, the active listen process is terminated and the process returns to the overall process. Note that the utterance promotion process will be described later with reference to the flowchart shown in FIG. 34, and thus detailed description thereof is omitted here.

そして、ステップＳ２０７の発話抑制処理およびステップＳ２２７の発話促進処理が実行されると、ロボット１０ａはユーザＡの発話を制御するように動作する。したがって、ユーザＡとユーザＢとがバランス良く発話するようになるため、対話が持続するようになる。 When the speech suppression process in step S207 and the speech promotion process in step S227 are executed, the robot 10a operates to control the user A's speech. Therefore, since the user A and the user B speak in a well-balanced manner, the dialogue continues.

図３０には非アクティブプログラム３１６ｄの処理を示すフロー図が示される。ＰＣ１４ａのプロセッサ６４は、ステップＳ１７７の処理が実行されると、ステップＳ２３１で発話促進処理を実行する。そして、ステップＳ２３１の処理が終了すれば、非アクティブ処理を終了して、全体処理に戻る。たとえば、対話に非積極的なユーザの発話を促進することで、ユーザを対話に参加させ、対話を持続させる。 FIG. 30 is a flowchart showing the processing of the inactive program 316d. When the process of step S177 is executed, the processor 64 of the PC 14a executes the speech promotion process in step S231. And if the process of step S231 is complete | finished, an inactive process will be complete | finished and it will return to the whole process. For example, by promoting the utterance of a user who is not active in the dialogue, the user is allowed to participate in the dialogue and the dialogue is continued.

図３１にはアザープログラム３１６ｅの処理を示すフロー図が示される。ＰＣ１４ａのプロセッサ６４は、ステップＳ１７９の処理が実行されると、ステップＳ２４１で注意引きつけ処理を実行する。そして、ステップＳ２４１の処理が終了すると、アザー処理を終了して、全体処理に戻る。 FIG. 31 is a flowchart showing the processing of the other program 316e. When the processing of step S179 is executed, the processor 64 of the PC 14a executes attention attracting processing in step S241. Then, when the process of step S241 ends, the other process ends, and the process returns to the overall process.

なお、注意引きつけ処理の詳細な説明は省略するが、この処理が実行されると、ロボット１０ａは、ユーザＡの注意を引きつけるように動作する。つまり、ユーザＡの興味対象がモニタ１６（ユーザＢ）でも、ロボット１０ａでなければ、ロボット１０ａは、ユーザＡが対話に興味を持つように注意を引きつける。このように、アクティブ状態ではないユーザに、対話へ興味を持たせることで、対話が持続するようになる。 Although a detailed description of the attention attracting process is omitted, when this process is executed, the robot 10a operates to attract the user A's attention. That is, even if the object of interest of the user A is the monitor 16 (user B) but not the robot 10a, the robot 10a attracts attention so that the user A is interested in the dialogue. In this way, the conversation is continued by making the user who is not in an active state interested in the conversation.

図３２には発話継続プログラム３１６ｆの処理を示すフロー図が示される。ＰＣ１４ａのプロセッサ６４は、ステップＳ２０１またはステップＳ２２５の処理が実行されると、ステップＳ２６１で行動データを参照する。つまり、ステップＳ２６１では、ユーザＡの行動データを、行動テーブルデータ３５２から参照する。続いて、ステップＳ２６３では、発話しているか否かを判断する。つまり、ユーザＡの行動データにおいて、発話判定結果が「あり」であるか否かを判断する。ステップＳ２６３で“ＹＥＳ”であれば、つまりユーザＡが発話していれば、ステップＳ２６５で傍参与者的疑似傾聴処理を実行する。また、ステップＳ２６５の処理が終了すると、発話継続処理を終了して、上位ルーチンに戻る。なお、傍参与者的疑似傾聴処理については、図３６のフロー図を用いて後述するため、ここでの詳細な説明は省略する。 FIG. 32 is a flowchart showing the processing of the utterance continuation program 316f. When the process of step S201 or step S225 is executed, the processor 64 of the PC 14a refers to the behavior data in step S261. That is, in step S261, the action data of the user A is referred from the action table data 352. Subsequently, in step S263, it is determined whether or not the user is speaking. That is, it is determined whether or not the utterance determination result is “Yes” in the action data of the user A. If “YES” in the step S263, that is, if the user A is speaking, an attendant pseudo-listening process is executed in a step S265. When the process of step S265 is completed, the speech continuation process is terminated and the process returns to the upper routine. Note that the attendant-like pseudo-listening process will be described later with reference to the flowchart of FIG. 36, and a detailed description thereof will be omitted here.

また、ステップＳ２６３で“ＮＯ”であれば、つまりユーザＡが発話していなければ、ステップＳ２６７で、ステップＳ１９１と同様に、サーバ２４とのデータ通信を確立する。続いて、ステップＳ２６９では相手の行動データを取得する。つまり、ユーザＢの行動データをサーバ２４から取得する。続いて、ステップＳ２７１では、相手の行動データを一時記憶する。つまり、取得されたユーザＢの行動データを相手行動データバッファ３４６に一旦格納する。 If “NO” in the step S263, that is, if the user A is not speaking, the data communication with the server 24 is established in a step S267 similarly to the step S191. In step S269, the other party's action data is acquired. That is, the action data of the user B is acquired from the server 24. Subsequently, in step S271, the other party's action data is temporarily stored. That is, the acquired action data of the user B is temporarily stored in the opponent action data buffer 346.

続いて、ステップＳ２７３では、相手が発話しているか否かを判断する。つまり、相手行動データバッファ３４６に記憶されている、ユーザＢの行動データを読み出し、発話判定結果が「あり」となっているか否かを判断する。 Subsequently, in step S273, it is determined whether or not the other party is speaking. That is, the behavior data of the user B stored in the partner behavior data buffer 346 is read, and it is determined whether or not the speech determination result is “Yes”.

ステップＳ２７３で“ＹＥＳ”であれば、つまりユーザＢが発話していれば、ステップＳ２７５で第１積極的疑似傾聴処理を実行する。また、ステップＳ２７５の処理が終了すれば、発話継続処理を終了して上位ルーチンに戻る。なお、第１積極的疑似傾聴処理については、図３７に示すフロー図を用いて後述するため、ここでの詳細な説明は省略する。一方、ステップＳ２７３で“ＮＯ”であれば、つまりユーザＢが発話していなければ、ステップＳ２７７で直前の動作を継続し、処理が終了すれば、発話継続処理を終了して上位ルーチンに戻る。たとえば、ステップＳ２７７では、前回の処理で「ユーザＡを見る」の動作命令が発行されていれば、今回の処理でも同じ動作命令を発行する。 If “YES” in the step S273, that is, if the user B is speaking, the first positive pseudo-listening process is executed in a step S275. When the process of step S275 is completed, the utterance continuation process is terminated and the process returns to the upper routine. Since the first active pseudo-listening process will be described later with reference to the flowchart shown in FIG. 37, a detailed description thereof is omitted here. On the other hand, if “NO” in the step S273, that is, if the user B is not speaking, the immediately preceding operation is continued in a step S277, and if the process is ended, the utterance continuing process is ended and the process returns to the upper routine. For example, in step S277, if an operation command “view user A” has been issued in the previous process, the same operation command is issued in the current process.

なお、傍参与者的疑似傾聴処理および第１積極的疑似傾聴処理では、ユーザＡまたはユーザＢに対して、ロボット１０が疑似傾聴を行うように動作命令が付与される。つまり、ユーザＡおよびユーザＢは、ロボット１０の疑似傾聴によって、自身の話を聴いてもらっているように感じることができるため、対話が持続するようになる。 In the bystander-like pseudo-listening process and the first positive pseudo-listening process, an operation command is given to the user A or the user B so that the robot 10 performs the pseudo-listening. That is, since the user A and the user B can feel as if they are listening to their own stories by the pseudo-listening of the robot 10, the dialogue continues.

以下、図３３，図３４に示すフロー図において、図３２のフローと重複するフローについては、詳細な説明を省略する。 In the flow charts shown in FIGS. 33 and 34, detailed description of the same flow as that in FIG. 32 is omitted.

図３３には発話抑制プログラム３１６ｇの処理を示すフロー図が示される。ＰＣ１４ａのプロセッサ６４は、ステップＳ２０７の処理が実行されると、ステップＳ２８１で行動データを取得する。続いて、ステップＳ２８３では発話しているか否かを判断する。ステップＳ２８３で“ＹＥＳ”であれば、つまりユーザＡが発話していれば、ステップＳ２８５で傍参与者的疑似傾聴処理を実行し、ステップＳ２８５の処理が終了すれば、発話抑制処理を終了してアクティブトーク処理に戻る。 FIG. 33 is a flowchart showing the processing of the speech suppression program 316g. When the process of step S207 is executed, the processor 64 of the PC 14a acquires behavior data in step S281. Subsequently, in step S283, it is determined whether or not a speech is being made. If “YES” in the step S283, that is, if the user A is speaking, the by-participant pseudo-listening process is executed in a step S285, and if the process in the step S285 is ended, the speech suppressing process is ended. Return to the active talk process.

一方、ステップＳ２８３で“ＮＯ”であれば、つまりユーザＡが発話していなければ、ステップＳ２８７でサーバ２４とのデータ通信を確立し、ステップＳ２８９で相手の行動データを取得する。そして、ステップＳ２９１では相手の行動データを一時記憶する。続いて、ステップＳ２９３では、相手が発話しているか否かを判断する。ステップＳ２９３で“ＹＥＳ”であれば、つまりユーザＢが発話してれば、ステップＳ２９５で第１積極的疑似傾聴処理を実行する。また、ステップＳ２９５の処理が終了すれば、発話抑制処理を終了してアクティブトーク処理に戻る。 On the other hand, if “NO” in the step S283, that is, if the user A is not speaking, the data communication with the server 24 is established in a step S287, and the action data of the other party is acquired in a step S289. In step S291, the other party's action data is temporarily stored. Subsequently, in step S293, it is determined whether or not the other party is speaking. If “YES” in the step S293, that is, if the user B speaks, the first positive pseudo-listening process is executed in a step S295. When the process of step S295 ends, the speech suppression process ends and the process returns to the active talk process.

また、ステップＳ２９３で“ＮＯ”であれば、つまりユーザＢが発話していなければ、ステップＳ２９７でユーザ発話抑制処理を実行する。そして、ステップＳ２９７の処理が終了すれば、発話抑制処理を終了してアクティブトーク処理に戻る。 If “NO” in the step S293, that is, if the user B does not speak, the user utterance suppressing process is executed in a step S297. And if the process of step S297 is complete | finished, an utterance suppression process will be complete | finished and it will return to an active talk process.

なお、ユーザ発話抑制処理の詳細な説明は後述するが、この処理が実行されると、ロボット１０はユーザの発話を抑制するように動作する。つまり、ユーザＡが一方的に発話している場合には、ユーザＡの発話が抑制される。 Although detailed description of the user utterance suppression process will be described later, when this process is executed, the robot 10 operates to suppress the user's utterance. That is, when user A speaks unilaterally, user A's speech is suppressed.

図３４には発話促進プログラム３１６ｈの処理を示すフロー図が示される。ＰＣ１４ａのプロセッサ６４は、ステップＳ２２７またはステップＳ２３１の処理が実行されると、ステップＳ３０１で行動データを参照する。続いて、ステップＳ３０３では発話しているか否かを判断する。ステップＳ３０３で“ＹＥＳ”であれば、つまりユーザＡが発話していれば、ステップＳ３０５で傍参与者的疑似傾聴処理を実行する。一方、ステップＳ３０３で“ＮＯ”であれば、つまりユーザＡが発話していなければ、ステップＳ３０７でサーバ２４とのデータ通信を確立し、ステップＳ３０９で相手の行動データを取得する。そして、ステップＳ３１１では、相手の行動データを一時記憶する。続いて、ステップＳ３１３では、相手が発話しているか否かを判断する。ステップＳ３１３で“ＹＥＳ”であれば、つまりユーザＢが発話していれば、ステップＳ３１５で第１積極的疑似傾聴処理を実行する。 FIG. 34 is a flowchart showing the processing of the speech promotion program 316h. When the process of step S227 or step S231 is executed, the processor 64 of the PC 14a refers to the behavior data in step S301. Subsequently, in step S303, it is determined whether or not the user is speaking. If “YES” in the step S303, that is, if the user A is speaking, an attendant pseudo-listening process is executed in a step S305. On the other hand, if “NO” in the step S303, that is, if the user A is not speaking, the data communication with the server 24 is established in a step S307, and the action data of the other party is acquired in a step S309. In step S311, the other party's action data is temporarily stored. Subsequently, in step S313, it is determined whether or not the other party is speaking. If “YES” in the step S313, that is, if the user B is speaking, the first positive pseudo-listening process is executed in a step S315.

また、ステップＳ３１３で“ＮＯ”であれば、つまりユーザＢが発話していなければステップＳ３１７でユーザがロボット１０を見ているか否かを判断する。つまり、ユーザＡの行動データで、視線方向判定結果が「ロボット」であるか否かを判断する。 If “NO” in the step S313, that is, if the user B is not speaking, it is determined whether or not the user is watching the robot 10 in a step S317. That is, it is determined from the action data of the user A whether the gaze direction determination result is “robot”.

ステップＳ３１７で“ＹＥＳ”であれば、つまりユーザＡがロボット１０ａを見ていれば、ステップＳ３１９でユーザ発話促進処理を実行する。そして、ステップＳ３１９の処理が終了すれば、発話促進処理を終了して、上位ルーチンに戻る。 If “YES” in the step S317, that is, if the user A is looking at the robot 10a, the user speech promoting process is executed in a step S319. And if the process of step S319 is complete | finished, a speech promotion process will be complete | finished and it will return to a high-order routine.

なお、ユーザ発話促進処理の詳細な説明は後述するが、この処理が実行されると、ロボット１０ａはユーザＡが発話するように動作する。たとえば、上位ルーチンが非アクティブ処理であれば、ユーザＡが会話に参加できるように発話を促進、つまり促すようにロボット１０が動作する。これにより、対話に非積極的なユーザが発話するため、ユーザＡが対話に参加するようになる。 Although detailed description of the user utterance promotion process will be described later, when this process is executed, the robot 10a operates so that the user A speaks. For example, if the upper routine is an inactive process, the robot 10 operates so as to promote, that is, encourage, speech so that the user A can participate in the conversation. As a result, a user who is not active in the dialogue speaks, so that the user A participates in the dialogue.

また、ステップＳ２０７，Ｓ２２７，Ｓ２９７およびＳ３１９の処理を実行するプロセッサ６４は、発話制御動作付与手段として機能する。さらに、ステップＳ２６９，Ｓ２８９およびＳ３０９の処理を実行するプロセッサ６４は取得手段として機能する。 The processor 64 that executes the processes of steps S207, S227, S297, and S319 functions as an utterance control operation giving unit. Furthermore, the processor 64 that executes the processes of steps S269, S289, and S309 functions as an acquisition unit.

一方、ステップＳ３１７で“ＮＯ”であれば、つまりユーザＡがロボット１０ａを見ていなければ、ステップＳ３２１で注意引きつけ処理を実行する。そして、ステップＳ３２１の処理が終了すれば、発話促進処理を終了して、上位ルーチンに戻る。 On the other hand, if “NO” in the step S317, that is, if the user A is not looking at the robot 10a, the attention attracting process is executed in a step S321. And if the process of step S321 is complete | finished, a speech promotion process will be complete | finished and it will return to a high-order routine.

なお、注意引きつけ処理の詳細な説明は後述するが、この処理が実行されると、ロボット１０ａはユーザＡの注意を引きつけるように動作する。たとえば、上位ルーチンが非アクティブ処理であれば、ユーザＡの注意を引きつけることで、ユーザＡが会話に参加できるようにロボット１０が動作する。これにより、対話に非積極的なユーザは注意を引きつけられ、対話に参加する。 Although a detailed description of the attention attracting process will be described later, when this process is executed, the robot 10a operates to attract the user A's attention. For example, if the upper routine is an inactive process, the robot 10 operates to attract the user A's attention so that the user A can participate in the conversation. This attracts the attention of users who are not active in the dialogue and participates in the dialogue.

なお、ステップＳ１７７，Ｓ２３１，Ｓ３１９およびＳ３２１の処理を実行するプロセッサ６４は参加動作付与手段として機能する。 Note that the processor 64 that executes the processes of steps S177, S231, S319, and S321 functions as a participation operation giving unit.

ここで、発話促進処理の上位ルーチンがアクティブリッスン処理である場合に、ステップＳ３１９またはステップＳ３２１の処理でユーザの発話を促進したり、注意を引きつけたりするようにロボット１０が動作することで、ユーザＡが発話するようになる。つまり、ユーザが相手の話を積極的に傾聴している状態が長く続けば、ユーザＡが発話が求められる。 Here, when the upper routine of the speech promotion process is the active listen process, the robot 10 operates to promote the user's speech or attract attention in the process of step S319 or step S321, so that the user A begins to speak. That is, if the user is actively listening to the other person's story for a long time, the user A is required to speak.

図３５には注意引きつけプログラム３１６ｊの処理を示すフロー図が示される。ＰＣ１４ａのプロセッサ６４は、たとえば、ステップＳ２５１の処理が実行されると、ステップＳ３３１で乱数生成処理を実行する。たとえば、ステップＳ３３１では乱数生成処理では桁数の少ない疑似乱数を１つ生成し、生成した乱数を乱数バッファ３５０に格納する。 FIG. 35 is a flowchart showing the processing of the attention attracting program 316j. For example, when the process of step S251 is executed, the processor 64 of the PC 14a executes a random number generation process in step S331. For example, in step S331, one random number with a small number of digits is generated in the random number generation process, and the generated random number is stored in the random number buffer 350.

続いて、ステップＳ３３３では、乱数が奇数であるか否かを判断する。つまり、乱数バッファ３５０に格納される乱数を「２」で割ったときの余りが「１」であるか否かを判断する。ステップＳ３３３で“ＹＥＳ”であれば、つまり乱数が奇数であれば、ステップＳ３３５で「ユーザを見る」の動作命令を発行する。つまり、ユーザＡを見るような動作命令をロボット１０ａに付与する。そして、ステップＳ３３５の処理が終了すれば、注意引きつけ処理を終了して上位ルーチンに戻る。 Subsequently, in step S333, it is determined whether or not the random number is an odd number. That is, it is determined whether or not the remainder when the random number stored in the random number buffer 350 is divided by “2” is “1”. If “YES” in the step S333, that is, if the random number is an odd number, an operation command “view user” is issued in a step S335. That is, an operation command for viewing the user A is given to the robot 10a. When the process of step S335 ends, the attention attracting process ends and the process returns to the upper routine.

また、ステップＳ３３３で“ＮＯ”であれば、つまり乱数が偶数であればステップＳ３３７で、再び乱数生成処理を実行する。また、ステップＳ３３７の処理で作成された乱数は、ステップＳ３３１で作成された乱数を消去した後に記憶される。つまり、ステップＳ３３７の処理が実行されると、乱数バッファ３５０に記憶される乱数が更新される。 If “NO” in the step S333, that is, if the random number is an even number, the random number generation process is executed again in a step S337. In addition, the random number created in the process of step S337 is stored after the random number created in step S331 is deleted. That is, when the process of step S337 is executed, the random number stored in the random number buffer 350 is updated.

続いて、ステップＳ３３９では、乱数が奇数か否かを判断する。つまり、ステップＳ３３７で生成された乱数が奇数であるか否かを判断する。ステップＳ３３９で“ＹＥＳ”であれば、つまり乱数が奇数であれば、ステップＳ３４１で「ユーザに声をかける」の動作命令を発行する。たとえば、発行される動作命令は、「ねえねえ」などの合成音声を、口３２に設けられたスピーカ５６から出力させる。そして、ユーザＡは、合成音声を聴くことで、ロボット１０ａに話しかけられたように感じる。 Subsequently, in step S339, it is determined whether or not the random number is an odd number. That is, it is determined whether or not the random number generated in step S337 is an odd number. If “YES” in the step S339, that is, if the random number is an odd number, an operation command “speak to the user” is issued in a step S341. For example, the issued operation command causes a synthesized voice such as “Hey” to be output from the speaker 56 provided in the mouth 32. Then, the user A feels as if talking to the robot 10a by listening to the synthesized voice.

また、ステップＳ３３９で“ＮＯ”であれば、つまり乱数が偶数であれば、ステップＳ３４３で「ユーザと同じものを見る」の動作命令を発行する。つまり、ユーザＡの行動データにおける、視線判定結果に基づいて、ロボット１０ａの頭部モータ４８および眼球モータ５０を駆動させる動作命令を、ロボット１０ａに付与する。 If “NO” in the step S339, that is, if the random number is an even number, an operation command “view the same as the user” is issued in a step S343. That is, an operation command for driving the head motor 48 and the eyeball motor 50 of the robot 10a is given to the robot 10a based on the line-of-sight determination result in the action data of the user A.

たとえば、ユーザＡの視線判定結果が「ロボット」であれば、ロボット１０ａが自身の胴体２８を見るように、頭部モータ４８および眼球モータ５０を駆動させる動作命令を発行する。また、視線判定結果が「モニタ」であれば、ロボット１０ａがモニタ１６ａを見るように、頭部２８を動作させる動作命令を発行する。そして、視線判定結果が「アザー」であれば、ロボット１０ａが周囲をきょろきょろするように、頭部２８を動作させる動作命令を発行する。なお、視線判定結果が４パターン以上ある場合には、さらに動作命令が多様化する。また、行動データにおける視線判定結果が空間座標で示される場合には、ロボット１０ａの空間座標位置と視線判定結果の空間座標位置とに基づいて、ユーザＡが注視する対象をより正確に判断して、見るようにしてもよい。 For example, if the user A's line-of-sight determination result is “robot”, the robot 10a issues an operation command for driving the head motor 48 and the eyeball motor 50 so as to look at its body 28. If the line-of-sight determination result is “monitor”, an operation command for operating the head 28 is issued so that the robot 10a looks at the monitor 16a. If the line-of-sight determination result is “other”, an operation command for operating the head 28 is issued so that the robot 10a stumbles around. When there are four or more line-of-sight determination results, the operation commands are further diversified. Further, when the line-of-sight determination result in the behavior data is indicated by spatial coordinates, the target to be watched by the user A is more accurately determined based on the space coordinate position of the robot 10a and the space coordinate position of the line-of-sight determination result. You may make it look.

このように、注意引きつけ処理では、ユーザを見たり、ユーザに声を掛けたり、ユーザと同じものを見たりするようにロボット１０を動作させる動作命令がランダムに発行される。そして、このようにロボット１０が動作することで、ユーザの注意が引きつけられる。 As described above, in the attention attracting process, an operation command for operating the robot 10 so as to look at the user, speak to the user, or look at the same thing as the user is randomly issued. And the user's attention is attracted by the robot 10 operating in this way.

なお、右腕３０Ｌまたは左腕３０Ｌを動かしてユーザＡの注意を引きつけてもようにしてもよい。また、ステップＳ３２１，Ｓ３３５，Ｓ３４１およびＳ３４３の処理を実行するプロセッサ６４は注意引付手段として機能する。 The user A's attention may be drawn by moving the right arm 30L or the left arm 30L. Further, the processor 64 that executes the processes of steps S321, S335, S341, and S343 functions as an attention attracting unit.

以下、図３６−図４０に示すフロー図において、図３５のフローと重複するフローについては、詳細な説明を省略する。 In the flowcharts shown in FIGS. 36 to 40, the detailed description of the same flow as that in FIG. 35 is omitted.

図３６には傍参与者的疑似傾聴プログラム３１６ｋの処理を示すフロー図が示される。ＰＣ１４ａのプロセッサ６４は、ステップＳ３６１で行動データを参照する。続いて、ステップＳ３６３では、話し始めかを判断する。たとえば、発声カウンタ３６２で発声時間がカウントされる場合には、発声カウンタ３６２の値が所定の値（たとえば、２秒を示す値）よりも小さいか否かを判断する。また、行動テーブルにおける発話判定結果に基づいて発声時間が計測されている場合には、発話判定結果が「なし」から「あり」となってから所定回数（たとえば、２回）以内であるか否かを判断する。 FIG. 36 shows a flowchart showing the processing of the by-participant pseudo-listening program 316k. The processor 64 of the PC 14a refers to the behavior data in step S361. Subsequently, in step S363, it is determined whether to start talking. For example, when the utterance time is counted by the utterance counter 362, it is determined whether or not the value of the utterance counter 362 is smaller than a predetermined value (for example, a value indicating 2 seconds). Further, when the utterance time is measured based on the utterance determination result in the action table, whether or not the utterance determination result is within a predetermined number of times (for example, two times) after “None” is changed to “Yes”. Determine whether.

ステップＳ３６３で“ＹＥＳ”であれば、つまりユーザＡが話し始めていれば、ステップＳ３６７で第２積極疑似傾聴処理を実行する。そして、ステップＳ３６７の処理が終了すれば、傍参与者的疑似傾聴処理を終了して、上位ルーチンに戻る。なお、第２積極的疑似傾聴処理については、図３９に示すフロー図を用いて後述するため、ここでの詳細な説明は省略する。 If “YES” in the step S363, that is, if the user A starts speaking, the second positive pseudo-listening process is executed in a step S367. And if the process of step S367 is complete | finished, an attendant pseudo-listening process will be complete | finished and it will return to a high-order routine. Since the second active pseudo-listening process will be described later with reference to the flowchart shown in FIG. 39, detailed description thereof is omitted here.

また、ステップＳ３６３で“ＮＯ”であれば、ステップＳ３６５でロボット１０ａを見ているか否かを判断する。つまり、参照されたユーザＡの行動データで、視線方向判定が「ロボット」であるか否かを判断する。ステップＳ３６５で“ＹＥＳ”であれば、つまりユーザＡがロボット１０ａを見ていれば、ステップＳ３６７に進む。一方、ステップＳ３６５で“ＮＯ”であれば、つまりユーザＡがロボット１０ａを見ていなければ、ステップＳ３６９で乱数生成処理を実行し、ステップＳ３７１で生成された乱数が奇数であるか否かを判断する。ステップＳ３７１で“ＮＯ”であれば、つまり乱数が偶数であれば、ステップＳ３６７に進む。一方、ステップＳ３７１で“ＹＥＳ”であれば、つまり乱数が奇数であれば、ステップＳ３７３で第１積極的疑似傾聴処理を実行する。そして、ステップＳ３７３の処理が終了すると、傍参与者的疑似傾聴処理を終了して、上位ルーチンに戻る。 If “NO” in the step S363, it is determined whether or not the robot 10a is viewed in a step S365. That is, it is determined whether or not the line-of-sight direction determination is “robot” based on the action data of the user A referred to. If “YES” in the step S365, that is, if the user A is looking at the robot 10a, the process proceeds to a step S367. On the other hand, if “NO” in the step S365, that is, if the user A is not looking at the robot 10a, a random number generation process is executed in a step S369, and it is determined whether or not the random number generated in the step S371 is an odd number. To do. If “NO” in the step S371, that is, if the random number is an even number, the process proceeds to a step S367. On the other hand, if “YES” in the step S371, that is, if the random number is an odd number, the first positive pseudo-listening process is executed in a step S373. Then, when the process of step S373 is finished, the attendant pseudo-listening process is finished, and the process returns to the upper routine.

このように、傍参与者的疑似傾聴処理では、ユーザＢに対して疑似傾聴を行う第１疑似傾聴処理と、ユーザＡに対して疑似傾聴を行う第２疑似傾聴処理とが、ユーザＡの発話や、視線に基づいて選択される。 Thus, in the by-participant pseudo-listening process, the first pseudo-listening process for performing pseudo-listening with respect to the user B and the second pseudo-listening process for performing pseudo-listening with respect to the user A include the utterance of the user A. Or based on the line of sight.

図３７には第１積極的疑似傾聴プログラム３１６ｍの処理を示すフロー図が示される。ＰＣ１４ａのプロセッサ６４は、たとえばステップＳ３７３の処理が実行されると、ステップＳ３８１で、乱数生成処理を実行し、ステップＳ３８３では生成された乱数が奇数であるか否かを判断する。ステップＳ３８３で“ＹＥＳ”であれば、つまり乱数が奇数であれば、ステップＳ３８５で「モニタ１６ａに顔を向ける」の動作命令を発行する。つまり、ユーザＢの映像が表示されるモニタ１６ａの方向に、頭部２８の頭部方向および眼球３４の視線方向が向くように、頭部モータ４８および眼球モータ５０を駆動させる動作命令を、ロボット１０ａに付与する。そして、ステップＳ３８５の処理が終了すれば、第１積極的疑似傾聴処理を終了して、上位ルーチンに戻る。 FIG. 37 shows a flowchart showing the processing of the first active pseudo-listening program 316m. For example, when the process of step S373 is executed, the processor 64 of the PC 14a executes a random number generation process in step S381, and determines in step S383 whether or not the generated random number is an odd number. If “YES” in the step S383, that is, if the random number is an odd number, an operation command “face to the monitor 16a” is issued in a step S385. That is, an operation command for driving the head motor 48 and the eyeball motor 50 so that the head direction of the head 28 and the line-of-sight direction of the eyeball 34 are directed in the direction of the monitor 16a on which the image of the user B is displayed. To 10a. Then, when the process of step S385 ends, the first active pseudo-listening process ends, and the process returns to the upper routine.

また、ステップＳ３８３で“ＮＯ”であれば、つまり乱数が偶数であれば、ステップＳ３８７で再び乱数生成処理を実行し、ステップＳ３８９で乱数が奇数であるか否かを判断する。ステップＳ３８９で“ＹＥＳ”であれば、つまり乱数が奇数であればステップＳ３９１で「モニタ１６ａ側に体を傾ける」の動作命令を発行する。つまり、体（頭部２６および胴体２８など）がモニタ１６ａに傾くように、腰モータ５２を駆動させる動作命令を、ロボット１０ａに付与する。そして、ステップＳ３９１の処理が終了すれば、第１積極的疑似傾聴処理を終了して、上位ルーチンに戻る。 If “NO” in the step S383, that is, if the random number is an even number, the random number generation process is executed again in a step S387, and it is determined whether or not the random number is an odd number in a step S389. If “YES” in the step S389, that is, if the random number is an odd number, an operation command “tilt the body toward the monitor 16a” is issued in a step S391. That is, an operation command for driving the waist motor 52 is given to the robot 10a so that the body (the head 26, the torso 28, etc.) is tilted toward the monitor 16a. And if the process of step S391 is complete | finished, a 1st active pseudo-listening process will be complete | finished, and it returns to a high-order routine.

さらに、ステップＳ３８９で“ＮＯ”であれば、つまり乱数が偶数であれば、ステップＳ３９３で「相槌をうつ」の動作命令を発行する。たとえば、人間の相槌のように、スピーカ５６から「あー」などの合成音声を出力するとともに、頭部２６がモニタ１６ａ側に頷くように頭部モータ４８を駆動させる動作命令を、ロボット１０ａに付与する。そして、ステップＳ３９３の処理が終了すれば、第１積極的疑似傾聴処理を終了して、上位ルーチンに戻る。 Further, if “NO” in the step S389, that is, if the random number is an even number, an operation instruction “depress the companion” is issued in a step S393. For example, a synthetic voice such as “Ah” is output from the speaker 56 like a human interaction, and an operation command for driving the head motor 48 so that the head 26 crawls toward the monitor 16a is given to the robot 10a. To do. When the process of step S393 is completed, the first positive pseudo-listening process is terminated, and the process returns to the upper routine.

このように、第１積極的疑似傾聴処理では、相手ユーザの映像が映るモニタ１６に対して、ロボット１０の顔（頭部２６および眼球３４の方向）を向けたり、ロボット１０の体を傾けたり、相槌をうったりすることで、相手ユーザに対して傾聴しているかのようにロボット１０が動作する。そのため、たとえユーザＡがユーザＢの話を傾聴していなくても、ロボッ１０ａがユーザＢの話を疑似傾聴することで、ユーザＢは自身の話を聴いてもらっているように感じることができる。 In this way, in the first active pseudo-listening process, the face of the robot 10 (the direction of the head 26 and the eyeball 34) is directed toward the monitor 16 on which the image of the other user is reflected, or the body of the robot 10 is tilted. The robot 10 operates as if it is listening to the other user by reciprocal. Therefore, even if the user A is not listening to the user B's story, the robot 10a can pseudo-listen to the user B's story, so that the user B can feel as if he / she is listening to his / her own story.

なお、ステップＳＳ２７５，Ｓ２９５，Ｓ３１５，Ｓ３７３，Ｓ３８５，Ｓ３９１およびＳ３９３の処理を実行するプロセッサ６４は相手傾聴動作付与手段として機能する。 In addition, the processor 64 which performs the process of step SS275, S295, S315, S373, S385, S391, and S393 functions as an other party listening action provision means.

図３８にはユーザ発話抑制プログラム３１６ｎの処理を示すフロー図が示される。ＰＣ１４ａのプロセッサ６４は、ステップＳ２９７の処理が実行されると、ステップＳ４０１で、行動データを参照する。続いて、ステップＳ４０３では、ステップＳ３６５と同様に、ロボット１０ａを見ているか否かを判断する。ステップ４０３で“ＮＯ”であれば、つまりユーザＡがロボット１０ａを見ていなければ、ステップＳ４０５で注意引きつけ処理を実行する。そして、ステップＳ４０５の処理が終了すれば、ユーザ発話抑制処理を終了して、発話抑制処理に戻る。なお、ステップＳ１７９，Ｓ２４１，Ｓ３２１，Ｓ３３５，Ｓ３４１，Ｓ３４３およびＳ４０５の処理を実行するプロセッサ６４は興味動作付与手段として機能する。 FIG. 38 is a flowchart showing the processing of the user utterance suppression program 316n. When the process of step S297 is executed, the processor 64 of the PC 14a refers to the action data in step S401. Subsequently, in step S403, as in step S365, it is determined whether or not the robot 10a is being viewed. If “NO” in the step 403, that is, if the user A is not looking at the robot 10a, the attention attracting process is executed in a step S405. And if the process of step S405 is complete | finished, a user utterance suppression process will be complete | finished and it will return to an utterance suppression process. In addition, the processor 64 which performs the process of step S179, S241, S321, S335, S341, S343, and S405 functions as an interest motion giving means.

また、ステップＳ４０３で“ＹＥＳ”であれば、つまりユーザＡがロボット１０ａを見ていなければ、ステップＳ４０７で乱数生成処理を実行し、ステップＳ４０９で乱数が奇数か否かを判断する。ステップＳ４０９で“ＹＥＳ”であれば、つまり乱数が奇数であれば、ステップＳ４１１で「ユーザを見る」の動作命令を発行する。つまり、ユーザＡの方向に、頭部２８および眼球３４の方向が向くように、頭部モータ４８および眼球モータ５０を駆動させる動作命令を、ロボット１０ａに付与する。そして、ステップＳ４１１の処理が終了すれば、ユーザ発話抑制処理を終了して、発話抑制処理に戻る。 If “YES” in the step S403, that is, if the user A is not looking at the robot 10a, a random number generation process is executed in a step S407, and it is determined whether or not the random number is an odd number in a step S409. If “YES” in the step S409, that is, if the random number is an odd number, an operation command “view user” is issued in a step S411. That is, an operation command for driving the head motor 48 and the eyeball motor 50 is given to the robot 10a so that the head 28 and the eyeball 34 are directed toward the user A. And if the process of step S411 is complete | finished, a user utterance suppression process will be complete | finished and it will return to an utterance suppression process.

一方、ステップＳ４０９で“ＮＯ”であれば、ステップＳ４１３で乱数生成処理を再度実行し、ステップＳ４１５で乱数が奇数であるか否かを判断する。ステップＳ４１５で“ＹＥＳ”であれば、つまり乱数が奇数であれば、ステップＳ４１７で「発話を止める」の動作命令を発行する。つまり、ステップＳ４１７では、発話を止めさせるような合成音声をスピーカ５６から出力させる動作命令を、ロボット１０ａに付与する。また、発話を止めさせるような合成音声とは「ちょっと待って」などである。そして、ステップＳ４１７の処理が終了すれば、ユーザ発話抑制処理を終了して、発話抑制処理に戻る。 On the other hand, if “NO” in the step S409, the random number generation process is executed again in a step S413, and it is determined whether or not the random number is an odd number in a step S415. If “YES” in the step S415, that is, if the random number is an odd number, an operation command “stop utterance” is issued in a step S417. That is, in step S417, an operation command for outputting the synthesized voice that stops the speech from the speaker 56 is given to the robot 10a. Synthetic speech that stops speech is “wait a moment”. And if the process of step S417 is complete | finished, a user's speech suppression process will be complete | finished and it will return to a speech suppression process.

さらに、ステップＳ４１５で“ＮＯ”であれば、つまり乱数が偶数であれば、ステップＳ４１９で再び乱数生成処理を実行し、ステップＳ４２１で乱数が奇数であるか否かを判断する。ステップＳ４２１で“ＹＥＳ”であれば、つまり生成された乱数が奇数であればステップＳ４２３で「モニタ１６ａを指し示す」の動作命令を発行する。つまり、左腕３０Ｌまたは右腕３０Ｒの先端がモニタ１６ａの方を指し示すように、右腕モータ４６Ｒまたは左腕モータ４６Ｌを駆動させる動作命令を、ロボット１０ａに付与する。そして、ステップＳ４２３の処理が終了すれば、ユーザ発話抑制処理を終了して、発話抑制処理に戻る。 Furthermore, if “NO” in the step S415, that is, if the random number is an even number, the random number generation process is executed again in a step S419, and it is determined whether or not the random number is an odd number in a step S421. If “YES” in the step S421, that is, if the generated random number is an odd number, an operation command “point to the monitor 16a” is issued in a step S423. That is, an operation command for driving the right arm motor 46R or the left arm motor 46L is given to the robot 10a so that the tip of the left arm 30L or the right arm 30R points toward the monitor 16a. And if the process of step S423 is complete | finished, a user utterance suppression process will be complete | finished and it will return to an utterance suppression process.

そして、ステップＳ４２１で“ＮＯ”であれば、つまり生成された乱数が偶数であれば、ステップＳ４２５で注意誘導処理を実行する。また、ステップＳ４２５の処理が終了すれば、ユーザ発話抑制処理を終了して、発話抑制処理に戻る。なお、注意誘導処理については、図４０に示すフロー図を用いて後述するため、ここでの詳細な説明は省略する。 If “NO” in the step S421, that is, if the generated random number is an even number, the attention guidance process is executed in a step S425. Moreover, if the process of step S425 is complete | finished, a user utterance suppression process will be complete | finished and it will return to an utterance suppression process. Note that the attention guidance process will be described later with reference to the flowchart shown in FIG. 40, and thus detailed description thereof will be omitted.

このように、ユーザ発話抑制処理では、積極的に発話するユーザＡに対して、ロボット１０ａがユーザＡを見たり、発話を止める音声を発したり、モニタ１６ａを指示したり、注意を引きつけたり、注意を誘導したりすることで、発話を抑制する。したがって、一方的に話すユーザの注意が誘導されるため、発話が抑制される。また、ユーザの注意が相手に誘導される場合には、相手が発話する機会を得ることができるため、対話がより持続するようになる。 In this way, in the user utterance suppression process, the robot 10a watches the user A, utters a voice that stops the utterance, instructs the monitor 16a, attracts attention to the user A who speaks actively, Suppress utterances by inducing attention. Accordingly, since the user's attention is guided, the utterance is suppressed. In addition, when the user's attention is guided to the other party, the other party can have an opportunity to speak, and thus the dialogue is further sustained.

なお、ステップＳ２９７，Ｓ４０５，Ｓ４１１，Ｓ４１７，Ｓ４２３およびＳ４２５の処理を実行するプロセッサ６４は発話抑制動作付与手段として機能する。 In addition, the processor 64 which performs the process of step S297, S405, S411, S417, S423, and S425 functions as an utterance suppression operation | movement provision means.

図３９には第２積極的疑似傾聴プログラム３１６ｐの処理を示すフロー図が示される。ＰＣ１４ａのプロセッサ６４は、ステップＳ３６７の処理が実行されると、ステップＳ４３１で乱数生成処理を実行し、ステップＳ４３３で乱数が奇数か否かを判断する。ステップＳ４３３で“ＹＥＳ”であれば、つまり乱数が奇数であれば、ステップＳ４３５で「ユーザに顔を向ける」の動作命令を発行する。つまり、つまりユーザＡの方向に、顔（頭部２６および眼球３４）の方向が向くように、頭部モータ４８および眼球モータ５０を駆動させる動作命令を、ロボット１０ａに付与する。また、ステップＳ４３５の処理が終了すれば、第２積極的疑傾聴処理を終了して、傍参与者的疑似傾聴処理に戻る。 FIG. 39 is a flowchart showing the processing of the second active pseudo-listening program 316p. When the process of step S367 is executed, the processor 64 of the PC 14a executes a random number generation process in step S431, and determines whether or not the random number is an odd number in step S433. If “YES” in the step S433, that is, if the random number is an odd number, an operation command “face the user” is issued in a step S435. That is, an operation command for driving the head motor 48 and the eyeball motor 50 is given to the robot 10a so that the face (head 26 and eyeball 34) faces the direction of the user A. Moreover, if the process of step S435 is complete | finished, a 2nd active doubt listening process will be complete | finished, and it will return to a by-participant pseudo-listening process.

一方、ステップＳ４３３で“ＮＯ”であれば、つまり乱数が偶数であれば、ステップＳ４３７で乱数生成処理を実行し、ステップＳ４３９で乱数が奇数か否かを再び判断する。ステップＳ４３９で“ＹＥＳ”であれば、つまり乱数が奇数であれば、ステップＳ４４１で「ユーザＡ側に体を傾ける」の動作命令を発行する。つまり、ロボット１０ａの体（頭部２６および胴体２８）がユーザＡ側に傾くように、腰モータ５２を駆動させる動作命令を、ロボット１０ａに付与する。 On the other hand, if “NO” in the step S433, that is, if the random number is an even number, a random number generation process is executed in a step S437, and it is determined again whether or not the random number is an odd number in a step S439. If “YES” in the step S439, that is, if the random number is an odd number, an operation command “tilt the user A side” is issued in a step S441. That is, an operation command for driving the waist motor 52 is given to the robot 10a so that the body (head 26 and torso 28) of the robot 10a is inclined toward the user A side.

また、ステップＳ４３９で“ＮＯ”であれば、つまり乱数が偶数であれば、ステップＳ４４３で「相槌をうつ」の動作命令を発行する。つまり、頭部２６がユーザＡ側に頷くように頭部モータ４８を駆動させ、人間の相槌のような合成音声をスピーカ５６から出力させる動作命令を、ロボット１０ａに付与する。そして、ステップＳ４４３の処理が終了すれば、第２積極的疑似傾聴処理を終了して、傍参与者的疑似傾聴処理に戻る。 If “NO” in the step S439, that is, if the random number is an even number, an operation instruction “depress the companion” is issued in a step S443. That is, the head motor 48 is driven so that the head 26 crawls toward the user A side, and an operation command for outputting synthesized speech like a human interaction from the speaker 56 is given to the robot 10a. And if the process of step S443 is complete | finished, a 2nd positive pseudo-listening process will be complete | finished, and it will return to an attendant pseudo-listening process.

このように、ロボット１０ａは、ユーザＡに対して顔を向けたり、体を傾けたり、相槌をうったりすることで、ユーザＡの発話を傾聴しているかのような動作を行う。 In this way, the robot 10a performs an operation as if listening to the user A's utterance by turning his / her face, tilting his / her body, or listening to the user A.

なお、ステップＳ１７１，Ｓ１７３，Ｓ２０１，Ｓ２２５，Ｓ２６５，Ｓ２８５，Ｓ３０５，Ｓ３６７，Ｓ４３５，Ｓ４４１およびＳ４４３の処理を実行するプロセッサ６４は傾聴動作付与手段として機能する。 The processor 64 that executes the processes of steps S171, S173, S201, S225, S265, S285, S305, S367, S435, S441, and S443 functions as a listening operation imparting unit.

図４０には注意誘導プログラム３１６ｑの処理を示すフロー図が示される。ＰＣ１４ａのプロセッサ６４は、ステップＳ４２５の処理が実行されると、ステップＳ４５１で乱数生成処理を実行し、ステップＳ４５３で乱数が奇数であるか否かを判断する。そして、ステップＳ４５３で“ＹＥＳ”であれば、つまり乱数が奇数であれば、ステップＳ４５５で「モニタ１６ａを見る」の動作命令を発行する。たとえば、ステップＳ４５５では、モニタ１６ａの方向に、頭部２８の頭部方向および眼球３４の視線方向が向くように、頭部モータ４８および眼球モータ５０を駆動させる動作命令を、ロボット１０ａに付与する。そして、ステップＳ４５５の処理が終了すれば、注意誘導処理を終了して、ユーザ発話抑制処理に戻る。 FIG. 40 is a flowchart showing the processing of the attention guidance program 316q. When the process of step S425 is executed, the processor 64 of the PC 14a executes a random number generation process in step S451, and determines whether or not the random number is an odd number in step S453. If “YES” in the step S453, that is, if the random number is an odd number, an operation command “view the monitor 16a” is issued in a step S455. For example, in step S455, an operation command for driving the head motor 48 and the eyeball motor 50 is given to the robot 10a so that the head direction of the head 28 and the line-of-sight direction of the eyeball 34 are directed toward the monitor 16a. . And if the process of step S455 is complete | finished, an attention guidance process will be complete | finished and it will return to a user utterance suppression process.

一方、ステップＳ４５３で“ＮＯ”であれば、つまり乱数が偶数であれば、ステップＳ４５７で乱数生成処理を再度実行し、ステップＳ４５９で乱数が奇数であるか否かを判断する。そして、ステップＳ４５９で“ＹＥＳ”であれば、つまり乱数が奇数であれば、ステップＳ４５５に進む。一方、ステップＳ４５９で“ＮＯ”であれば、つまり生成された乱数が偶数であれば、ステップＳ４６１で、ステップＳ４１１と同様に、「ユーザＡを見る」の動作命令を発行する。そして、ステップＳ４６１の処理が終了すれば、注意誘導処理を終了して、ユーザ発話抑制処理に戻る。 On the other hand, if “NO” in the step S453, that is, if the random number is an even number, the random number generation process is executed again in a step S457, and it is determined whether or not the random number is an odd number in a step S459. If “YES” in the step S459, that is, if the random number is an odd number, the process proceeds to a step S455. On the other hand, if “NO” in the step S459, that is, if the generated random number is an even number, an operation command “view user A” is issued in a step S461 similarly to the step S411. And if the process of step S461 is complete | finished, an attention guidance process will be complete | finished and it will return to a user utterance suppression process.

このように、注意誘導処理が何度か実行されると、ユーザＡとモニタ１６ａとを交互に見るような動作を行うようになるため、ユーザＡの注意が誘導される。特に本実施例では、ロボット１０ａは、ユーザＡよりもモニタ１６ａを見る回数が多くなるように設定されているため、ユーザＡの注意はモニタ１６ａ、つまりユーザＢに誘導される。また、他の実施例では、注意誘導処理を複数回繰り返すことで、より効果的にユーザＡの注意を誘導してもよい。 As described above, when the attention inducing process is executed several times, the user A and the monitor 16a are alternately viewed so that the user A's attention is induced. In particular, in this embodiment, the robot 10a is set so that the number of times of watching the monitor 16a is larger than that of the user A, so that the user A's attention is guided to the monitor 16a, that is, the user B. In another embodiment, the user A's attention may be more effectively guided by repeating the attention guidance process a plurality of times.

なお、ステップＳ４２５，Ｓ４５５およびＳ４６１の処理を実行するプロセッサ６４は注意誘導手段として機能する。 The processor 64 that executes the processes of steps S425, S455, and S461 functions as a caution guiding unit.

図４１にはユーザ発話促進プログラム３１６ｒの処理を示すフロー図が示される。ＰＣ１４ａのプロセッサ６４は、ステップＳ３１９の処理が実行されると、ステップＳ４７１で「ユーザに質問する」の動作命令を発行する。たとえば、ステップＳ４７１では、「今日の食事のことについて聞いてみたら？」などの合成音声を、口３２に設けられたスピーカ５６から出力されるような動作命令を、ロボット１０ａに付与する。そして、ステップＳ４７１の処理が終了すれば、ユーザ発話促進処理を終了して、発話促進処理に戻る。 FIG. 41 is a flowchart showing the processing of the user utterance promotion program 316r. When the process of step S319 is executed, the processor 64 of the PC 14a issues an operation command “Ask the user a question” in step S471. For example, in step S471, an operation command is output to the robot 10a so that a synthesized voice such as “What should I ask about today's meal?” Is output from the speaker 56 provided in the mouth 32. And if the process of step S471 is complete | finished, a user's speech promotion process will be complete | finished and it will return to a speech promotion process.

このように、ユーザＡがユーザＢに対して質問をさせるようなことを、ロボット１０ａが発話することで、ユーザＢが発話するようになり、対話が持続するようになる。 Thus, when the robot 10a speaks that the user A asks the user B a question, the user B speaks, and the dialogue continues.

なお、ステップＳ２２７，Ｓ３１９およびＳ４７１の処理を実行するプロセッサ６４は発話促進動作付与手段として機能する。また、ステップＳ３１９およびＳ４７１の処理を実行するプロセッサ６４は発話促し手段として機能する。 Note that the processor 64 that executes the processes of steps S227, S319, and S471 functions as an utterance promoting operation imparting unit. In addition, the processor 64 that executes the processes of steps S319 and S471 functions as a speech prompting unit.

図４２にはカメラ制御プログラム３１８の処理を示すフロー図が示される。ＰＣ１４ａのプロセッサ６４は、テレビ電話機能によって通話が始まると、ステップＳ４８１でモニタカメラ２２ａの画像を送信する。つまり、テレビ電話機能において利用される画像データを、ネットワーク２００を介してＰＣ１４ｂに送信する。続いて、ステップＳ４８３では、興味対象認識結果を参照する。つまり、状態データ３５４に含まれる、興味対象の認識結果を参照する。 FIG. 42 is a flowchart showing the processing of the camera control program 318. When a call is started by the videophone function, the processor 64 of the PC 14a transmits an image of the monitor camera 22a in step S481. That is, the image data used in the videophone function is transmitted to the PC 14b via the network 200. Subsequently, in step S483, an interest recognition result is referred to. That is, the recognition result of the object of interest included in the state data 354 is referred to.

続いて、ステップＳ４８５では、アザー状態か否かを判断する。つまり、ユーザＡの興味対象の認識結果が「アザー」であるか否かを判断する。ステップＳ４８５で“ＹＥＳ”であれば、つまりユーザＡがロボット１０ａまたはモニタ１６ａを見ていなければ、ステップＳ４８３に戻る。一方、ステップＳ４８５で“ＮＯ”であれば、つまりユーザＡがロボット１０ａまたはモニタ１６ａを見ていれば、ステップＳ４８７でモニタ状態か否かを判断する。つまり、ユーザＡの興味対象がモニタ１６であるか否かを判断する。 Subsequently, in step S485, it is determined whether or not the state is the other state. That is, it is determined whether or not the recognition result of the object of interest of the user A is “other”. If “YES” in the step S485, that is, if the user A is not looking at the robot 10a or the monitor 16a, the process returns to the step S483. On the other hand, if “NO” in the step S485, that is, if the user A is looking at the robot 10a or the monitor 16a, it is determined whether or not the monitor state is in a step S487. That is, it is determined whether or not the object of interest of the user A is the monitor 16.

ステップＳ４８７で“ＹＥＳ”であれば、つまりユーザＡの興味対象がモニタ１６ａであれば、ステップＳ４８９でモニタカメラ２２ａの画像を送信する。つまり、ユーザＡがモニタ１６を見ることで、ユーザＡの顔がモニタカメラ２２ａによって正面から撮影されていれば、ステップＳ４８９で、モニタカメラ２２ａの画像をＰＣ１４ｂに送信する。一方、ステップＳ４８７で“ＮＯ”であれば、つまりユーザＡの興味対象がロボット１０ａであれば、ステップＳ４９１で腹部カメラ１２ａの画像を送信する。つまり、ユーザＡがロボット１０ａを見ることで、ユーザＡの顔が腹部カメラ１２ａによって正面から撮影されていれば、ステップＳ４９１で、腹部カメラ１２ａの画像をＰＣ１４ｂに送信する。 If “YES” in the step S487, that is, if the object of interest of the user A is the monitor 16a, the image of the monitor camera 22a is transmitted in a step S489. That is, if the user A looks at the monitor 16 and the face of the user A is photographed from the front by the monitor camera 22a, the image of the monitor camera 22a is transmitted to the PC 14b in step S489. On the other hand, if “NO” in the step S487, that is, if the object of interest of the user A is the robot 10a, the image of the abdominal camera 12a is transmitted in a step S491. That is, if the user A looks at the robot 10a and the face of the user A is photographed from the front by the abdominal camera 12a, the image of the abdominal camera 12a is transmitted to the PC 14b in step S491.

なお、ステップＳ４８９またはステップＳ４９１の処理が終了すれば、ステップＳ４８３に戻る。また、ステップＳ４８９およびＳ４９１の処理を実行するプロセッサ６４は画像送信手段として機能する。 In addition, if the process of step S489 or step S491 is completed, the process returns to step S483. The processor 64 that executes the processes of steps S489 and S491 functions as an image transmission unit.

これにより、たとえばユーザＡの顔が認識されている画像がＰＣ１４ｂに送信されるため、モニタ１６ｂに表示される画像には必ずユーザＡの顔が表示されるようになる。そのため、ユーザＢは、たとえユーザＡがロボット１０ａに話しかけている状態であったとしても、ユーザＢ自身に話しかけられているように感じるため、対話が持続するようになる。 Thereby, for example, an image in which the face of the user A is recognized is transmitted to the PC 14b, so that the face of the user A is always displayed in the image displayed on the monitor 16b. Therefore, even if the user A is in a state where the user A is talking to the robot 10a, the user B feels that the user B is talking to the user B himself, so that the dialogue continues.

この実施例によれば、傾聴対話持続システム１００は、相手ユーザが表示されるモニタ１６、ユーザの音声が集音されるマイク２０およびユーザを撮影するモニタカメラ２２と接続されるＰＣ１４と、腹部カメラ１２が設けられるロボット１０とを含む。 According to this embodiment, the listening dialogue sustaining system 100 includes a monitor 16 on which the other user is displayed, a microphone 20 on which the user's voice is collected, and a PC 14 connected to the monitor camera 22 that photographs the user, and an abdominal camera. And a robot 10 provided with 12.

ＰＣ１４では、モニタカメラ２２および腹部カメラ１２によって撮影されたユーザの画像と、マイク２０によって集音された音声とに基づいてユーザの行動が判定され、さらにメモリ６８に記憶される。また、ＰＣ１４では、第１所定時間分の行動データからユーザの状態を認識する。そして、たとえばユーザの状態が「アクティブ・トーク・モニタ」と認識されていれば、ロボット１０は、ユーザに対して疑似傾聴を行うように、動作命令がＰＣ１４から付与される。また、ユーザの状態が「パッシブ・リッスン・モニタ」と認識されていれば、ロボット１０はユーザに対して話しかけるなどして注意を引きつけ、ユーザを対話に参加させる。 In the PC 14, the user's action is determined based on the user's image taken by the monitor camera 22 and the abdominal camera 12 and the sound collected by the microphone 20, and is further stored in the memory 68. Further, the PC 14 recognizes the state of the user from the action data for the first predetermined time. For example, if the user's state is recognized as “active talk monitor”, the robot 10 is given an operation command from the PC 14 so as to perform pseudo-listening to the user. Further, if the user's state is recognized as “passive listen monitor”, the robot 10 draws attention by talking to the user and causes the user to participate in the dialogue.

このように、ユーザの状態に応じてロボット１０が対話を持続させるように動作するため、コミュニケーション障害のある者同士の対話を持続させることができる。 Thus, since the robot 10 operates so as to maintain the dialogue according to the state of the user, the dialogue between persons with communication disabilities can be sustained.

なお、腹部カメラ１２およびモニタカメラ２２以外に、モニタ１６およびロボット１０以外の位置でユーザの顔を撮影する第３カメラを設置し、その第３カメラによってユーザの興味対象が「アザー」であるか否かの判定を補完的に行うようにしてもよい。つまり、第３カメラによって、ユーザの顔認識が失敗した状態と、ユーザの興味対象が「アザー」である状態とを区別できるようにする。そして、たとえば顔認識が失敗したときの行動データを興味対象認識処理に反映させないようにすることで、傾聴対話持続システム１００がユーザの興味対象を精度よく認識できるようにしてもよい。 In addition to the abdominal camera 12 and the monitor camera 22, a third camera that captures the face of the user at a position other than the monitor 16 and the robot 10 is installed, and whether the user's interest is “other” by the third camera. You may make it complementarily determine whether it is no. In other words, the third camera can distinguish between a state in which the user's face recognition has failed and a state in which the user's interest is “other”. Then, for example, by not reflecting the action data when the face recognition fails in the interest recognition process, the listening dialogue sustaining system 100 may recognize the user's interest with high accuracy.

さらに、ユーザの視線方向を特定する第４カメラを設置し、その第４カメラによってユーザの視界にロボット１０が入っているか否かの判定を補完的に行うようにしてもよい。そして、傾聴対話持続システム１００では、上記判定結果に基づいて、ロボット１０による発話や、ユーザの注意の引きつけの必要性が判断されてもよい。たとえば、ＰＣ１４は、ユーザの視界にロボット１０が入っていなければ、ロボット１０を必ず発話するように動作させる。一方、ＰＣ１４は、ユーザの視界にロボット１０が入っていれば、ロボット１０を必ずユーザの注意の引きつけるように動作させる。 Furthermore, a fourth camera for specifying the user's line-of-sight direction may be installed, and the determination of whether or not the robot 10 is in the user's field of view may be performed complementarily by the fourth camera. Then, in the listening dialogue sustaining system 100, the necessity of utterance by the robot 10 or the user's attention may be determined based on the determination result. For example, if the robot 10 is not in the user's field of view, the PC 14 operates the robot 10 to speak. On the other hand, if the robot 10 is in the user's field of view, the PC 14 always operates the robot 10 so as to attract the user's attention.

このように、カメラを３台以上設置することで、興味対象の認識精度を向上させたり、ロボット１０をより効果的に動作させたりすることができる。なお、第３カメラおよび第４カメラの代わりに、１台のカメラで上記２つの判定が行われてもよい。また、より多くのカメラを設置することで、ユーザの行動の判定および状態の認識の精度を向上させたり、ロボット１０に付与される動作命令の判断をより高度なものにしたりしてもよい。 In this way, by installing three or more cameras, it is possible to improve the recognition accuracy of the object of interest or to operate the robot 10 more effectively. Note that the above two determinations may be performed by one camera instead of the third camera and the fourth camera. Further, by installing more cameras, it is possible to improve the accuracy of the determination of the user's action and the recognition of the state, or to make the determination of the operation command given to the robot 10 more advanced.

また、他の実施例は、ＰＣ１４がロボット１０に内蔵されていてもよく、この場合にロボット１０は自律的に傾聴動作や発話機能を実行する。また、この場合、ロボット１０は、ユーザのようなコミュニケーションの対象（コミュニケーション対象）との間で、身振り手振りのような身体動作および音声の少なくとも一方を含むコミュニケーション行動を実行する機能を有する相互作用指向のロボット（コミュニケーションロボット）であってもよい。 In another embodiment, the PC 14 may be built in the robot 10, and in this case, the robot 10 autonomously performs a listening operation and a speech function. Further, in this case, the robot 10 has an interaction-oriented function having a function of executing a communication action including at least one of a body motion such as gesture gesture and a voice with a communication target (communication target) such as a user. Or a robot (communication robot).

また、ＰＣ１４およびネットワーク２００を利用せずに電話網などを介して、ユーザの画像と音声とがＰＣ１４に送受信されてもよい。また、電話機能をモニタ１６などが有していれば、ＰＣ１４を利用せずに、モニタ１６、スピーカ１８、マイク２０およびモニタカメラ２２のみでテレビ電話機の機能が実現されてもよい。そして、モニタ１６、スピーカ１８、マイク２０およびモニタカメラ２２が同一の筐体に組み込まれてもよい。さらに、この場合、ロボット１０とテレビ電話機とが接続された状態で、テレビ電話の通話が開始される。 Further, the user's image and sound may be transmitted to and received from the PC 14 via a telephone network or the like without using the PC 14 and the network 200. If the monitor 16 or the like has a telephone function, the video phone function may be realized by using only the monitor 16, the speaker 18, the microphone 20, and the monitor camera 22 without using the PC 14. And the monitor 16, the speaker 18, the microphone 20, and the monitor camera 22 may be integrated in the same housing | casing. Further, in this case, a videophone call is started with the robot 10 and the videophone connected.

また、ＰＣ１４に代えて、サーバ２４によってユーザの行動および状態が判定および認識され、ロボット１０に対して動作命令を付与するようにしてもよい。この場合、腹部カメラ１２およびモニタカメラ２２によって撮影された画像と、マイク２０によって集音された音声とはサーバ２４に直接送信され、ロボット１０はサーバ２４からの動作命令に従って動作する。さらに、ユーザの状態を認識する処理だけが、サーバ２４で実行されてもよい。つまり、図２２−図２５に示す処理がサーバ２４のプロセッサ８０によって実行され、ＰＣ１４は第１所定時間毎に状態認識結果を取得する。 Further, instead of the PC 14, the user's action and state may be determined and recognized by the server 24, and an operation command may be given to the robot 10. In this case, the image captured by the abdominal camera 12 and the monitor camera 22 and the sound collected by the microphone 20 are directly transmitted to the server 24, and the robot 10 operates according to an operation command from the server 24. Further, only the process for recognizing the user state may be executed by the server 24. That is, the processing shown in FIGS. 22 to 25 is executed by the processor 80 of the server 24, and the PC 14 acquires the state recognition result every first predetermined time.

また、傾聴対話持続システム１００の他の実施例としては、ユーザＡおよびユーザＢの状態認識結果によらず、ユーザＡの状態データのみに基づいてロボット１０ａが「疑似傾聴動作」、「発話制御動作」および「注意の引きつけの動作」を行ってもよい。たとえば、ユーザＡが発話すれば、ロボット１０ａが疑似傾聴動作を行い、ユーザＡの発声時間が長くなれば、ロボット１０ａがユーザＡを見ることで発話を抑制する。そして、ユーザＡが発話していなかったり、モニタ１６ａおよびロボット１０ａを見ていなかったりすれば、ロボット１０ａはユーザＡの注意を引きつける動作を行う。 As another example of the listening dialogue sustaining system 100, the robot 10 a performs “pseudo-listening operation”, “utterance control operation” based on only the state data of the user A regardless of the state recognition results of the user A and the user B. And “attracting attention” may be performed. For example, if the user A speaks, the robot 10a performs a pseudo-listening operation, and if the speaking time of the user A becomes longer, the robot 10a looks at the user A and suppresses the utterance. If the user A is not speaking or is not looking at the monitor 16a and the robot 10a, the robot 10a performs an operation to attract the user A's attention.

そして、状態データは行動データと同様、図６に示すテーブルのような形式で記憶されてもよい。また、状態データがテーブル形式で記憶される際には、ユーザの状態変化を示すログデータとして利用されてもよい。そして、状態データのテーブルに基づいて作成されるグラフが、モニタ１６に表示されてもよい。 And state data may be memorize | stored in the format like the table shown in FIG. 6 similarly to action data. Further, when the state data is stored in a table format, it may be used as log data indicating a user state change. A graph created based on the state data table may be displayed on the monitor 16.

１０ａ，１０ｂ …ロボット
１２ａ，１２ｂ …腹部カメラ
１４ａ，１４ｂ …ＰＣ
１６ａ，１６ｂ …モニタ
２０ａ，２０ｂ …マイク
２２ａ，２２ｂ …モニタカメラ
２４ …サーバ
２６ …頭部
２８ …胴体
３０Ｒ …右腕
３０Ｌ …左腕
３２ …口
３４ …眼球
４６Ｒ …右腕モータ
４６Ｌ …左腕モータ
４８ …頭部モータ
５０ …眼球モータ
５２ …腰モータ
５６ …スピーカ
６４ …プロセッサ
６８ …メモリ
７０ …視線サーバ
７２ …通信ＬＡＮボード
７０ …無線通信装置
１００ …傾聴対話持続システム
２００ …ネットワーク 10a, 10b ... Robot 12a, 12b ... Abdominal camera 14a, 14b ... PC
16a, 16b ... monitor 20a, 20b ... microphone 22a, 22b ... monitor camera 24 ... server 26 ... head 28 ... trunk 30R ... right arm 30L ... left arm 32 ... mouth 34 ... eyeball 46R ... right arm motor 46L ... left arm motor 48 ... head Motor 50 ... Eye motor 52 ... Waist motor 56 ... Speaker 64 ... Processor 68 ... Memory 70 ... Line of sight server 72 ... Communication LAN board 70 ... Wireless communication device 100 ... Listening dialogue sustaining system 200 ... Network

Claims

A listening dialogue sustaining system comprising a video phone and a robot including a first camera and a microphone,
Determining means for determining a user's action based on an image captured by the first camera and sound collected by the microphone;
Recognizing means for recognizing the state of the user from the action for the first predetermined time determined by the determining means, and the robot so as to maintain the conversation based on the state of the user recognized by the recognizing means. A listening dialogue sustaining system, comprising an action imparting means for operating the device.

The recognizing means includes positive recognition means for recognizing active and inactive dialogues,
2. The listening dialogue sustaining system according to claim 1, wherein the motion giving unit includes a listening motion giving unit that causes the robot to perform a pseudo-listening to the user based on a recognition result of the positiveness recognition unit.

The recognizing means further includes a speaker state recognizing means for recognizing a listener side state and a speaker side state in a dialogue,
It further comprises measuring means for measuring the utterance time of the user or the other party,
3. The listening according to claim 1, wherein the action giving unit further includes an utterance control action giving unit that moves the robot so as to control the user's utterance based on the utterance time measured by the measuring unit. Dialog persistence system.

The utterance control means is an active utterance state that is recognized as active by the aggressiveness recognition means and is recognized as a speaker side state by the speaker state recognition means, and when the utterance time exceeds a threshold value, The listening dialogue sustaining system according to claim 3, further comprising an utterance suppression operation giving unit that operates the robot so that the user's utterance is suppressed.

5. The listening dialogue sustaining system according to claim 4, wherein the utterance suppression operation giving unit includes a caution induction unit that operates the robot so as to guide the user's attention.

When the utterance control means is in an active listening state that is recognized as active by the positiveness recognition means and is recognized as a listening state by the speaker state recognition means, the utterance of the user is promoted. The listening dialogue sustaining system according to any one of claims 3 to 5, further comprising an utterance promotion operation giving means for operating the robot.

6. The operation giving means further includes participation action giving means for operating the robot so that the user participates in a dialogue when the action recognition means is recognized as inactive by the positiveness recognition means. Listening dialog system described in Crab.

The listening dialogue sustaining system according to claim 7, wherein the participation motion giving means includes attention attracting means for operating the robot to attract the user's attention.

The listening dialogue sustaining system according to claim 7 or 8, wherein the participation motion giving unit further includes an utterance prompting unit that operates the robot so as to prompt the user to speak.

A network to which the robot is connected;
A server connected to the network;
A transmission means for transmitting the action of the user determined by the determination means to the server; and an acquisition means for acquiring the action of the other user from the server,
The said action provision means further contains the other party listening action provision means which makes the said robot perform pseudo-listening with respect to the said other party user based on the other party's action acquired by the said acquisition means, and a user's action. 9. The listening dialogue sustaining system according to any of 9

The recognition means further includes an interest object recognition means for recognizing the interest object of the user,
11. The action giving means further includes an interest action giving means for operating the robot so that the user is interested in a dialogue based on a recognition result by the interest object recognition means. The dialogue listening and sustaining system described.

The robot includes a second camera;
Face recognition means for executing face recognition processing on each of the images from the first camera and the second camera;
The interesting object recognition means includes a first recognition rate calculation means for calculating a first recognition rate of the first camera from a face recognition result for a second predetermined time by the face recognition means, and a second predetermined time by the face recognition means. And a setting means for setting a recognition result based on the first recognition rate and the second recognition rate. 11. The dialogue listening and sustaining system according to 11.

13. The dialogue listening sustaining system according to claim 12, further comprising image transmission means for transmitting either the image of the first camera or the second camera based on the object of interest set by the setting means.

The determination unit includes a posture determination unit that determines the posture of the user, an utterance determination unit that determines the presence or absence of the user's utterance, a head direction determination unit that determines the head direction of the user, and a gaze direction of the user. Gaze direction determining means for determining;
The said user action is determined based on a posture determination result, an utterance determination result, a head direction determination result, a gaze direction determination result, a whisper determination result, and a conflict determination result. Interactive listening system.