JP2006247780A

JP2006247780A - Communication robot

Info

Publication number: JP2006247780A
Application number: JP2005066734A
Authority: JP
Inventors: Noriaki Mitsunaga; 法明光永; Takayuki Kanda; 崇行神田; Hiroshi Ishiguro; 浩石黒
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2005-03-10
Filing date: 2005-03-10
Publication date: 2006-09-21
Anticipated expiration: 2025-03-10
Also published as: JP5120745B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a communication robot, achieving natural interaction as if it is like a human being. <P>SOLUTION: This communication robot 12 includes a CPU, wherein the CPU manages the whole processing of the robot 12. The robot 12 detects the distance (to a personal) between the robot and the human being 14, and the direction of the face of the human being 14 to the robot 12. The robot 12 detects the felicity of interaction parameters (a distance to a person, the watch time, the motion start time, the motion speed), that is, comfort/discomfort of interaction, and the interaction parameters are updated to optimize the above. Thus, comfortable interaction conformable to individuals can be performed. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

この発明はコミュニケーションロボットに関し、特にたとえば、人間との間で発話および身体動作の少なくとも一方を含むインタラクション行動を行う、コミュニケーションロボットに関する。 The present invention relates to a communication robot, and more particularly to a communication robot that performs an interaction action including at least one of speech and body movement with a human.

自然なコミュニケーションを行うためには、相手への適応が重要であり、たとえば人が快適に過ごすためには、適度なパーソナルスペースが必要である。このパーソナルスペースはコミュニケーションの内容により異なることが非特許文献１に開示されている。また、非特許文献２に開示されるように、視線を合わせる頻度が人により異なることも知られている。快適なパーソナルスペースや視線頻度は人により異なるが、人はコミュニケーション相手に合わせ、互いに快適さを保っている。たとえば、相手が近すぎると感じれば少し離れ、また、非特許文献３に開示されるように、見つめられ過ぎると感じれば視線を反らす。さらに、非特許文献４〜非特許文献６に開示されるように、人と関わるロボットについて、パーソナルスペースの考え方を応用する研究が行われている。 In order to communicate naturally, adaptation to the partner is important. For example, in order for a person to spend comfortably, an appropriate personal space is required. Non-Patent Document 1 discloses that this personal space differs depending on the content of communication. In addition, as disclosed in Non-Patent Document 2, it is also known that the frequency of matching the line of sight varies from person to person. Comfortable personal spaces and eye-gaze frequency vary from person to person, but people are comfortable with each other according to their communication partners. For example, if the other party feels too close, he / she leaves a little, and as disclosed in Non-Patent Document 3, he / she turns his / her gaze when he / she feels too staring. Furthermore, as disclosed in Non-Patent Document 4 to Non-Patent Document 6, researches are being made on the application of the concept of personal space for robots related to humans.

また、この種のコミュニケーションロボットに近似する背景技術の一例が特許文献１に開示される。この特許文献１によれば、行動パターン生成装置は、たとえば、ロボットに適用される。行動パターン生成装置は、ロボットに対するユーザの対人距離を検出し、対人距離に応じてロボットに対するユーザの親密度を求めて、親密度に応じて、ユーザがロボットをコミュニケーション対象としているかどうかを判断するようにしてある。また、行動パターン生成装置では、ユーザの音声の強弱やトーンの高低、さらには血圧や体温等に基づいて、ユーザの感情が推定される。行動パターン生成装置は、対人距離やユーザの感情に応じた行動をロボットに実行させるようにしてある。
E. T. Hall. The Hidden Dimension. Double Day Publishing, 1966. S. Duncan jr. and D. W. Fiske. Face-to-Face Interaction: Research, Methods, and Theory. Lawrence Erlbaum Associates, Inc., Publishers, 1977. E. Sundstrom and I. Altman. Interpersonal relationships and personal space: Research review and theoretical model. Human Ecology, 4(1), 1976. 中嶋移動体ロボットに対するヒトの個体距離に関する研究．博士論文、九州芸術工科大学，１９９８． Y. Nakauchi and R. Simmons. A social robot that stands in line. Autonomous Robots, 1:313-324, 2002. T. Tasaki, S. Matsumoto, K. Komatani, T. Ogata, H. G. Okuno. Dynamic communication of humanoid robot with multiple people based on interaction distance. In Proc. of International Workshop on Robot and Human Interaction (Ro-Man-2004), pp.81-86, IEEE, 2004. 特開２００４−６６３６７号 An example of background technology approximating this type of communication robot is disclosed in Patent Document 1. According to this patent document 1, the behavior pattern generation device is applied to a robot, for example. The behavior pattern generation device detects the user's interpersonal distance with respect to the robot, obtains the user's intimacy with respect to the robot according to the interpersonal distance, and determines whether the user is communicating with the robot according to the intimacy It is. Further, in the behavior pattern generation device, the user's emotion is estimated based on the strength of the user's voice, the level of the tone, the blood pressure, the body temperature, and the like. The behavior pattern generation device causes the robot to perform a behavior according to the interpersonal distance or the user's emotion.
ET Hall. The Hidden Dimension. Double Day Publishing, 1966. S. Duncan jr. And DW Fiske. Face-to-Face Interaction: Research, Methods, and Theory. Lawrence Erlbaum Associates, Inc., Publishers, 1977. E. Sundstrom and I. Altman. Interpersonal relationships and personal space: Research review and theoretical model.Human Ecology, 4 (1), 1976. Nakajima A study on the individual distance of a human to a mobile robot. Doctoral dissertation, Kyushu Institute of Technology, 1998. Y. Nakauchi and R. Simmons.A social robot that stands in line.Autonomous Robots, 1: 313-324, 2002. T. Tasaki, S. Matsumoto, K. Komatani, T. Ogata, HG Okuno.Dynamic communication of humanoid robot with multiple people based on interaction distance.In Proc. Of International Workshop on Robot and Human Interaction (Ro-Man-2004) , pp.81-86, IEEE, 2004. JP 2004-66367 A

しかし、背景技術のロボットでは、パーソナルスペースは固定的であり、個人に適応させたものは存在しなかった。ただし、特許文献１に開示される行動パターン生成装置では、ユーザの音声の強弱やトーンの高低、さらには、血圧や体温等に基づいてユーザの感情を推定するようにしてあるため、この点では、個人および個人の感情に適応させたコミュニケーション（インタラクション）を行っていると言えるが、対人距離については、閾値処理により、ロボットとコミュニケーションしているか否かを判断するのみである。つまり、適切なパーソナルスペースを個人に適応させていなかった。このため、ロボットとコミュニケーションするユーザないし人間は、コミュニケーションにおいて不快に感じてしまうこともあった。 However, in the background art robot, the personal space is fixed, and there is no one adapted to the individual. However, in the behavior pattern generation device disclosed in Patent Literature 1, the user's emotion is estimated based on the strength of the user's voice, the level of the tone, and the blood pressure, the body temperature, and the like. It can be said that the communication (interaction) adapted to the individual and the emotion of the individual is performed, but the interpersonal distance is only determined whether or not the communication with the robot is performed by threshold processing. In other words, an appropriate personal space was not adapted to the individual. For this reason, a user or a person who communicates with the robot may feel uncomfortable in the communication.

それゆえに、この発明の主たる目的は、新規な、コミュニケーションロボットを提供することである。 Therefore, the main object of the present invention is to provide a novel communication robot.

この発明の他の目的は、人同士のような自然なインタラクションを実現できる、コミュニケーションロボットを提供することである。 Another object of the present invention is to provide a communication robot capable of realizing natural interactions like people.

請求項１の発明は、人間との間でインタラクションするコミュニケーションロボットであって、インタラクションについてのパラメータを設定するパラメータ設定手段、パラメータ設定手段によって設定されたパラメータに従って発話および身体動作の少なくとも一方を含むインタラクションを実行するインタラクション実行手段、インタラクション中におけるパラメータの適切度を検出する適切度検出手段、および適切度検出手段によって検出された適切度を最適化する最適化手段を備える、コミュニケーションロボットである。 The invention according to claim 1 is a communication robot that interacts with a human, and includes parameter setting means for setting parameters for the interaction, and interaction including at least one of speech and physical movement according to the parameters set by the parameter setting means. It is a communication robot provided with the interaction execution means which performs, the appropriateness detection means which detects the appropriateness of the parameter in interaction, and the optimization means which optimizes the appropriateness detected by the appropriateness detection means.

請求項１の発明では、コミュニケーションロボットは、人間との間で、身体動作および発話少なくとも一方を含むインタラクション行動を実行する。パラメータ設定手段は、インタラクション（インタラクション行動）についてのパラメータを設定する。インタラクション実行手段は、パラメータ設定手段によって設定されたパラメータに従ってインタラクション行動を実行する。適切度検出手段は、インタラクション中におけるパラメータの適切度を検出する。ここで、インタラクション（コミュニケーション）相手としての人間がインタラクションを快いと感じている場合には、パラメータの適切度は高いと言える。一方、人間がインタラクションを不快に感じている場合には、パラメータの適切度は低いと言える。たとえば、インタラクションを不快に感じているか否かは、コミュニケーションロボットに対する人間の距離（移動距離）、コミュニケーションロボットに対する人間の顔の向き、人間が貧乏ゆすりをしているか否か、人間の顔の表情（笑い（柔らかい）、辛い（硬い））や人間の足音の大小で知ることができる。最適化手段は、適切度検出手段によって検出された適切度を最適化する。つまり、インタラクションパラメータがインタラクション相手に適応される。 According to the first aspect of the present invention, the communication robot performs an interaction action including at least one of body movement and speech with a human. The parameter setting means sets a parameter for interaction (interaction action). The interaction executing means executes the interaction action according to the parameter set by the parameter setting means. The appropriateness detection means detects the appropriateness of the parameter during the interaction. Here, if a person as an interaction (communication) partner feels comfortable, it can be said that the appropriateness of the parameter is high. On the other hand, when a human feels uncomfortable, it can be said that the appropriateness of the parameter is low. For example, whether or not the interaction feels uncomfortable depends on the distance (movement distance) of the human relative to the communication robot, the orientation of the human face relative to the communication robot, whether or not the human is poverty, Laughter (soft), hard (hard) and human footsteps can be known. The optimization means optimizes the appropriateness detected by the appropriateness detection means. That is, the interaction parameter is adapted to the interaction partner.

請求項１の発明によれば、インタラクションパラメータをインタラクション相手に適応させるので、インタラクションを重ねるに従って、快適にインタラクションを行うことができる。したがって、人同士のような自然なコミュニケーションが可能である。 According to the first aspect of the present invention, the interaction parameter is adapted to the interaction partner, so that the interaction can be performed comfortably as the interaction is repeated. Therefore, natural communication like people is possible.

請求項２の発明は請求項１に従属し、インタラクション中における人間の移動距離を検出する移動距離検出手段、およびインタラクション中において人間がコミュニケーションロボット自身の顔を見る時間を検出する時間検出手段をさらに備え、適切度検出手段は、パラメータ設定手段によって設定されたパラメータでインタラクションを実行したときの行動距離検出手段および時間検出手段の少なくとも一方の検出結果に基づいて、当該パラメータの適切度を検出する。 The invention of claim 2 is dependent on claim 1, and further includes a moving distance detecting means for detecting a moving distance of the human during the interaction, and a time detecting means for detecting a time during which the human views the face of the communication robot itself. The appropriateness detection means detects the appropriateness of the parameter based on the detection result of at least one of the action distance detection means and the time detection means when the interaction is executed with the parameter set by the parameter setting means.

請求項２の発明では、コミュニケーションロボットは、行動距離検出手段および時間検出手段をさらに備える。移動距離検出手段は、インタラクション中における人間の移動距離を検出する。また、時間検出手段は、インタラクション中において人間が自身の顔を見ている時間すなわち注視している時間を検出する。たとえば、インタラクション中における人間の移動距離が長い（大きい）場合や注視時間が短い場合には、人間はインタラクションに不快さを感じていると判断できる。逆に、移動距離が短い（小さい）場合や注視時間が長い場合には、人間はインタラクションを快適である感じていると判断できる。適切度検出手段は、パラメータ設定手段によって設定されたパラメータでインタラクションを実行したときの行動距離検出手段および時間検出手段の少なくとも一方の検出結果に基づいて、当該パラメータの適切度を検出する。 According to a second aspect of the present invention, the communication robot further includes action distance detecting means and time detecting means. The movement distance detection means detects the movement distance of the person during the interaction. The time detection means detects the time during which the human is looking at his / her face during the interaction, that is, the time during which he / she is gazing. For example, when the movement distance of a person during interaction is long (large) or when the gaze time is short, it can be determined that the person feels uncomfortable with the interaction. Conversely, when the moving distance is short (small) or when the gaze time is long, it can be determined that the person feels comfortable in the interaction. The appropriateness detecting means detects the appropriateness of the parameter based on the detection result of at least one of the action distance detecting means and the time detecting means when the interaction is executed with the parameter set by the parameter setting means.

請求項２の発明によれば、インタラクション中における人間の所作に基づいてインタラクションの快適さを知ることができ、快適さを増大させるように、パラメータを最適化することができる。 According to the second aspect of the present invention, it is possible to know the comfort of the interaction based on the human action during the interaction, and it is possible to optimize the parameters so as to increase the comfort.

請求項３の発明は請求項１または２に従属し、パラメータは、人間とのインタラクションにおける対人距離、人間の顔に自身の顔を向ける時間の長さ、発話から身体動作の動作開始までの遅れ時間および身体動作の動作速度の少なくとも１つを含む。 The invention of claim 3 is dependent on claim 1 or 2, and the parameters are the interpersonal distance in the interaction with the human, the length of time for directing his / her face to the human face, the delay from the utterance to the start of the movement of the body movement Including at least one of time and speed of physical motion.

請求項３の発明では、パラメータは、ロボットと人間とがコミュニケーションする場合に、インタラクションの快適さを決定すると考えられる成分を含む。具体的には、パラメータは、人間とのインタラクションにおける対人距離、人間の顔に自身の顔を向ける時間の長さ（注視時間）、発話から身体動作の動作開始までの遅れ時間および身体動作の動作速度の少なくとも１つを含む。 In the invention of claim 3, the parameter includes a component that is considered to determine the comfort of interaction when the robot and the human being communicate with each other. Specifically, the parameters are the interpersonal distance in the interaction with the human, the length of time to turn his face to the human face (gaze time), the delay time from the utterance to the start of the physical motion, and the physical motion Including at least one of the speeds.

請求項３の発明によれば、インタラクションの快適さを決定すると考えられる成分を更新するようにすれば、パラメータの適切度を最適化して、快適なインタラクションを実現することができる。 According to the third aspect of the present invention, if the component that is considered to determine the comfort of the interaction is updated, the appropriateness of the parameter can be optimized to realize a comfortable interaction.

請求項４の発明は請求項３に従属し、対人距離は、親密距離、個体距離および社会距離を含む。 The invention of claim 4 is dependent on claim 3, and the interpersonal distance includes an intimate distance, an individual distance, and a social distance.

請求項４の発明では、対人距離は、親密距離、個体距離および社会距離を含む。これは、インタラクション行動の種類に応じて適切な対人距離を、個人に適応して取るようにさせるためである。たとえば、自己紹介や挨拶のようなインタラクション行動を実行する場合には、社会距離が取られる。 In the invention of claim 4, the interpersonal distance includes an intimate distance, an individual distance, and a social distance. This is to allow an appropriate interpersonal distance to be adapted to the individual according to the type of interaction action. For example, when performing an interaction action such as self-introduction or greeting, social distance is taken.

請求項４の発明によれば、パラメータの対人距離として親密距離、個体距離および社会距離を含むので、インタラクション行動の種類に応じた対人距離を、個人に対応して取らせることができる。 According to the invention of claim 4, since the intimate distance, the individual distance, and the social distance are included as the interpersonal distance of the parameter, the interpersonal distance corresponding to the type of interaction action can be taken corresponding to the individual.

請求項５の発明は請求項１ないし４のいずれかに従属し、最適化手段は、パラメータを更新するパラメータ更新手段を含む。 The invention of claim 5 is dependent on any one of claims 1 to 4, and the optimization means includes parameter update means for updating a parameter.

請求項５の発明では、パラメータ更新手段が、パラメータを更新する。したがって、たとえば、インタラクションする度に、パラメータの適切度を最適化されるように、パラメータを更新することができる。 In the invention of claim 5, the parameter updating means updates the parameter. Thus, for example, each time an interaction is performed, the parameter can be updated so that the appropriateness of the parameter is optimized.

請求項５の発明によれば、インタラクションを行う度に、パラメータを更新して、パラメータの適切度を最適化するので、インタラクションを繰り返すに従ってより快適なインタラクションを行うことができる。 According to the fifth aspect of the present invention, since the parameter is updated each time the interaction is performed and the appropriateness of the parameter is optimized, more comfortable interaction can be performed as the interaction is repeated.

請求項６の発明は請求項１ないし５のいずれかに従属し、パラメータを人間に対応して記憶するパラメータ記憶手段、およびインタラクションの開始時に人間を識別する人間識別手段をさらに備え、パラメータ設定手段は、人間識別手段によって識別された人間に対応するパラメータがパラメータ記憶手段によって記憶されているとき、当該パラメータを設定し、人間識別手段によって識別された人間に対応するパラメータがパラメータ記憶手段によって記憶されていないとき、パラメータ記憶手段によって記憶されているすべてのパラメータの平均値を設定する。 The invention of claim 6 is dependent on any one of claims 1 to 5, and further comprises parameter storage means for storing parameters corresponding to humans, and human identification means for identifying humans at the start of interaction, parameter setting means When the parameter corresponding to the person identified by the human identification means is stored in the parameter storage means, the parameter is set, and the parameter corresponding to the person identified by the human identification means is stored in the parameter storage means. If not, the average value of all parameters stored by the parameter storage means is set.

請求項６の発明では、パラメータ記憶手段は、パラメータを人間に対応して記憶する。つまり、人間との間でインタラクションを実行し、最適化されたパラメータを当該人間に対応して記憶する。人間識別手段は、インタラクション開始時に人間を識別する。パラメータ設定手段は、人間識別手段によって識別された人間に対応するパラメータがパラメータ記憶手段によって記憶されているとき、つまり以前インタラクションした相手であれば、当該パラメータを設定する。しかし、人間識別手段によって識別された人間に対応するパラメータがパラメータ記憶手段によって記憶されていないとき、つまり以前インタラクションした相手でなければ、パラメータ記憶手段によって記憶されているすべてのパラメータの平均値を設定する。ただし、かかる場合には、今回インタラクションする人間と似ている人間についてのパラメータを設定するようにしてもよい。 In the invention of claim 6, the parameter storage means stores the parameter corresponding to the person. That is, an interaction is performed with a human and the optimized parameters are stored corresponding to the human. The human identification means identifies a human at the start of interaction. The parameter setting means sets the parameter when the parameter corresponding to the person identified by the human identification means is stored in the parameter storage means, that is, if the other party has previously interacted. However, if the parameter corresponding to the person identified by the human identification means is not stored by the parameter storage means, that is, if it is not a previously interacting partner, the average value of all parameters stored by the parameter storage means is set. To do. However, in such a case, parameters for a person similar to the person who interacts this time may be set.

請求項６の発明によれば、インタラクションした経験がある人間に対しては前回最適化されたパラメータを用いるので、今回のインタラクションでは、その当初から比較的快適なインタラクションを実行できる。 According to the sixth aspect of the present invention, since the parameter optimized last time is used for a person who has experience of interaction, relatively comfortable interaction can be executed from the beginning in this interaction.

この発明によれば、インタラクション時の人間の移動距離および顔の向きに基づいてインタラクションについてのパラメータの適切度を検出し、これを最適化するので、インタラクション相手に適応させることができる。つまり、個人に適応したインタラクションにより、人同士のような自然なコミュニケーションを実現することができる。 According to the present invention, since the appropriateness of the parameter for the interaction is detected and optimized based on the moving distance and the face direction of the person at the time of interaction, it can be adapted to the interaction partner. In other words, natural communication like people can be realized by interaction adapted to individuals.

この発明の上述の目的，その他の目的，特徴および利点は、図面を参照して行う以下の実施例の詳細な説明から一層明らかとなろう。 The above object, other objects, features and advantages of the present invention will become more apparent from the following detailed description of embodiments with reference to the drawings.

図１を参照して、この実施例のコミュニケーションロボットシステム（以下、単に「システム」という。）１０は、コミュニケーションロボット（以下、単に「ロボット」という。）１２を含む。このロボット１２は、たとえば人間１４のようなコミュニケーションの対象（相手）とコミュニケーションすることを目的とした相互作用指向のものであり、身体動作（身振り、手振り）および発話（音声）の少なくとも一方を用いたコミュニケーション（インタラクション）の行動（以下、「インタラクション行動」ということがある。）を行う機能を備えている。 Referring to FIG. 1, a communication robot system (hereinafter simply referred to as “system”) 10 of this embodiment includes a communication robot (hereinafter simply referred to as “robot”) 12. The robot 12 is interaction-oriented for the purpose of communicating with a communication target (partner) such as a human 14, and uses at least one of body movement (gesture, hand gesture) and speech (voice). It has a function to perform the communication (interaction) action (hereinafter sometimes referred to as “interaction action”).

ロボット１２は、人間のような身体を有し、その身体を用いてインタラクションのために必要な複雑な身体動作を生成する。具体的には、図２を参照して、ロボット１２は台車３２を含み、この台車３２の下面には、このロボット１２を自律移動させる車輪３４が設けられる。この車輪３４は、車輪モータ（ロボット１２の内部構成を示す図３において参照番号「３６」で示す。）によって駆動され、台車３２すなわちロボット１２を前後左右任意の方向に動かすことができる。 The robot 12 has a human-like body and uses the body to generate complex body movements necessary for interaction. Specifically, referring to FIG. 2, the robot 12 includes a carriage 32, and wheels 34 for autonomously moving the robot 12 are provided on the lower surface of the carriage 32. The wheel 34 is driven by a wheel motor (indicated by reference numeral “36” in FIG. 3 showing the internal configuration of the robot 12), and the carriage 32, that is, the robot 12 can be moved in any direction.

なお、図２では示さないが、この台車３２の前面には、衝突センサ（図３において参照番号「３８」で示す。）が取り付けられ、この衝突センサ３８は、台車３２への人や他の障害物の接触を検知する。そして、ロボット１２の移動中に障害物との接触を検知すると、直ちに車輪３４の駆動を停止してロボット１２の移動を急停止させる。 Although not shown in FIG. 2, a collision sensor (indicated by reference numeral “38” in FIG. 3) is attached to the front surface of the carriage 32, and the collision sensor 38 is connected to a person or other person to the carriage 32. Detect obstacle contact. When contact with an obstacle is detected during the movement of the robot 12, the driving of the wheels 34 is immediately stopped and the movement of the robot 12 is suddenly stopped.

また、ロボット１２の背の高さは、この実施例では、人、特に子供に威圧感を与えることがないように、１００ｃｍ程度とされている。ただし、この背の高さは任意に変更可能である。 In this embodiment, the height of the robot 12 is about 100 cm so as not to intimidate people, particularly children. However, this height can be arbitrarily changed.

台車３２の上には、多角形柱のセンサ取付パネル４０が設けられ、このセンサ取付パネル４０の各面には、超音波距離センサ４２が取り付けられる。この超音波距離センサ４２は、取付パネル４０すなわちロボット１２の周囲の主として人との間の距離を計測するものである。 A polygonal column sensor mounting panel 40 is provided on the carriage 32, and an ultrasonic distance sensor 42 is mounted on each surface of the sensor mounting panel 40. The ultrasonic distance sensor 42 measures the distance between the mounting panel 40, that is, the person around the robot 12 mainly.

台車３２の上には、さらに、ロボット１２の胴体が、その下部が上述の取付パネル４０に囲まれて、直立するように取り付けられる。この胴体は下部胴体４４と上部胴体４６とから構成され、これら下部胴体４４および上部胴体４６は、連結部４８によって連結される。連結部４８には、図示しないが、昇降機構が内蔵されていて、この昇降機構を用いることによって、上部胴体４６の高さすなわちロボット１２の高さを変化させることができる。昇降機構は、後述のように、腰モータ（図３において参照番号「５０」で示す。）によって駆動される。上で述べたロボット１２の身長１００ｃｍは、上部胴体４６をそれの最下位置にしたときの値である。したがって、ロボット１２の身長は１００ｃｍ以上にすることができる。 Further, the body of the robot 12 is mounted on the carriage 32 so that the lower portion thereof is surrounded by the mounting panel 40 described above and stands upright. The body is composed of a lower body 44 and an upper body 46, and the lower body 44 and the upper body 46 are connected by a connecting portion 48. Although not shown, the connecting portion 48 has a built-in lifting mechanism, and the height of the upper body 46, that is, the height of the robot 12 can be changed by using the lifting mechanism. As will be described later, the elevating mechanism is driven by a waist motor (indicated by reference numeral “50” in FIG. 3). The height 100 cm of the robot 12 described above is a value when the upper body 46 is at its lowest position. Therefore, the height of the robot 12 can be 100 cm or more.

上部胴体４６のほぼ中央には、１つの全方位カメラ５２と、１つのマイク１６とが設けられる。全方位カメラ５２は、ロボット１２の周囲を撮影するもので、後述の眼カメラ５４と区別される。マイク１６は、周囲の音、とりわけ人の声を取り込む。 One omnidirectional camera 52 and one microphone 16 are provided in the approximate center of the upper body 46. The omnidirectional camera 52 photographs the surroundings of the robot 12 and is distinguished from an eye camera 54 described later. The microphone 16 captures ambient sounds, particularly human voice.

上部胴体４６の両肩には、それぞれ、肩関節５６Ｒおよび５６Ｌによって、上腕５８Ｒおよび５８Ｌが取り付けられる。肩関節５６Ｒおよび５６Ｌは、それぞれ３軸の自由度を有する。すなわち、右肩関節５６Ｒは、Ｘ軸，Ｙ軸およびＺ軸の各軸廻りにおいて上腕５８Ｒの角度を制御できる。Ｙ軸は、上腕５８Ｒの長手方向（または軸）に平行な軸であり、Ｘ軸およびＺ軸は、そのＹ軸に、それぞれ異なる方向から直交する軸である。左肩関節５６Ｌは、Ａ軸，Ｂ軸およびＣ軸の各軸廻りにおいて上腕５８Ｌの角度を制御できる。Ｂ軸は、上腕５８Ｌの長手方向（または軸）に平行な軸であり、Ａ軸およびＣ軸は、そのＢ軸に、それぞれ異なる方向から直交する軸である。 Upper arms 58R and 58L are attached to both shoulders of the upper body 46 by shoulder joints 56R and 56L, respectively. The shoulder joints 56R and 56L each have three degrees of freedom. That is, the right shoulder joint 56R can control the angle of the upper arm 58R around each of the X, Y, and Z axes. The Y axis is an axis parallel to the longitudinal direction (or axis) of the upper arm 58R, and the X axis and the Z axis are axes orthogonal to the Y axis from different directions. The left shoulder joint 56L can control the angle of the upper arm 58L around each of the A, B, and C axes. The B axis is an axis parallel to the longitudinal direction (or axis) of the upper arm 58L, and the A axis and the C axis are axes orthogonal to the B axis from different directions.

上腕５８Ｒおよび５８Ｌのそれぞれの先端には、肘関節６０Ｒおよび６０Ｌを介して、前腕６２Ｒおよび６２Ｌが取り付けられる。肘関節６０Ｒおよび６０Ｌは、それぞれ、Ｗ軸およびＤ軸の軸廻りにおいて、前腕６２Ｒおよび６２Ｌの角度を制御できる。 Forearms 62R and 62L are attached to the respective distal ends of upper arms 58R and 58L via elbow joints 60R and 60L. The elbow joints 60R and 60L can control the angles of the forearms 62R and 62L around the W axis and the D axis, respectively.

なお、上腕５８Ｒおよび５８Ｌならびに前腕６２Ｒおよび６２Ｌ（いずれも図２）の変位を制御するＸ，Ｙ，Ｚ，Ｗ軸およびＡ，Ｂ，Ｃ，Ｄ軸では、「０度」がホームポジションであり、このホームポジションでは、上腕５８Ｒおよび５８Ｌならびに前腕６２Ｒおよび６２Ｌは下方向に向けられる。 In the X, Y, Z, W axes and the A, B, C, D axes that control the displacement of the upper arms 58R and 58L and the forearms 62R and 62L (FIG. 2), “0 degree” is the home position. In this home position, the upper arms 58R and 58L and the forearms 62R and 62L are directed downward.

また、図２では示さないが、上部胴体４６の肩関節５６Ｒおよび５６Ｌを含む肩の部分や上述の上腕５８Ｒおよび５８Ｌならびに前腕６２Ｒおよび６２Ｌを含む腕の部分には、それぞれ、タッチセンサ（図３において参照番号６４で包括的に示す。）が設けられていて、これらのタッチセンサ６４は、人がロボット１２のこれらの部位に接触したかどうかを検知する。 Although not shown in FIG. 2, a touch sensor (FIG. 3) is provided on the shoulder portion including the shoulder joints 56R and 56L of the upper body 46 and the arm portion including the upper arms 58R and 58L and the forearms 62R and 62L. The touch sensor 64 detects whether or not a person has touched these parts of the robot 12.

前腕６２Ｒおよび６２Ｌのそれぞれの先端には、手に相当する球体６６Ｒおよび６６Ｌがそれぞれ固定的に取り付けられる。ただし、指の機能（握る、掴む、摘むなど）が必要な場合には、球体６６Ｒおよび６６Ｌに代えて、人の手の形をした「手」を用いることも可能である。 Spheres 66R and 66L corresponding to hands are fixedly attached to the tips of the forearms 62R and 62L, respectively. However, when a finger function (gripping, grasping, picking, etc.) is required, a “hand” in the shape of a human hand may be used instead of the spheres 66R and 66L.

上部胴体４６の中央上方には、首関節６８を介して、頭部７０が取り付けられる。この首関節６８は、３軸の自由度を有し、Ｓ軸，Ｔ軸およびＵ軸の各軸廻りに角度制御可能である。Ｓ軸は首から真上に向かう軸であり、Ｔ軸およびＵ軸は、それぞれ、このＳ軸に対して異なる方向で直交する軸である。頭部７０には、人の口に相当する位置に、スピーカ７２が設けられる。スピーカ７２は、ロボット１２が、それの周囲の人に対して音声または声によってコミュニケーションを図るために用いられる。ただし、スピーカ７２は、ロボット１２の他の部位たとえば胴体に設けられてもよい。 A head 70 is attached to an upper center of the upper body 46 via a neck joint 68. The neck joint 68 has three degrees of freedom and can be controlled in angle around each of the S, T, and U axes. The S-axis is an axis that goes directly from the neck, and the T-axis and the U-axis are axes that are orthogonal to the S-axis in different directions. The head 70 is provided with a speaker 72 at a position corresponding to a human mouth. The speaker 72 is used for the robot 12 to communicate with a person around it by voice or voice. However, the speaker 72 may be provided in another part of the robot 12, for example, the trunk.

また、頭部７０には、目に相当する位置に眼球部７４Ｒおよび７４Ｌが設けられる。眼球部７４Ｒおよび７４Ｌは、それぞれ眼カメラ５４Ｒおよび５４Ｌを含む。なお、右の眼球部７４Ｒおよび左の眼球部７４Ｌをまとめて眼球部７４といい、右の眼カメラ５４Ｒおよび左の眼カメラ５４Ｌをまとめて眼カメラ５４ということもある。眼カメラ５４は、ロボット１２に接近した人の顔や他の部分ないし物体等を撮影してその映像信号を取り込む。 The head 70 is provided with eyeball portions 74R and 74L at positions corresponding to the eyes. Eyeball portions 74R and 74L include eye cameras 54R and 54L, respectively. The right eyeball portion 74R and the left eyeball portion 74L may be collectively referred to as an eyeball portion 74, and the right eye camera 54R and the left eye camera 54L may be collectively referred to as an eye camera 54. The eye camera 54 captures the video signal by photographing the face of the person approaching the robot 12 and other parts or objects.

なお、上述の全方位カメラ５２および眼カメラ５４のいずれも、たとえばＣＣＤやＣＭＯＳのような固体撮像素子を用いるカメラであってよい。 Note that each of the omnidirectional camera 52 and the eye camera 54 described above may be a camera using a solid-state imaging device such as a CCD or a CMOS.

たとえば、眼カメラ５４は眼球部７４内に固定され、眼球部７４は眼球支持部（図示せず）を介して頭部７０内の所定位置に取り付けられる。眼球支持部は、２軸の自由度を有し、α軸およびβ軸の各軸廻りに角度制御可能である。α軸およびβ軸は頭部７０に対して設定される軸であり、α軸は頭部７０の上へ向かう方向の軸であり、β軸はα軸に直交しかつ頭部７０の正面側（顔）が向く方向に直交する方向の軸である。この実施例では、頭部７０がホームポジションにあるとき、α軸はＳ軸に平行し、β軸はＵ軸に平行するように設定されている。このような頭部７０において、眼球支持部がα軸およびβ軸の各軸廻りに回転されることによって、眼球部７４ないし眼カメラ５４の先端（正面）側が変位され、カメラ軸すなわち視線方向が移動される。 For example, the eye camera 54 is fixed in the eyeball part 74, and the eyeball part 74 is attached to a predetermined position in the head 70 via an eyeball support part (not shown). The eyeball support unit has two degrees of freedom and can be controlled in angle around each of the α axis and the β axis. The α axis and the β axis are axes set with respect to the head 70, the α axis is an axis in a direction toward the top of the head 70, the β axis is orthogonal to the α axis and the front side of the head 70 It is an axis in a direction orthogonal to the direction in which (face) faces. In this embodiment, when the head 70 is at the home position, the α axis is set to be parallel to the S axis and the β axis is set to be parallel to the U axis. In such a head 70, when the eyeball support portion is rotated around each of the α axis and the β axis, the tip (front) side of the eyeball portion 74 or the eye camera 54 is displaced, and the camera axis, that is, the line-of-sight direction is changed. Moved.

なお、眼カメラ５４の変位を制御するα軸およびβ軸では、「０度」がホームポジションであり、このホームポジションでは、図２に示すように、眼カメラ５４のカメラ軸は頭部７０の正面側（顔）が向く方向に向けられ、視線は正視状態となる。 In the α axis and β axis that control the displacement of the eye camera 54, “0 degree” is the home position. At this home position, the camera axis of the eye camera 54 is the head 70 as shown in FIG. The direction of the front side (face) is directed, and the line of sight is in the normal viewing state.

図３には、ロボット１２の内部構成を示すブロック図が示される。この図３に示すように、ロボット１２は、全体の制御のためにマイクロコンピュータまたはＣＰＵ７６を含み、このＣＰＵ７６には、バス７８を通して、メモリ８０，モータ制御ボード８２，センサ入力／出力ボード８４および音声入力／出力ボード８６が接続される。 FIG. 3 is a block diagram showing the internal configuration of the robot 12. As shown in FIG. 3, the robot 12 includes a microcomputer or a CPU 76 for overall control. The CPU 76 is connected to a memory 80, a motor control board 82, a sensor input / output board 84, and a voice through a bus 78. An input / output board 86 is connected.

メモリ８０は、図示しないが、ＲＯＭやＨＤＤ、ＲＡＭ等を含み、ＲＯＭまたはＨＤＤにはこのロボット１２の制御プログラムおよびデータ等が予め格納されている。ＣＰＵ７６は、このプログラムに従って処理を実行する。具体的には、ロボット１２の身体動作を制御するための複数のプログラム（行動モジュールと呼ばれる。）が記憶される。たとえば、行動モジュールが示す身体動作としては、「握手」、「抱っこ」、「指差し」…などがある。行動モジュールが示す身体動作が「握手」である場合には、当該行動モジュールを実行すると、ロボット１２は、たとえば、右手を前に差し出す。また、行動モジュールが示す身体動作が「抱っこ」である場合には、当該行動モジュールを実行すると、ロボット１２は、たとえば、両手を広げた状態で前に差し出し、人間が近づくと、両手を閉じる。さらに、行動モジュールが示す身体動作が「指差し」である場合には、当該行動モジュールを実行すると、ロボット１２は、たとえば、右手（右腕）または左手（左腕）で所望の方向を指示する。また、ＲＡＭは、一時記憶メモリとして用いられるとともに、ワーキングメモリとして利用され得る。 Although not shown, the memory 80 includes a ROM, an HDD, a RAM, and the like, and the control program and data for the robot 12 are stored in the ROM or the HDD in advance. The CPU 76 executes processing according to this program. Specifically, a plurality of programs (referred to as action modules) for controlling the body movement of the robot 12 are stored. For example, the body motion indicated by the behavior module includes “handshake”, “holding”, “pointing”, and so on. When the body motion indicated by the behavior module is “handshake”, when the behavior module is executed, the robot 12 presents the right hand forward, for example. Further, when the body motion indicated by the behavior module is “cuckling”, when the behavior module is executed, the robot 12 presents the hands forward with, for example, both hands open, and closes both hands when a human approaches. Further, when the body motion indicated by the behavior module is “pointing”, when the behavior module is executed, the robot 12 indicates a desired direction with, for example, the right hand (right arm) or the left hand (left arm). The RAM can be used as a working memory as well as a temporary storage memory.

モータ制御ボード８２は、たとえばＤＳＰ(Digital Signal Processor)で構成され、右腕、左腕、頭および眼等の身体部位を駆動するためのモータを制御する。すなわち、モータ制御ボード８２は、ＣＰＵ７６からの制御データを受け、右肩関節５６ＲのＸ，ＹおよびＺ軸のそれぞれの角度を制御する３つのモータと右肘関節６０ＲのＷ軸の角度を制御する１つのモータを含む計４つのモータ（図３ではまとめて、「右腕モータ」として示す。）８８の回転角度を調節する。また、モータ制御ボード８２は、左肩関節５６ＬのＡ，ＢおよびＣ軸のそれぞれの角度を制御する３つのモータと左肘関節６０ＬのＤ軸の角度を制御する１つのモータとを含む計４つのモータ（図３ではまとめて、「左腕モータ」として示す。）９０の回転角度を調節する。モータ制御ボード８２は、また、首関節６８のＳ，ＴおよびＵ軸のそれぞれの角度を制御する３つのモータ（図３ではまとめて、「頭部モータ」として示す。）９２の回転角度を調節する。モータ制御ボード８２は、また、腰モータ５０、および車輪３４を駆動する２つのモータ（図３ではまとめて、「車輪モータ」として示す。）３６を制御する。さらに、モータ制御ボード８２は、右眼球部７４Ｒのα軸およびβ軸のそれぞれの角度を制御する２つのモータ（図３ではまとめて、「右眼球モータ」として示す。）９４の回転角度を調節し、また、左眼球部７４Ｌのα軸およびβ軸のそれぞれの角度を制御する２つのモータ（図３ではまとめて、「左眼球モータ」として示す。）９６の回転角度を調節する。 The motor control board 82 is composed of, for example, a DSP (Digital Signal Processor) and controls a motor for driving body parts such as the right arm, the left arm, the head, and the eyes. That is, the motor control board 82 receives control data from the CPU 76 and controls the angles of the X axis, the Y axis and the Z axis of the right shoulder joint 56R and the W axis angle of the right elbow joint 60R. The rotation angle of a total of four motors including one motor (collectively shown as “right arm motor” in FIG. 3) 88 is adjusted. The motor control board 82 includes a total of four motors including three motors that control the angles of the A, B, and C axes of the left shoulder joint 56L and one motor that controls the angle of the D axis of the left elbow joint 60L. The rotation angle of the motor (collectively shown as “left arm motor” in FIG. 3) 90 is adjusted. The motor control board 82 also adjusts the rotation angle of three motors 92 (collectively shown as “head motors” in FIG. 3) that control the angles of the S, T, and U axes of the neck joint 68. To do. The motor control board 82 also controls the waist motor 50 and the two motors 36 that drive the wheels 34 (collectively shown as “wheel motors” in FIG. 3). Further, the motor control board 82 adjusts the rotation angle of two motors 94 (collectively shown as “right eyeball motor” in FIG. 3) that control the angles of the α axis and β axis of the right eyeball portion 74R. In addition, the rotation angles of two motors 96 that collectively control the angles of the α axis and β axis of the left eyeball portion 74L (collectively shown as “left eyeball motor” in FIG. 3) 96 are adjusted.

なお、この実施例の上述のモータは、車輪モータ３６を除いて、制御を簡単化するためにそれぞれステッピングモータまたはパルスモータであるが、車輪モータ３６と同様に、直流モータであってよい。 The above-described motors of this embodiment are stepping motors or pulse motors for simplifying the control except for the wheel motors 36, but may be direct-current motors as with the wheel motors 36.

センサ入力／出力ボード８４も、同様に、ＤＳＰで構成され、各センサやカメラからの信号を取り込んでＣＰＵ７６に与える。すなわち、超音波距離センサ４２の各々からの反射時間に関するデータがこのセンサ入力／出力ボード８４を通して、ＣＰＵ７６に入力される。また、全方位カメラ５２からの映像信号が、必要に応じてこのセンサ入力／出力ボード８４で所定の処理が施された後、ＣＰＵ７６に入力される。眼カメラ５４からの映像信号も、同様にして、ＣＰＵ７６に与えられる。また、タッチセンサ６４からの信号がセンサ入力／出力ボード８４を介してＣＰＵ７６に与えられる。 Similarly, the sensor input / output board 84 is also constituted by a DSP, and takes in signals from each sensor and camera and gives them to the CPU 76. That is, data relating to the reflection time from each of the ultrasonic distance sensors 42 is input to the CPU 76 through the sensor input / output board 84. The video signal from the omnidirectional camera 52 is input to the CPU 76 after being subjected to predetermined processing by the sensor input / output board 84 as necessary. Similarly, the video signal from the eye camera 54 is also supplied to the CPU 76. Further, a signal from the touch sensor 64 is given to the CPU 76 via the sensor input / output board 84.

スピーカ７２には音声入力／出力ボード８６を介して、ＣＰＵ７６から、合成音声データが与えられ、それに応じて、スピーカ７２からはそのデータに従った音声または声が出力される。また、マイク２４からの音声入力が、音声入力／出力ボード８６を介してＣＰＵ７６に取り込まれる。 Synthetic voice data is given to the speaker 72 from the CPU 76 via the voice input / output board 86, and accordingly, voice or voice according to the data is outputted from the speaker 72. Further, the voice input from the microphone 24 is taken into the CPU 76 via the voice input / output board 86.

また、ＣＰＵ７６には、バス７８を通して、通信ＬＡＮボード９８が接続される。この通信ＬＡＮボード９８も、同様に、ＤＳＰで構成され、ＣＰＵ７６から与えられた送信データを無線通信装置１００に与え、無線通信装置１００から送信データを送信させる。また、通信ＬＡＮボード９８は無線通信装置１００を介してデータを受信し、受信データをＣＰＵ７６に与える。 Further, a communication LAN board 98 is connected to the CPU 76 through the bus 78. Similarly, the communication LAN board 98 is also configured by a DSP, and sends the transmission data given from the CPU 76 to the wireless communication apparatus 100 and causes the wireless communication apparatus 100 to transmit the transmission data. The communication LAN board 98 receives data via the wireless communication device 100 and provides the received data to the CPU 76.

さらに、ＣＰＵ７６には、バス７８を通して、データベース１０２が接続される。図示は省略するが、データベース１０２には、後述するインタラクションパラメータΘが対応する人物（人間１４等）の名称ないしは識別情報（タグ情報，識別番号）とともに記憶される。また、人物の識別情報に対応して、ロボット１２の眼カメラ５４で撮影した人物の顔画像および全身画像から推定した身長の値も記憶される。これは、後述するように、インタラクション相手に応じて、インタラクションパラメータΘの初期値を設定するようにしてあるためである。 Further, the database 102 is connected to the CPU 76 through the bus 78. Although not shown, the database 102 stores an interaction parameter Θ described later together with the name or identification information (tag information, identification number) of the corresponding person (human 14 or the like). Corresponding to the identification information of the person, the height value estimated from the face image and whole body image of the person photographed by the eye camera 54 of the robot 12 is also stored. This is because, as will be described later, the initial value of the interaction parameter Θ is set according to the interaction partner.

なお、この実施例では、データベース１０２をロボット１２内部に設けるようにしてあるが、ロボット１２の外部に通信可能に設けるようにしてもよい。 In this embodiment, the database 102 is provided inside the robot 12, but may be provided outside the robot 12 so as to be communicable.

図１に戻って、システム１０はモーションキャプチャシステム２０を含む。モーションキャプチャシステム（３次元動作計測装置）２０としては、公知のモーションキャプチャシステムが適用される。たとえば、ＶＩＣＯＮ社(http://www.vicon.com/)の光学式のモーションキャプチャシステムを用いることができる。図示は省略するが、モーションキャプチャシステム２０は、ＰＣ或いはＷＳのようなコンピュータを含み、このコンピュータとロボット１２とが、有線または無線ＬＡＮ（図示せず）によって互いに接続される。 Returning to FIG. 1, the system 10 includes a motion capture system 20. A known motion capture system is applied as the motion capture system (three-dimensional motion measurement apparatus) 20. For example, an optical motion capture system of VICON (http://www.vicon.com/) can be used. Although illustration is omitted, the motion capture system 20 includes a computer such as a PC or WS, and the computer and the robot 12 are connected to each other by a wired or wireless LAN (not shown).

図４を用いて具体的に説明すると、モーションキャプチャシステム２０においては、複数（少なくとも３つ）の赤外線照射機能を有するカメラ２０ａが、空間ないし環境に存在するロボット１２および人間１４に対して異なる方向に配置される。ロボット１２および人間１４には、複数（この実施例では、４個）の赤外線反射マーカ３０が取り付けられる。具体的には、図４からも分かるように、赤外線反射マーカ３０は、ロボット１２および人間１４共に、眼の上（額）と肩とに取り付けられる。これは、この実施例では、ロボット１２および人間１４の位置（３次元位置）および顔（視線）の方向を検出するためである。ただし、位置や顔の方向を正確に検出するために、さらに他の部位に赤外線反射マーカ３０を取り付けるようにしてもよい。 Specifically, referring to FIG. 4, in the motion capture system 20, a plurality of (at least three) cameras 20 a having infrared irradiation functions have different directions with respect to the robot 12 and the human 14 existing in the space or environment. Placed in. A plurality of (four in this embodiment) infrared reflection markers 30 are attached to the robot 12 and the human 14. Specifically, as can be seen from FIG. 4, the infrared reflection marker 30 is attached to the top of the eye (the forehead) and the shoulder for both the robot 12 and the human 14. This is because in this embodiment, the position of the robot 12 and the human 14 (three-dimensional position) and the direction of the face (line of sight) are detected. However, in order to accurately detect the position and the direction of the face, the infrared reflection marker 30 may be attached to another part.

モーションキャプチャシステム２０のコンピュータは、カメラ２０ａから画像データをたとえば６０Hz（１秒間に６０フレーム）で取得し、画像データを画像処理することによって、その計測時の全ての画像データにおける各マーカ３０の２次元位置を抽出する。そして、コンピュータは、画像データにおける各マーカ３０の２次元位置に基づいて、実空間における各マーカ３０の３次元位置を算出するとともに、ロボット１２および人間１４の顔の方向も算出する。次いで、コンピュータは、算出した３次元位置の座標データ（位置データ）および顔の方向データを、ロボット１２（ＣＰＵ７６）からの要求に応じてロボット１２に送信する。 The computer of the motion capture system 20 acquires image data from the camera 20a at, for example, 60 Hz (60 frames per second), and performs image processing on the image data, so that 2 of each marker 30 in all image data at the time of measurement is obtained. Extract dimension position. Then, the computer calculates the three-dimensional position of each marker 30 in the real space based on the two-dimensional position of each marker 30 in the image data, and also calculates the direction of the face of the robot 12 and the human 14. Next, the computer transmits the calculated three-dimensional position coordinate data (position data) and face direction data to the robot 12 in response to a request from the robot 12 (CPU 76).

ロボット１２は、モーションキャプチャシステム２０から送信される座標データおよび方向データを取得し、自身および人間１４の３次元位置を取得する。そして、ロボット１２は、自身を中心（原点）とした場合（ロボット座標）における、人間１４の位置（距離）を検出（算出）する。また、ロボット１２は、方向データに基づいて、人間１４がロボット１２の顔を見ているかどうかを判断する。 The robot 12 acquires coordinate data and direction data transmitted from the motion capture system 20 and acquires a three-dimensional position of itself and the human 14. Then, the robot 12 detects (calculates) the position (distance) of the human 14 when the robot 12 is centered (origin) (robot coordinates). Further, the robot 12 determines whether the human 14 is looking at the face of the robot 12 based on the direction data.

このような構成のロボット１２は、上述したように、人間１４との間でコミュニケーションする場合には、身体動作（ジェスチャ）および音声（発話）の少なくとも一方を用いたインタラクション行動を行う。たとえば、ロボット１２は、自身に対する人間１４のジェスチャや発話を検出して、そのようなインタラクション行動を決定する。 As described above, the robot 12 having such a configuration performs an interaction action using at least one of a body motion (gesture) and a voice (speech) when communicating with the human 14. For example, the robot 12 detects a gesture or utterance of the human 14 with respect to the robot 12 and determines such interaction behavior.

ここで、人と人とのコミュニケーションについて考察すると、自然な（快適な）コミュニケーションを行うためには、相手への適応が重要である。たとえば、人が快適に過ごすためには、適度なパーソナルスペースが必要であり、コミュニケーションの内容により異なる。また、視線を合わせる頻度が人により異なることも知られている。快適なパーソナルスペースや視線頻度は人により異なるが、人はコミュニケーション相手に合わせ、互いに快適さを保っている。たとえば、相手が近すぎると感じれば少し離れ、見つめられ過ぎると感じれば視線を反らす。こういった適応を人は無意識に行っている。 Here, considering communication between people, adaptation to the other party is important for natural (comfortable) communication. For example, in order for people to spend comfortably, an appropriate personal space is required, which varies depending on the content of communication. It is also known that the frequency of matching the line of sight varies from person to person. Comfortable personal spaces and eye-gaze frequency vary from person to person, but people are comfortable with each other according to their communication partners. For example, if you feel that the other person is too close, you will move away a little, and if you feel you are staring too much, you will bend your line of sight. People are unconsciously making these adaptations.

したがって、ロボット１２と人間１４とがインタラクション（コミュニケーション）する場合には、ロボット１２が、相手に合わせて、適切なパーソナルスペースを確保したり、視線を合わせる頻度を個人に合わせたりする必要がある。 Therefore, when the robot 12 and the human 14 interact (communicate), it is necessary for the robot 12 to secure an appropriate personal space or to adjust the line of sight to the individual according to the opponent.

また、人とロボットとの間のインタラクションにおいて、身体動作を解析した研究(T. Kanda, H. Ishiguro, M. Imai, and T. Ono. Body Movement Analysis of Human-Robot Interaction. In Int. Joint Conference on Artificial Intelligence (IJCAI 2003),pp.177-182, 2003)によると、ロボットの振る舞いに好印象を持つ被験者はロボットに顔を向ける傾向があり、インタラクション中の移動距離も短い傾向が見られている。また、パートナーの動きが緩慢で退屈である場合や、速過ぎて理解できない場合にも他に顔を向けると考えるのは自然である。 In addition, a study analyzing body movements in human-robot interaction (T. Kanda, H. Ishiguro, M. Imai, and T. Ono. Body Movement Analysis of Human-Robot Interaction. In Int. Joint Conference on Artificial Intelligence (IJCAI 2003), pp.177-182, 2003), subjects who have a good impression of robot behavior tend to turn their faces toward the robot, and there is a tendency that the moving distance during the interaction is also short. Yes. It's also natural to think of your face as a partner when your partner's movement is slow and boring, or when it's too fast to understand.

以上より、ロボット１２とのインタラクションにおいて、人の快・不快が無意識に移動距離とロボットに顔を向ける時間とに現れる（いずれか一方でも可。）と仮定して、報酬関数（図５参照）を設計した。ここで、ロボット１２のインタラクション行動についてのパラメータ（インタラクションパラメータ）Θとしては、３種の対人距離（親密距離、個体距離、社会距離）、人の顔の方向にカメラ（眼カメラ５４）を向ける時間の長さ、発話からモーション再生までの遅れ時間、モーションの速度である。ロボット１２は、報酬関数の演算により得られる報酬を最大化するように、方策勾配型強化学習（policy gradient reinforcement learning :ＰＧＲＬ）により、パラメータΘを学習し、インタラクションパートナー（ここでは、人間１４）に個人適応する。どのようなパラメータΘが適切であるかを直接得ることが出来ないため、学習方法として教師なし学習が必要である。これは、たとえば、個々に適切と思う対人距離（パーソナルスペース等）が異なるからである。また、人とのインタラクションにおいて学習するためには収束が速いことも重要であることから、方策勾配型強化学習が用いられる。 From the above, it is assumed that in the interaction with the robot 12, the pleasure function of the person unconsciously appears in the distance traveled and the time to face the robot (either one is acceptable) (see FIG. 5). Designed. Here, as the parameter (interaction parameter) Θ for the interaction behavior of the robot 12, three kinds of interpersonal distances (intimacy distance, individual distance, social distance), and the time for which the camera (eye camera 54) is directed toward the human face. Is the length of time, the delay time from speech to motion playback, and the speed of motion. The robot 12 learns the parameter Θ by policy gradient reinforcement learning (PGRL) so as to maximize the reward obtained by calculating the reward function, and the interaction partner (here, human 14) Adapt to individuals. Since it is impossible to directly obtain what parameter Θ is appropriate, unsupervised learning is necessary as a learning method. This is because, for example, the interpersonal distance (personal space or the like) that is appropriate for each person is different. In addition, policy gradient-type reinforcement learning is used because it is important to have a fast convergence in order to learn in human interaction.

報酬関数は、ロボット１２のＣＰＵ７６によってソフト的に処理される。その機能的なブロック図が図５に示される。図５を参照して、報酬関数２００は、入力端子Ｐ１およびＰ２を含む。この入力端子Ｐ１およびＰ２には、モーションキャプチャシステム２０から入力された位置データがそのまま入力される。ただし、後述するように、１つのインタラクション行動が実行される毎に、報酬関数２００による演算を実行するようにしてあるため、入力端子Ｐ１およびＰ２には、１つのインタラクション行動を実行中に得られた、時間変化に従う位置データが入力されるのである。 The reward function is processed in software by the CPU 76 of the robot 12. Its functional block diagram is shown in FIG. Referring to FIG. 5, reward function 200 includes input terminals P1 and P2. The position data input from the motion capture system 20 is input as it is to the input terminals P1 and P2. However, as will be described later, every time one interaction action is executed, the calculation by the reward function 200 is executed, so that the input terminals P1 and P2 can be obtained during the execution of one interaction action. In addition, position data according to time changes is input.

入力端子Ｐ１に入力された位置データは、フィルタ部２０２でノイズ除去される。たとえば、フィルタ部２０２は、５HzのＬＰＦであり、位置データに含まれる高域成分を除去する。これは、細かい人間１４の身体の揺れを移動距離に含めないためである。高域成分が除去された位置データは積分部２０４で積分される。つまり、人間１４の移動距離が算出される。そして、積分部２０４の出力に、正規化／重み付け部２０６で、正規化および重み付けが施され、加算器２０８に反転して入力される。これは、上述したように、インタラクション中における人間１４の移動量は、インタラクションを不快に感じていると考えられ、報酬としてはマイナス要因だからである。 The position data input to the input terminal P1 is noise-removed by the filter unit 202. For example, the filter unit 202 is a 5 Hz LPF, and removes a high frequency component included in the position data. This is because fine movements of the human body 14 are not included in the movement distance. The position data from which the high frequency component has been removed is integrated by the integrating unit 204. That is, the moving distance of the human 14 is calculated. Then, the output of the integration unit 204 is normalized and weighted by the normalization / weighting unit 206 and is inverted and input to the adder 208. This is because, as described above, the movement amount of the human 14 during the interaction is considered to feel the interaction unpleasant and is a negative factor as a reward.

一方、入力端子Ｐ２に入力された位置データは、首角度算出部２１０に与えられ、首角度算出部２１０によってロボット１２に対する人間１４の首角度が算出される。厳密に言うと、ロボット１２の顔に対する人間１４の顔の向きが算出されるのである。人間１４の首角度が算出されると、閾値処理部２１２で、所定の角度（この実施例では、１０°）以下であるかどうかが判断される。つまり、人間１４がロボット１２の顔を見ているかどうかが判断される。ここで、図６に示すように、ロボット１２と人間１４とが対面しているとき、ロボット１２と人間１４とを結ぶ直線（線分）に対して人間１４の顔の方向がなす角度が１０°以下である場合には、人間１４がロボット１２の顔を見ていると判断するようにしてある。ただし、人間１４がロボット１２の顔を見ているかどうかを厳密に判断する場合には、人間１４の視線方向も検出する必要があると考えられる。そして、閾値処理部２１２では、首角度算出部２１０によって算出された首角度が１０°以下である場合には、閾値処理部２１２で、その時間が加算される。つまり、インタラクション中に、人間１４がロボット１２の顔を見ている時間の合計が算出されるのである。そして、閾値処理部２１２の出力に、正規化／重み付け部２１４で、正規化および重み付けが施され、加算器２０８にそのまま入力する。これは、上述したように、インタラクション中における人間１４がロボット１２の顔を見る時間は、インタラクションを快いと感じていると考えられ、報酬としてはプラス要因だからである。そして、加算器２０８の結果が報酬取得部２１６に与えられる。 On the other hand, the position data input to the input terminal P <b> 2 is given to the neck angle calculation unit 210, and the neck angle of the human 14 relative to the robot 12 is calculated by the neck angle calculation unit 210. Strictly speaking, the orientation of the face of the human 14 relative to the face of the robot 12 is calculated. When the neck angle of the human 14 is calculated, the threshold processing unit 212 determines whether the angle is equal to or less than a predetermined angle (10 ° in this embodiment). That is, it is determined whether or not the human 14 is looking at the face of the robot 12. Here, as shown in FIG. 6, when the robot 12 and the human 14 are facing each other, the angle formed by the face direction of the human 14 with respect to a straight line (line segment) connecting the robot 12 and the human 14 is 10. If it is less than 0 °, it is determined that the human 14 is looking at the face of the robot 12. However, when it is strictly determined whether or not the human 14 is looking at the face of the robot 12, it is considered necessary to detect the direction of the line of sight of the human 14 as well. In the threshold processing unit 212, when the neck angle calculated by the neck angle calculating unit 210 is 10 ° or less, the threshold processing unit 212 adds the time. That is, the total time during which the human 14 is looking at the face of the robot 12 during the interaction is calculated. Then, the normalization / weighting unit 214 performs normalization and weighting on the output of the threshold processing unit 212 and inputs the output to the adder 208 as it is. This is because, as described above, the time during which the human 14 looks at the face of the robot 12 during the interaction is thought to feel that the interaction is pleasant, and the reward is a positive factor. Then, the result of the adder 208 is given to the reward acquisition unit 216.

なお、この実施例においては、正規化／重み付け部２０６および２１４における重み付けは、簡単のため、１対１となるようにした。ただし、対人距離またはロボットに顔を向ける時間のいずれか一方に基づいて、報酬すなわちインタラクション中における快適さを知ることができるため、たとえば、１対０や０対１で重み付けするようにしてもよい。 In this embodiment, the weights in the normalization / weighting units 206 and 214 are set to 1: 1 for simplicity. However, since it is possible to know the reward, that is, the comfort during the interaction, based on either the interpersonal distance or the time to face the robot, it may be weighted, for example, 1 to 0 or 0 to 1. .

また、この実施例では、対人距離およびロボットに顔をむける時間に基づいて、インタラクションの快適さを知るようにしてあるが、これに限定される必要はない。たとえば、人間の足音の大小、人間がいわゆる貧乏ゆすりをしているか否か、または、人間の顔の表情（笑い（柔らかい）、辛い（硬い））によって、インタラクションの快適さを知ることもできる。たとえば、人間の足音はいらいらに関係し、足音が小さければ、いらいらしておらず、インタラクショクションが快適であると言え、逆に、足音が大きければ、いらいらしており、インタラクションが不快であると言える。ただし、人間の足音は、騒音計により検出することができる。また、人間が貧乏ゆすりをしているか否か、および人間の顔の表情は、画像認識技術を用いることにより検出することができる。 In this embodiment, the comfort of interaction is known based on the interpersonal distance and the time to face the robot. However, the present invention is not limited to this. For example, it is possible to know the comfort of interaction based on the level of human footsteps, whether or not a human is so-called poor, or facial expressions (laughter (soft), hard (hard)). For example, human footsteps are related to annoyance, and if footsteps are low, it is not frustrating and interaction is comfortable, and conversely, if footsteps are high, it is frustrating and interaction is uncomfortable. I can say that. However, human footsteps can be detected by a sound level meter. Further, whether or not a human is poverty and whether or not the human facial expression can be detected by using an image recognition technique.

このような報酬関数２００による演算は、インタラクションにおいて、ロボット１２がインタラクション行動を実行する毎に実行される。そして、人間１４がインタラクションを快いと感じるように、ロボット１２のインタラクションパラメータΘを強化学習により求める。 Such calculation by the reward function 200 is executed every time the robot 12 performs the interaction action in the interaction. Then, the interaction parameter Θ of the robot 12 is obtained by reinforcement learning so that the human 14 feels pleasant.

ここで、Ｑ学習に代表される強化学習では、最適な振る舞い(政策ないし方策)を学習するために、出来るだけ広範囲の空間を探索し、あらゆる方策を試行する。そのため、学習結果はグローバルに最適なものが得られるが探索には長期間かかってしまう。それに対し方策勾配型強化学習（または、方策勾配法による強化学習）では、現在の方策を、報酬を得られる方向へ修正していくことで局所最適解を求める。報酬から方策を直接変化させるので、報酬伝播の遅れが少なく学習時間が短い特長がある。この実施例では、インタラクション開始以降は、センサによってインタラクションを変化させないオープンループシステムの方策勾配型強化学習を採用した。 Here, in reinforcement learning represented by Q-learning, in order to learn the optimal behavior (policy or policy), a wide range of space is searched as much as possible, and every policy is tried. Therefore, the learning result can be optimized globally, but the search takes a long time. On the other hand, in policy gradient type reinforcement learning (or reinforcement learning by the policy gradient method), a local optimal solution is obtained by correcting the current policy in a direction in which a reward can be obtained. Since the policy is changed directly from the reward, there is a feature that there is little delay in reward propagation and a short learning time. In this embodiment, policy gradient reinforcement learning of an open loop system in which the interaction is not changed by the sensor after the start of the interaction is adopted.

具体的には、図７および図８で示すフロー図に従って全体処理を実行し、その中で強化学習を実行し、インタラクションパラメータΘを更新するようにしてある。ここで、この実施例における方策勾配型強化学習のアルゴリズムについて簡単に説明する。学習には、まず現在の方策すなわちインタラクションパラメータΘを少し変動させたＴ通りの方策θ_ijを用意する。方策θ_ijは、インタラクションパラメータΘの各成分θ_jにランダムにε_j，０，−ε_jのいずれかを加えて生成する。ただし、変動ステップサイズε_jはパラメータ（インタラクションパラメータΘの成分）θ_j毎に異なる値でよい。 Specifically, the entire process is executed according to the flowcharts shown in FIGS. 7 and 8, reinforcement learning is executed therein, and the interaction parameter Θ is updated. Here, the algorithm of the policy gradient type reinforcement learning in this embodiment will be briefly described. In the learning, first, the current policy, that is, T policies θ _{ij in} which the interaction parameter Θ is slightly changed is prepared. The policy θ _ij is generated by randomly adding any one of ε _j , 0, and −ε _j to each component θ _j of the interaction parameter Θ. However, the variable step size ε _j may be different for each parameter (component of the interaction parameter Θ) θ _j .

次に、それぞれの方策Ｒ_iに従ってインタラクションをＴ回行い、報酬を得る。Ｔ通りの方策θ_ijすべてについてインタラクションを行った後、報酬関数２００のインタラクションパラメータΘに対する勾配Ａを近似的に求める。各パラメータθ_jについて、ε_jを加えた時の平均報酬、０を加えた時の平均報酬、−ε_jを加えた時の平均報酬を、それぞれ求める。 Next, the interaction is performed T times according to each policy R _i to obtain a reward. After performing the interaction for all T policies θ _{ij, the} gradient A with respect to the interaction parameter Θ of the reward function 200 is approximately obtained. For each parameter θ _j , an average reward when ε _j is added, an average reward when 0 is added, and an average reward when −ε _j is added are obtained.

０を加えた時の平均報酬が最も大きい場合には、各パラメータθ_jについての勾配Ａは０とする。一方、０を加えた時の平均報酬が最も大きくない場合には、各パラメータθ_jについての勾配Ａは、εを加えた時の平均報酬と−εを加えた時の平均報酬との差とする。勾配Ａを求めた後、勾配Ａを正規化して、ηを掛けたものに、各成分にε_jの重みをつけ、インタラクションパラメータΘを更新する。このＴ回のインタラクションとインタラクションパラメータΘの更新が１ステップである。これを繰り返すことで、報酬が極大となる、つまり人間１４が快いと感じるインタラクション行動を実行できる、インタラクションパラメータΘに更新される。 When the average reward when 0 is added is the largest, the gradient A for each parameter θ _j is 0. On the other hand, when the average reward when 0 is added is not the largest, the gradient A for each parameter θ _j is the difference between the average reward when ε is added and the average reward when −ε is added. To do. After obtaining the gradient A, the gradient A is normalized and multiplied by η, each component is weighted with ε _j , and the interaction parameter Θ is updated. The T times of interaction and the update of the interaction parameter Θ are one step. By repeating this, the reward is maximized, that is, the interaction parameter Θ that can execute the interaction behavior that the human 14 feels comfortable is updated.

図７に示すように、ＣＰＵ７６は、全体処理を開始すると、ステップＳ１で、インタラクションの相手（たとえば、人間１４）が過去にインタラクションしたことのある人物であるかどうかを判断する。図示は省略するが、たとえば、人間１４にタグを装着させて、タグの受信機をロボット１２に設けておき、データベース１０２を参照して、タグの識別情報（タグ情報または番号）に対応する人物についてのインタラクションパラメータΘが記憶されているかどうかを判断する。ここで、その人物についてのインタラクションパラメータΘが記憶されている場合には、過去にインタラクションしたことがあると判断することができる。一方、その人物についてのインタラクションパラメータΘが記憶されていない場合には、過去にインタラクションしたことがないと判断することができる。 As shown in FIG. 7, when starting the entire process, the CPU 76 determines whether or not the interaction partner (for example, the human 14) is a person who has interacted in the past in step S1. Although illustration is omitted, for example, a person corresponding to tag identification information (tag information or number) is referred to the database 102 by attaching a tag to a human 14 and providing a tag receiver in the robot 12. It is determined whether the interaction parameter Θ for is stored. Here, when the interaction parameter Θ for the person is stored, it can be determined that the user has interacted in the past. On the other hand, when the interaction parameter Θ for the person is not stored, it can be determined that no interaction has occurred in the past.

ステップＳ１で“ＹＥＳ”であれば、つまり過去にインタラクションしたことがあれば、ステップＳ３で、インタラクションパラメータΘをデータベース１０２から読み出し、変数Θに代入して、ステップＳ１１に進む。一方、ステップＳ１で“ＮＯ”であれば、つまり過去にインタラクションしたことがなければ、ステップＳ５で、インタラクションの相手に似た人物とインタラクションした経験があるかどうかを判断する。 If “YES” in the step S1, that is, if there has been an interaction in the past, in a step S3, the interaction parameter Θ is read from the database 102 and substituted into the variable Θ, and the process proceeds to a step S11. On the other hand, if “NO” in the step S1, that is, if there is no interaction in the past, it is determined whether or not there is an experience of interacting with a person similar to the partner of the interaction in a step S5.

この実施例では、似た人物か否かは、人物の顔（主に形状）と身長とに基づいて判断される。人物の顔や身長は、ロボット１２に設けられた眼カメラ５４の撮影画像（顔画像および全身画像）に基づいて判断（推定）される。上述したように、インタラクションした人物についての顔画像と推定した身長とを、タグ情報に対応してデータベース１０２に記憶しておくので、現在インタラクションしている人物の顔画像および推定した身長と比較することにより、似た人物が存在するかどうかを判断することができる。つまり、似た人物とインタラクションした経験があるかどうかを判断することができるのである。 In this embodiment, whether or not the person is similar is determined based on the person's face (mainly shape) and height. The face and height of the person are determined (estimated) based on the images (face image and whole body image) taken by the eye camera 54 provided in the robot 12. As described above, since the face image and the estimated height of the interacting person are stored in the database 102 in correspondence with the tag information, the face image of the currently interacting person and the estimated height are compared. Thus, it can be determined whether or not a similar person exists. In other words, you can determine whether you have any experience interacting with similar people.

ステップＳ５で“ＹＥＳ”であれば、つまり似た人物とインタラクションした経験があれば、ステップＳ７で、似た人物のインタラクションパラメータΘをデータベース１０２か読み出し、変数Θに代入して、ステップＳ１１に進む。一方、ステップＳ５で“ＮＯ”であれば、つまり似た人物とインタラクションした経験がなければ、ステップＳ９で、平均的なインタラクションパラメータΘをデータベース１０２から読み出し、変数Θに代入して、ステップＳ１１に進む。ここで、平均的なインタラクションパラメータΘは、たとえば、データベース１０２に記憶してあるすべてのインタラクションパラメータΘの平均値である。 If “YES” in the step S5, that is, if there is an experience of interacting with a similar person, in a step S7, the interaction parameter Θ of the similar person is read from the database 102 and substituted into the variable Θ, and the process proceeds to the step S11. . On the other hand, if “NO” in the step S5, that is, if there is no experience of interacting with a similar person, in a step S9, the average interaction parameter Θ is read from the database 102 and substituted into the variable Θ, and the process proceeds to the step S11. move on. Here, the average interaction parameter Θ is, for example, the average value of all the interaction parameters Θ stored in the database 102.

なお、図示は省略するが、初めて全体処理を実行する場合には、インタラクションパラメータΘはデータベース１０２に記憶されていないため、ユーザによって初期値が設定（入力）される。 Although not shown in the figure, when the entire process is executed for the first time, the interaction parameter Θ is not stored in the database 102, and an initial value is set (input) by the user.

ステップＳ１１では、インタラクション回数ｉを初期化（ｉ＝１）する。続くステップＳ１３では、変数Θに基づいて今回試すインタラクションパラメータΘ_iの決定処理（図９参照）を実行する。なお、この決定処理については、後で詳細に説明するため、ここではその詳細な説明は省略する。次に、ステップＳ１５では、インタラクション行動を実行する。ただし、ここでは、予め用意されている複数のインタラクション行動のうち、いずれか１つのインタラクション行動がランダム（所定のルール）或いは人間１４の振る舞いに応じて選択的に実行される。 In step S11, the number of interactions i is initialized (i = 1). In the subsequent step S13, the process of determining the interaction parameter Θ _{i to be} tested this time based on the variable Θ (see FIG. 9) is executed. Since this determination process will be described later in detail, the detailed description thereof is omitted here. Next, in step S15, an interaction action is executed. However, here, any one of the plurality of interaction actions prepared in advance is selectively executed according to random (predetermined rule) or the behavior of the human 14.

続いて、ステップＳ１７では、インタラクションの評価を算出し、変数Ｒ_iに代入する。ここで、インタラクションの評価は、上述した報酬関数２００（図５）に従って求められる報酬である。図８に示すように、次のステップＳ１９では、インタラクション回数ｉを１加算（ｉ＝ｉ＋１）する。そして、ステップＳ２１では、インタラクション回数ｉが所定回数Ｔ（たとえば、１０）を超えたかどうかを判断する。ステップＳ２１で“ＮＯ”であれば、つまりインタラクション回数ｉが所定回数以下であれば、図７に示したステップＳ１３に戻る。一方、ステップＳ２１で“ＹＥＳ”であれば、つまりインタラクション回数ｉが所定回数Ｔを超えていれば、ステップＳ２３で、インタラクションパラメータΘの更新処理（図１０参照）を実行して、ステップＳ２５で、インタラクションの終了かどうかを判断する。ここでは、たとえば、インタラクションの終了指示が入力されたり、一定時間が経過したりしたかを判断しているのである。 Then, in step S17, it calculates an evaluation of the interaction, into a variable R _i. Here, the evaluation of the interaction is a reward obtained according to the above-described reward function 200 (FIG. 5). As shown in FIG. 8, in the next step S19, 1 is added to the number of interactions i (i = i + 1). In step S21, it is determined whether or not the number of times of interaction i has exceeded a predetermined number of times T (for example, 10). If “NO” in the step S21, that is, if the number of times of interaction i is equal to or less than the predetermined number, the process returns to the step S13 shown in FIG. On the other hand, if “YES” in the step S21, that is, if the number of times of interaction i exceeds the predetermined number T, an update process of the interaction parameter Θ (see FIG. 10) is executed in a step S23, and in step S25, Determine if the interaction is over. Here, for example, it is determined whether an instruction to end the interaction is input or whether a certain time has passed.

ステップＳ２５で“ＮＯ”であれば、つまりインタラクションの終了でなければ、図７に示したステップＳ１１に戻る。一方、ステップＳ２５で“ＹＥＳ”であれば、つまりインタラクションの終了であれば、更新された変数Θを、インタラクション相手に対応するインタラクションパラメータΘとして、データベース１０２に登録（更新）して、全体処理を終了する。 If “NO” in the step S25, that is, if the interaction is not ended, the process returns to the step S11 shown in FIG. On the other hand, if “YES” in the step S25, that is, if the interaction is ended, the updated variable Θ is registered (updated) in the database 102 as an interaction parameter Θ corresponding to the interaction partner, and the entire process is performed. finish.

図９は、図７に示したステップＳ１３の今回試すインタラクションパラメータΘ_iの決定処理を示すフロー図である。この図９を参照して、ＣＰＵ７６は、今回試すインタラクションパラメータΘ_iの決定処理を開始すると、ステップＳ４１で、変数ｊに初期値を設定する（ｊ＝１）。続くステップＳ４３では、０，ε_j，−ε_jからランダムに１つ選択し、変数Δに代入する。次のステップＳ４５では、今回試すインタラクションパラメータΘ_iの第ｊ番目の成分θ_ijを算出する（θ_ij＝θ_j＋Δ）。続いて、ステップＳ４７で、変数ｊをインクリメントする（ｊ＝ｊ＋１）。そして、ステップＳ４９で、変数ｊがインタラクションパラメータΘ（インタラクションパラメータベクトル）の大きさ（全成分θ_jの個数）ｎを超えているかどうかを判断する。ステップＳ４９で“ＮＯ”であれば、つまり変数ｊがインタラクションパラメータΘの大きさｎ以下であれば、そのままステップＳ４３に戻る。一方、ステップＳ４９で“ＹＥＳ”であれば、つまり変数ｊがインタラクションパラメータΘの大きさｎを超えていれば、今回試すインタラクションパラメータΘ_iを決定したと判断して、今回試すインタラクションパラメータΘ_iの決定処理をリターンする。 FIG. 9 is a flowchart showing the determination process of the interaction parameter Θ _{i to be} tried this time in step S13 shown in FIG. Referring to FIG. 9, when starting the determination process of interaction parameter Θ _{i to be} tested this time, CPU 76 sets an initial value to variable j in step S41 (j = 1). In the subsequent step S43, one is randomly selected from 0, ε _j , and −ε _j and is substituted into the variable Δ. In the next step S45, the j-th component θ _ij of the interaction parameter Θ _{i to be} tested this time is calculated (θ _ij = θ _j + Δ). Subsequently, in step S47, the variable j is incremented (j = j + 1). In step S49, it is determined whether or not the variable j exceeds the size (number of all components θ _j ) n of the interaction parameter Θ (interaction parameter vector). If “NO” in the step S49, that is, if the variable j is equal to or less than the magnitude n of the interaction parameter Θ, the process returns to the step S43 as it is. On the other hand, if "YES" in step S49, the words if the variable j exceeds the size n of the interaction parameter theta, it is determined that determine the interaction parameter theta _i try this, the interaction parameter theta _i try this Return the decision process.

図１０は、図８に示したステップＳ２３におけるインタラクションパラメータΘの更新処理を示すフロー図である。図１０を参照して、ＣＰＵ７６は、インタラクションパラメータΘの更新処理を開始すると、ステップＳ６１で、変数ｊに初期値を設定する（ｊ＝１）。続くステップＳ６３では、今回試したインタラクションパラメータΘ_iについて、θ_ijをθ_jとした場合の平均報酬Ｒ０，θ_ijをθ_j＋ε_jとした場合の平均報酬Ｒ１，θ_ijをθ_j−ε_jとした場合の平均報酬Ｒ２を、それぞれ求める。ただし、θ_jはインタラクションパラメータ（ベクトル）Θの第ｊ成分であり、θ_ijはインタラクションパラメータΘ_iの第ｊ成分であり、ε_jはインタラクションパラメータΘの第ｊ成分を変動させる値である。 FIG. 10 is a flowchart showing the update processing of the interaction parameter Θ in step S23 shown in FIG. Referring to FIG. 10, when starting the update process of the interaction parameter Θ, the CPU 76 sets an initial value to the variable j in step S61 (j = 1). In step S63, the interactions parameter theta _i tried this, the average compensation R0 when the theta _ij was theta _j, the average compensation R1 in the case where the the θ _ij θ _{_j} + ε _j, the θ _ij θ _{_j} -ε _j The average reward R2 is calculated for each. Here, θ _j is the j-th component of the interaction parameter (vector) Θ, θ _ij is the j-th component of the interaction parameter Θ _i , and ε _j is a value that varies the j-th component of the interaction parameter Θ.

次にステップＳ６５では、ステップＳ６３で算出したＲ０，Ｒ１，Ｒ２を用いて、Ｒ０＞Ｒ１であり、かつＲ０＞Ｒ２であるかどうかを判断する。ステップＳ６５で“ＹＥＳ”であれば、つまりＲ０＞Ｒ１であり、かつＲ０＞Ｒ２であれば、ステップＳ６５で、勾配Ａの第ｊ成分ａ_jに０を設定（ａ_j＝０）して、ステップＳ７１に進む。一方、ステップＳ６５で“ＮＯ”であれば、つまりＲ０≦Ｒ１およびＲ０≦Ｒ２の少なくとも一方を満たしていれば、ステップＳ６９で、勾配Ａの第ｊ成分ａ_jに平均報酬Ｒ１と平均報酬Ｒ２の差分（ａ_j＝Ｒ１−Ｒ２）を設定して、ステップＳ７１に進む。 Next, in step S65, it is determined whether R0> R1 and R0> R2 using R0, R1, and R2 calculated in step S63. If “YES” in the step S65, that is, if R0> R1 and R0> R2, the j-th component a _j of the gradient A is set to 0 (a _j = 0) in a step S65, Proceed to step S71. On the other hand, if “NO” in the step S65, that is, if at least one of R0 ≦ R1 and R0 ≦ R2 is satisfied, the average reward R1 and the average reward R2 are added to the j-th component a _j of the gradient A in the step S69. The difference (a _j = R1−R2) is set, and the process proceeds to step S71.

ステップＳ７１では、変数ｊをインクリメントする。そして、ステップＳ７３では、変数ｊがインタラクションパラメータΘの大きさｎを超えているかどうかを判断する。ステップＳ７１で“ＮＯ”であれば、つまり変数ｊがインタラクションパラメータΘの大きさｎを超えていれば、ステップＳ６３に戻る。一方、ステップＳ７３で“ＹＥＳ”であれば、つまり変数ｊがインタラクションパラメータΘの大きさｎ以下であれば、ステップＳ７５で、勾配Ａを正規化（Ａ＝Ａ／｜Ａ｜）する。続くステップＳ７７では、勾配Ａの第ｊ成分ａ_jを更新（ａ_j＝ａ_j×ε_j×η）する。ただし、ηはスカラーであり、全体としての更新の大きさを決定するパラメータである。そして、ステップＳ７９で、インタラクションパラメータΘを更新（Θ＝Θ＋Ａ）して、インタラクションΘの更新処理をリターンする。 In step S71, the variable j is incremented. In step S73, it is determined whether or not the variable j exceeds the size n of the interaction parameter Θ. If “NO” in the step S71, that is, if the variable j exceeds the magnitude n of the interaction parameter Θ, the process returns to the step S63. On the other hand, if “YES” in the step S73, that is, if the variable j is equal to or smaller than the magnitude n of the interaction parameter Θ, the gradient A is normalized (A = A / | A |) in a step S75. In the subsequent step S77, the j-th component a _j of the gradient A is updated (a _j = a _j × ε _j × η). However, η is a scalar and is a parameter that determines the magnitude of the update as a whole. In step S79, the interaction parameter Θ is updated (Θ = Θ + A), and the interaction Θ update processing is returned.

このような構成のロボット１２を実際に人間（被験者）との間でインタラクションさせて、インタラクションパラメータΘを更新させるとともに、被験者がロボット１２とのインタラクションから受けた印象（快・不快）等から強化学習によるパラメータΘの適応度を実験により検証した。上述したように、インタラクションパラメータΘは、３種類の対人距離（親密距離、個体距離、社会距離）、人の顔の方向に眼カメラ５４を向ける時間の長さ（注視時間）、発話からモーション再生（インタラクション行動の開始）までの遅れ時間、モーション再生速度である。ロボット１２に用意するすべのモーション（インタラクション行動）は、対人距離によって分類し（図１２参照）、同じ分類に含まれるインタラクション行動では、同じ距離を用いた。１つののモーションに関係するインタラクションパラメータΘは距離、注視時間、遅れ時間、再生速度の4つの要素（パラメータθ_j）である。適応するパラメータθ_jを多くすると、学習に時間がかかってしまうため、インタラクションに大きな影響があると考えられる。また、パラメータθ_jは可能な限り少ない方が、実装が容易であるため、上述したようなパラメータθ_jを選択することとした。 The robot 12 having such a configuration is actually interacted with a human (subject) to update the interaction parameter Θ, and the reinforcement learning is performed from the impression (pleasant / uncomfortable) the subject received from the interaction with the robot 12. The fitness of parameter Θ by is verified by experiment. As described above, the interaction parameter Θ includes three types of interpersonal distances (intimacy distance, individual distance, social distance), the length of time for which the eye camera 54 is directed toward the face of the person (gazing time), and motion playback from speech The delay time until the start of the interaction action and the motion playback speed. All motions (interaction actions) prepared for the robot 12 are classified according to interpersonal distances (see FIG. 12), and the same distances are used for the interaction actions included in the same classification. The interaction parameter Θ related to one motion is four elements (parameter θ _j ) of distance, gaze time, delay time, and reproduction speed. If the adaptive parameter θ _j is increased, it takes time for learning, so it is considered that there is a great influence on the interaction. Also, lesser as possible parameter theta _j is because implementation is easier, it was decided to select the parameter theta _j as described above.

また、人とロボット１２との距離（対人距離）は、それぞれの額間の水平距離とした。ロボット１２は５秒を１周期として、人の顔を見て、他の方向を向く。注視時間は、この人の顔を見る時間の５秒に対する割合とした。ここで、５秒を１周期としたのは、人と人とのインタラクションにおける注視の周期に合わせたためである。遅れ時間は、たとえばロボット１２が「握手してね」と発話してから、手を出すモーションを再生するまでの時間である。再生速度はモーションを作成した際の動きの速さを１としてある。 The distance between the person and the robot 12 (interpersonal distance) was the horizontal distance between the foreheads. The robot 12 looks at a person's face and turns in the other direction with 5 seconds as one cycle. The gaze time was the ratio of the time to see this person's face to 5 seconds. Here, the reason that 5 seconds is set as one cycle is that it is adjusted to the gaze cycle in the interaction between people. The delay time is, for example, the time from when the robot 12 speaks “shake a handshake” until the motion of raising the hand is reproduced. The playback speed is set to 1 when the motion is created.

実験は、モーションキャプチャシステム２０を有する実験室において、精度良くモーションキャプチャが行える中央の所定範囲（４．５×３．５（ｍ））で行った。図１１に示すように、１２台のカメラ２０ａからなるモーションキャプチャシステム２０が備えられている。ただし、図１１においては、簡単のため、カメラ２０ａ以外のコンピュータ等は省略してある。このような構成で、実験領域内では１（ｍｍ）程度の測定精度がある。上述したように、マーカ３０が被験者（人間１４）とロボット１２の額と肩とに取り付けられ、そのマーカ３０からそれぞれの額の位置および方向を求めた。モーションキャプチャにより求められた位置と方向はロボット１２にＬＡＮのようなネットワークを介して送り、ロボット１２の動作決定と報酬関数２００の計算に用いた。実験中では、通信による時間遅れは０．１秒以内であり、この通信による遅れは無視することができた。 The experiment was performed in a laboratory having the motion capture system 20 within a predetermined range (4.5 × 3.5 (m)) in the center where motion capture can be performed with high accuracy. As shown in FIG. 11, a motion capture system 20 including twelve cameras 20a is provided. However, in FIG. 11, for the sake of simplicity, a computer other than the camera 20a is omitted. With such a configuration, there is a measurement accuracy of about 1 (mm) in the experimental area. As described above, the marker 30 is attached to the subject (human 14) and the forehead and shoulders of the robot 12, and the position and direction of each forehead are obtained from the marker 30. The position and direction obtained by the motion capture were sent to the robot 12 via a network such as a LAN, and used for determining the operation of the robot 12 and calculating the reward function 200. During the experiment, the time delay due to communication was within 0.1 seconds, and the delay due to this communication could be ignored.

図１２には、実験のために用意したロボット１２の振る舞い（インタラクション行動）についての第１テーブルが示される。図１２を参照して分かるように、インタラクション行動としては、抱っこ(Hug)、握手(Shake hands) 、どこから来たの？(Ask where person comes from)、ロボビー（ロボット１２の商品名）ってかわいい？(Ask if robot is cute)、触ってね(Ask person to touch robot)、じゃんけん(Play paper-scissors-stone)、あっちむいてほい(Play pointing game)、運動(Perform arm-swinging exercise)、自己紹介(Hold “thank you” monologue)、相手を見る(just looking)の１０通りである。これらのインタラクションを、親密距離(intimate distance)、個体距離(personal distance)、社会距離(social distance)の３つの対人距離に、予備実験により分類した。なお、分類の予備実験では、８名の被験者を集めて、ロボット１２の位置を固定し、各被験者に、それぞれのインタラクションに適していると考える距離に移動してもらい、その距離を測定した。被験者間で多少の距離の差は見られたが、分散は小さく、分類に影響するほどではなかった。 FIG. 12 shows a first table regarding the behavior (interaction behavior) of the robot 12 prepared for the experiment. As can be seen with reference to FIG. 12, the interaction behavior is Hug, Shake hands, where did you come from? (Ask where person comes from), is Robotbie (the name of the robot 12) cute? (Ask if robot is cute), Ask person to touch robot, Janken (Play paper-scissors-stone), Play pointing game, Perform arm-swinging exercise, Self There are 10 ways of introduction (Hold “thank you” monologue) and just looking. These interactions were classified by preliminary experiments into three interpersonal distances: an intimate distance, a personal distance, and a social distance. In the preliminary classification experiment, eight subjects were collected, the position of the robot 12 was fixed, and each subject was moved to a distance considered suitable for each interaction, and the distance was measured. Although there were some distance differences between subjects, the variance was small and did not affect the classification.

次に実験の手順について説明する。実験開始時に、ロボット１２は、モーションキャプチャシステム２０の測定領域の中央に存在し、被験者はロボット１２の正面に立った状態から、リラックスして自然な気持ちでロボット１２とインタラクションするよう求められた。モーションキャプチャシステム２０の測定範囲内に存在することを要求した以外は、被験者に対してインタラクションについて何も要求していない。 Next, an experimental procedure will be described. At the start of the experiment, the robot 12 exists in the center of the measurement area of the motion capture system 20, and the subject was asked to interact with the robot 12 in a relaxed and natural manner from the state of standing in front of the robot 12. Except for requesting to be within the measurement range of the motion capture system 20, the subject is not requested to interact.

実験においては、ロボット１２と各被験者との間で、約３０分間のインタラクションを行った。この３０分の間に、上述した１０個のインタラクション行動をランダムに実行した。詳細な説明は省略するが、いずれのインタラクション行動を実行する場合にも、ロボット１２は、その腕や頭の動きを伴う。つまり、身体動作を伴うのである。たとえば、抱っこ(hug)では、ロボット１２が「抱っこしてね」と発声し、腕を広げ、これに応じて、人（被験者）がロボット１２の正面の適当な位置（距離）に立つと、その後、腕で当該人に抱きつく。３０分間のインタラクションを行い、上述したような強化学習を行った。また、上述したように、報酬関数２００はロボット１２が１つのインタラクション行動を終了(約１０秒)する毎に計算される。インタラクションパラメータΘの各成分（パラメータ）θ_jを少しずつ変化させ、Ｔ回（この実施例では、１０回）のインタラクションが終了すると、報酬からインタラクションパラメータΘの変動方向(勾配Ａ)を決定し、インタラクションパラメータΘを更新した。図１３には、各パラメータθ_jに対応して、各々の初期値およびステップサイズを示す第２テーブルが示される。具体的には、パラメータ「親密距離」では、初期値が５０（ｃｍ）であり、ステップサイズが１５（ｃｍ）である。パラメータ「個体距離」では、初期値が８０（ｃｍ）であり、ステップサイズが１５（ｃｍ）である。パラメータ「社会距離」では、初期値が１００（ｃｍ）であり、ステップサイズが１５（ｃｍ）である。パラメータ「注視時間」では、初期値が０．７であり、ステップサイズが０．１である。パラメータ「遅れ時間」では、初期値が０．１７（ｓ）であり、ステップサイズが０．３（ｓ）である。パラメータ「再生速度」では、初期値が１．０であり、ステップサイズが０．１である。 In the experiment, an interaction of about 30 minutes was performed between the robot 12 and each subject. During the 30 minutes, the ten interaction actions described above were randomly executed. Although a detailed description is omitted, the robot 12 is accompanied by movements of its arms and heads when performing any interaction action. That is, it involves physical movement. For example, in the hug, when the robot 12 says “Please hold me” and spreads the arm, in response, when a person (subject) stands at an appropriate position (distance) in front of the robot 12, Then hug the person with his arms. After 30 minutes of interaction, reinforcement learning as described above was performed. As described above, the reward function 200 is calculated every time the robot 12 finishes one interaction action (about 10 seconds). Each component (parameter) θ _j of the interaction parameter Θ is changed little by little, and when the interaction of T times (10 times in this embodiment) is completed, the fluctuation direction (gradient A) of the interaction parameter Θ is determined from the reward, Updated the interaction parameter Θ. FIG. 13 shows a second table showing each initial value and step size corresponding to each parameter θ _j . Specifically, in the parameter “intimate distance”, the initial value is 50 (cm) and the step size is 15 (cm). The parameter “individual distance” has an initial value of 80 (cm) and a step size of 15 (cm). In the parameter “social distance”, the initial value is 100 (cm) and the step size is 15 (cm). In the parameter “gazing time”, the initial value is 0.7 and the step size is 0.1. In the parameter “delay time”, the initial value is 0.17 (s) and the step size is 0.3 (s). In the parameter “reproduction speed”, the initial value is 1.0 and the step size is 0.1.

インタラクション後、被験者にロボット１２の動きとインタラクションについて、ロボット１２の動き、距離、視線の合わせ方の印象と、実験中それらがどのように変化していったかを聞き、個人距離の測定を行った。親密距離、個体距離、社会距離、それぞれについてモーションを行っているロボット１２の正面の適当と感じる位置へ被験者に立ってもらい、モーションキャプチャシステム２０で距離を測定した。ここでは、注視時間０．７５、遅れ時間０．３（ｓ）、再生速度１．０とし、親密距離についてはインタラクション「抱っこ」を用い、個体距離についてはインタラクション「握手」を用い、社会距離についてはインタラクション「ありがとう(Hold “thank you” monologue)」を用いた。また、適当と感じる距離からロボット１２を近づけた場合と、逆にロボット１２を遠ざけた場合とで、被験者が距離を適切でないと感じる位置を測定した。 After the interaction, the subjects were asked about the movement and interaction of the robot 12, the impression of how to adjust the movement, distance, and line of sight of the robot 12, and how they changed during the experiment, and the individual distance was measured. . The subject was allowed to stand at a suitable position on the front of the robot 12 performing the motion for the intimate distance, the individual distance, and the social distance, and the distance was measured by the motion capture system 20. Here, the gaze time is 0.75, the delay time is 0.3 (s), the playback speed is 1.0, the interaction “catch” is used for the intimate distance, the interaction “handshake” is used for the individual distance, and the social distance is determined. Used the “Hold“ thank you ”monologue” interaction. Further, the position at which the subject felt the distance was not appropriate was measured when the robot 12 was brought closer to the appropriate distance and when the robot 12 was moved away.

さらに、１つのパラメータθ_jのみを低、中、高と３通りに変化させ、他のパラメータθ_jを全被験者の平均値に固定した場合のロボット１２のインタラクションモーションを被験者に見せ、適切と感じるものを選択してもらった。注視時間と再生速度の測定には、「ありがとう」のモーションを用い、人との距離は１．０（ｍ）とした。遅れ時間の測定には、「抱っこ」のモーションを用い、距離についてはロボット１２の移動を止めて、被験者に適切と思われる位置に立ってもらった。これは、親密距離は個人差が大きかったためである。被験者の中には、複数の値で適切であると感じた者や中間の値が適切であると感じた者がいた。 In addition, the interaction motion of the robot 12 when only one parameter θ _j is changed in three ways, low, medium, and high and the other parameter θ _j is fixed to the average value of all subjects is shown to the subject and feels appropriate. We had you choose thing. For the measurement of the gaze time and the reproduction speed, a “thank you” motion was used, and the distance from the person was set to 1.0 (m). For the measurement of the delay time, a “cuckling” motion was used, and with respect to the distance, the movement of the robot 12 was stopped, and the subject was allowed to stand in an appropriate position. This is because the intimate distance has a large individual difference. Some subjects felt that multiple values were appropriate, and some felt that intermediate values were appropriate.

このような実験を１５名の被験者に対して行った。被験者は、１名を除き、日本人で、全員がロボット１２の発話を聞き取ることが出来た。被験者の年齢は２０才から３５才で、多くは２０才から２５才であった。また、被験者のうち、６名が女性で、残りは男性であった。ただし、被験者の中に、ロボット１２について知っている者が多少いた。 Such an experiment was conducted on 15 subjects. Except for one subject, all the subjects were Japanese, and all of them were able to hear the utterance of the robot 12. Subjects were 20 to 35 years old, and many were 20 to 25 years old. Of the subjects, 6 were female and the rest were male. However, some of the subjects knew about the robot 12.

被験者のうち、３名は我々が期待したようには振舞わなかった。具体的には、ロボット１２のインタラクションが適当なものであっても、そうでなくても、顔の方向を変えたり、立ち位置を変えることなく、感想を言葉でロボット１２或いは実験者に述べたり、顔に表出したりするのみであった。このような被験者は、想定しているインタラクション評価モデルには当てはまらず、システム１０（強化学習の処理）は正しく動作しない。したがって、以下に説明する実験結果においては、これら被験者（３名の被験者）の結果を除いている。それ以外の多くの被験者に対しては、１５分から２０分(約１０回のＰＧＲＬのパラメータ更新)で適切な値にインタラクションパラメータΘが収束した。 Three of the subjects did not behave as we expected. Specifically, whether or not the interaction of the robot 12 is appropriate, even if it is not, change the direction of the face or change the standing position to describe the impression to the robot 12 or the experimenter. It was only appearing on the face. Such a subject does not apply to the assumed interaction evaluation model, and the system 10 (reinforcement learning process) does not operate correctly. Therefore, in the experimental results described below, the results of these subjects (three subjects) are excluded. For many other subjects, the interaction parameter Θ converged to an appropriate value in 15 to 20 minutes (about 10 PGRL parameter updates).

図１４（Ａ），（Ｂ），（Ｃ）には、１２名の被験者の距離（親密距離、個体距離、社会距離）について、適応の結果得られた値と被験者が適当と判断した値を示す。距離に関しては、全インタラクション最後の１／４の期間(約７分半)の平均を示している。これはＰＧＲＬが常に極所最適値を探索しているためである。図１４（Ａ）〜（Ｃ）において、「＊」印が適応した結果であり、縦棒は被験者を示し、横棒のうち短い棒は許容限度（許容範囲）を示し、横棒のうち長い棒は最適とした値（最適値）を示す。 14 (A), (B) and (C) show the values obtained as a result of adaptation and the values determined by the subjects as appropriate for the distances (intimacy distance, individual distance, social distance) of the 12 subjects. Show. Regarding the distance, the average of the last quarter period (about 7 and a half minutes) of all the interactions is shown. This is because PGRL always searches for the optimum local value. 14 (A) to 14 (C), “*” marks are the results of adaptation, the vertical bar indicates the subject, the short bar among the horizontal bars indicates the allowable limit (allowable range), and the long among the horizontal bars. The bar indicates the optimum value (optimum value).

図１５（Ａ），図１５（Ｂ）および図１５（Ｃ）は、注視時間(motion meeting ratio)、遅れ時間(waiting time)、モーション再生速度(motion speed)についての結果を示す。図１５（Ａ）〜図１５（Ｃ）において、「○」印は被験者が適当と判断した値であり、「＊」印は適応結果(全インタラクション最後の１／４の期間の平均)である。ただし、２つの値の中間が適当とした被験者については、中間に「▽」印を記してある。図１４（Ａ）〜図１４（Ｃ）および図１５（Ａ）〜図１５（Ｃ）から、被験者の判断との一致度合はパラメータθ_jによって大きく異なると言える。これは、それぞれのパラメータθ_jのインタラクションへの重要性が異なり、報酬へ寄与の大きいパラメータθ_jから収束し、許容範囲の広いパラメータθ_jの収束は遅くなるためである。 FIG. 15A, FIG. 15B, and FIG. 15C show the results for gaze time (motion meeting ratio), delay time (waiting time), and motion playback speed (motion speed). 15 (A) to 15 (C), “◯” indicates a value determined by the subject as appropriate, and “*” indicates an adaptation result (an average of the last quarter of all interactions). . However, for subjects whose midpoint between the two values is appropriate, a “▽” mark is marked in the middle. From FIG. 14 (A) to FIG. 14 (C) and FIG. 15 (A) to FIG. 15 (C), it can be said that the degree of coincidence with the judgment of the subject greatly varies depending on the parameter θ _j . This is because the importance of each parameter θ _{j to} the interaction is different, and it converges from the parameter θ _{j having} a large contribution to the reward, and the convergence of the parameter θ _{j having} a wide allowable range is delayed.

また、ロボット１２がよく適応できていた被験者の印象には、パラメータθ_jの変化があまり含まれない傾向があった。これは、自然な適応が行われると、パラメータθ_jの適応が認識されなくなる可能性を示唆している。 Also, the impression of the subject who was able to adapt well to the robot 12 tended to not include much change in the parameter θ _j . This suggests that when natural adaptation is performed, the adaptation of the parameter θ _j may not be recognized.

図１６は、各パラメータθ_jの最適とされる値（最適値）からの分散を１２名の被験者について平均した第３テーブルを示す。分散は、全インタラクションについての最後の１／４期間について計算した。これは、最後の１／４期間中におけるパラメータθ_jの変動の影響を分散に含めるためである。また、各パラメータθ_jは、その更新のステップサイズが１になるように正規化している。なお、図１６の第３テーブルでは、参考のため、右端に初期値の分散を示している。第３テーブルからも分かるように、個人距離、社会距離を除いて、ステップサイズの１．１倍以下になっている。許容される範囲は、個人距離についてはステップサイズの３倍であり、社会距離に関しては５倍であった。社会距離に関しては、1人の被験者を除き、適応結果は許容範囲に入った。したがって、ＰＧＲＬに基づいた適応により各パラメータθ_jは適切な値に収束したと言える。より誤差を小さくするには、ステップサイズをより小さくしたり、適応が進むにつれて徐々に小さくしたりする必要があると考えられる。 FIG. 16 shows a third table in which the variances from the optimum values (optimum values) of the parameters θ _j are averaged for 12 subjects. The variance was calculated for the last quarter period for all interactions. This is because the influence of the variation of the parameter θ _j during the last quarter period is included in the variance. Each parameter θ _j is normalized so that the update step size is 1. Note that in the third table of FIG. 16, the initial value variance is shown at the right end for reference. As can be seen from the third table, the step size is 1.1 times or less excluding personal distance and social distance. The allowable range was 3 times the step size for personal distance and 5 times for social distance. With regard to social distance, the adaptation results were within acceptable limits, with the exception of one subject. Therefore, it can be said that each parameter θ _j has converged to an appropriate value due to adaptation based on PGRL. In order to further reduce the error, it is considered necessary to reduce the step size or gradually decrease the adaptation.

また、第３テーブルに示すように、初期値もそれほど最適値から離れているわけではないが、最適値への収束には１０回程度の適応が必要となっている。ただし、図１４（Ａ）〜図１４（Ｃ）および図１５（Ａ）〜図１５（Ｃ）に示したように、被験者によっては、１５回〜２０回の適応でも最適値に収束しないパラメータθ_jがあった。また、報酬関数２００を意識し、現在のインタラクションパラメータΘに応じて、一貫して人が同じ振る舞いをした場合には、４〜５回の適応で収束した。なお、かかる場合には、シミュレーション上では、３〜４回で最適値へ収束することもあった。したがって、収束するまでに必要な適応の回数が多くなってしまう一因としては、人の動きが毎回一定ではないことが考えられる。以下では、適応結果により被験者を4つのグループに分け、更に詳細な実験結果を説明する。 Moreover, as shown in the third table, the initial value is not so far from the optimum value, but about 10 adaptations are required for convergence to the optimum value. However, as shown in FIGS. 14 (A) to 14 (C) and FIGS. 15 (A) to 15 (C), depending on the subject, the parameter θ that does not converge to the optimum value even after 15 to 20 adaptations. There was _j . Also, in consideration of the reward function 200, when a person consistently behaves in accordance with the current interaction parameter Θ, convergence was achieved with 4 to 5 adaptations. In such a case, the simulation may converge to the optimum value 3 to 4 times. Therefore, one possible reason for the increase in the number of adaptations required before convergence is that the human movement is not constant every time. In the following, the subjects are divided into four groups according to the adaptation results, and further detailed experimental results are described.

まず、最適値への適応が良好であり、被験者の印象も良い場合について説明する。ロボット１２は、３名の被験者（被験者２，１０，１２）に対してスムーズに適応した。各パラメータθ_jの最適値への収束が見られ、この３名の被験者は、「インタラクションについて適当と感じる」と述べた。つまり、ロボット１２がインタラクションしたときのインタラクションパラメータΘが適切であったと言える。この３名の被験者に見られた共通点は、ロボット１２とのインタラクションを楽しみ、ロボット１２であることを意識せず、人に対する場合と同様にロボット１２と接していたことである。 First, the case where the adaptation to the optimal value is good and the impression of the subject is good will be described. The robot 12 was smoothly adapted to three subjects (subjects 2, 10, 12). Convergence of each parameter θ _{j to} the optimum value was observed, and these three subjects stated that they felt “interaction is appropriate”. That is, it can be said that the interaction parameter Θ when the robot 12 interacted was appropriate. The common point seen by these three subjects is that they enjoyed interaction with the robot 12, were not conscious of being the robot 12, and were in contact with the robot 12 as in the case of humans.

図１７（Ａ）〜（Ｆ）は、被験者１０についての各パラメータθ_jの変化を示す。被験者１０は、「ロボット１２の振る舞い（インタラクション行動）の改善が速かった」と感想を述べている。図１７（Ａ）〜（Ｆ）からも分かるように、個体距離が若干最適値から離れているだけで、各パラメータθ_jは十分に最適値近くに収束しており、被験者１０の感想と一致する結果であると言える。また、被験者は、モーション開始のタイミング（遅れ時間）に関して許容範囲が広く、適応結果も許容範囲内に入っていることが分かる。 17A to 17F show changes in the parameters θ _j for the subject 10. The subject 10 states that “the behavior (interaction behavior) of the robot 12 has been improved rapidly”. As can be seen from FIGS. 17A to 17F, each parameter θ _j converges sufficiently close to the optimum value even if the individual distance is slightly away from the optimum value, and agrees with the impression of the subject 10. It can be said that this is the result. In addition, it can be seen that the subject has a wide allowable range with respect to the motion start timing (delay time), and the adaptation result is also within the allowable range.

次に、一部のパラメータθ_jが最適値に収束していないが、被験者の印象がよい場合について説明する。２名の被験者（被験者５，８）は、「ロボット１２の動作について印象が良かった」と回答したが、一部のパラメータθ_jは最適値から大きく外れていた。図１８（Ａ）〜図１８（Ｆ）は、被験者５に対する各パラメータθ_jの変化を示す。図１８（Ａ）〜図１８（Ｆ）からも分かるように、３つの個人距離に関しては最適値に収束しており、モーション開始のタイミング（遅れ時間）に関しては許容範囲が広く、適切に学習したと言えるが、他の２つのパラメータθ_j（注視時間，再生速度）は最適値から大きく離れている。しかし、被験者５は、「注視時間、再生速度についても適当であった」と述べた。この原因としては、実験中の条件と最適値を測定した条件の違い、或いは、被験者５のパラメータθ_j（特に、注視時間，再生時間）の許容範囲が実際には広かったと考えられる。また、被験者５は、他の被験者には見られない行動（振る舞い）を行った。具体的には、この被験者５は、社会距離に分類される、ロボット１２が話すインタラクションにおいても、ロボット１２の各部を触っていた。その結果、社会距離が他の被験者と比較してかなり短くなっている。また、このような振る舞いを予期していなかったが、他の被験者と同じ報酬関数２００（図５参照）により、ロボット１２は被験者５が満足する適応が出来たと言える。 Next, a case will be described in which some of the parameters θ _j do not converge to the optimum value but the impression of the subject is good. Two subjects (subjects 5 and 8) replied that “the impression of the operation of the robot 12 was good”, but some of the parameters θ _j were greatly deviated from the optimum values. FIG. 18A to FIG. 18F show changes in each parameter θ _j with respect to the subject 5. As can be seen from FIGS. 18 (A) to 18 (F), the three individual distances converge to the optimum value, and the motion start timing (delay time) has a wide allowable range, and is learned appropriately. However, the other two parameters θ _j (gaze time, reproduction speed) are far from the optimum values. However, test subject 5 stated that “the gaze time and playback speed were also appropriate”. This is considered to be due to the fact that the allowable range of the parameter θ _j (especially the gaze time, the reproduction time) of the subject 5 was actually wide. Moreover, the test subject 5 performed the behavior (behavior) which is not seen by other test subjects. Specifically, the subject 5 touched each part of the robot 12 even in the interaction spoken by the robot 12 classified as a social distance. As a result, the social distance is considerably shorter than other subjects. Moreover, although such behavior was not anticipated, it can be said that the robot 12 was able to satisfy the subject 5 with the same reward function 200 (see FIG. 5) as other subjects.

続いて、最適値へ収束（適応）したが、被験者が一部の適応について不満をもった場合について説明する。図１９（Ａ）〜（Ｆ）は、被験者７に対するロボット１２の各パラメータθ_jの適応を示す。各パラメータθ_jは最適値へ十分に収束しているように見られるが、被験者７は距離に関して近過ぎたと述べた。しかし、図１９（Ａ）〜（Ｃ）から分かるように、社会距離を除き被験者が許容する最も遠い距離近くに収束している。また、被験者７は、初期の印象としては「ためらった感じ」だったが、次第に「活発」になる印象を受けたと述べている。距離に関する印象が良くなかったのは、ロボット１２はＰＧＲＬによる適応により、被験者７の好みに合わせてモーションの再生速度を上げていったが、最適距離の測定はモーションの再生速度を「１」で行ったためと考えられる。この場合、被験者７にとっての最適距離は、より遠かった可能性がある。 Subsequently, the case where the subject has converged (adapted) to the optimum value but was unhappy with some of the adaptations will be described. FIGS. 19A to 19F show adaptation of each parameter θ _j of the robot 12 to the subject 7. Each parameter θ _j appears to have converged well to the optimal value, but subject 7 stated that it was too close in terms of distance. However, as can be seen from FIGS. 19A to 19C, it converges near the farthest distance allowed by the subject except for the social distance. In addition, the subject 7 stated that the initial impression was “feeling hesitated”, but the impression that gradually became “active” was received. The impression about the distance was not good because the robot 12 increased the motion playback speed according to the preference of the subject 7 due to the adaptation by PGRL, but the optimal distance was measured by setting the motion playback speed to “1”. It is thought that it went. In this case, the optimal distance for the subject 7 may be longer.

次に、一部のパラメータθ_jが最適値へ収束せず、被験者も一部の適応に不満を持った場合について説明する。５名の被験者（被験者１，３，４，６，１１）については、一部のパラメータθ_jが最適値へ収束せず、各被験者もそれらのパラメータθ_jの適応結果については不満を述べた。図２０（Ａ）〜（Ｆ）は、被験者１に対するパラメータθ_jの変化の様子を示す。なお、この実験は、トラブルにより他の被験者よりも実験時間が２１分間と短くなっている。 Next, a case will be described in which some parameters θ _j do not converge to the optimum values and the subject is also dissatisfied with some adaptations. For five subjects (subjects 1, 3, 4, 6, and 11), some parameters θ _j did not converge to the optimum values, and each subject also complained about the adaptation results of those parameters θ _j . 20A to 20F show how the parameter θ _j changes with respect to the subject 1. In this experiment, the experiment time is 21 minutes shorter than other subjects due to trouble.

図２０（Ａ）〜（Ｃ）を参照して分かるように、個体距離と社会距離とについては最適値へ収束しているが、親密距離については許容範囲に入っていない。これは、被験者１に対してロボット１２が取った親密距離が許容範囲外にあり、どの距離に対しても被験者１はほぼ同じ振る舞いであったため、最適な距離に近付くようパラメータθ_jを変化することができなかったと考えられる。また、被験者１は親密距離が不適当であったと述べている。図２０（Ｄ）に示すように、視線を合わせる頻度（注視時間）は、適応により約９０％になっている。被験者１は、注視時間については、１００％が最もよく、７５％〜５０％程度でもよいと述べたので、適当な値に収束していると言える。被験者１は、遅れ時間についてはあまり気にならないと述べ、再生速度についてはどの値でもよいと述べたため、親密距離以外はうまく適応したと言える。 As can be seen with reference to FIGS. 20A to 20C, the individual distance and the social distance converge to the optimum values, but the intimate distance is not within the allowable range. This is because the intimate distance taken by the robot 12 with respect to the subject 1 is outside the allowable range, and the subject 1 behaved almost the same for any distance, so the parameter θ _j is changed so as to approach the optimum distance. It is thought that it was not possible. Subject 1 also states that the intimate distance was inappropriate. As shown in FIG. 20D, the frequency (gaze time) for matching the line of sight is about 90% due to adaptation. Since the test subject 1 stated that 100% is the best and about 75% to 50% may be sufficient for the gaze time, it can be said that the subject 1 has converged to an appropriate value. Since the test subject 1 stated that he / she did not care much about the delay time and stated that any value could be used for the playback speed, it can be said that the subject 1 was well adapted except for the intimate distance.

図２１（Ａ）〜（Ｆ）は、被験者３に対するパラメータθ_jの変化の様子を示す。被験者３は、「個体距離が実験の前半で不適当であった」と指摘した。このことは、図２１（Ｂ）の個体距離のグラフと一致している。視線合わせ頻度（注視時間）については、図２１（Ｄ）示すように、適応の結果は７５％程度であり、最適値は１００％であったが、被験者３は「十分に満足できた」と述べた。親密距離の適応結果は、ロボット１２が安全のため人に接触しないように設けた下限の１５（ｃｍ）になっている。この被験者３は、タイミング（遅れ時間）については許容範囲が広かったため、図２１（Ｅ）に示すように、適応の結果は適当と言える。ただし、図２１（Ｆ）に示すように、被験者３では、再生速度の適応結果が最適値から大きく離れている。また、被験者３は、再生速度は不適当だったと述べている。これは、被験者３は、モーションが速すぎると、ロボット１２をじっと見る傾向があり、ロボット１２が再生速度を上げ過ぎて、報酬が誤って大きくなったためと考えられる。 21A to 21F show how the parameter θ _j changes with respect to the subject 3. Subject 3 pointed out that "individual distance was inappropriate in the first half of the experiment". This is consistent with the individual distance graph in FIG. With regard to the line-of-sight frequency (gaze time), as shown in FIG. 21D, the adaptation result was about 75% and the optimum value was 100%, but the subject 3 was “sufficiently satisfied”. Stated. The adaptation result of the intimate distance is a lower limit of 15 (cm) provided so that the robot 12 is not in contact with a person for safety. Since the subject 3 has a wide allowable range with respect to timing (delay time), the adaptation result can be said to be appropriate as shown in FIG. However, as shown in FIG. 21F, in the test subject 3, the adaptation result of the reproduction speed is far from the optimum value. Subject 3 also states that the playback speed was inappropriate. This is probably because the subject 3 has a tendency to stare at the robot 12 when the motion is too fast, and the reward is accidentally increased because the robot 12 increases the reproduction speed too much.

最後に、上手く適応できなかった場合について説明する。図２２（Ａ）〜（Ｆ）は、被験者９に対するパラメータθ_jの変化の様子を示す。図２２（Ａ）〜（Ｆ）から分かるように、個人距離と視線合わせ頻度（注視時間）以外のパラメータθ_jは最適値から大きく離れている。被験者９は、「ロボビー（ロボット１２）に嫌われていて、ロボビーはいやいや普通に振舞うよう努力している印象を受けた」と述べている。実験中の様子からは大きな問題があるとは観察されなかったが、被験者９が素直にロボット１２に対して反応しなかった可能性がある。 Finally, the case where the adaptation was not successful will be described. 22A to 22F show how the parameter θ _j changes with respect to the subject 9. As can be seen from FIGS. 22A to 22F, the parameter θ _j other than the personal distance and the line-of-sight frequency (gaze time) is far from the optimum value. Subject 9 states that “I was hated by Robbie (robot 12) and Robbie had no impression of trying to act normally. Although it was not observed that there was a big problem from the state during the experiment, there is a possibility that the subject 9 did not react to the robot 12 in a straightforward manner.

以上より、ロボット１２の個人適応の実現を確認することができた。また、適応したロボット１２の振る舞いが自然に見えたという感想が被験者から得られている。個人適応は、より自然に人とインタラクションを行えるロボット１２の実現への重要な要素の一つであり、この実施例における手法はその一歩となると言える。 From the above, it was possible to confirm the personal adaptation of the robot 12. Moreover, the test subject has obtained the impression that the behavior of the adapted robot 12 seemed natural. Personal adaptation is one of the important elements for realizing the robot 12 that can interact with people more naturally, and it can be said that the method in this embodiment is one step.

この実施例によれば、ＲＰＧＬによりロボットの振る舞いをインタラクション相手に合わせることができるので、人間同士がコミュニケーションするように、自然なコミュニケーションを実行することができる。 According to this embodiment, since the behavior of the robot can be matched to the interaction partner by RPGL, natural communication can be executed so that humans can communicate with each other.

なお、これらの実施例では、モーションキャプチャシステムを用いて、ロボットおよび人間の３次元位置とロボットおよび人間の視線方向とを検出するようにしたが、これは他のセンサを用いて検出することも可能である。たとえば、ロボットに、ステレオカメラ（イメージセンサ）や超音波センサを搭載すれば、超音波センサの出力やステレオカメラによる視差で、人間との距離を測定することができる。また、人間の顔の向きは、カメラの画像から顔の向きをパターンマッチングして検出することができる。ただし、超音波センサとしては、ロボットに搭載されている超音波距離センサを用いたりすることが可能である。 In these embodiments, the motion capture system is used to detect the three-dimensional position of the robot and the human and the direction of the line of sight of the robot and the human, but this may be detected using other sensors. Is possible. For example, if a stereo camera (image sensor) or an ultrasonic sensor is mounted on the robot, the distance from the human can be measured by the output of the ultrasonic sensor or the parallax from the stereo camera. Further, the orientation of the human face can be detected by pattern matching of the orientation of the face from the camera image. However, an ultrasonic distance sensor mounted on the robot can be used as the ultrasonic sensor.

また、この実施例では、方策勾配型強化学習により、インタラクションパラメータを更新するようにしたが、これに限定される必要はなく、他のアルゴリズムによって更新することもできる。たとえば、遺伝的アルゴリズムによりインタラクションパラメータを更新することができる。 In this embodiment, the interaction parameter is updated by the policy gradient reinforcement learning. However, the interaction parameter is not limited to this, and can be updated by another algorithm. For example, the interaction parameters can be updated by a genetic algorithm.

さらに、この実施例では、今回試すインタラクションパラメータΘ_iの決定処理）（図９）では、ｉ番目に試すパラメータθ_jを決定する場合に、Δ（０，ε_j，−ε_j）を用いるようにしてあるが、これに限らず、乱数を用いることもできる。ただし、乱数を用いる場合には、これに従ってインタラクションパラメータΘの更新処理（図１０）も変更する必要がある。 Further, in this embodiment, in the determination process of the interaction parameter Θ _{i to be} tried this time (FIG. 9), Δ (0, ε _j , −ε _j ) is used when determining the i-th parameter θ _j to be tried. However, the present invention is not limited to this, and a random number can be used. However, when a random number is used, it is necessary to change the update processing of the interaction parameter Θ (FIG. 10) accordingly.

さらにまた、この実施例では、対人距離として、親密距離、個体距離、社会距離を用いるようにしたが、これに限定されるべきではない。たとえば、握手専用の「握手距離」、挨拶用の「挨拶距離」のような他の距離を仮定して調整した方が良い結果が出る可能性がある。 Furthermore, in this embodiment, the intimate distance, the individual distance, and the social distance are used as the interpersonal distance, but should not be limited to this. For example, there is a possibility that a better result may be obtained by adjusting other distances such as “shake distance” for handshake and “greeting distance” for greeting.

図１はこの発明のコミュニケーションロボットシステムの一例を示す図解図である。FIG. 1 is an illustrative view showing one example of a communication robot system of the present invention. 図２は図１実施例に示すロボットの外観を説明するための図解図である。FIG. 2 is an illustrative view for explaining the appearance of the robot shown in FIG. 1 embodiment. 図３は図１および図２に示すロボットの電気的な構成を示す図解図である。FIG. 3 is an illustrative view showing an electrical configuration of the robot shown in FIGS. 1 and 2. 図４はモーションキャプチャシステムで検出するマーカのロボットおよび人間の装着状態およびカメラの配置例を示す図解図である。FIG. 4 is an illustrative view showing a mounting state of a marker robot and a human being detected by a motion capture system and a camera. 図５は図１および図２に示すロボットのＣＰＵによって演算される報酬関数の機能ブロック図である。FIG. 5 is a functional block diagram of a reward function calculated by the CPU of the robot shown in FIGS. 図６は図５に示す報酬関数において人間がロボットの方を向いていると判断される場合の角度を説明するための図解図である。FIG. 6 is an illustrative view for explaining an angle when it is determined in the reward function shown in FIG. 5 that a human is facing the robot. 図７は図３に示すＣＰＵの強化学習の処理の一部を示すフロー図である。FIG. 7 is a flowchart showing a part of the reinforcement learning process of the CPU shown in FIG. 図８は図３に示すＣＰＵの強化学習の処理の他の一部であり、図７に後続するフロー図である。FIG. 8 is another part of the CPU reinforcement learning process shown in FIG. 3, and is a flowchart subsequent to FIG. 図９は図３に示すＣＰＵのΘ_iの決定処理を示すフロー図である。FIG. 9 is a flowchart showing the Θ _i determination process of the CPU shown in FIG. 図１０は図３に示すＣＰＵのΘの更新処理を示すフロー図である。FIG. 10 is a flowchart showing the Θ update process of the CPU shown in FIG. 図１１は図１に示すシステムを適用した実験環境を説明するための図解図である。FIG. 11 is an illustrative view for explaining an experimental environment to which the system shown in FIG. 1 is applied. 図１２は対人距離に対するロボットの振る舞いを示す第１テーブルを示す図解図である。FIG. 12 is an illustrative view showing a first table showing the behavior of the robot with respect to the interpersonal distance. 図１３は実験におけるパラメータの初期値およびステップサイズを示す第２テーブルを示す図解図である。FIG. 13 is an illustrative view showing a second table showing initial values and step sizes of parameters in the experiment. 図１４は１２名の被験者の距離について、適応の結果得られた値と被験者が適当と判断した値とを示す図解図である。FIG. 14 is an illustrative view showing a value obtained as a result of adaptation and a value determined to be appropriate by the subject with respect to a distance of 12 subjects. 図１５は１２名の被験者の注視時間、遅れ時間およびモーション再生速度についての適応の結果を示す図解図である。FIG. 15 is an illustrative view showing the results of adaptation for the gaze time, delay time, and motion playback speed of 12 subjects. 図１６は各パラメータの最適地からの分散を１２名の被験者について平均した値および初期値の分散を示す第３テーブルを示す図解図である。FIG. 16 is an illustrative view showing a third table showing dispersion of values and initial values obtained by averaging dispersion of parameters from the optimum place for 12 subjects. 図１７は被験者１０についてのパラメータの変化の様子を示すグラフである。FIG. 17 is a graph showing how the parameters of the subject 10 change. 図１８は被験者５についてのパラメータの変化の様子を示すグラフである。FIG. 18 is a graph showing how the parameters of the subject 5 change. 図１９は被験者７についてのパラメータの変化の様子を示すグラフである。FIG. 19 is a graph showing how the parameters of the subject 7 change. 図２０は被験者１についてのパラメータの変化の様子を示すグラフである。FIG. 20 is a graph showing how the parameters of the subject 1 change. 図２１は被験者３についてのパラメータの変化の様子を示すグラフである。FIG. 21 is a graph showing how the parameters of the subject 3 change. 図２２は被験者９についてのパラメータの変化の様子を示すグラフである。FIG. 22 is a graph showing how the parameters of the subject 9 change.

Explanation of symbols

１０ …コミュニケーションロボットシステム
１２ …コミュニケーションロボット
２０ …モーションキャプチャシステム
３８ …衝突センサ
４２ …超音波距離センサ
５２ …全方位カメラ
５４ …眼カメラ
６４ …タッチセンサ
７６ …ＣＰＵ
８０ …メモリ
８２ …モータ制御ボード
８４ …センサ入力／出力ボード
８６ …音声入力／出力ボード
８８−９６ …モータ
９８ …通信ＬＡＮボード
１００ …無線通信装置
１０２ …データベース DESCRIPTION OF SYMBOLS 10 ... Communication robot system 12 ... Communication robot 20 ... Motion capture system 38 ... Collision sensor 42 ... Ultrasonic distance sensor 52 ... Omnidirectional camera 54 ... Eye camera 64 ... Touch sensor 76 ... CPU
DESCRIPTION OF SYMBOLS 80 ... Memory 82 ... Motor control board 84 ... Sensor input / output board 86 ... Voice input / output board 88-96 ... Motor 98 ... Communication LAN board 100 ... Wireless communication apparatus 102 ... Database

Claims

A communication robot that interacts with humans,
Parameter setting means to set parameters for interaction,
Interaction executing means for executing an interaction including at least one of speech and physical movement according to the parameter set by the parameter setting means;
A communication robot, comprising: an appropriateness detection unit that detects an appropriateness of the parameter during an interaction; and an optimization unit that optimizes the appropriateness detected by the appropriateness detection unit.

A moving distance detecting means for detecting a moving distance of the human during the interaction; and a time detecting means for detecting a time during which the human looks at the face of the communication robot itself during the interaction,
The appropriateness detection means detects the appropriateness of the parameter based on the detection result of at least one of the action distance detection means and the time detection means when the interaction is executed with the parameter set by the parameter setting means. The communication robot according to claim 1.

The parameter includes at least one of an interpersonal distance in the interaction with the human, a length of time for directing his / her face to the human face, a delay time from an utterance to the start of a physical motion, and a motion speed of the physical motion. The communication robot according to claim 1 or 2, further comprising:

The communication robot according to claim 3, wherein the interpersonal distance includes an intimate distance, an individual distance, and a social distance.

The communication robot according to claim 1, wherein the optimization unit includes a parameter update unit that updates the parameter.

Parameter storage means for storing the parameter corresponding to the person, and human identification means for identifying the person at the start of an interaction,
The parameter setting means sets the parameter when the parameter corresponding to the person identified by the person identification means is stored in the parameter storage means, and the parameter corresponding to the person identified by the person identification means 6. The communication robot according to claim 1, wherein when the parameter is not stored by the parameter storage unit, an average value of all the parameters stored by the parameter storage unit is set.