JP2007220076A

JP2007220076A - Interaction device

Info

Publication number: JP2007220076A
Application number: JP2006194827A
Authority: JP
Inventors: R Movellan Javier; アール．モベランハビエル; Fumihide Tanaka; 文英田中
Original assignee: Sony Corp; University of California San Diego UCSD
Current assignee: Sony Corp; University of California San Diego UCSD
Priority date: 2006-01-18
Filing date: 2006-07-14
Publication date: 2007-08-30

Abstract

PROBLEM TO BE SOLVED: To provide a robot apparatus for determining whether the human being is present in the outside world or not by using only a simple input/output sensor. SOLUTION: A social robot 10 is configured by an optimal engine 11, real time controller 12 and an interaction manager 13 and own controller is set in order to maximize expectation of information defined between a hypothesis about an interaction object and own input/output. COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、ロボットとユーザーとの間のより緊密な相互作用を可能にする相互作用装置に関する。 The present invention relates to an interaction device that allows closer interaction between a robot and a user.

知覚におけるインフォマックスの考え方はリンスカー等による研究に遡る（例えば、非特許文献２、１２参照）。 The idea of infomax in perception goes back to research by Linsker et al. (See, for example, Non-Patent Documents 2 and 12).

このアプローチはベルとセジュノフスキーにより非ガウス的回路網に一般化され、インフォマックスＩＣＡとなった。インフォマックスは計算論的神経科学においては重要な理論的ツールとなっている。しかしながら、この研究の大部分はインフォマックスの受動的観点から見ている。インフォマックス・プロセッサの目標は、入力に関する可能な限り多くの情報を次の処理段階に簡単に送ることにある。他方、インフォマックス制御は、可能な限り迅速に関心の仮説を発見するために行動を予定することができる能動プロセッサとともに働く。例えば、神経細胞は、単に情報を伝達するためではなく、フィードバック接続を通じて、可能な限り早急に世界の関連状態を発見するために、急増することもできる。神経細胞の能動的役割に関する同様の推論的考え方がクプロフにより快楽主義的神経細胞仮説の形で定式化されている。 This approach was generalized to non-Gaussian networks by Bell and Seznovsky and became Infomax ICA. Infomax has become an important theoretical tool in computational neuroscience. However, most of this work looks from the Infomax passive perspective. The goal of the infomax processor is to easily send as much information about the input as possible to the next processing stage. On the other hand, infomax control works with active processors that can schedule actions to discover hypotheses of interest as quickly as possible. For example, nerve cells can be proliferated to discover the relevant state of the world as soon as possible through feedback connections, not just to convey information. A similar speculative view of the active role of neurons has been formulated by Kuprof in the form of the pleasurable nerve cell hypothesis.

テリーにおける学習規則
偶発事象を発見するために行動を予定するという問題は形式上２群バンディット問題と関連している。古典的２群問題においては、２本のレバーのどちらがリターン速度を最大化するかを判断するために、２本のレバーのどちらかを引っ張らなければならない。その２つの条件が等しいか等しくないかを決定する追加の陰の変数Ｈが含まれるように、我々は問題を修正した。従って、２群問題において目標が２郡のどちらの方が優れているかを判断することにあるのに対して、偶発事象探知問題においては目標が２群の間に差があるかどうかを判断することにある。この微妙な差が重要な結果を有する。例えば、標準２群バンディット問題においては、１群だけを複数回引っ張ることにより判断を下すことが可能である。それは、もし２群の一方が異常に大きなリターンをもたらす場合は、そのことがすでにその群の方が優れているという証拠になるからである。しかしながら、偶発事象判断問題においては、２群が少なくとも１回はプールされるまでは、情報は得られないのである。 Learning rules in Terry The problem of scheduling actions to find contingencies is formally associated with the two-group bandit problem. In the classic two-group problem, one of the two levers must be pulled to determine which of the two levers maximizes the return speed. We have corrected the problem to include an additional implicit variable H that determines whether the two conditions are equal or not equal. Therefore, in the two-group problem, the goal is to determine which of the two groups is better, whereas in the incident detection problem, the goal is to determine whether there is a difference between the two groups. There is. This subtle difference has important consequences. For example, in the standard 2 group bandit problem, it is possible to make a judgment by pulling only 1 group several times. That's because if one of the two groups yields an unusually large return, it is already evidence that the group is better. However, in the incidental event determination problem, information is not available until the two groups are pooled at least once.

従来より、知覚過程や神経過程に対するインフォマックス・アプローチについて各種の提案がされている（例えば、非特許文献２、１１、１２、２０参照）。 Conventionally, various proposals have been made on infomax approaches to perceptual processes and neural processes (for example, see Non-Patent Documents 2, 11, 12, and 20).

しかしながら、情報を次の処理段階に伝達するように設計されているという点で、これらのモデルは受動的である。その代わりに、ここでは、情報の長期収集を最大化するために行動を経時選択するモデルを強調してある。 However, these models are passive in that they are designed to convey information to the next processing stage. Instead, it emphasizes a model that selects behaviors over time to maximize long-term collection of information.

インフォマックス制御は、最適化問題の行動を説明する人間の運動制御のモデルの伝統の中に見ることもできる（例えば、非特許文献５、７、１３、２７参照）。 Infomax control can also be found in the tradition of human motion control models that explain the behavior of optimization problems (see, for example, Non-Patent Documents 5, 7, 13, 27).

しかしながら、ここに提唱したアプローチは、時間的尺度と不確実性のレベルが伝統的な運動制御問題の場合よりもはるかに大きい社会的行動にも適用されるという点で固有の特色を有する。 However, the approach proposed here has unique features in that it applies to social behavior where the time scale and level of uncertainty are much larger than in the case of traditional motor control problems.

また、概念学習課題において人々がどのように質問を選ぶのかを説明するために、情報最大化の考え方を使用していた（例えば、非特許文献１６、１７、１８参照）。 In addition, the concept of information maximization was used to explain how people choose questions in a conceptual learning task (see, for example, Non-Patent Documents 16, 17, and 18).

人々がどのように目を動かすのか、又は関心のある事象に関するデータ収集を最大化するために、どのように能動カメラを動かすことができるのかを理解するために、インフォマックス型制御もすでに提唱されている（例えば、非特許文献４、８、９、１０、２１参照）。 Infomax control has already been advocated to understand how people move their eyes or how they can move their active cameras to maximize data collection on events of interest. (For example, see Non-Patent Documents 4, 8, 9, 10, 21).

しかしながら、これらのモデルはいくつかの重要な問題を扱っていない。
（１）行動の順序付けの説明に焦点を合わせている以前のモデルは行動のタイミングの問題を扱っていなかった。例えば、[非特許文献１６、１７、１８]においては、概念学習は、主体が時間制約なしで質問を行い、答えを与えられる厳密な交代活動であると見られている。
（２）以前のモデルは、提起されている情報最大化問題を解決していない。モデルはよくても「貪欲」であり、最悪の場合は非因果的である。例えば、[非特許文献１７]においては、長期的に戻される情報というよりはむしろ即時情報リターンを最大化する質問が行われている。 However, these models do not address some important issues.
(1) Previous models that focused on explaining behavior ordering did not address the issue of action timing. For example, in [Non-Patent Documents 16, 17, 18], conceptual learning is considered to be a strict shift activity in which the subject asks questions without time constraints and is given answers.
(2) The previous model does not solve the information maximization problem that has been raised. The model is at best “greedy” and in the worst case non-causal. For example, in [Non-Patent Document 17], a question is made to maximize the immediate information return rather than the information returned in the long term.

[非特許文献１０]においては、最初に全ての可能な目の動きを行い、次にたまたま最大の情報を提供した目の動きを選ぶことが観察者には許されている。このアプローチは目的のモデル化には役立つが、非因果的である、すなわち、現在のサッカードを引き起こすためには、将来が見えることが必要である。 In [Non-Patent Document 10], the observer is allowed to perform all possible eye movements first and then select the eye movement that happens to provide the most information. This approach is useful for modeling objectives, but is non-causal, that is, the future needs to be visible in order to trigger the current saccade.

ロボット工学の分野においては、行動ロボット・アークテクチャと認知ロボット・アーキテクチャとを区別することが一般的になっている（例えば、非特許文献１参照）。 In the field of robot engineering, it is common to distinguish between behavioral robot architecture and cognitive robot architecture (see Non-Patent Document 1, for example).

行動アークテクチャはセンサとアクチュエータとの間の直接マッピングに基づくものである。このアークテクチャはタイミングと、環境内における変化に対する迅速な反応を強調する。認知アークテクチャは典型的には計画・熟考過程と世界表現の構築に依存している。適切な数学的基礎がなければ、表現、熟考、知識のような概念はほとんど無意味になる。例えば、コントローラがただ単にセンサ情報と行動との間の因果マッピングであるという点で、ここに提唱したインフォマックス制御の枠組みは反応的である。 Behavioral architecture is based on direct mapping between sensors and actuators. This architecture emphasizes timing and quick response to changes in the environment. Cognitive architecture typically relies on planning and contemplation processes and the construction of world representation. Without an appropriate mathematical foundation, concepts like expression, contemplation, and knowledge are almost meaningless. For example, the infomax control framework proposed here is reactive in that the controller is simply a causal mapping between sensor information and behavior.

インフォマックス制御の考え方は、逐次決定過程へのベイズ的アプローチ、特にｎ群バンディット問題に対するベイズ的解に直接関連付けられている。本論文における筆者の寄与は、どのようにすればこの良く知られた一群の問題をリアルタイム社会的相互作用の理解のために適応させることができるのかということと、相互情報を有効な強化信号として使用できることを示すことにある。 The idea of infomax control is directly related to the Bayesian approach to the sequential decision process, especially the Bayesian solution to the n-group bandit problem. The author's contribution in this paper is how this well-known group of problems can be adapted for understanding real-time social interactions, and mutual information as an effective reinforcement signal. It is to show that it can be used.

ゲームの理論は、制御理論の特別な事例であると見なすことができるが、特に経済学と紛争の研究において、人間の社会的行動への適用の長い歴史を有する。しかしながら、リアルタイム社会的行動の重要性を理解するための制御の重要性が文献に登場したのはつい最近のことである。リアルタイム社会的相互作用の最適性を理解するために、発明者等は[非特許文献１５]で特にｎバンディット問題の確率最適制御の潜在的価値を指摘した。ウォルパート、ドヤ、カワトは運動制御と社会的相互作用についての統合枠組みを提唱した。ミヤシタとイシグロは、伝達行動を作り出すために簡単なＰＩＤコントローラを使用できることを指摘した。 Game theory can be seen as a special case of control theory, but has a long history of application to human social behavior, especially in economics and conflict research. However, it is only recently that the importance of control to understand the importance of real-time social behavior has appeared in the literature. In order to understand the optimality of real-time social interaction, the inventors pointed out the potential value of probabilistic optimal control of the n-bandit problem in [Non-Patent Document 15]. Walpert, Doya and Kawato proposed an integrated framework for motor control and social interaction. Miyashita and Ishiguro pointed out that a simple PID controller can be used to create transmission behavior.

次に、簡単な社会的相互作用について説明する。 Next, a simple social interaction will be described.

１偶発事象探知と社会的発達
ジョン・ワトソンは偶発事象検知が幼児の社会的・感情的発達において決定的に重要な役割を演ずると述べている。偶発事象は人間の脳により直接的な形で知覚され、同様に脳は色や運動のようなその他の要素を知覚する。特に、幼児の初期段階においては、偶発事象が世話をする人の定義と認識についての基本的な情報源である（例えば、非特許文献２４、２５参照）。 1 Contingency event detection and social development John Watson states that contingent event detection plays a critical role in the social and emotional development of infants. Accidental events are perceived directly by the human brain, and similarly the brain perceives other elements such as color and movement. In particular, in the early stages of infants, incidental events are a basic source of information about the definition and recognition of the caregivers (see, for example, non-patent documents 24 and 25).

この見解は、２ヶ月児がベビーヘッドの上方のモービルを作動させるために自分の頭を動かすことを学習した実験から得られたものである（例えば、非特許文献２４参照）。実験群の幼児には、幼児の頭の動きに反応するモービルが与えられた。対照群の幼児については、モービルは実験群の場合と同じ速度ではあるが、ランダムで、非偶発的な形で作動した。１日４回各１０分のこのモービルの体験活動と平均約２００回の反応の後に、実験群の幼児は対照群の幼児よりも相当に高い反応速度を示した。より重要であるのは、ほぼ同時に、実験群の幼児が、世話をする人に向けられるのが典型的である社会的反応を多数示し始めたということである。これらの社会的反応には、旺盛な社会的微笑、のど鳴らし、モービルに対する積極的感情が含まれていた。偶発事象が同種のものを定義し、識別するための手掛かりとして幼児により使用されており、この手掛かりは人間の顔の目に見える表情のようなその他の知覚的先入観よりも重要である、とワトソンは述べている。 This view is derived from an experiment in which a 2-month-old child learned to move his head to activate a mobile above the baby head (see, for example, Non-Patent Document 24). Infants in the experimental group were given a mobile that responded to the infant's head movement. For the control group of infants, the mobile operated at the same rate as the experimental group, but in a random, non-incident manner. After 10 minutes each of this mobile experience and four times a day, the experimental group infants showed a much higher response rate than the control group infants. More importantly, at about the same time, the infants in the experimental group began to show a number of social responses that are typically directed at the caregiver. These social responses included vigorous social smiles, thirsts, and positive emotions towards mobile. Watson says that contingencies are used by young children as clues to define and identify the same, and that these cues are more important than other perceptual prejudices, such as the visible facial expressions of human faces Says.

ワトソンは社会的偶発事象判断のためのポアソンモデルを定式化した。このモデルにおいては、背景エージェントと社会的エージェントがポアソン過程としてモデル化される。ワトソンの最初の定式化においては、どのようにして行動を最適に予定するか、又はこのモデルの下でどのようにして推定を行うのかという問題は取り扱われていなかった。その代わりに提唱されたのが、一定の長さの間隔の範囲内では急増することのない確率の比較に基づく発見的アプローチであった。 Watson formulated a Poisson model for judging social contingencies. In this model, background agents and social agents are modeled as Poisson processes. Watson's initial formulation did not address the question of how to schedule actions optimally or how to make an estimate under this model. Instead, he proposed a heuristic approach based on comparing probabilities that did not increase rapidly within a certain length interval.

１９８６年に、発明者等は１０ヶ月児が新しい社会的エージェントを探知するためにどのように偶発事象情報を使用するのかを試験するために実験を行った（例えば非特許文献１５、１９参照）。 In 1986, the inventors conducted experiments to test how a 10-month-old child uses incidental event information to detect new social agents (see, for example, Non-Patent Documents 15 and 19). .

幼児をあまり人間には似ていないロボットの前に座らせた。「頭」は、その側面が幾何学的パターンを含む長方形のプリズムであった(図１（Ａ）を参照)。ロボットの頭はその表面上に光を点滅させ、音声を発し、左右に回転することができた。幼児は任意に実験群または対照群に割り当てられた。実験群では、ロボットは、人間の偶発事象特性をシミュレーションした形で環境に反応するようにプログラミングされていた。対照群の各幼児には実験群の１人の幼児を対応させ、対応被験者と同じ時間分布の中央のロボットの光、音声、回転を体験させた。しかしながら、対照群では、ロボットは幼児の行動または室内のその他の何らかの事象には反応しなかった。 An infant sits in front of a robot that does not resemble a human being. The “head” was a rectangular prism whose side included a geometric pattern (see FIG. 1A). The robot's head flashed light on its surface, made a sound, and was able to rotate left and right. Infants were arbitrarily assigned to experimental or control groups. In the experimental group, robots were programmed to react to the environment in a simulated form of human contingency characteristics. Each infant in the control group was associated with one infant in the experimental group and experienced the light, voice, and rotation of the central robot with the same time distribution as the corresponding test subject. However, in the control group, the robot did not respond to infant behavior or any other events in the room.

ここで、図１の（Ａ）は、非特許文献１９において使用したロボットの頭部５０の概略図である。図１の（Ｂ）は、乳児−９の写真である。ロボットの画像が乳児の背後に置かれた鏡に映っているのが見える。 Here, FIG. 1A is a schematic diagram of the head 50 of the robot used in Non-Patent Document 19. FIG. 1 (B) is a photograph of Infant-9. You can see the robot image in the mirror behind the baby.

１．１幼児の１日の４３秒間
その研究においては、実験群の幼児がロボットをまるで社会的エージェントであるかのように取り扱うという証拠が発見された。例えば、この群の幼児は対照群の幼児よりも５回も多く発声行為を行った。さらに、ロボットが回転した時には、ロボットの「注視線」を追い、注意の共有の若干の証拠を示した（例えば非特許文献１５参照）。しかしながら、我々が特に驚いたのが、何人かの幼児とロボットとの間で起こった相互作用の強さ、幼児の行動の明確な意図性、これらの相互作用が展開された速度であった。 1.1 Infants 43 seconds a day In the study, evidence was found that experimental infants treat robots as if they were social agents. For example, this group of infants uttered five times more often than the control group. Furthermore, when the robot rotated, it followed the robot's “gaze line” and showed some evidence of sharing attention (see, for example, Non-Patent Document 15). However, we were particularly surprised by the strength of the interactions that occurred between some infants and robots, the clear intentions of the infant's behavior, and the speed at which these interactions were deployed.

ロボットが反応してもしなくても、何人かの幼児が何回かの試験において、ほんの数秒のことであるが、能動的に「判断し」、それに従って行動するように見えたという事実であった。特に多くを語っていたのが、実験群の幼児の１人についての実験の最初の４３秒間であった。その幼児を乳児−９と呼ぶことにする（図１（Ｂ）を参照）。研究がＵＣバークレー校の人間発達研究所で行われた１９８６年７月１４日の時点で、年齢は１０ヶ月であった。この４３秒間のビデオはhttp:/mplab.ucsd.eduにおいて入手可能である。この４３秒間に、乳児−９は７回の発声行為を行い、その度毎に、続いてロボットから音と光が発せられた。実験のビデオを見た大部分の人々が、３回目または４回目の発声行為（実験開始後２５秒）までに、乳児−９が自分に対してロボットが反応しているという事実を明確に探知したことに同意している。非常に重要なことであるが、ビデオを見ると、その子供が能動的にロボットに質問し、ロボットが自分に対して反応しているのかいないのかを試験していることは極めて明白である。このことから、この論文の中心となるいくつかの興味深い疑問が生じてくる。 The fact that some toddlers seemed to actively “judge” and act accordingly, in just a few seconds in several tests, whether or not the robot responded. It was. Of particular interest was the first 43 seconds of the experiment on one infant in the experimental group. The infant will be referred to as infant-9 (see FIG. 1B). As of July 14, 1986, when the study was conducted at the Human Development Institute at UC Berkeley, the age was 10 months. This 43-second video is available at http: /mplab.ucsd.edu. During this 43 seconds, Infant-9 performed seven voices, each time followed by a sound and light from the robot. Most people who saw the video of the experiment clearly detected the fact that the infant-9 was reacting to the robot by the third or fourth vocalization (25 seconds after the start of the experiment). I agree with you. Very importantly, when you watch the video, it is very clear that the child is actively asking the robot and testing whether the robot is reacting to him or not. This raises some interesting questions that are central to this paper.

１）言語を持たない有機体にとって「質問を行う」ということは何を意味しているのか？
２）なぜ乳児−９は自分が行った方法でその発声行為を予定したのか？例えば、はるかに早い速度やはるかに遅い速度ではなぜ発声しなかったのか？
３）ロボットが反応しているとの判断を３〜４回の反応と実験開始後２０〜３０秒以内に下すことが乳児−９にとっては合理的だったのだろうか？なぜ時間や反応回数がこれ以上でもこれ以下でもなかったのであろうか？
インフォマックス制御問題は最近では知覚・カテゴリ化文献においてよく見られるようになったが、これらの文献が通常使用しているのは、最適戦略というよりはむしろ貪欲な一段階前進インフォマックスである。 1) What does "questioning" mean for an organism that does not have a language?
2) Why did Infant-9 schedule its vocalization in the way he did? For example, why did you not speak at a much faster or slower speed?
3) Was it reasonable for an infant-9 to make a decision that the robot is reacting within 3 to 4 reactions and within 20 to 30 seconds after the start of the experiment? Why wasn't the time and the number of reactions more or less than this?
The infomax control problem has recently become more common in perceptual and categorized literature, but these documents usually use a greedy one-step forward infomax rather than an optimal strategy.

現在のシステムの主要な実際的限界は、現在の社会的エージェントのモデルの簡単さにその原因がある。特に、現在のシステムはエージェントを受動的反応者としては記述するが、伝達意図をもった行動の自律的主唱者としては記述していない。この問題を処理するためのモデルの拡張は複雑なことではない。しかしながら、社会的エージェントについての改良モデルを手作りするよりは、そのようなモデルをデータから学習することに時間を費やす方が得策である。 A major practical limitation of current systems is due to the simplicity of the current social agent model. In particular, the current system describes an agent as a passive responder, but not as an autonomous advocate of action with a communication intention. Extending the model to handle this problem is not complicated. However, it is better to spend time learning such models from the data than to hand-craft improved models for social agents.

自分の応答を最適な方法で予定することに加えて、乳児−９は、シミュレーションした４３秒間全体にわたってその反応の調子と感情の質を漸進的に高めていった。その調子を社会的エージェントの存在に関する信念の変化と結び付けることにより、この表現をモデル化することは可能である。この修正はロボット・モデルと人間との間の相互作用の改善には効果的であるが、例えば、交代法のような、原則に基づいた方法により現在のモデルから生まれたものではない。 In addition to scheduling their responses in an optimal manner, Infant-9 progressively improved the tone and emotional quality of the response over the 43-second simulated period. It is possible to model this representation by linking its tone with changes in beliefs about the existence of social agents. This modification is effective in improving the interaction between the robot model and humans, but is not born from the current model in a principle-based manner, such as, for example, an alternation method.

乳児−９は目新しい社会的エージェントが反応するかどうかについての学習に関しては最適な方法で行動したが、実験に参加した幼児の大部分はそうではなかった。これらの幼児を見ていて得られる主観的感情は、幼児達が当初は状況を恐れているというものである。 Infants-9 acted in an optimal way when it came to learning whether new social agents would respond, but most of the infants who participated in the experiment did not. The subjective feelings gained from watching these infants are that they are initially afraid of the situation.

有機体が目標により動かされ、その目標に関連する情報の収集を最適化する方法でその行動を予定するという考え方に基づき、行動の組織化への一般的アプローチを提示した。伝統的・道具的学習モデルが行動の強化因子としての外部刺激の役割を強調するのに対して（食料、水、不快感、呼吸、軽い電気ショックが最も典型的なものである）、インフォマックス制御においては、刺激や反応は内在的価値を有していない。その代わりに、有機体の現在の知識状態を前提とすると、その価値は期待情報リターンに関するものである。インフォマックスは、有機体自身が動的な方法で強化価値を刺激と反応に割り当てる自己管理形式の制御であると考えることができる。外部強化因子は必要とされない。その代わりに、入手可能なデータをもっとうまく説明し、高度な情報価値のあるデータを提供すると期待される行動をもたらすために、インフォマックス・コントローラはその内部状態を修正する。 Based on the idea that an organism is driven by a goal and schedules its behavior in a way that optimizes the collection of information related to that goal, a general approach to organizing behavior is presented. Whereas traditional and instrumental learning models emphasize the role of external stimuli as behavioral enhancers (food, water, discomfort, breathing, and light electric shock are most typical), Infomax In control, stimuli and responses have no intrinsic value. Instead, given the current state of knowledge of the organism, its value relates to expected information returns. Infomax can be thought of as a self-managing form of control in which the organism itself assigns enhanced values to stimuli and responses in a dynamic manner. External enhancement factors are not required. Instead, the infomax controller modifies its internal state in order to better describe the available data and bring about actions that are expected to provide highly informational data.

１０ヶ月児における社会的偶発事象の探知において、我々は単純な社会的相互作用を理解するためにはどのようにインフォマックス・コントローラの考え方を使用できるのかを例証した。興味深いことに、この状況において、最適インフォマックス・コントローラはその年齢の幼児に見られるのと同様の交代行動を示す。すなわち、コントローラは反応を示し、その後には、まるで質問が出るのを待っているかのように、沈黙の期間が続いた。この「交代」行動はシステムに組み込まれたものではなかった。社会的相互作用において一般的な時間遅延と不確実性のレベルを前提とすると、むしろ、その行動は得られる情報を最大化するという要求から生じたものであった。それらの結果が示唆しているのは、言語を欠いているにもかかわらず、その年齢の幼児がすでに質問を行っているということである。すなわち、社会的相互作用に典型的な時間遅延と不確実性レベルを前提とすると、幼児は期待情報リターンを最大化する方法でその行動を予定するのである。これは、両親が直感的レベルでは知っているが、正式に証明するのが困難な重要なことである。 In detecting social contingencies in 10-month-old children, we illustrated how the Infomax controller concept can be used to understand simple social interactions. Interestingly, in this situation, the optimal infomax controller exhibits shift behavior similar to that found in infants of that age. That is, the controller responded, followed by a period of silence, as if waiting for questions. This “replacement” behavior was not built into the system. Given the general level of time delay and uncertainty in social interactions, rather, the behavior stems from the demand to maximize the information available. The results suggest that infants of that age are already asking questions despite the lack of language. That is, given the time delay and uncertainty levels typical of social interactions, infants schedule their actions in a way that maximizes expected information returns. This is important for parents to know at an intuitive level, but difficult to formally prove.

ここで提示したアプローチは、日常生活の状況においてリアルタイムで動作する必要があるロボットに適用した場合でも、実際にうまく機能する。このことは、偶発事象が有益で計算論的に安価な情報源であるという考え方に対する信頼性をもたらすだけではなく、幼児の脳が同種のものを定義し、探知するために偶発事象を使用している可能性が高いという考え方に対する信頼性も与える。 The approach presented here actually works well even when applied to robots that need to operate in real time in everyday life situations. This not only provides confidence in the idea that contingencies are useful and computationally inexpensive sources, but also allows the infant's brain to use contingencies to define and detect the same. It also gives us confidence in the idea that

確率と制御理論に関してインフォマックス制御は数学的基礎を有するので、原則に基づいた方法でその他の領域に拡張することができる。例えば、現在の解析をラット、神経細胞、さらには分子にさえ拡張することもできる。現在の神経活動のインフォマックス・モデルは神経細胞に受動的情報リレーとしての役割を与えている、すなわち、神経反応の役割は、受け取る情報に関して可能な限り多くの情報を伝達することにある。インフォマックス制御は、神経細胞が「質問をする」かもしれない、すなわち、ただ単にその他の神経細胞に情報を伝達するためだけではなく、その他の神経細胞に関する情報を収集するためにそのスパイクが設計されているかもしれないという興味深い可能性を検証するための枠組みを提供する。もちろん、フィードバック結合は質問に対する答えを得るためのチャンネルとして見ることができる。 Since infomax control has a mathematical basis for probability and control theory, it can be extended to other areas in a principle-based manner. For example, current analysis can be extended to rats, neurons, and even molecules. Current infomax models of neural activity provide neurons with a role as passive information relays, i.e. the role of neural responses is to convey as much information as possible about the information they receive. Infomax control may be designed to collect information about other neurons, not just to convey information to other neurons, in which neurons may “question” Provide a framework for examining interesting possibilities that might have been Of course, the feedback combination can be viewed as a channel for obtaining answers to questions.

本件発明者等は、計算論的神経科学の先駆者であるデビット・マーに触発された行動の研究への一般的アプローチを例証を示した（例えば、非特許文献６、１４参照）。 The inventors have illustrated a general approach to behavioral research inspired by David Ma, a pioneer in computational neuroscience (see, for example, Non-Patent Documents 6 and 14).

R. C. Arkin. Behavior-based Robotics. MIT Press, Cambridge, MA, 1998.R. C. Arkin. Behavior-based Robotics. MIT Press, Cambridge, MA, 1998. T. Bell and T. Sejnowski. An information-maximization approach to blindseparation and blind deconvolution. Neural Computation, 7:1129-1159,1995.T. Bell and T. Sejnowski. An information-maximization approach to blindseparation and blind deconvolution. Neural Computation, 7: 1129-1159, 1995. C. Breazeal. Designing Sociable Robots. MIT Press, Cambridge, MA,2002.28C. Breazeal. Designing Sociable Robots. MIT Press, Cambridge, MA, 2002.28 Reichle E. D., Rayner K., and A. Pollatsek. The E-Z reader model of eyemovement control in reading: comparisons to other models. Behavioral and Brain Sciences, 26:445-526, 2003.Reichle E. D., Rayner K., and A. Pollatsek. The E-Z reader model of eyemovement control in reading: comparisons to other models.Behavioral and Brain Sciences, 26: 445-526, 2003. Todorov E. and Jordan J.I. Optimal feedback control as a theory of motor coordination. Nature Neuroscience, 5:1226-1235, 2002.Todorov E. and Jordan J.I.Optimal feedback control as a theory of motor coordination.Nature Neuroscience, 5: 1226-1235, 2002. S. Edleman and L. M. Vaina. David marr. International Encyclopedia of the Social and Behavioral Sciences, 2001.S. Edleman and L. M. Vaina. David marr. International Encyclopedia of the Social and Behavioral Sciences, 2001. Tanaka H., Krakauer W., and Qian N. An optimization principle for determining movement duration. Under Review, 5, 2005.Tanaka H., Krakauer W., and Qian N. An optimization principle for determining movement duration.Under Review, 5, 2005. Denzler J. and Brown C. M. Information theoretic sensor data selection for active object recognition and state estimation. Transactions on Pattern Analysis and Machine Intelligence, 24:145-157, 2002.Denzler J. and Brown C. M. Information theoretic sensor data selection for active object recognition and state estimation.Transactions on Pattern Analysis and Machine Intelligence, 24: 145-157, 2002. Najemnik J. and Geisler W. S. Optimal eye movement strategies in visual search. Nature, 434, 2005.Najemnik J. and Geisler W. S. Optimal eye movement strategies in visual search.Nature, 434, 2005. Renninger L.and Coughlan J., P. Verghese, and J. Malik. An information maximization model of eye movements. In S. A. Solla, T. K. Leen, and K. R. Miller, editors, Advances in Neural Information Processing Systems, volume 17, pages 1121-1128. MIT Press, 2005.Renninger L. and Coughlan J., P. Verghese, and J. Malik.An information maximization model of eye movements.In SA Solla, TK Leen, and KR Miller, editors, Advances in Neural Information Processing Systems, volume 17, pages 1121 -1128. MIT Press, 2005. M. S. Lewicki. E_cient coding of natural sounds. Nature Neurosci, 5(4): 356-363, 2002.M. S. Lewicki. E_cient coding of natural sounds.Nature Neurosci, 5 (4): 356-363, 2002. R. Linsker. Self-organization in a perceptual network. Computer, 21: 105-117, 1988.R. Linsker. Self-organization in a perceptual network.Computer, 21: 105-117, 1988. Harris C. M. andWolpert D. M. Signal dependent noise determines motor planning. Nature, 394:780-784, 1998.Harris C. M. and Wolpert D. M. Signal dependent noise determines motor planning.Nature, 394: 780-784, 1998. David Marr. Vision. Freeman, New York, 1982.David Marr. Vision.Freeman, New York, 1982. J. R. Movellan and J. S. Watson. The development of gaze following as a Bayesian systems identification problem. In Proceedings of the International Conference on Development and Learning (ICDL02). IEEE, 2002.J. R. Movellan and J. S. Watson.The development of gaze following as a Bayesian systems identification problem.In Proceedings of the International Conference on Development and Learning (ICDL02) .IEEE, 2002. J. D. Nelson and J. R. Movellan. Active inference in concept induction. In T. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems, number 13, pages 45-51. MIT Press, Cambridge, Massachusetts, 2001.JD Nelson and JR Movellan. Active inference in concept induction.In T. Leen, TG Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems, number 13, pages 45-51.MIT Press, Cambridge, Massachusetts, 2001 . J. D. Nelson, J. B. Tenenbaum, and J. R. Movellan. Active inference in concept learning. In Proceedings of the 23rd Annual Conference of the Cognitive Science Society, pages 692-697. LEA, Edinburgh, Scotland, 2001.J. D. Nelson, J. B. Tenenbaum, and J. R. Movellan. Active inference in concept learning.In Proceedings of the 23rd Annual Conference of the Cognitive Science Society, pages 692-697. LEA, Edinburgh, Scotland, 2001. Jonathan Nelson, Gary Cottrell, and Javier R. Movellan. Explaining eye movements during learning as an active sampling process. In Proceedings of the second international conference on development and learning (ICDL04), The Salk Institute, San Diego, October 20, 2004.Jonathan Nelson, Gary Cottrell, and Javier R. Movellan. Explaining eye movements during learning as an active sampling process.In Proceedings of the second international conference on development and learning (ICDL04), The Salk Institute, San Diego, October 20, 2004. Movellan J. R. and J. S. Watson. Perception of directional attention. In Infant Behavior and Development: Abstracts of the 6th International Conference on Infant Studies, NJ, 1987. Ablex.Movellan J. R. and J. S. Watson. Perception of directional attention. In Infant Behavior and Development: Abstracts of the 6th International Conference on Infant Studies, NJ, 1987. Ablex. R. P. N. Rao, B. A. Olshausen, and M. S. Lewicki. Probabilistic ModelsR. P. N. Rao, B. A. Olshausen, and M. S. Lewicki. Probabilistic Models

本件発明者等は、リアルタイム学習とリアルタイム社会的相互作用の問題を結びつけ、系統的に説明するために、能動的リアルタイム学習を理解するための表現形式として確率的最適制御の理論を探求した。「リアルタイム」という言葉は、学習中に現れる行動に対する時間圧力を強調するためのものである。「能動的」という言葉は、学習エージェントの行動が目的を有するという事実、すなわち、最適化問題に対する解決策として考察された場合にその正当性が明らかになるという事実を指している。 The present inventors have explored the theory of probabilistic optimal control as an expression format for understanding active real-time learning in order to link and systematically explain the problems of real-time learning and real-time social interaction. The term “real time” is intended to emphasize the time pressure on the behavior that appears during learning. The term “active” refers to the fact that the behavior of the learning agent has a purpose, ie the fact that its legitimacy becomes apparent when considered as a solution to the optimization problem.

制御理論は、下部分野として強化学習を含むが、伝統的には物理的目的（例えば、移動物体の追跡、ポールの平衡維持、変動負荷の下での速度の維持）の最適化に適用されてきた。この観点から、学習は試行錯誤によりコントローラを発達させる過程であると見られている。本論文においては、筆者は別の観点を採用し、学習過程自体を制御問題として探求する。知覚のインフォマックス・モデルとのその直接的な関係を強調するために、この考えをインフォマックス制御と呼ぶ（例えば、非特許文献２、１１、１２、２０参照）。 Control theory includes reinforcement learning as a sub-field, but has traditionally been applied to optimize physical objectives (eg tracking moving objects, maintaining pole equilibrium, maintaining speed under variable loads). It was. From this point of view, learning is seen as a process of developing a controller by trial and error. In this paper, the author takes a different perspective and explores the learning process itself as a control problem. In order to emphasize its direct relationship with the perceptual infomax model, this idea is called infomax control (see, for example, Non-Patent Documents 2, 11, 12, and 20).

これは制御理論の自然な適用である（すなわち、「学習」をコントローラの目標とする）と同時に、その可能性が文献においてはまだ十分には指摘されていない観点を学習の科学にもたらすものでもある。 This is a natural application of control theory (ie, “learning” is the goal of the controller) and at the same time brings a perspective to learning science that has not been fully pointed out in the literature. is there.

「強化学習」という言葉の使用は、強化因子が目標事象（例えば、バックギャモンのゲームでの勝利、ヘリコプタの墜落防止、食料の獲得、電気ショックの回避）であるべきだと一部の人々に信じさせるという不幸な結果をもたらしてきた。それに代わって、インフォマックスにおいて使用する「強化」信号は主観的信念に関連付けられている。学習エージェントに対して明示的に正しいとか間違っているとか伝える必要はなく、その代わりに、「強化」とは、強い信念を作り上げるエージェントの自己能力である。これが事実ベイズ的アプローチの一般的特性であり、このアプローチは主観的解釈に修正可能である。このアプローチにおいては、確率理論の規範的制約に関する信念の更新の基礎を教え込むことにより、唯我論を回避している。現在では、制御理論が世界のモデルを必要とし、強化学習が世界のモデルを必要としないという事実により、制御理論と強化学習を区別するのが一般的傾向である。しかしながら、ベイズ的アプローチにおいては、このような区別は明確ではない。もちろん、ベイズ的伝統は、モデルを欠いていることの意味が単にどのモデルに説得力があり、どのモデルに説得力がないかについての漠然とした事前の信念を有していることにすぎないとの主張に基づいて確立されたものである。要するに、ベイズのアプローチは「モデルの欠如」についてのモデルを有しているのである。これにより。ベイズ的アプローチは独特の自己管理性格を制御している。 The use of the word “reinforcement learning” is for some people that the reinforcement factor should be the target event (eg victory in backgammon games, helicopter crash prevention, food acquisition, avoidance of electric shock) It has had the unfortunate result of believing. Instead, the “enhanced” signal used in Infomax is associated with subjective beliefs. There is no need to explicitly tell the learning agent whether it is right or wrong; instead, “strengthening” is the agent's own ability to build strong beliefs. This is a general property of the fact-Bayesian approach, which can be modified to a subjective interpretation. This approach avoids justification by teaching the basis for updating beliefs about normative constraints in probability theory. At present, it is a general trend to distinguish between control theory and reinforcement learning due to the fact that control theory requires a world model and reinforcement learning does not require a world model. However, this distinction is not clear in the Bayesian approach. Of course, the Bayesian tradition simply means that the lack of models has vague prior beliefs about which models are persuasive and which models are not persuasive. It was established on the basis of In short, Bayesian approach has a model for “lack of model”. By this. The Bayesian approach controls a unique self-management character.

本発明の目的は、単純なリアルタイム社会的相互作用の発達を理解するためにインフォマックス制御を使用することにある。「リアルタイム社会的相互作用」とは、対面社会的環境における時間圧力の下での行動信号の迅速な交換を意味する。この事例におけるコミュニケーション・チャンネルは数分の１秒から数秒のフィードバック遅延を有する。社会的エージェントが自律的で、予測が困難であるという事実から、行動の結果に関する不確実性のレベルとそのような結果のタイミングに関する不確実性は重要である。 The object of the present invention is to use infomax control to understand the development of simple real-time social interactions. “Real-time social interaction” means the rapid exchange of action signals under time pressure in a face-to-face social environment. The communication channel in this case has a feedback delay of a fraction of a second to a few seconds. Due to the fact that social agents are autonomous and difficult to predict, the level of uncertainty regarding the outcome of behavior and the uncertainty regarding the timing of such outcome are important.

このことから、本発明の領域は、遅延が１０分の１秒の単位で測定され、不確実性が無視可能である伝統的運動制御の領域や、長いフィードバック遅延と無視可能な時間制約を有するその他の形態の相互作用（例えば、物理的文字または電子メールを通じてのコミュニケーション）の領域とは異なるものになる。ただし、社会的相互作用の領域の細目は伝統的運動制御の場合とは異なるが、その基礎となる数学的形式は同じであるということがある。 From this, the domain of the present invention has a traditional motor control domain where the delay is measured in units of 1/10 seconds and the uncertainty is negligible, and a long feedback delay and negligible time constraint Other forms of interaction (eg, communication through physical characters or email) will be different. However, the specifics of the area of social interaction are different from those of traditional motor control, but the underlying mathematical form may be the same.

本発明の考え方は、１９８５年に本件発明者等がＵＣバークレー校において行った実験における１０ヶ月児の衝撃的な行動を理解するための研究から生まれたものである。 The idea of the present invention was born from a study for understanding the shocking behavior of a 10-month-old child in an experiment conducted by the present inventors in 1985 at UC Berkeley.

実験の目的は、どのように子供達が社会的エージェントの因果構造を学習するのかを理解することにあった。この趣旨で、子供達はロボットと相互作用し、ロボットに対して反応する場合もあったし、反応しない場合もあった。本論文において筆者が着目した子供は、人間行動を理解する上で不可欠であると筆者が考えるいくつかの特性を体現していたが、これらの特性は、当時の学習モデルが見逃していたものであった。
（１）子供達は我々の予想よりもずっと早くロボットと適切に相互作用することを学習した。
（２）ロボットが反応を示そうと示すまいと、まるで非言語的方法で質問をしているかのように、子供達は明らかに能動的であった。当時、筆者にはこの学習行動を理解するための表現形式がなかった。バック・プロパゲーションのような結合説的アプローチはあまりに速度が遅すぎ、受動的にすぎ、伝統的ＡＩアプローチは問題の不確実性とリアルタイム制約を取り扱っていなかったのである。 The purpose of the experiment was to understand how children learn the causal structure of social agents. To this effect, children interact with the robot and may or may not react to the robot. The children I focused on in this paper embody some characteristics that I think are indispensable for understanding human behavior, but these characteristics were missed by the learning model at the time. there were.
(1) The children learned to interact properly with the robot much earlier than we expected.
(2) The children were clearly active as if they were asking questions in a non-verbal way, if the robot did not respond. At that time, the author did not have an expression format to understand this learning behavior. Combined theoretic approaches such as back-propagation are too slow and passive, and traditional AI approaches have not dealt with problem uncertainty and real-time constraints.

制御理論に関する本件発明者等の関心は、人々と相互作用するように設計されたロボットの発達に関する研究中に生まれたものである。このようなロボットは、タイミングと、生体系との相互作用に典型的である不確実性の動的処理とに関連する固有の問題に直面していた。本件発明者等は確率最適制御の理論がこれらの問題を処理するための理想的表現形式であると確信するようになり、その過程でジョン・ワトソンと筆者がかつて１９８５年に観察した行動についての優れた説明になることに気付いたのである。 The inventors' interest in control theory was born during research on the development of robots designed to interact with people. Such robots faced the inherent problems associated with timing and the dynamic processing of uncertainties typical of interactions with biological systems. The inventors have become convinced that the theory of stochastic optimal control is the ideal form of expression to handle these problems, and in the process John Watson and the author have observed what they once observed in 1985. I realized it would be an excellent explanation.

本件発明者等はインフォマックス制御の考え方を社会的相互作用の発達との関連で提唱しているが、このアプローチは普遍的なものであり、潜在的には非常に幅広い多様な問題に適用可能である。特に興味深いのは、この考え方が、非言語的有機体に普遍化することが可能な公式の定義を「質問」とは何かということに対して与えるという事実である。 The inventors have proposed the idea of infomax control in relation to the development of social interaction, but this approach is universal and can potentially be applied to a very wide variety of problems. It is. Of particular interest is the fact that this idea gives a formal definition of what a “question” is, which can be universalized into non-linguistic organisms.

本発明の更に他の目的、本発明によって得られる具体的な利点は、以下に説明される実施の形態の説明から一層明らかにされる。 Other objects of the present invention and specific advantages obtained by the present invention will become more apparent from the description of embodiments described below.

本発明に係る相互作用装置は、相互作用対象に関する仮説と自己入力／出力との間で定義される情報の期待を最大化するために自己コントローラを設定することを特徴とする。 The interaction device according to the present invention is characterized in that the self-controller is set in order to maximize the expectation of information defined between the hypothesis about the interaction target and the self-input / output.

また、本発明に係る相互作用装置は、入出力情報を元に相互作用対象にとってこちらの存在に対する期待獲得情報量が最大となるタイミングで行動出力を行う制御手段を備えることを特徴とする。 In addition, the interaction device according to the present invention includes a control unit that outputs an action at a timing when the expected acquired information amount for the presence of the interaction target is maximized for the interaction target based on the input / output information.

本発明では、発達心理学におけるcontingencyという概念の抽出をベイズ推定の枠組で実装し、これによりインタラクション対象の人間が居るか居ないかという仮説に対する確信度が、確率値の形で時々刻々得ることができ、単純な入出力センサのみを用いて，外界に人間が居るか居ないかを判断できるロボット装置を実現することができる。 In the present invention, the extraction of the concept of contingency in developmental psychology is implemented in the framework of Bayesian estimation, so that the certainty of the hypothesis whether there is a person to interact with or not is obtained from time to time in the form of probability values. Therefore, it is possible to realize a robot apparatus that can determine whether or not there is a person in the outside world using only a simple input / output sensor.

また、本発明では、インタラクションのやり取りを通じて時々刻々応答特性が変化していき、また、そのダイナミクスが人間のそれと近いものを示す。よってより自然な応答特性を示すものとなり、特に長期的なインタラクションという応用場面に有効性を発揮する。 Further, in the present invention, the response characteristic changes from moment to moment through interaction, and the dynamics thereof are similar to those of humans. Therefore, it shows more natural response characteristics and is particularly effective in the application scene of long-term interaction.

以下、本発明の実施の形態について、図面を参照して詳細に説明する。なお、本発明は以下の例に限定されるものではなく、本発明の要旨を逸脱しない範囲で、任意に変更可能であることは言うまでもない。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Needless to say, the present invention is not limited to the following examples, and can be arbitrarily changed without departing from the gist of the present invention.

２社会的ロボット用アークテクチャを目指して
この明細書において、特に明記しない限りは、大文字はランダム変数に、小文字はランダム変数が取る固有の値に、ギリシャ文字は固定パラメータに使用する。ランダム変数が定義される確率空間（Ω，Ｆ，Ｐ）の特性は内在するものとする。状況から明らかになる場合は、確率関数はその引数により識別する。例えば、ｐ（ｘ，ｙ）は、同時確率質量または同時確率密度について、ランダム変数Ｘが固有の値ｘを取り、ランダム変数Ｙが固有の値ｙを取ることの省略表現である。数列、例えば、Ｘ_１：ｔ＝^ｄｅｆ｛Ｘ_１・・・Ｘ_ｔ｝を示すために、添え字コロンを使用する。作業は離散時間確率過程により行うことになる。パラメータ△ｔ∈Ｒはサンプリング期間、すなわち、時間段階の間の時間（単位：秒）を表す。固有の値の△ｔを選ぶということは、基礎的連続時間過程に関する関連情報が０．５／△ｔヘルツよりも低い周波数帯域にあることを示していることと等価である。記号〜はランダム変数の分布を示す。例えば、Ｘ〜ポアソン（λ）はＸがパラメータλのポアソン分布を有することを示す。表記Ｙ∈σ｛Ｘ｝は、ランダム変数Ｙがランダム変数Ｘにより誘導されるシグマ代数により測定可能であることを意味している。直感的に、このことはＸがＹの値を決定するために必要な全ての値を含んでいることを意味している。Ｅは期待値に使用し、Ｖａｒは共分散行列に使用する。δ（・，・）はクロネッカーのデルタ関数に使用し、その２つの引数は等しい場合は、値は１となり、そうでない場合は、値は０となる。Ｎ＝｛０，１，２，・・・｝は自然数を表し、Ｒは実数を表す。
［２５］ロボット・アークテクチャを２つの確率過程、すなわち、知覚過程Ｙ＝｛Ｙ_１，Ｙ_２，・・・｝と運動過程Ｕ＝｛Ｕ_１，Ｕ_２，・・・｝の間のマッピングであるとするものとする。時間ｔにおいては、ロボットはＹ_１：ｔとＵ_{１，ｔ−１}における情報を入手することができ、そのような情報、すなわち、Ｕ_１∈σ｛Ｙ_１：ｔ，Ｕ_{１，ｔ−１}｝に基づき、運動コマンドを作成しなければならない。実際のところは、このようなマッピングは、過去の歴史に関する関連情報を維持する機能的統計値Ｓ_ｔ、すなわち、 2 Aiming at an architecture for social robots In this specification, unless otherwise specified, uppercase letters are used for random variables, lowercase letters are used for specific values taken by random variables, and Greek letters are used for fixed parameters. It is assumed that the characteristic of the probability space (Ω, F, P) in which the random variable is defined is inherent. If it becomes clear from the situation, the probability function is identified by its argument. For example, p (x, y) is an abbreviation for the random probability X taking the unique value x and the random variable Y taking the unique value y for the joint probability mass or joint probability density. A subscript colon is used to denote a sequence of numbers, eg, X _{1: t} = ^def {X ₁ ... X _t }. The work is done by a discrete time stochastic process. The parameter ΔtεR represents a sampling period, that is, a time (unit: second) between time stages. Choosing a unique value of Δt is equivalent to indicating that the relevant information about the basic continuous time process is in a frequency band lower than 0.5 / Δt hertz. The symbol ~ indicates the distribution of random variables. For example, X to Poisson (λ) indicates that X has a Poisson distribution with the parameter λ. The notation Yεσ {X} means that the random variable Y can be measured by the sigma algebra induced by the random variable X. Intuitively, this means that X contains all the values necessary to determine the value of Y. E is used for the expected value and Var is used for the covariance matrix. δ (·, ·) is used in the Kronecker delta function, where the two arguments are equal, the value is 1; otherwise, the value is 0. N = {0, 1, 2,...} Represents a natural number, and R represents a real number.
[25] Robot architecture is mapped by mapping between two stochastic processes, namely perceptual process Y = {Y ₁ , Y ₂ ,...} And movement process U = {U ₁ , U ₂ ,. Suppose there is. At time t, the robot can obtain information on Y ₁ : t and U _{1, t−1} , such information, namely U ₁ ∈σ {Y _{1: t} , U _{1, t−1.} }, An exercise command must be created. The fact that such mapping, functional statistics S _t to maintain relevant information about past _history, i.e.,

に依存している。熟考アークテクチャにおいては、Ｓ_ｔは世界の表現と呼ばれることになり、大量の利用可能資源がこのような世界の表現の維持に割り当てられる。反応型アークテクチャは、世界は常に変化しており、従って、過去の歴史に関心を引くものはほとんどないという考えを強調する。その最も純粋な形式においては、反応型アークテクチャは以下のようになる。 Depends on. In pondering architectural, S _t will be referred to as the world of representation, a large amount of the available resources are allocated to the maintenance of such a world of expression. Reactive architecture emphasizes the idea that the world is constantly changing and therefore there is little to draw attention to past history. In its purest form, the reactive architecture is:

ここでは、単独のバイナリセンサ（例えば、音声探知機）と単独のバイナリアクチュエータを備えた必要最小限のロボットの視点から、この問題を調査した。プレーヤは２人、すなわち、（１）世話をする人の役割を演じる社会的エージェントと、（２）幼児の役割を演じるロボットである。エージェントとロボットは、ランダムな背景活動を有するかもしれない環境の中にいる。ロボットの役割は、反応する社会的エージェントを可能な限り迅速かつ正確に発見することにある。 Here, this problem was investigated from the viewpoint of the minimum necessary robot equipped with a single binary sensor (for example, a sound detector) and a single binary actuator. There are two players: (1) a social agent that plays the role of the caretaker, and (2) a robot that plays the role of the infant. Agents and robots are in an environment that may have random background activity. The role of the robot is to find the social agent that reacts as quickly and accurately as possible.

ここで、例えば図２に示すように、バイナリセンサ入力Ｙ_ｔに応じてバイナリアクチュエータ出力Ｕ_ｔを生成する必要最小限の機能だけを備えた社会的ロボット１０は、最適化エンジン１１、リアルタイム・コントローラ１２、意図マネージャ１３にて構成される。
［２７］ロボットのアクチュエータの活動は２進ランダム過程｛Ｕ_ｔ｝により表される。変数Ｕ_ｔの値は、ロボットのアクチュエータが作動中の場合は１、それ以外の場合は０である。反応する社会的エージェントの存在又は不在はランダム変数Ｈにより表される。｛Ｈ＝０｝、すなわち、反応するエージェントの不在を「帰無仮説」と呼び、｛Ｈ＝１｝、すなわち、反応するエージェントの存在を「対立仮説」と呼ぶ。パラメータπは対立仮説の事前確率、すなわち、知覚情報の収集前における社会的エージェントの存在に関するロボットの当初信念を表す。 Here, as shown in FIG. 2, for example, the social robot 10 having only a minimum necessary function for generating the binary actuator output U _t in response to the binary sensor input Y _t includes the optimization engine 11, the real-time controller, and the like. 12 and the intention manager 13.
[27] Robot actuator activity is represented by a binary random process {U _t }. The value of the variable U _t is 1 when the actuator of the robot is operating, and 0 otherwise. The presence or absence of a reacting social agent is represented by a random variable H. {H = 0}, that is, the absence of a reacting agent is called a “null hypothesis”, and {H = 1}, that is, the presence of a reacting agent is called an “alternative hypothesis”. The parameter π represents the prior probability of the alternative hypothesis, that is, the robot's initial belief regarding the presence of social agents before the collection of perceptual information.

２．１社会的エージェントのモデル化
極度に単純化されてはいるが、下記のモデルは、数学的に扱いやすいという利点と、関心の本質的２つの特性、すなわち、（１）エージェントが異なると、反応のレベルも異なること、（２）社会的エージェントは相当の遅延とこれらの遅延における相当のレベルの不確実性を伴い反応すること、を維持するという利点を有する。この社会的エージェントのモデルは、リアルタイム社会的相互作用の問題を構成するために、確率最適制御の考え方をどのように使用できるのかを説明するのに十分なだけの豊かさを有している。 2.1 Modeling social agents Although extremely simplified, the following model has the advantage of being mathematically easy to handle and the two essential characteristics of interest: (1) The level of response is also different, and (2) social agents have the advantage of maintaining considerable delays and reacting with a significant level of uncertainty in these delays. This social agent model is rich enough to explain how the idea of stochastic optimal control can be used to construct real-time social interaction problems.

社会的エージェントの行動は２つの補助過程、すなわち、タイマ｛Ｚ_ｔ｝とインジケータ｛Ｉ_ｔ｝に依存するものとする。タイマにより、社会的相互作用に典型的な時間遅延と一時的不確実性をモデル化することが可能になる。タイマは｛０，…，τ_２ ^ａ｝において値を取り、ここに、τ_２ ^ａ∈Ｎはモデルのパラメータであり、その意味については以下で説明する。タイマはτ_２ ^ａまで最後のロボットの行動以降の時間段階の数を追跡する(図３を参照)、すなわち、以下のようになる。 It is assumed that the behavior of the social agent depends on two auxiliary processes: a timer {Z _t } and an indicator {I _t }. Timers allow modeling of time delays and temporal uncertainties typical of social interactions. The timer takes a value at {0,..., Τ ₂ ^a }, where τ ₂ ^a εN is a model parameter, the meaning of which will be described below. The timer keeps track of the number of time steps since the last robot action until τ ₂ ^a (see FIG. 3), ie:

インジケータ・ベクトルＩ_ｔ＝（Ｉ_１,ｔ，Ｉ_２,ｔ，Ｉ_３,ｔ）^Ｔは、時間ｔが以下のカテゴリ、すなわち、（１）Ｉ_１,ｔにより示される「自己期間」、（２）Ｉ_２,ｔにより示される「エージェント己期間」、（３）Ｉ_３,ｔにより示される「背景期間」、に属するかどうかを示す３つの２進変数からなる。これらの３つの期間の意味については以下で説明する。 The indicator vector I _t = (I _{1, t} , I _{2, t} , I _{3, t} ) ^T is the category in which time t is the following: (1) the “self-period” indicated by I _{1, t} ( 2) It consists of three binary variables indicating whether it belongs to the “agent self period” indicated by I _{2, t} or (3) the “background period” indicated by I _{3, t} . The meaning of these three periods will be described below.

社会的エージェントの反応時間はパラメータ０≦τ_１ ^ａ≦τ_２ ^ａの範囲内にある。すなわち、ロボットからの行動に反応するためには、エージェントはτ_１ ^ａ〜τ_２ ^ａの範囲のあらゆる時間段階を必要とする。インジケータ過程｛Ｉ_２,ｔ｝により指定される「エージェント期間」は、もしエージェントが存在することができるとすれば、前のロボットの行動に対するエージェントの反応が可能である期間である。従って、以下のようになる。 The reaction time of the social agent is in the range of parameters 0 ≦ τ ₁ ^a ≦ τ ₂ ^a . That is, in order to react to the action from the robot, the agent needs every time step in the range of τ ₁ ^{a to} τ ₂ ^a . The “agent period” specified by the indicator process {I _{2, t} } is a period in which the agent can react to the previous robot action if the agent can exist. Therefore, it becomes as follows.

エージェント期間中には、ロボットのセンサは、速度Ｒ_２を有するポアソン過程｛Ｄ_２,ｔ｝により駆動される。Ｒ_２の分布は、反応するエージェントが以下に定められるような形で存在しているかどうかにより決まる。 During the agent period, the robot sensor is driven by a Poisson process {D _{2, t} } having a velocity R ₂ . The distribution of R ₂ depends on whether the reacting agent is present in the form defined below.

ここで、図３は、タイマとインジケータ変数との力学のグラフ表示であり、遅延パラメータがτ_１ ^Ｓ＝１，τ_２ ^Ｓ＝２，τ_１ ^ａ＝４，τ_２ ^ａ＝５であった。 Here, FIG. 3 is a graphical representation of the dynamics of the timer and the indicator variable, and the delay parameters were τ ₁ ^S = 1, τ ₂ ^S = 2, τ ₁ ^a = 4, τ ₂ ^a = 5.

２．２自己フィードバック過程と背景過程のモデル化
ロボット・センサがロボット・アクチュエータに反応することを可能にし、例えば、ロボットは自分の発声を聞くことができ、さらには、この自己フィードバック・ループにおける遅延と不確実性を考慮に入れる。特に、自己フィードバック反応時間の分布はパラメータτ_１ ^Ｓ≦τ_２ ^Ｓについて均一であるものとし、τ_２ ^Ｓ＜τ_１ ^ａである。従って、自己フィードバック期間についてのインジケータ変数は以下のように定義される。 2.2 Modeling the self-feedback process and background process Allows the robot sensor to react to the robot actuator, for example, the robot can hear its own utterance, and further delays in this self-feedback loop And taking into account uncertainty. In particular, the distribution of the self-feedback reaction time is assumed to be uniform with respect to the parameter τ ₁ ^S ≦ τ ₂ ^S , and τ ₂ ^S <τ ₁ ^a . Thus, the indicator variable for the self-feedback period is defined as follows:

自己期間中には、センサの作動は、速度Ｒ_１を有するポアソン過程｛Ｄ_１,ｔ｝により促進される。 During the self period, the operation of the sensor is facilitated by a Poisson process {D _{1, t} } with a velocity R ₁ .

背景過程に関しては、速度Ｒ_３を有するポアソン過程｛Ｄ_３,ｔ｝としてモデル化する。背景過程は、自己フィードバックによるものでもなく、ロボットの行動に対する社会的エージェントの反応によるものでもないセンサの活動を促進する。背景活動が、特に、ロボットに反応しない外部の社会的エージェントからの行動を含むことができることに注目されたい（例えば、２人の社会的エージェントが相互に会話し、それにより、ロボットの音声センサを作動させることがある）。背景速度Ｒ３には、状況に応じた背景活動の可変性を反映するパラメータβ_３，１，β_３，２を有する事前ベータ分布を与える。β_３，１＝β_３，２＝１である場合は、全ての反応速度が先験的に等しく可能であること、すなわち、 The background process is modeled as a Poisson process {D _{3, t} } with velocity R ₃ . The background process promotes sensor activity that is neither due to self-feedback nor to the response of social agents to robot behavior. Note that background activities can include actions from external social agents that are not particularly responsive to the robot (for example, two social agents talk to each other, thereby May be activated). The background velocity R3 is given a prior beta distribution having parameters β _3,1 , β _3,2 reflecting the variability of background activity according to the situation. If β _3,1 = β _3,2 = 1, all reaction rates are a priori equally possible, ie

を反映して、分布には情報価値がない。 Reflecting, the distribution has no information value.

背景インジケータは、社会的エージェントからの自己フィードバックまたは反応行動が起こらないかもしれない期間を追跡する、すなわち、以下のようになる。 The background indicator tracks the period during which self-feedback or reaction behavior from social agents may not occur, i.e .:

２．３ロボット・センサのモデル化
センサの活動は交換ポアソン過程である。自己フィードバック期間中には、ポアソン過程｛Ｄ_１,ｔ｝により促進され、エージェント期間中には、｛Ｄ_２,ｔ｝により促進され、背景期間中には、｛Ｄ_３,ｔ｝により促進される、すなわち、以下のようになる。 2.3 Robot Sensor Modeling Sensor activity is an exchange Poisson process. Promoted by Poisson process {D _{1, t} } during self-feedback period, promoted by {D _{2, t} } during agent period, and promoted by {D _{3, t} } during background period. That is, it is as follows.

さらに、エージェント期間中の反応速度Ｒ２の分布を指定する必要がある。もしエージェントが存在する場合は、すなわち、Ｈ＝１の場合は、Ｒ_２をＲ_１とＲ_３から独立させ、Ｒ_２には、状況に応じた背景活動の可変性を反映するパラメータβ_２，１，β_２，２を有する事前ベータ分布を与える。β_２、１＝β_２、２＝１である場合は、事前分布には情報価値がない（白紙状態アプローチ）。もしエージェントが存在しない場合は、すなわち、Ｈ＝０の場合は、エージェント期間中の反応速度と背景期間中の反応速度は同じ、すなわち、Ｒ_２＝Ｒ_３である。筆者の考えでは、それはパラメータとしてベータ１１、ベータ１２を有するベータとなることと、これらのベータはモデルの行動に何の効果も有しておらず、従って、パラメータの指定は行わないということを言う必要がある。 Furthermore, it is necessary to specify the distribution of the reaction rate R2 during the agent period. If an agent is present, that is, if H = 1, then R ₂ is made independent of R ₁ and R ₃ , and R ₂ is a parameter β _2, which reflects the variability of background activity according to the situation _. Give a prior beta distribution with ₁ , β _2,2 . If β ₂ , 1 = β ₂ , 2 = 1, the prior distribution has no information value (blank state approach). If there is no agent, that is, if H = 0, the reaction rate during the agent period and the reaction rate during the background period are the same, ie, R ₂ = R ₃ . In my view, it is a beta with beta 11 and beta 12 as parameters, and that these betas have no effect on the behavior of the model, and therefore no parameters are specified. I need to say.

２．４補助過程
自己期間、エージェント期間、背景期間中の時間ｔまでのセンサの活動とその欠如を記録するためには、過程｛Ｏ_ｔ，Ｑ_ｔ｝を使用することになる。特に、ｔ＝１，２…の場合は、以下のようになる。 2.4 Auxiliary process The process {O _t , Q _t } will be used to record the sensor activity and its absence up to time t during the self period, the agent period and the background period. In particular, when t = 1, 2,...

２．５確率的制約
付録Ｉは、モデルを指定するパラメータ、ランダム変数、及び確率過程の要約を含んでいる。 2.5 Stochastic constraints Appendix I contains a summary of the parameters, random variables, and stochastic processes that specify the model.

図４は、モデルに含まれた異なる変数の結合分布におけるマルコフ形制約を示したものである。変数Ｘから変数Ｙへの矢印は、ＸがＹの「親」であることを示している。ランダム変数の確率は、他の全ての変数が親変数であると仮定すると、それらの変数から条件付きで独立している。点線部分は観察不能変数を示し、実線部分は観察可能変数を示す。 FIG. 4 shows a Markov constraint in the joint distribution of different variables included in the model. The arrow from variable X to variable Y indicates that X is the “parent” of Y. The probability of a random variable is conditionally independent from those variables, assuming all other variables are parent variables. A dotted line portion indicates an unobservable variable, and a solid line portion indicates an observable variable.

ここで、図４に示した生成モデルにおいて、コントローラＣ_ｔ＋１は全ての観察情報を時間ｔまでに行動Ｕ_ｔ＋１にマッピングする。行動の効果は反応するエージェントＨの存在又は不在とＺ_ｔにより決められたタイミングとに依存する。インフォマックス・コントローラは関心の陰の変数、例えば、Ｈの値に関する情報リターンを最大化する。 Here, in the generation model shown in FIG. 4, the controller C _{t + 1} maps all observation information to the action U _{t + 1} by time t. The effect of action depends on the timing which is determined by the presence or absence and Z _t agents H to react. The infomax controller maximizes the information return for the value of the implicit variable of interest, eg, H.

３発達と学習。推論と制御
ここでは、「発達」とは、社会的相互作用の根底にある因果構造を発見する問題、すなわち、図４に示したような種類のモデルを発見する問題のことを言う。これは、大量のデータを何ヶ月または何年にもわたって収集することが要求されるかもしれない困難な問題である。ここでは、「学習」とは、偶発事象を発見する問題、すなわち、あるモデルの観察不能変数に関する推論を行う問題のことを言う。これは一般的にはモデル発達の場合よりも必要なデータが少なく、数秒、数分または数時間以内で行われることもある過程である。 3 Development and learning. Inference and Control Here, “development” refers to the problem of finding the causal structure underlying social interaction, that is, the problem of finding the type of model shown in FIG. This is a difficult problem that may require collecting large amounts of data over months or years. Here, “learning” refers to a problem of finding an incidental event, that is, a problem of making an inference regarding an unobservable variable of a certain model. This is generally a process that requires less data than model development and may occur within seconds, minutes or even hours.

開発と学習は２つの基本的過程、すなわち、推論と制御に依存している。推論とは、原則に基づいた形で事前データをセンサ・データと結び付ける問題のことを言う。制御とは、有機体の目標を達成するために行動をリアルタイムで予定する問題のことを言う。 Development and learning depend on two basic processes: inference and control. Inference refers to the problem of associating prior data with sensor data in a principled manner. Control refers to the problem of scheduling actions in real time to achieve an organism's goals.

３．１発達
実際には、これまでに発達させてきたモデルは、世界との相互作用の際には、ロボットが２つの「因果クラスタ」に遭遇するかもしれないということをただ単に述べているにすぎない(図５の（Ａ），（Ｂ）を参照)。 3.1 Development In fact, the models we have developed so far simply state that robots may encounter two “causal clusters” when interacting with the world. (See FIGS. 5A and 5B).

ここで、図５の（Ａ），（Ｂ）は、モデルにより作られる２つの偶発事象クラスタを示す図である。変数Ｈは、２つの変数のどちらが現在の状況において活動中であるかを示している。図５の（Ａ）は偶発事象クラスタ１「反応するエージェント不在」を示し、また、図５の（Ｂ）は偶発事象クラスタ２「反応するエージェント存在」を示している。 Here, (A) and (B) in FIG. 5 are diagrams showing two contingent event clusters created by the model. Variable H indicates which of the two variables is active in the current situation. FIG. 5A shows the contingency event cluster 1 “reactive agent absence”, and FIG. 5B shows the contingency event cluster 2 “reactive agent presence”.

[クラスタ１]において、ある行動に続く期間[τ_１ ^ｓ，τ_ｓ ^２]中に、センサ活動が背景活動に関して変化する傾向がある場合。これは自己フィードバックの効果によるものである。 In [Cluster 1], sensor activity tends to change with respect to background activity during the period [τ ₁ ^s , τ _s ² ] following a certain action. This is due to the effect of self-feedback.

[クラスタ２]において、[τ_１ ^ｓ，τ_２ ^ｓ]中に、センサ活動が変化する傾向があるが、しかし、ある行動に続く期間[τ_１ ^ａ，τ_２ ^ａ]中にも変化する場合、活動の第２の変化は反応する社会的エージェントの存在によるものである。 In [Cluster 2], sensor activity tends to change during [τ ₁ ^s , τ ₂ ^s ], but also changes during the period [τ ₁ ^a , τ ₂ ^a ] following a certain action The second change in activity is due to the presence of responding social agents.

社会ロボット工学における非常に先駆的なアーキテクチャは便利なことに発達心理学からの心の理論アプローチに依存している（例えば、非特許文献３、２２参照）。これらのアプローチが強調しているのは、幼児が人間や有意エージェントを相手にするための高レベルの知識モジュールを生まれながらに備えているという考え方である。一方、ここで提唱しているロボット・アーキテクチャは明確なラベル又は概念理論を使用しない。 A very pioneering architecture in social robotics conveniently relies on mind theory approaches from developmental psychology (see, eg, Non-Patent Documents 3 and 22). The emphasis of these approaches is the idea that young children are born with a high-level knowledge module for humans and significant agents. On the other hand, the proposed robot architecture does not use a clear label or conceptual theory.

以上の説明では、因果モデルを手書きで発達させたが、これらの因果クラスタの発見のために現在の機械学習法を使用することもできる。心の理論アプローチが主張しているように、発達中には、これらのクラスタは、言葉により簡単に記述可能なあらゆる概念には対応しないかもしれない。我々にとっては、ロボットが上記のタイプの因果クラスタの存在を発見し、世界の中で作動する際にそのようなクラスタが役に立つということを実感するだけで十分である。これは現在の機械学習技術の範囲内で十分可能なことである。 In the above description, causal models have been developed by hand, but current machine learning methods can also be used to find these causal clusters. As the theory approach of mind argues, during development, these clusters may not correspond to any concept that can be easily described in words. For us, it is enough to discover the existence of the above types of causal clusters and realize that such clusters are useful when operating in the world. This is well possible within the current machine learning technology.

３．２学習：推論
ここでは、ロボットはすでに因果モデルを発達させているものと仮定し、どのようにしてある一連のセンサ活動ｙ１：ｔと行動ｕ１：ｔに基づき社会的エージェントの存在または不在について判断するかということに焦点を合わせる。（ｙ_１：ｔ，ｕ_１：ｔ，ｏ_ｔ，ｑ_ｔ，ｚ_ｔ）は（Ｙ_１：ｔ，Ｕ_１：ｔ，Ｏ_ｔ，Ｑ_ｔ，Ｚ_ｔ）からの任意標本であるとする。すると、 3.2 Learning: Reasoning Here, it is assumed that the robot has already developed a causal model and how social agents are present or absent based on a series of sensor activities y1: t and actions u1: t Focus on whether to judge. _{_{_{(Y 1: t, u 1}}} : t, o t, q t, z t) is assumed to be any sample from _{_{_{(Y 1: t, O t}}} , Q t, Z t: t, U 1). Then

となる。 It becomes.

速度変数Ｒ_１、Ｒ_２、Ｒ_３が事前分布の下では独立していることに注目されたい。さらに、Ｈ＝１の場合は、これらの変数は時間の非交差集合においてセンサに影響を与える。従って、速度変数は事後分布の下でも独立しているということになる。特に、 Note that the velocity variables R ₁ , R ₂ , R ₃ are independent under the prior distribution. Furthermore, when H = 1, these variables affect the sensor in a non-crossing set of time. Therefore, the speed variable is independent even under the posterior distribution. In particular,

となる。 It becomes.

帰無仮説の下では、Ｒ_２＝Ｒ_３、すなわち、センサ活動は「エージェント」期間中は変化しない。さらに、センサの活動がＲ_２、Ｒ_３に依存する場合の時間の集合は、Ｒ_１に依存する場合の時間の集合とは交差しない。従って、Ｒ_１は事後分布の下でＲ_２、Ｒ_３から独立している。すなわち、 Under the null hypothesis, R ₂ = R ₃ , ie sensor activity does not change during the “agent” period. Furthermore, the set of times when the sensor activity depends on R ₂ , R ₃ does not intersect the set of times when it depends on R ₁ . Therefore, R ₁ is independent of R ₂ and R ₃ under the posterior distribution. That is,

となる。 It becomes.

ｐ（ｒ│ｙ_１：ｔ，ｕ_１：ｔ，ｈ）＞０となるような任意のｒの場合は、 For any r such that p (r | y _{1: t} , u _{1: t} , h)> 0,

が得られる。 Is obtained.

従って、（１７）から（１８）への遷移には何か正しくないものがある。（１７）においては、ｕ（）を無視すると、たとえｏ＋ｑ＝０であっても、比率ｑが得られる。（１８）においては、得られない。 Therefore, there is something wrong with the transition from (17) to (18). In (17), if u () is ignored, the ratio q is obtained even if o + q = 0. In (18), it cannot be obtained.

及び as well as

ここでは、Ｈ＝０の下において確率１でＲ_２＝Ｒ_３であるという事実を使用した。従って、２つの仮説の間のログ尤度比は以下のとおりである。 Here, the fact that R ₂ = R ₃ with probability 1 under H = 0 was used. Therefore, the log likelihood ratio between the two hypotheses is:

さらに、関心の仮説に関する事後分布は以下のとおりである。 Furthermore, the posterior distribution for the hypothesis of interest is as follows:

３．３関心の事例
より明示的には、以下のようになる。 3.3 Cases of interest More specifically, it is as follows.

この事後分布は、反応するエージェントに関してロボットが入手可能な全ての情報を含んでいる。その２つの重要な特性は、（１）ｏ_１,ｔ，ｑ_１,ｔに依存していないこと、すなわち、自己帰還には仮説に関しての情報価値がないということと、（２）ｏ_１,ｔ＋ｑ_１,ｔ＝０またはｏ_２,ｔ＋ｑ_２,ｔ＝０の場合は、ログ尤度比が０であるということである。要するに、エージェントまたは背景状態においてデータが収集されていなければ、Ｈに関する情報は得られていないのである。従って、Ｈに関する情報を得るために、ロボットはそのアクチュエータを少なくとも１回は使用しなければならないし、少なくとも１回は使用してはならない。 This posterior distribution contains all the information available to the robot about the reacting agent. The two important properties are (1) not dependent on o _{1, t} , q _{1, t} , ie, self-feedback has no information value about the hypothesis, and (2) o _1, _t _{, If t} + q _{1, t} = 0 or o _{2, t} + q _{2, t} = 0, the log likelihood ratio is zero. In short, if the data is not collected in the agent or the background state, information on H is not obtained. Therefore, in order to obtain information about H, the robot must use its actuator at least once and not at least once.

データがまだ集まっていない場合は、尤度は１である、すなわち、事後尤度は事前尤度に等しい。 If the data has not yet been collected, the likelihood is 1, that is, the posterior likelihood is equal to the prior likelihood.

エージェント・データがまだ集まっていない場合は、集まっている背景データがどの程度であるかとは無関係に、尤度は１である。 If the agent data has not yet been collected, the likelihood is 1 regardless of how much background data has been collected.

背景データがまだ集まっておらず、β_２＝β_３である場合は、集まっているエージェント・データがどの程度であるかとは無関係に、尤度は１である。 If background data has not yet been collected and β ₂ = β ₃ , the likelihood is 1 regardless of how much agent data has been collected.

背景データがまだ集まっていないが、β_２≠β_３である場合は、エージェント・データの収集には情報価値がなく、特に、以下のようになる。 If background data has not yet been collected, but β ₂ ≠ β ₃ , the collection of agent data has no information value, and is particularly as follows.

背景ビットを集め、それが＋１であるとしよう。この値は帰無仮説によっても対立仮説によっても説明が困難である。というのは、情報が全く得られていないからである。それでは、背景データを有していないが、１ビットをエージェント時間から集め、それが＋１であるとしよう。帰無仮説は簡単な結果説明時間を有するが、対立仮説は有していない。従って、情報は得られていない。もしビットが０であるとすると、帰無仮説は対立仮設よりも簡単な結果説明時間を有することになる。情報は得られている。 Collect background bits and suppose it is +1. This value is difficult to explain either by the null hypothesis or the alternative hypothesis. This is because no information is available. Let's say that we don't have background data, but we collect 1 bit from the agent time and it is +1. The null hypothesis has a simple result explanation time, but no alternative hypothesis. Therefore, no information is obtained. If the bit is 0, the null hypothesis will have a simpler result explanation time than the alternative hypothesis. Information has been obtained.

３．４学習：インフォマックス制御
この節においては、社会的エージェントの存在または不在に関する期待情報リターンを最大化するために、どのようにしてロボットのアクチュエータの挙動を予定するのかということに焦点を合わせる。ｔは現在の時間を、Ｔ＞ｔは若干未来の時間を表すものとする。Ｃ＝｛Ｃ_ｒ：ｒ＝ｔ＋１，・・・，Ｔ｝は閉ループ・コントローラ、すなわち、一連の観察を行動にマッピングする関数の集合を表すものとする、すなわち、以下のようになる。 3.4 Learning: Infomax Control In this section, we focus on how to schedule robot actuator behavior to maximize the expected information return on the presence or absence of social agents. It is assumed that t represents the current time and T> t represents some future time. Let C = {C _r : r = t + 1,..., T} denote a closed-loop controller, that is, a set of functions that map a series of observations to actions, ie:

コントローラＣはベイズ的アプローチと整合するランダム・オブジェクトとして取り扱う。目標は、Ｈに関する不確実性を最小化する必要条件であるＣが取る値を発見することにある。 Controller C treats it as a random object consistent with the Bayesian approach. The goal is to find the value that C takes, which is a necessary condition to minimize the uncertainty about H.

本発明では、Ｃ＝cが必要条件である場合にＨに関する未来の情報リターンが最大化するようなコントローラcを提供する。 The present invention provides a controller c that maximizes future information returns for H when C = c is a necessary condition.

コントローラｃを使用した場合に期待される情報リターンは、Ｈとその時点において得られることになる観察可能変数との間の相互情報、すなわち、 The expected information return when using controller c is the mutual information between H and the observable variable that will be obtained at that time, ie

により与えられ、ここに、Ｔは相互情報を、Ｈはエントロピを表しており（付録ＩＩＩを参照）、（Ｙ_{ｔ＋１：ｒ}，Ｃ）であると仮定し、ＨはＵ_{ｔ＋１：ｒ}から条件付きで独立しているという事実を使用した。式が教えているのは、観察可能過程Ｙ_{ｔ＋１：ｓ}，Ｕ_{ｔ＋１：ｓ}により提供されるＨに関する情報が、それらの観察可能過程により提供される不確実性の減少に等しいということである。項Ｈ（Ｈ |ｙ_１：ｔ，ｕ_１：ｔ）はコントローラに依存していないので、情報利得を最大化することは、Ｈの未来のエントロピを最小化することと等価である。この事実を使用して、情報ベース効用関数を展開することにする。時間ｒにおける観察可能変数が与えられているものと仮定して、Ｗ_ｒはＨに関する不確実性であるものとする（条件付き期待値の定義については、付録ＩＩＩを参照）。 Where T represents mutual information, H represents entropy (see Appendix III), and is assumed to be (Y _{t + 1: r} , C), where H is conditional from U _{t + 1: r} Used the fact that they are independent. The equation teaches that the information about H provided by the observable processes Y _{t + 1: s} , U _{t + 1: s} is equal to the reduction in uncertainty provided by those observable processes. Maximizing the information gain is equivalent to minimizing the future entropy of H, since the term H (H | y _{1: t} , u _{1: t} ) does not depend on the controller. We will use this fact to develop an information-based utility function. Assuming that an observable variable at time r is given, _let W _{r be} an uncertainty with respect to H (see Appendix III for definition of conditional expectation).

従って、以下の式が得られる。 Therefore, the following equation is obtained.

コントローラｃを所与として、観察された数列ｙ_１：ｔ，ｕ_１：ｔについての期待リターンを以下のように、すなわち、 Given the controller c, the expected return for the observed sequence y _{1: t} , u _{1: t} is as follows _:

のように定義し、ここに、α_ｒ≧０は、未来の異なる時点における情報の相対値を示す固定数である。我々の目標は、期待リターンを最大化するコントローラcを発見することにある。 Where α _r ≧ 0 is a fixed number indicating the relative value of the information at different times in the future. Our goal is to find a controller c that maximizes the expected return.

最適コントローラが与えられているものと仮定し、数列（ｙ_１：ｔ，ｕ_１：ｔ）についての最適期待リターンはその期待リターンであると定義する。 Assuming that an optimal controller is given, the optimal expected return for the sequence (y _{1: t} , u _{1: t} ) is defined as that expected return.

最適コントローラと最適期待リターンがベルマンの最適性式、すなわち、 Optimal controller and optimal expected return is Bellman's optimality formula, ie

ここに、 here,

を満足するのを示すことは簡単である。 It is easy to show that

部分観察可能マルコフ過程については、ベルマンの式を厳密に解くことは一般的には困難である。その原因は、可能な数列の数が時間の関数としてあまりにも速く増加することにある。幸運なことに、我々の事例においては、Ｈに関する情報の損失が全くなしで観察可能数列を要約する再帰的統計値Ｓ_ｔ＝^ｄｅｆ（Ｏ_ｔ，Ｑ_ｔ，Ｚ_ｔ）が存在するために、問題は簡単になる。これにより、標準動的計画再帰アルゴリズムを使用して最適コントローラを使用することが可能になる（付録ＩＩを参照）。 For partially observable Markov processes, it is generally difficult to solve Bellman's formula exactly. The reason is that the number of possible sequences increases too quickly as a function of time. Fortunately, in our case there is a recursive statistic S _t = ^def (O _t , Q _t , Z _t ) that summarizes the observable sequence without any loss of information about H, The problem becomes simple. This allows the optimal controller to be used using standard dynamic programming recursive algorithms (see Appendix II).

４最適コントローラの解析
動的計画法問題は２４の２．５ＧＨｚＰｏｗｅｒＰＣＧ５ＣＰＵのクラスタを使用して解かれた。計算時間はほぼ１２時間程度であった。モデルのパラメータは以下のように設定された。Ｔ＝４０；τ_１ ^ｓ＝０；τ_２ ^ｓ＝０；τ_１ ^ａ＝１；τ_１ ^ａ＝３；π＝０．５。次に、時間１５＜ｔ＜２５についてコントローラの挙動をモデル化するために、ロジスティック回帰を使用したが、その理由は、この時間が、コントローラの関心の窓の開始と終了に近過ぎない時間、すなわち、ｔ∈だからである（例えば、非特許文献１、４０参照）。ロジスティック回帰は全ての可能な条件について９６．４６％の精度で最適コントローラの行動を予測した。最終的モデルは以下のとおりであった。 4 Analysis of optimal controller
The dynamic programming problem was solved using a cluster of 24 2.5 GHz Power PC G5 CPUs. The calculation time was approximately 12 hours. The model parameters were set as follows. T = 40; τ ₁ ^s = 0; τ ₂ ^s = 0; τ ₁ ^a = 1; τ ₁ ^a = 3; π = 0.5. Next, logistic regression was used to model the controller's behavior for the time 15 <t <25 because the time is not too close to the beginning and end of the window of interest of the controller, That is, because tε (see, for example, Non-Patent Documents 1 and 40). Logistic regression predicted optimal controller behavior with 96.46% accuracy for all possible conditions. The final model was as follows:

解釈：最適コントローラの誘導は多少困難ではあったが、最終製品はリアルタイムで簡単に作動できる単純な反応システムとなる。誘導により提供されたものが、この単純なコントローラが目前のタスクにとって最適であることの保証であった。このモデルの下では、これよりも優れた制御手段は存在しない。未来期待リターンを無視する貪欲な一段階コントローラ（例えば、非特許文献１６、１７参照）ではこのタスクに失敗するということに注目されたい。その理由は、反応する際に、次の時間段階が自己フィードバックにより占有され、たまたまそれに情報価値がなく、従って、結局は貪欲なコントローラは絶対に行動しないという判断を下すことになるからである。未来期待リターンを含むことにより、コントローラには、自動的に先を見越ことと、長い目で見れば行動を起こすことが行動を行わない場合よりも良い情報を提供できることを理解することが可能になる。 Interpretation: Optimal controller derivation was somewhat difficult, but the final product is a simple reaction system that can be easily operated in real time. What was provided by guidance was a guarantee that this simple controller was optimal for the task at hand. There is no better control under this model. Note that a greedy one-stage controller (eg, see Non-Patent Documents 16 and 17) that ignores future expected returns will fail this task. The reason is that when reacting, the next time step is occupied by self-feedback, and it happens to have no information value, so that eventually a greedy controller will make a decision that it will never act. By including expected future returns, it is possible to understand that the controller can automatically proactively and in the long run, taking action is better than not taking action. become.

いつ行動すべきかを判断するために、コントローラは統計値 In order to determine when to act, the controller

を使用する。この統計値は、Ｒ_ｉが能動的にセンサを駆動する期間、すなわち、Ｒ_１の場合は自己フィードバック期間、Ｒ_２の場合はエージェント期間、Ｒ_３の場合は背景期間から新しい観察結果により提供されるＲ_ｉに関する分散の期待される減少である。従って、最適コントローラはＲ_３とＲ_２に関する不確実性を一定の比率の範囲内に保つことを希望しているように見える。エージェント速度であるＲ_２があまりにも不確実な場合は、コントローラは行動することを選択する。背景速度であるＲ_３があまりにも不確実な場合は、コントローラは沈黙を保つことを選択し、それにより、背景活動速度に関する情報を獲得する。背景速度Ｒ_３に関する分散が背景速度Ｒ_２に関する分散の少なくとも９倍の大きさである場合に、行動が起こされることに注目することは興味深いことである。この倍率の理由は、情報リターンという観点からは、行動が行動の欠如よりもコストがかかるという事実にあるのかもしれない。ロボットが時間ｔにおいて行動した場合は、自己フィードバック観察結果にはＨに関する情報価値がないので、ロボットは時間［ｔ＋τ_１ ^ｓ，ｔ＋τ_２ ^ｓ］中には情報を獲得しない。さらに、時間［ｔ＋τ_１ ^ａ，ｔ＋τ_２ ^ａ］中には、コントローラはロボットに対して行動しないように命令し、従って、これらの期間中には、ロボットはＲ_２に関情報だけしか得ることができず、Ｒ_３に関する情報を得ることはできない。対照的に、ロボットが時間ｔにおいて行動しなかった場合は、自己フィードバックより時間が無駄になることはない。これが、行動が起こる前において、なぜエージェント活動速度Ｒ_２に関する不確実性が背景活動速度Ｒ_３に関する不確実性よりも大きい必要があるのかの説明に役立つかもしれない。 Is used. This statistic is provided by the new observations from the period during which R _i actively drives the sensor, ie the self-feedback period for R ₁ , the agent period for R _{2 and} the background period for R _3. Is the expected reduction in variance with respect to R _i . Thus, the optimal controller appears to want to keep the uncertainty regarding R ₃ and R ₂ within a certain ratio. If R ₂ is too uncertain an agent rate, the controller selects to act. If R ₃ is a background rate too uncertain, the controller chooses to keep silent, thereby acquiring information about the background activity rate. When variance for the background rate R ₃ is at least nine times the size of the variance for the background rate R _2, it is interesting to note that the action is awakened. The reason for this magnification may be the fact that behavior is more costly than lack of behavior in terms of information return. If the robot behaves at time t, the self-feedback observation results have no information value about H, so the robot does not acquire information during time [t + τ ₁ ^s , t + τ ₂ ^s ]. Furthermore, during the time [t + τ ₁ ^a , t + τ ₂ ^a ], the controller commands the robot not to act, so during these periods the robot can only get information about R _2. impossible, it is impossible to obtain information about R _3. In contrast, if the robot does not act at time t, then no more time is wasted than self-feedback. This is before the action takes place, why might uncertainty about the agent activities speed R ₂ is illustrative of what is required greater than uncertainty about background activity rate R _3.

５自己管理学習の形態としてのインフォマックス制御
ここで肝心なことは、システムにはその失敗又は成功が、すなわち、Ｈの真の値が決して明示的には知らされないということである。原理的には、エージェントが存在していたのか不在であったのかを決して教えられずに世界と相互作用することにより、システムは最適政策を簡単に学習することができる。これは、外部批判者を利用することができないかもしれない学習のモデルにとっては重要なことである。最近ではこの形態の学習を自己管理学習と呼んでいる人々もでてきている。 5 Infomax Control as a Form of Self-Managed Learning The important thing here is that the system is never explicitly told about its failure or success, ie the true value of H. In principle, the system can easily learn the optimal policy by interacting with the world, never being told if the agent was present or absent. This is important for learning models that may not be able to take advantage of external critics. Recently, some people have called this form of learning self-managed learning.

強化学習は、最適コントローラを発見するためのサンプリング法に依存する最適制御理論の一部門であると見ることができる。そのようなものとして、最適インフォマックス・コントローラを発達させるために、動的計画法の代わりに、強化学習アプローチを使用することもできたのである。動的計画法が与えたものは、コントローラが最適のものであったということの、すなわち、このコントローラよりも優れたコントローラは存在しなかったということの公式の保証である。 Reinforcement learning can be viewed as a division of optimal control theory that relies on sampling methods to find optimal controllers. As such, a reinforcement learning approach could be used instead of dynamic programming to develop an optimal infomax controller. What dynamic programming has given is an official guarantee that the controller was optimal, that is, there was no better controller than this controller.

６幼児の１日の４３秒間の理解
この節においては、１．１節に記載したように、乳児−９との実験セッションの最初の４３秒間についての質的理解を得るために、最適インフォマックス・コントローラを適用する。この時間中に、乳児−９は７回発声し、その発声時点は実験の開始から｛５．５８、９．４４、２０．１２、２５．５６、３２．１、３７．９、４１．７｝秒後であった。これらの発声の後には、ロボットは必ず音声と光を同時に発した。２回の連続した幼児の発声の間の時間間隔（単位：ミリ秒）は以下のとおり、すなわち、｛４．２２、１０．３２、５．３２、６．１４、５．４４、３．５６｝であった。３回目または４回目の発声までに、室内に反応するエージェントが存在することに幼児が気付いている、ということを大部分の人々が認めている。 6 Infants' understanding of 43 seconds per day In this section, as described in section 1.1, to obtain a qualitative understanding of the first 43 seconds of the experimental session with Infant-9, Apply the controller. During this time, Infant-9 uttered seven times, and the time point of utterance was {5.58, 9.44, 20.12, 25.56, 32.1, 37.9, 41.7 from the beginning of the experiment. } Seconds later. After these utterances, the robot always uttered voice and light simultaneously. The time interval (unit: milliseconds) between two consecutive infant utterances is as follows: {4.22, 10.32, 5.32, 6.14, 5.44, 3.56 }Met. Most people admit that by the third or fourth utterance, infants are aware that there is an agent that reacts in the room.

３．４節に提示したインフォマックス・コントローラの場合、５つのパラメータを、すなわち、時間打ち切りについてのサンプリング期間、２つの自己遅延パラメータ、２つのエージェント遅延パラメータを設定することが必要である。これらのパラメータについての概算を行うために、試験的研究を行う。エージェント潜在パラメータτ_１ ^ａ、τ_２ ^ａについて、研究の目的を教えずに、４人にコンピュータのアニメ・キャラクタに話しかけるよう求めた。参加者の年齢は４、６、２４、３５歳であった。音声センサの活動を２進化するために、最適エンコーダを使用し、この２進センサの起動の確率を１５０回の試験全体についての時間の関数としてプロットした。各試験はアニメ・キャラクタの発声により始まり、その４秒後に終わった。その結果を示したのが図６である。図６の（Ａ）のグラフは音声センサの活動を１５０回の試験全体にわたってキャラクタの発声の開始からの時間の関数として示している。各水平線は異なる試験である。最初の縦棒はキャラクタからの自己フィードバックによるものである。アニメ・キャラクタの発声の終了から約１２００〜１４００ｍｓｅｃまでには、センサの活動のもう１つのピークが生じるが、これは人間の参加者の発声にその原因がある。図６の（Ｂ）のグラフはセンサの活動の確率を試験全体にわたって縮約された時間の関数として示している。自己フィードバックによる活動の最初のピークと、人間の反応によるセンサの活動の漸進的増減に注目されたい。このグラフに基づき、最適コントローラのシミュレーションを以下のパラメータ、すなわち、△ｔ＝８００ｍｓｅｃ、τ_１ ^ｓ＝τ_２ ^ｓ＝０、τ_１ ^ａ＝１、τ_２ ^ａ＝３により行う。要するに、自己遅延を人間の反応の予想遅延に関しては無視できるものとし、人間の活動は８００〜２４００秒以内に起こるものとする。最悪事例のシナリオをシミュレーションするために、π＝０．０１に設定し、従って、反応するシステムが存在しているとの判断を下すためには、もっと多くのデータが必要である。 For the infomax controller presented in section 3.4, it is necessary to set five parameters: a sampling period for time censoring, two self-delay parameters, and two agent delay parameters. A pilot study is performed to make an approximation for these parameters. With regard to the agent latent parameters τ ₁ ^a and τ ₂ ^a, we asked four people to talk to the computer's animated characters without telling the purpose of the study. Participants were 4, 6, 24, and 35 years old. To binarize voice sensor activity, an optimal encoder was used and the probability of this binary sensor activation was plotted as a function of time for the entire 150 tests. Each test started with the voice of the animated character and ended 4 seconds later. The result is shown in FIG. The graph in FIG. 6A shows voice sensor activity as a function of time from the start of the character's utterance over 150 tests. Each horizontal line is a different test. The first vertical bar is due to self-feedback from the character. From the end of the utterance of the animated character to about 1200-1400 msec, another peak of sensor activity occurs, which is due to the utterance of the human participant. The graph of FIG. 6B shows the probability of sensor activity as a function of time reduced throughout the test. Note the first peak of activity due to self-feedback and the gradual increase and decrease in sensor activity due to human reaction. Based on this graph, the optimal controller is simulated with the following parameters: Δt = 800 msec, τ ₁ ^s = τ ₂ ^s = 0, τ ₁ ^a = 1, τ ₂ ^a = 3. In short, the self-delay is negligible with respect to the expected delay of human reaction, and human activity occurs within 800-2400 seconds. In order to simulate the worst case scenario, more data is needed to set π = 0.01 and thus to determine that a reacting system exists.

図７の（Ａ），（Ｂ），（Ｃ），（Ｄ）はシミュレーションの結果を示したものである。全てのグラフにおける水平軸は時間（単位：秒）である。図７の（Ａ）のグラフは、乳児−９の役割を演じる最適コントローラの発声を示している。コントローラは４３秒間の期間にわたって６回の発声を行った。発声間の平均時間間隔は、乳児−９の場合が５．８３３秒であるのに対して、５．９２秒であった。標準Ｔ試験（Ｔ（９）＝０．０８、ｐ＝０．９４）を使用した場合、この差は重要なものではない。 (A), (B), (C), and (D) of FIG. 7 show the results of the simulation. The horizontal axis in all graphs is time (unit: second). The graph in FIG. 7A shows the utterance of the optimal controller that plays the role of infant-9. The controller made 6 voices over a 43 second period. The average time interval between vocalizations was 5.92 seconds compared to 5.833 seconds for infant-9. This difference is not significant when using a standard T test (T (9) = 0.08, p = 0.94).

図７の（Ｂ）のグラフは、反応するエージェントの存在に関するシステムの信念を示している。実験開始３０秒後の４回目の反応までに、この確率は０．５レベルを超える。図７の（Ｃ）のグラフは、４３秒の期間の終了までのエージェント反応速度と背景反応速度に関する事後確率分布を示している。図７の（Ｄ）のグラフはエージェント期間中のセンサ速度に関する不確実性と背景期間中の速度に関する不確実性との比率を示している。この比率が９に達すると、模擬乳児が反応することに注目されたい。 The graph in FIG. 7B shows the system's beliefs regarding the presence of reacting agents. By the fourth reaction 30 seconds after the start of the experiment, this probability exceeds 0.5 level. The graph of FIG. 7C shows the posterior probability distribution regarding the agent reaction rate and the background reaction rate until the end of the period of 43 seconds. The graph of FIG. 7D shows the ratio between the uncertainty related to the sensor speed during the agent period and the uncertainty related to the speed during the background period. Note that when this ratio reaches 9, the simulated infant reacts.

従って、社会的相互作用において一般的に見られる時間遅延と不確実性のレベルを前提とすると、このモデルは、乳児−９が自分の反応を予定し、最適の形で社会的エージェントの反応性について判断を下したことを示している。このモデルは、乳児−９の発声がロボットの反応性に関して戻された情報を最大化するような形で予定されているという意味において、乳児−９がロボットに対して「質問を行って」いたという考えとの整合性もある。関心のもう１つのポイントは、最適コントローラが交代する、すなわち、行動が行われた後に、コントローラは次の発声までにある時間、平均５．９２秒待つということである。発声と発声の間の時間間隔は固定されておらず、エージェントと背景の反応性のレベルに関する相対不確実性により決まることになる。例えば、予想外の背景活動が生じた場合は、背景活動の変化をよりよく「理解する」ために、コントローラは発声間の時間間隔を自動的に延長する。予想外のエージェント活動が生じた場合は、コントローラは反応速度を高め、エージェント期間に関する情報の収集を加速する。 Therefore, given the level of time delay and uncertainty commonly found in social interactions, this model is designed to allow infants to schedule their reactions and optimally respond to social agents. This indicates that a judgment has been made. This model was “inquired” to the robot by the infant-9 in the sense that the infant-9 ’s utterance was scheduled to maximize the information returned about the robot's responsiveness. There is also consistency with this idea. Another point of interest is that the optimal controller takes turns, i.e., after an action has taken place, the controller waits an average of 5.92 seconds before the next utterance. The time interval between utterances is not fixed and will depend on the relative uncertainty regarding the level of agent-background reactivity. For example, if unexpected background activity occurs, the controller automatically extends the time interval between utterances to better “understand” changes in background activity. In the event of unexpected agent activity, the controller increases the reaction rate and accelerates the collection of information about the agent period.

７リアルタイム・ロボット実装
この問題を研究するために、上記の最適インフォマックス・コントローラを、ＡＴＲの知能ロボット工学研究所で開発された人型ロボットＲｏｂｏｖｉｅＭに実装した。リアルタイム・コントローラを試験するためにはロボットは必ずしも必要ではなかったが、人間と機械との間で展開される相互作用の質を高めるのに大いに役立ち、従って、より現実的なコントローラの試験方法を提供した。ＲｏｂｏｖｉｅＭは自由度２２（肩：自由度１、腰：自由度１、腕：自由度２×４、脚：自由度２×６）を有する。高さは２９ｃｍ、重量は約１．９ｋｇである。対応する２２のサーボの制御はＨ８１６ＭＨｚマイクロコントローラにより行われる。リアルタイム・インフォマックス・コントローラはＪａｖａで実装され、ホスト・コンピュータである、例えば、異なる陰の変数の事後分布のような、コントローラの異なる状態をリアルタイムでグラフィック表示するＭａｃＰｏｗｅｒＢｏｏｋＧ４で実行された。ホスト・コンピュータとコントローラとの間の通信は、ＷｉｒｅｌｅｓｓＣａｂｌｅｓ社製のシリアル・アダプタへのブルー・トゥースを使用して無線で行われた。現バージョンのインフォマックス・コントローラは１ビット・センサと１ビット・アクチュエータとを必要とする。センサについては、５００ｍｓｅｃのウィンドウにわたっての平均音声エネルギを選び、１ビット最適コーダを使用してそれを打ち切った。アクチュエータは、２００ｍｓｅｃのロボット音を発する小型のラウドスピーカであった。音を作り出すコマンドの発令と音声センサからのフィードバックの受信との間の時間遅延を測定することにより、コントローラの自己時間遅延パラメータは選択された。エージェントの遅延パラメータは乳児−９のシミュレーションの場合と同じであった（第６節を参照）。 7 Real-Time Robot Implementation In order to study this problem, the optimal infomax controller described above was implemented in the humanoid robot RobovieM developed at the Intelligent Robotics Laboratory of ATR. Robots are not necessarily required to test real-time controllers, but they greatly help to improve the quality of interactions developed between humans and machines, and thus make more realistic controller test methods possible. Provided. RobovieM has 22 degrees of freedom (shoulder: 1 degree of freedom, waist: 1 degree of freedom, arm: 2 degrees of freedom, leg: 2 degrees of freedom). The height is 29 cm and the weight is about 1.9 kg. The corresponding 22 servos are controlled by an H8 16 MHz microcontroller. The real-time infomax controller was implemented in Java and implemented on a host computer, such as the MacPowerBook G4, which graphically displays different states of the controller in real time, such as the posterior distribution of different shadow variables. Communication between the host computer and the controller was performed wirelessly using Bluetooth to a serial adapter from Wireless Cables. Current versions of the Infomax controller require a 1-bit sensor and a 1-bit actuator. For the sensor, the average voice energy over a 500 msec window was chosen and truncated using a 1-bit optimal coder. The actuator was a small loudspeaker that emits a 200 msec robot sound. The controller's self-time delay parameter was selected by measuring the time delay between issuing the sound producing command and receiving feedback from the voice sensor. Agent delay parameters were the same as in the infant-9 simulation (see Section 6).

ロボットの発声に加えて、反応するエージェントの存在／不在についてのコントローラの信念に基づき、その姿勢が、エージェントが存在するとコントローラが信じた場合の高レベルの注意を示す姿勢と、エージェントが存在しないとコントローラが信じた場合の退屈さを示す姿勢とに変化した。 Based on the controller's belief about the presence / absence of the reacting agent in addition to the robot's utterance, the posture indicates a high level of attention when the controller believes that the agent is present, and the agent does not exist It changed to a posture that showed boredom when the controller believed.

７．１非定常環境
ここに提唱したモデルにおいては、変数ＲとＨにより表されるエージェントと背景の状態はランダムであるが、しかし定常的である。現実的実装のためには、ＲとＨが時間とともに変化できることが必要である。残念なことに、そのような事例においては、最適コントローラの計算が面倒であることを示すことができる。我々は、過去の観察結果が時間の関数として指数的に無関係になると仮定することにより、状況を近似化する。この近似化の下で、我々はＯ_ｔ、Ｑ_ｔの指数平滑化された移動平均をただ単に収集し、標準コントローラをこれらの移動平均に適用する。状況が３０秒を超えて定常的であることを期待すべきではないという考え方を反映して、指数平滑部の時間定数は３０秒であった。 7.1 Unsteady environment In the model proposed here, the agent and background states represented by the variables R and H are random, but stationary. For practical implementation, it is necessary that R and H can change with time. Unfortunately, in such cases, it can be shown that the computation of the optimal controller is cumbersome. We approximate the situation by assuming that past observations become exponentially irrelevant as a function of time. Under this approximation, we simply collect the exponentially smoothed moving averages of O _t , Q _t and apply a standard controller to these moving averages. Reflecting the idea that the situation should not be expected to be steady beyond 30 seconds, the exponential smoothing time constant was 30 seconds.

７．２質的評価
本発明の目的は、偶発事象を信頼できる情報源として使用できるようにすることであった。偶発事象は信頼でき、計算と帯域幅に関する要件も非常に低い。数量的評価を欠いているので、公の集会においてシステムを実演するという我々の経験に基づき、質的評価を提示することにする。騒音のレベルが比較的高い標準的オフィス環境において、コントローラは反応するエージェントが存在しているかどうかの判断を数回の試験の後に下す。特に有効であるのが、エージェントがロボットへの話しかけから誰か他の人への話しかけに移る転移点である。このシステムを４回の科学講演会と２回の会議、すなわち、ＩＣＤＬ０４とＮＩＰＳ０４において実演した。一般的に騒音レベルが比較的低い講演会における実演はうまく行く。ＩＣＤＬ０４においては、ポスター・ルームのような比較的騒がしく、コントローラが信頼できる判断を下すには少し余計に時間がかかった。状況の困難さを考慮に入れると、全体的にパーフォーマンスのレベルは目覚しいものであった。ＮＩＰＳ０４においては、条件は極めて騒がしいものであった。多くの場合の大声での会話も相互理解には十分ではなかった。これらの条件の下でコントローラが信頼できる働きを行うためには、人間は大声で話し、ロボットの近くにいなければならなかった。 7.2 Qualitative assessment The purpose of the present invention was to make it possible to use contingencies as a reliable source of information. The contingency is reliable and the requirements for computation and bandwidth are very low. Because of the lack of quantitative evaluation, we will present a qualitative evaluation based on our experience of demonstrating the system at a public assembly. In a standard office environment where the level of noise is relatively high, the controller makes a decision after several tests to determine whether there is a reacting agent. Particularly effective is the transition point where an agent moves from talking to a robot to talking to someone else. The system was demonstrated at 4 scientific lectures and 2 conferences, namely ICDL04 and NIPS04. Demonstrations at lectures with relatively low noise levels generally work well. In ICDL04, it was relatively noisy like a poster room, and it took a little extra time for the controller to make a reliable decision. Overall, the level of performance was impressive, given the difficulty of the situation. In NIPS04, the conditions were extremely noisy. Loud conversations in many cases were not sufficient for mutual understanding. In order for the controller to perform reliably under these conditions, humans had to speak loudly and be near the robot.

この方法が人型ロボットと音声ｍｈｏＤａｌに適用された場合は、カメラ入力ｍｈｏＤａｌとともに使用することが可能である。 If this method is applied to humanoid robot and voice mho Dal, it can be used with camera input mho Dal.

これをロボットの様々な表現能力と結合することにより、ロボットとユーザーとの間のより緊密な相互作用を可能にする装置を提供する。 Combining this with the various expressive capabilities of the robot provides a device that allows a closer interaction between the robot and the user.

基本的発明部分においては、音声がｍｏｏＤａｌをロボットの入力・出力として取り扱った。 In the basic invention part, the voice handled the Moo Dal as the input / output of the robot.

この結果、画像入力または身振り出力によるセンサ入力によるアクチュエータ出力を間接コーナ制御により処理することができる。 As a result, the actuator output by the sensor input by the image input or the gesture output can be processed by the indirect corner control.

一例に過ぎないが、外部世界におけるカメラ画像入力と光学欠陥計算技術により一定の上記数量を有するための動きが検出されると、センサ入力１が入力する。 Although it is only an example, if the movement for having the said fixed quantity is detected by the camera image input and optical defect calculation technique in the outside world, the sensor input 1 will input.

それに加えて、事前に平和をもたらした一定身振り出力コマンドが出力１のために実行される。 In addition, a constant gesture output command that brought peace in advance is executed for output 1.

結果として、画像入力ｍｈｏＤａｌによる偶発事象探索が有効になる。エンターテイメント・ロボットにおいては、ユーザーを疲れさせない表現の要素能力が重要である。 As a result, the contingency event search by the image input mho Dal becomes effective. In entertainment robots, the elemental ability of expressions that does not tire the user is important.

８本発明の適用例
この基本的発明の適用例として、以下には２つの実施例が示してある。 8 Application Examples of the Invention As examples of application of the basic invention, two examples are shown below.

適用例１
身振りが観察され、さらに、表現は出力のことを考えるので、その身振りは間接コーナ制御に基づき模倣される。 Application example 1
Gestures are observed and, further, the expression considers output, so that the gestures are imitated based on indirect corner control.

身振りが観察され、模倣の程度において基本となる発明により計算された後の確率を使用するための方法が可能である。 There are possible methods for using the probability after the gesture is observed and calculated by the underlying invention in the degree of imitation.

ロボットの間接アクチュエータが使用され、それは観察され、模倣され、しかも、カメラ画像入力から、多くの知識が得られるように、相互作用の対象である人間の動きを出力することができる。 Robot indirect actuators are used, which can be observed, imitated, and output the human motion that is the subject of interaction so that much knowledge is gained from the camera image input.

それが観察され、模倣の程度を数値制御できる場合は、基本となる発明により計算された事後確率にこの数値を反映させることが考え得る。 If it is observed and the degree of imitation can be numerically controlled, it can be considered to reflect this numerical value in the posterior probability calculated by the basic invention.

例示の目的のために、それは観察され、人間の間接コーナが模倣の方法として画像入力から推定され、目標角度に対してロボットの間接コーナを制御するための方法が考えられる。 For illustrative purposes, it is observed and human indirect corners are estimated from the image input as a method of imitation, and a method for controlling the robot indirect corners relative to the target angle is conceivable.

[結合コーナ制御値]＝ｋ１×［事後確率値］×［間接コーナ値］
アクチュエータが上記に従って動かされた場合は、アクチュエータは仕事の後にロボットの中で似た反応を示すことができ、その結果、確率値は、ユーザーとロボットとの相互作用が偶発的なものになるような高い値を示すことになる。 [Joint corner control value] = k1 × [posterior probability value] × [indirect corner value]
If the actuator is moved according to the above, the actuator can show a similar response in the robot after work, so that the probability value is such that the interaction between the user and the robot is accidental. A high value.

正常な進歩の過程が観察され、それは模倣よりも多く観察され、しかも、模倣することは動的であり、それを理解することができ、それは依存しており、ユーザーの関心を引き付けることができる。 The process of normal progress is observed, it is observed more than imitation, and imitation is dynamic and can be understood, it depends and can attract the user's interest .

ｋ１（ｋ２も同様）はここではパラメータである。 Here, k1 (also k2) is a parameter.

適用例２
表情として、ロボットの表情の変化が考えられる。 Application example 2
Changes in the facial expression of the robot can be considered as facial expressions.

表情のメカニズムは様々であるが、しかし、最も簡単な実装例は、パラメータについて目のＬＥＤの輝度を変えるための方法である。 The facial expression mechanisms vary, but the simplest implementation is a method for changing the brightness of the eye LED in terms of parameters.

ＬＥＤの輝度は以下のように設定される。 The brightness of the LED is set as follows.

［ＬＥＤ輝度］＝ｋ２×［仕事後の確率値］
結果として、ユーザーは一番前におり、目の輝度は偶発的である相互作用に伴って大きくなる。 [LED brightness] = k2 × [probability value after work]
As a result, the user is at the forefront and the brightness of the eyes increases with interactions that are accidental.

ユーザーに対しては、「このロボットは私の存在をゆっくりと認識する」という動的変化過程により、より好ましい印象を与えることができる。 For the user, the dynamic change process of “this robot recognizes my existence slowly” can give a better impression.

もちろん、ＬＥＤ以外による表情も可能である。 Of course, expressions other than LEDs are also possible.

一例に過ぎないが、簡単な運動制御による眉と唇の形を変えるロボットのメカニズムが存在する。 For example, there is a robot mechanism that changes the shape of eyebrows and lips by simple motion control.

運動出力に反映し、確立値が上昇するように、力強い微笑みを示すための制御を類似の原理により簡単な方法で適用することができる。 The control for showing a strong smile can be applied in a simple manner by a similar principle so that the established value is reflected in the motor output.

例えば、発明の適用効果が、確信度と動的連続の数量を使用することにより、使用される変化過程の特徴をもたらすことができ、例えば、「発見の喜びについての人間の身振りの出力」のような、ルールベースの技術に比べて、人間が行動の近くにいることができる。それに加えて、人間がユーザーとの相互作用をさらに進めることができる。 For example, the application effect of the invention can bring about the characteristics of the change process used by using confidence and dynamic continuous quantities, for example, “output of human gestures about the pleasure of discovery” Compared to rule-based technology, people can be closer to action. In addition, humans can further interact with users.

９本発明を搭載したロボット装置
本発明は、例えば、図８に示したようなロボット装置に搭載することができる。図８の二足歩行ロボット装置３０は、日常生活における生活状態やその他の状況に対する人間の行動を手助けする実用ロボットである。ロボット装置３０は、内面状態（怒り、悲しみ、喜び、楽しみ等）に従って行動できるエンターテイメント・ロボットでもある。 9 Robot Device Mounted with the Present Invention The present invention can be mounted on, for example, a robot device as shown in FIG. The biped robot device 30 in FIG. 8 is a practical robot that assists human behavior in daily life and other situations. The robot device 30 is also an entertainment robot that can act according to the inner state (anger, sadness, joy, fun, etc.).

図８に示したように、ロボット装置３０は、頭部ユニット３２と、右左腕ユニット３３Ｒ／Ｌと、胴ユニット３１の指定位置に結合された右左脚ユニット３４Ｒ／Ｌを含む。これらの参照符号において、文字ＲとＬは、それぞれ右と左を示す接尾辞である。これは以下に記述についても同様である。 As shown in FIG. 8, the robot apparatus 30 includes a head unit 32, a right / left arm unit 33 R / L, and a right / left leg unit 34 R / L coupled to a designated position of the torso unit 31. In these reference signs, the letters R and L are suffixes indicating right and left, respectively. The same applies to the following description.

図９は、ロボット装置３０に提供される関節自由度の構造の概略を示したものである。頭部ユニット１０２を支持している首関節は自由度３、すなわち、首関節横揺れ軸１０１と、首関節縦揺れ軸１０２と、首関節ロール軸１０３を有する。 FIG. 9 schematically shows the structure of the degree of freedom of joint provided to the robot apparatus 30. The neck joint supporting the head unit 102 has three degrees of freedom, that is, a neck joint roll axis 101, a neck joint longitudinal axis 102, and a neck joint roll axis 103.

上肢を構成する腕ユニット３３Ｒ／Ｌの各々は、肩関節縦揺れ軸１０７と、肩関節ロール軸１０８と、上腕横揺れ軸１０９と、肘関節縦揺れ軸１１０と、前腕横揺れ軸１１１と、手関節縦揺れ軸１１２と、手関節ロール軸１１３と、手部分１１４を含む。手部分１１４は実際には複数の指を含む多関節多自由度構造である。しかしながら、手部分１１４の動きはロボット装置１の姿勢・歩行制御にはほとんど影響を与えない。簡素化のために、本明細書は手部分１１４の自由度をゼロと仮定している。従って、各腕ユニットの自由度は７である。 Each of the arm units 33R / L constituting the upper limb includes a shoulder joint pitch axis 107, a shoulder joint roll axis 108, an upper arm roll axis 109, an elbow joint pitch axis 110, a forearm roll axis 111, A wrist joint pitch axis 112, a wrist joint roll axis 113, and a hand portion 114 are included. The hand portion 114 is actually an articulated multi-degree-of-freedom structure including a plurality of fingers. However, the movement of the hand portion 114 hardly affects the posture / walking control of the robot apparatus 1. For simplicity, this specification assumes that the degree of freedom of the hand portion 114 is zero. Therefore, the degree of freedom of each arm unit is 7.

胴部分２は自由度３、すなわち、胴縦揺れ軸１０４と、胴ロール軸１０５と、胴横揺れ軸１０６を有する。 The trunk portion 2 has three degrees of freedom, that is, a trunk pitch axis 104, a trunk roll axis 105, and a trunk roll axis 106.

下肢を構成する脚ユニット３４Ｒ／Ｌの各々は、股関節横揺れ軸１１５と、股関節縦揺れ軸１１６と、股関節ロール軸１１７と、膝関節縦揺れ軸１１８と、足関節縦揺れ軸１１９と、足関節ロール軸１２０と、足部分１２１を含む。本明細書は股関節縦揺れ軸１１６と股関節ロール軸１１７との交差地点をロボット装置３０の股関節位置であると定義する。この足部分１２１に相当する人間の足部分は、多関節多自由度足底を含む構造である。簡素化のために、本明細書はロボット装置３０の足裏の自由度をゼロと仮定している。従って、各脚ユニットの自由度は６である。 Each of the leg units 34R / L constituting the lower limb includes a hip joint roll shaft 115, a hip joint pitch shaft 116, a hip joint roll shaft 117, a knee joint pitch shaft 118, an ankle joint pitch shaft 119, and a foot. A joint roll shaft 120 and a foot portion 121 are included. In the present specification, the intersection of the hip joint pitch axis 116 and the hip joint roll axis 117 is defined as the hip joint position of the robot apparatus 30. The human foot portion corresponding to the foot portion 121 has a structure including a multi-joint multi-degree-of-freedom sole. For the sake of simplicity, this specification assumes that the degree of freedom of the sole of the robot apparatus 30 is zero. Therefore, each leg unit has 6 degrees of freedom.

合計すると、ロボット装置３０全体の自由度は３２（３＋７×２＋３＋６×２）である。しかしながら、エンターテイメント志向ロボット装置３０の自由度は３２に画定されているわけではない。自由度、すなわち、関節の数を設計又は生産条件、要求される仕様等に従って増減できることは明白である。 In total, the degree of freedom of the entire robot apparatus 30 is 32 (3 + 7 × 2 + 3 + 6 × 2). However, the degree of freedom of the entertainment-oriented robot apparatus 30 is not limited to 32. It is obvious that the degree of freedom, ie the number of joints, can be increased or decreased according to design or production conditions, required specifications, etc.

実際には、ロボット装置３０に提供される上記の自由度の各々を実現するために、アクチュエータが使用される。自然な人間の体型に似せるために不要に見える膨らみを除去し、不安定な２足歩行構造に対して姿勢制御を提供するということを考慮して、好ましくは小型・軽量のアクチュエータを主として使用する。より好ましくは、モータ・ユニット内に取付けられた単一チップ・サーボ制御システムを備えた歯車に直接接続された小型ＡＣサーボ・アクチュエータを使用する。 In practice, an actuator is used to realize each of the above degrees of freedom provided to the robotic device 30. In consideration of removing bulges that appear unnecessary to resemble a natural human body and providing posture control for an unstable bipedal structure, preferably a small and lightweight actuator is mainly used. . More preferably, a small AC servo actuator connected directly to a gear with a single chip servo control system mounted in the motor unit is used.

図１０は、ロボット装置３０の制御システム構成の概略を示したものである。図１４に示したように、制御システムは推論制御モジュール２００と運動制御モジュール３００を含む。推論制御モジュール２００はユーザー入力等に対する動的反応の形で情緒的識別と感情的表現を制御する。運動制御モジュール３００は、アクチュエータ３５０の駆動のようなロボット装置１の全身の調和運動を制御する。 FIG. 10 shows an outline of a control system configuration of the robot apparatus 30. As shown in FIG. 14, the control system includes an inference control module 200 and a motion control module 300. The inference control module 200 controls emotional identification and emotional expression in the form of dynamic responses to user input and the like. The motion control module 300 controls the harmonic motion of the whole body of the robot apparatus 1 such as driving of the actuator 350.

推論制御モジュール２２０は、情緒的識別と感情的表現に関する計算過程を実行するためのＣＰＵ（中央処理ユニット）２１１と、ＲＡＭ（ランダム・アクセス・メモリ）２１２と、ＲＯＭ（読出し専用メモリ）２１３と、外部記憶装置（ハードディスク・ドライブ等）２１４を含む。推論制御モジュール２２０は、モジュール内部での必要な要素を全て備えた過程が可能な独立被駆動情報処理ユニットである。 The inference control module 220 includes a CPU (Central Processing Unit) 211, a RAM (Random Access Memory) 212, a ROM (Read Only Memory) 213 for executing calculation processes related to emotional identification and emotional expression, An external storage device (such as a hard disk drive) 214 is included. The inference control module 220 is an independent driven information processing unit capable of performing a process including all necessary elements inside the module.

推論制御モジュール２２０には、画像入力装置２５１から画像データが、音声入力装置２５２から音声データが、さらにはその他が供給される。外部からのこれらの刺激に応じて、推論制御モジュール２２０はロボット装置３０の現在の感情又は意図を判断する。画像入力装置２５１は、例えば、複数のＣＣＤ（電荷結合素子）カメラを有する。音声入力装置２５２は、例えば、複数のマイクロホンを有する。 The inference control module 220 is supplied with image data from the image input device 251, audio data from the audio input device 252, and others. In response to these external stimuli, the inference control module 220 determines the current emotion or intention of the robot apparatus 30. The image input device 251 includes, for example, a plurality of CCD (charge coupled device) cameras. The voice input device 252 has, for example, a plurality of microphones.

推論制御モジュール２２０は、意思決定、すなわち、四肢の動きに基づき一連の運動又は行動を行うように運動制御モジュール３００に対して命令を発する。 The inference control module 220 issues a command to the motion control module 300 to make a series of motions or actions based on decision making, that is, movement of the limbs.

運動制御モジュール３００は、ロボット装置３０の全身の調和運動を制御するためのＣＰＵ３１１と、ＲＡＭ３１２と、ＲＯＭ３１３と、外部記憶装置（ハードディスク・ドライブ等）３１４を含む。運動制御モジュール３００は、モジュール内部での必要な要素を全て備えた過程が可能な独立被駆動情報処理ユニットである。外部記憶装置３１４は、例えば、オフライン計算歩行パターンと、目標のＺＭＰ軌道と、その他の行動予定を記憶することができる。ＺＭＰとは、歩行中の床の反発力が原因となるゼロモーメントを発生させる床表面地点である。ＺＭＰ軌道とは、ロボット装置３０の歩行運動中にＺＰＭが動く際の軌道を意味する。ＺＭＰの概念と脚付きロボットの安定性判断基準へのＺＭＰの適用については、ミイオミール・ブコブラトビッチ“ＬＥＧＧＥＤＬＯＯＭＯＴＩＯＮＲＯＢＯＴ”（「日刊工業新聞社、カトウ・イチロウ他による日本語訳「歩行ロボットと人工の脚」」を参照）。 The motion control module 300 includes a CPU 311, a RAM 312, a ROM 313, and an external storage device (such as a hard disk drive) 314 for controlling the harmonic motion of the whole body of the robot device 30. The motion control module 300 is an independent driven information processing unit capable of performing a process including all necessary elements inside the module. The external storage device 314 can store, for example, an off-line calculated walking pattern, a target ZMP trajectory, and other action plans. ZMP is a floor surface point that generates a zero moment caused by the repulsive force of the floor during walking. The ZMP trajectory means a trajectory when the ZPM moves during the walking motion of the robot apparatus 30. For the concept of ZMP and the application of ZMP to the stability criteria for legged robots, see Miiomir Bukobratovic “LEGGED LOOMOTION ROBOT” (Japanese translation by Nikkan Kogyo Shimbun, Kato Ichiro et al. Leg ").

運動制御モジュール３００は、図９に示したロボット装置３０の全身に分布している自由度の各々を実現するためのアクチュエータと、胴ユニット２の姿勢又は傾斜を測定するための姿勢センサ５１と、左右の足裏が床から離れているのか、それとも床に接触しているのかを探知するための着地確認センサ３５２及び３５３と、電池のような電源を管理するための電源コントローラ３５４に接続されている。これらの装置はバス・インタフェース（Ｉ／Ｆ）３０１を通じて運動制御モジュール３００に接続されている。姿勢センサ３５１は、例えば、加速センサとジャイロ・センサとの組合せである。着地センサ３５２及び３５３は近接センサ、マイクロ・スイッチ等を含む。 The motion control module 300 includes an actuator for realizing each of the degrees of freedom distributed over the whole body of the robot apparatus 30 shown in FIG. 9, an attitude sensor 51 for measuring the attitude or inclination of the torso unit 2, Connected to landing confirmation sensors 352 and 353 for detecting whether the right and left soles are away from the floor or in contact with the floor, and a power controller 354 for managing a power source such as a battery. Yes. These devices are connected to the motion control module 300 through a bus interface (I / F) 301. The posture sensor 351 is, for example, a combination of an acceleration sensor and a gyro sensor. Landing sensors 352 and 353 include proximity sensors, micro switches, and the like.

推論制御モジュール２００と運動制御モジュール３００は共通のプラットホーム上に構築されている。この２つのモジュールはバス・インタフェース２０１及び３０１により相互に接続されている。 The inference control module 200 and the motion control module 300 are built on a common platform. The two modules are connected to each other by bus interfaces 201 and 301.

運動制御モジュール３００は、推論制御モジュール２００から命じられた行動を実現するために、アクチュエータ３５０の各々により全身の調和運動を制御する。推論制御モジュール２００から命じられた行動に対応して、ＣＰＵ３１１は対応動作パターンを外部記憶装置３１４から読出す。あるいは又、ＣＰＵ３１１は内部に動作パターンを生成する。 The motion control module 300 controls the harmonic motion of the whole body by each of the actuators 350 in order to realize the behavior commanded by the inference control module 200. In response to the action commanded by the inference control module 200, the CPU 311 reads the corresponding operation pattern from the external storage device 314. Alternatively, the CPU 311 generates an operation pattern inside.

定められた動作パターンに従って、ＣＰＵ３１１は足部分の運動、ＺＭＰ軌道、胴の運動、上肢の運動、腰の水平位置と高さ等を設定する。ＣＰＵ３１１は次にコマンド値をアクチュエータ３５０に転送する。コマンド値は設定内容に応じて動作を定める。 In accordance with the determined operation pattern, the CPU 311 sets the motion of the foot portion, the ZMP trajectory, the motion of the trunk, the motion of the upper limb, the horizontal position and height of the waist, and the like. Next, the CPU 311 transfers the command value to the actuator 350. The command value determines the operation according to the setting contents.

ロボット装置３０の胴ユニット３１の姿勢又は傾斜を探知するために、ＣＰＵ３１１は姿勢センサ３５１からの出力信号を使用する。それに加えて、脚ユニット５Ｒ／Ｌの各々が使用されていないのか、それとも起立中であるのかを探知するために、ＣＰＵ３１１は着地確認センサ３５２及び３５３からの出力信号を使用する。この方法により、ＣＰＵ３１１はロボット装置３０の全身の調和運動を適応制御することができる。 In order to detect the posture or inclination of the trunk unit 31 of the robot apparatus 30, the CPU 311 uses an output signal from the posture sensor 351. In addition, the CPU 311 uses the output signals from the landing confirmation sensors 352 and 353 to detect whether each of the leg units 5R / L is not used or standing. With this method, the CPU 311 can adaptively control the harmonic motion of the whole body of the robot apparatus 30.

さらに、ＺＭＰ位置が常にＺＭＰ安定化領域の中心を向くように、ＣＰＵ３１１はロボット装置３０の姿勢と動作を制御する。 Further, the CPU 311 controls the posture and operation of the robot apparatus 30 so that the ZMP position always faces the center of the ZMP stabilization region.

運動制御モジュール３００は推論制御モジュール２００に処理状態を、すなわち、どの程度まで運動制御モジュール３００が推論制御モジュール２００により行われた判断に従って行動を実行したのかを知らせる。 The motion control module 300 informs the inference control module 200 of the processing state, that is, to what extent the motion control module 300 has executed an action according to the judgment made by the inference control module 200.

この方法により、ロボット装置３０は制御プログラムに基づきその環境と周辺環境を判断することができ、自律的に行動することができる。 With this method, the robot apparatus 30 can determine the environment and the surrounding environment based on the control program, and can act autonomously.

ロボット装置３０においては、例えば、上記の画像認識機能を実装するために、推論制御モジュール２００のＲＯＭ２１３はプログラム（データを含む）を記憶する。この場合には、ＣＰＵ２１１は画像認識プログラムを実行する。 In the robot apparatus 30, for example, the ROM 213 of the inference control module 200 stores a program (including data) in order to implement the above-described image recognition function. In this case, the CPU 211 executes an image recognition program.

上記の画像認識機能はインストールされているので、ロボット装置３０は、画像入力装置２５１を通じて供給される画像データから事前記憶モデルを正確に抽出することができる。例えば、ロボット装置３０が自律歩行する場合は、意図されたモデルを画像入力装置２５１のＣＣＤカメラにより記録された周囲画像から探知することが必要な場合があるかもしれない。この場合は、モデルは他の障害物により部分的に隠されていることが多い。視点と輝度は変えることができる。かかる場合においてさえ、上記の画像認識技術はモデルを正確に抽出することができる。 Since the image recognition function is installed, the robot apparatus 30 can accurately extract the pre-stored model from the image data supplied through the image input apparatus 251. For example, when the robot apparatus 30 autonomously walks, it may be necessary to detect the intended model from the surrounding image recorded by the CCD camera of the image input apparatus 251. In this case, the model is often partially hidden by other obstacles. The viewpoint and brightness can be changed. Even in such a case, the image recognition technique described above can accurately extract the model.

上記実施の形態では、ロボット装置自身の確信度を考え、外界と自身出力との間で定義される相互情報量を最大化するようにロボット装置の行動出力を決定した。 In the above embodiment, considering the certainty of the robot apparatus itself, the action output of the robot apparatus is determined so as to maximize the mutual information amount defined between the outside world and the own output.

１０本発明の他の実施の形態
これに対して、以下に説明する実施の形態では、ロボット装置からみて，インタラクション対象が有すると思われる確信度を考えて、そこから定義される相互情報量を最大化するようにロボット装置の行動出力を決定する。 10 Other Embodiments of the Present Invention In contrast, in the embodiment described below, the mutual information amount defined from the certainty that the interaction target is considered from the viewpoint of the robot apparatus is determined. The action output of the robot apparatus is determined so as to maximize it.

これは，インタラクション相手の気持ちを推し量って（心理学で「心の理論：Theory of Mind」と言われる概念に相当）相手にとっての情報量最大化，つまり相手にとってより多くの情報量を与えるように、こちら側（ロボット装置側）から行動を働きかけるという、言わば先の実施の形態とは逆の行動スタンスを同様の方法論に基づき行うことに相当する。以下、具体例として、単純な音声入出力装置を有するロボット装置を例にあげながらこれを説明する。 This is based on the feelings of the interaction partner (equivalent to the concept called “Theory of Mind” in psychology). As described above, this is equivalent to performing an action stance opposite to the previous embodiment, in which action is acted on from this side (robot apparatus side). Hereinafter, as a specific example, a robot apparatus having a simple voice input / output apparatus will be described as an example.

ここでは、ロボット装置が、インタラクション対象（人間や他のロボットなど）と、単純な音声のやりとりでインタラクションする場面を考える。 Here, consider a scene in which a robot apparatus interacts with an interaction target (such as a human being or another robot) by simple voice exchange.

図１１に示すロボット装置４０は、音声入力器４１を備えており、これはマイク装置である。さらに、この音声入力器４１は、外界の音量が一定の大きさ以上になったとき、離散値１を入力する。そうでない時は単位時間ごとに０を入力する。音声出力器４２は、スピーカー装置であり，制御器４３からの命令を元に，単位時間ごとに予め定められた任意音（例えば「ピロロ」といった呼びかけ音）を制御出力値１の場合出力し、制御出力値０の場合は無音出力を行う。 A robot apparatus 40 shown in FIG. 11 includes an audio input device 41, which is a microphone apparatus. Further, the sound input device 41 inputs a discrete value 1 when the volume of the outside world becomes a certain level or more. Otherwise, 0 is input every unit time. The audio output device 42 is a speaker device, and outputs an arbitrary sound (for example, a calling sound such as “Pirolo”) predetermined per unit time based on a command from the controller 43 when the control output value is 1, When the control output value is 0, silent output is performed.

制御器４３は，前時間ステップまでの入出力履歴を元に，現時間ステップでの制御出力を決定し音声出力器４２に送る。以下、冒頭に挙げたように心の理論的な方法論、つまり相手の立場にたって自身の行動を見て、相手にとってより好ましい（この尺度は後述する）ように行動する制御器４３の設定方法について述べる。 The controller 43 determines the control output at the current time step based on the input / output history up to the previous time step and sends it to the audio output device 42. In the following, the theoretical methodology of the mind as described at the beginning, that is, the setting method of the controller 43 that acts in a manner that is more favorable to the partner (see this scale later) by looking at the behavior of the other party. State.

図１２に示すように、ロボット装置４０がある時点において音声出力「u1=1」を行ったとすると、外界ノイズがどうであれ、相手のセンサ入力（推定相手入力）は「y2=1」になると考える。さらに、相手が音声出力「u2=1」を行ったとすると、外界ノイズがどうであれ，「y1=1」になる。 As shown in FIG. 12, if the robot apparatus 40 performs voice output “u1 = 1” at a certain time, the sensor input (estimated partner input) of the other party becomes “y2 = 1” regardless of the external noise. Think. Furthermore, if the other party performs the audio output “u2 = 1”, it becomes “y1 = 1” whatever the external noise.

ロボット装置４０は、自らのセンサ入出力を元に、相手の状態に関する以下の５変数を毎時更新して保持する。 The robot apparatus 40 updates and holds the following five variables relating to the opponent's state every hour based on its own sensor input / output.

インタラクション対象が行動「u2=1」してから何ステップ経過したかを記録する。この値は最大値kappaでパラメータとして指定されており、この値以上にはならない。タイムカウンタz（実際には一時点前の値z(t-1)）を用いることによって、現在、インタラクション対象は、Agent periodとBackground periodのどちらに属するかが以下のルールに基づき判断される。 Record how many steps have passed since the interaction target action “u2 = 1”. This value is specified as a parameter with the maximum value kappa and will not exceed this value. By using the time counter z (actually the value z (t−1) before the temporary point), it is determined based on the following rule whether the interaction target currently belongs to the Agent period or the Background period.

if z(t-1) < (kappa-1) then currently in Agent period
if z(t-1) >= (kappa-1) then currently in Background period
タイムカウンタz自体は以下のルールにより更新される。 if z (t-1) <(kappa-1) then currently in Agent period
if z (t-1)> = (kappa-1) then currently in Background period
The time counter z itself is updated according to the following rules.

if u2(t) = 1 then z(t) = 1
if u2(t) = 0 then z(t) = z(t-1)+1, although z(t) = kappa if z(t-1) = kappa
Agent periodというのは、contingentなインタラクションが発生しうるタイミング、つまりこの区間内に応答があればその時のインタラクションはcontingentであることを意味する。Background periodというのはそれ以外、つまりタイミングを逸し、contingentではない反応と見なされてしまう区間を意味する。 if u2 (t) = 1 then z (t) = 1
if u2 (t) = 0 then z (t) = z (t-1) +1, although z (t) = kappa if z (t-1) = kappa
Agent period means the timing at which a contingent interaction can occur, that is, if there is a response within this interval, the interaction at that time is contingent. The background period means the other period, that is, the period that misses the timing and is regarded as a reaction that is not contingent.

以下、定義される４変数に基づき行われるベイズ推定の肝となる発想は、この二つの区間それぞれにおける応答履歴を比較して、前者（Agent period）における反応が統計的に大きいほど、外界に人間（この場合はロボット）がいると信じる確信度が高まるように、二項分布におけるベイズ推定の枠組みで定式化するところにある。 In the following, the basic idea of Bayesian estimation based on the four defined variables is to compare the response histories in each of these two intervals, and as the response in the former (Agent period) is statistically larger, (In this case, the robot) is formulated in the framework of Bayesian estimation in the binomial distribution so as to increase the certainty of believing.

４変数は、以下のように定義する。 The four variables are defined as follows.

tBG ：その時点までに経験したBackground periodの総数
tAG ：その時点までに経験したAgent periodの総数
sBG ：tBGの内，センサ1（y2=1もしくはノイズN=1）を記録した総数
sAG ：tAGの内，センサ1（y2=1もしくはノイズN=1）を記録した総数
実際には，y2=u1 u2=u1として考える。（心の理論的なアプローチ）
以上の履歴変数組及び，ノイズ分布にベータ分布を仮定することにより，任意の事前分布（ベータ関数のパラメータに関する事前分布）を元に、時々刻々得られていくデータ（上記５変数）から、インタラクション対象の確信度が確率値p2(t)の形で計算できる。この計算過程に用いる技術は先の実施の形態と同様に、ベイズ推定、とくに自然共役分布の性質の利用である。 tBG: Total number of background periods experienced so far
tAG: Total number of Agent periods experienced so far
sBG: Total number of tBG records of sensor 1 (y2 = 1 or noise N = 1)
sAG: Total number of tAG sensors 1 (y2 = 1 or noise N = 1). Actually, consider y2 = u1 u2 = u1. (Theoretical approach of mind)
By assuming the beta distribution for the above history variable group and noise distribution, interaction is obtained from the data (the above five variables) obtained from time to time based on an arbitrary prior distribution (prior distribution related to the parameters of the beta function). The certainty of the object can be calculated in the form of a probability value p2 (t). As in the previous embodiment, the technique used in this calculation process is Bayesian estimation, in particular, the use of the nature of natural conjugate distribution.

ここで、この実施の形態では、インタラクション対象にとって好ましい状況になるようにふるまう制御器４３設定するにあたり、好ましさの尺度として、以下の二通りの方法が考えられる。 Here, in this embodiment, in setting the controller 43 to behave so as to be in a favorable situation for the interaction target, the following two methods can be considered as measures of preference.

尺度１：確率値p2をそのまま用いる方法。 Scale 1: A method using the probability value p2 as it is.

この場合、確率値が高まれば高まるほど好ましい、つまりインタラクション対象が目の前に（彼から見た）インタラクション対象（人間やロボット）の存在を確信するほど、彼にとって好ましい、という尺度になる。 In this case, the higher the probability value is, the more preferable it is. That is, it is a measure that the interaction object is more preferable to him as the certainty of the interaction object (human or robot) in front of him (as seen from him) is certain.

尺度２：確率値p2から定義されるエントロピp2*log(p2)+(1-p2)*log(1-p2)を用いる方法。 Scale 2: A method using entropy p2 * log (p2) + (1-p2) * log (1-p2) defined from the probability value p2.

この場合、以下の参考文献に記されているように情報量最大化規範と等価になるが、本発明におけるこの意味は、インタラクション対象の獲得情報量が最大化となるほど好ましいという尺度になる。下記参考文献では，別の問題設定であるが，情報量最大化規範に基づく制御器設定を行った場合、その制御器の挙動が人間に近いものとなることが報告されている。 In this case, as described in the following reference, it is equivalent to the information amount maximization norm, but this meaning in the present invention is a measure that is preferable as the acquired information amount of the interaction target is maximized. In the following reference, although it is another problem setting, when the controller setting based on the information amount maximization norm is performed, it is reported that the behavior of the controller becomes close to human.

参考文献：” An Infomax Controller for Real-Time Detection of Social Contingency”, Javier R. Movellan, Proc. of the 4th IEEE International Conference on Development and Learning 2005
ここで挙げた尺度の何れかを報酬値とみると，既存の強化学習法（例えばQ-learning）により、将来にわたっての期待獲得報酬値を最大化する制御器が獲得できる。この方法は計算機シミュレーションで行うことが可能であり、最終的に得られた最適制御器をロボット装置４０の制御器４３にコピーすれば、目指していたものが実現できることになる。 Reference: “An Infomax Controller for Real-Time Detection of Social Contingency”, Javier R. Movellan, Proc. Of the 4th IEEE International Conference on Development and Learning 2005
When any one of the scales listed here is regarded as a reward value, a controller that maximizes the expected earned reward value in the future can be obtained by an existing reinforcement learning method (for example, Q-learning). This method can be performed by computer simulation, and if the optimum controller finally obtained is copied to the controller 43 of the robot apparatus 40, the intended one can be realized.

強化学習法を行うためには，報酬値に加えて状態を定義する必要があるが、この場合、状態とは上で定義した5変数の組に他ならない。ロボットの行動値u1をランダム選択により定め、多数の計算機シミュレーションをQ-learningにより行えば、獲得されたQ値に基づく最適制御器が得られる。 In order to perform reinforcement learning, it is necessary to define the state in addition to the reward value. In this case, the state is nothing but the set of five variables defined above. If the robot behavior value u1 is determined by random selection and a number of computer simulations are performed by Q-learning, an optimal controller based on the acquired Q value can be obtained.

以上により、インタラクション対象にとって好ましくふるまうロボット装置４０の制御器４３を実現することができる。 As described above, the controller 43 of the robot apparatus 40 that behaves favorably for the interaction target can be realized.

本発明は添付図面に関連した上記の実施態様に画定されるわけではない。さらに、添付請求項の精神又は範囲から逸脱することなく、様々な変更や置換を行ったり、その均等物を作ったりすることもできることは当業者に理解されている。 The invention is not limited to the embodiments described above with reference to the accompanying drawings. Further, those skilled in the art will recognize that various changes and substitutions and equivalents can be made without departing from the spirit or scope of the appended claims.

１１付録Ｉ：モデルの要約
パラメータ： 11 Appendix I: Model Summary Parameters:

静的ランダム変数： Static random variables:

確率過程：
ｔ＝１,２,．．．の場合について、以下の過程が定義される Stochastic process:
t = 1, 2,. . . The following process is defined for

確率的制約：
図４は、モデルに含まれる異なる変数の結合分布におけるマルコフ制約を示したものである。 Stochastic constraints:
FIG. 4 shows Markov constraints in the joint distribution of different variables included in the model.

１２付録ＩＩ：ベルマンの最適性式の解
この付録は、３．４節に提示した表記法と構成要素を前提としたものである。あらゆるｔ′∈［ｔ＋１，Ｔ］ついて、Ｓ_ｔ′＝^ｄｅｆ（Ｏ_ｔ′，Ｑ_ｔ′，Ｚ_ｔ′）とし、さらに、（ｙ_１：ｔ′，ｕ_１：ｔ，ｏ_ｔ′，ｑ_ｔ′，ｚ_ｔ′）は（Ｙ_１：ｔ′，Ｕ_１：ｔ，Ｏ_ｔ′，Ｑ_ｔ′，Ｚ_ｔ′）からの固定任意標本であるものとする。最初に、ｔ′＞ｔについて、以下の特性が満足されることを示す。 12 Appendix II: Solution of Bellman's Optimality Formula This appendix assumes the notation and components presented in Section 3.4. For every t′ε [t + 1, T], let S _{t ′} = ^def (O _{t ′} , Q _{t ′} , Z _{t ′} ), and (y _{1: t ′} , u _{1: t} , o _{t ′} , q _{Let t '} , zt _' ) be a fixed arbitrary sample from (Y1 _{: t '} , U1 _{: t} , _Ot' , _{Qt '} , _Zt' ). First, we show that the following characteristics are satisfied for t ′> t.

（１）Ｓ_ｔ′＝ｇ_ｔ′（Ｓ_ｔ′−１，Ｙ_ｔ′，Ｕ_ｔ′）となるような関数ｇ_ｔ′が存在する。 (1) There exists a function g _{t ′} such that S _{t ′} = g _{t ′} (S _t′−1 , Y _{t ′} , U _{t ′} ).

（２）ｐ（ｙ_ｔ′＋１，｜ｙ_１：ｔ′，ｕ_{１：ｔ′＋１}）＝ｐ（ｙ_ｔ′＋１，｜ｓ_ｔ′，ｕ_{１：ｔ′＋１}）
（３）Ｗ_ｔ′∈σ｛Ｓ_ｔ′｝
特性（１）はＳ_ｔ′を回帰法によりＳ_{ｔ′−１′}とＹ_ｔ′、Ｕ_ｔ′から計算できることをただ単に述べているだけであり、Ｓ_ｔ′の定義を前提とすれば、これは明らかに真である。特性（２）に関しては、 (2) p (y _{t ′ + 1} , | y _{1: t ′} , u _{1: t ′ + 1} ) = p (y _{t ′ + 1} , | s _{t ′} , u _{1: t ′ + 1} )
(3) W _{t ′} ∈σ {S _{t ′} }
Characteristics (1) is _{S t} _Y t and _{_'S t'-1} by regression _method' ', _{U t'} merely describes a simply can be calculated from, Assuming the definition of _{S t ',} This is clearly true. Regarding characteristic (2),

に注目されたい。ここに、Ｉ_ｔ′は、Ｚ_ｔ′によりその位置が決定される単一の１を有する３次元２進ベクトルである。従って、Ｅ［Ｒ・Ｉ_ｔ′＋１｜ｏ_ｔ′，ｑ_ｔ′，ｚ_ｔ′，ｕ_ｔ′＋１，ｈ］は、ｚ_ｔ′により精選されるポアソン過程の速度の期待値である。（１５）から、ｉ＝１，２，３について、以下の式が得られ、 Please pay attention to. Here, I _{t ′} is a three-dimensional binary vector having a single 1 whose position is determined by Z _{t ′} . Accordingly, E [R · I _{t ′ + 1} | o _{t ′} , q _{t ′} , z _{t ′} , u _{t ′ + 1} , h] is an expected value of the speed of the Poisson process selected by z _{t ′} . From (15), for i = 1, 2, 3, the following equation is obtained:

従って、（８６）を使用して、以下の式が得られる。 Thus, using (86), the following equation is obtained:

従って、時間ｔ′＋１における予想分布はＳ_ｔ′，Ｕ_ｔ′＋１の関数であり、これから（２）が得られる。特性（３）は（２４）から直接得られたものである。 Therefore, the expected distribution at time t ′ + 1 is a function of S _{t ′} and U _{t ′ + 1} , and (2) is obtained from this. Characteristic (3) is obtained directly from (24).

次に、これらの特性を使用し、最適インフォマックス・コントローラを発見するためのアルゴリズムを導出する。最初に、特性（１）と(３)を結合し、以下の式が得られる。 These characteristics are then used to derive an algorithm for finding the optimal infomax controller. First, the characteristics (1) and (3) are combined to obtain the following equation.

この事実を（１）と組み合わせて使用することにより、以下の式が得られる。 By using this fact in combination with (1), the following equation is obtained.

次に、ｔ′＝Ｔであるものとする。Ｔの後にはリターンが存在しないので、以下の式が得られる。 Next, it is assumed that t ′ = T. Since there is no return after T, the following equation is obtained.

従って、ｔ′＝Ｔである場合について、数列（ｙ_１：ｔ′，ｕ_１：ｔ′）の値をその関連統計値ｓｔ′の関数として計算できるのは明白である。 Thus, for the case where t ′ = T, it is clear that the value of the sequence (y _{1: t ′} , u _{1: t ′} ) can be calculated as a function of its associated statistical value st ′.

ｔ′＝Ｔ−１である場合について、数列（ｙ_１：ｔ′，ｕ_１：ｔ′）のＦ，Ｎ，Ｖ，Ｃを計算する関数Ｆ′，Ｎ′，Ｖ′，Ｃ′をその数列の統計値ｓ_ｔ′に基づき定義する。 For the case where t ′ = T−1, the functions F ′, N ′, V ′, C ′ for calculating F, N, V, C of the sequence (y _{1: t ′} , u _{1: t ′} ) are It is defined on the basis of the statistical value s _{t ′} of the numerical sequence.

同じ論理をｔ′＝Ｔ−２である場合についても使用することができる。 The same logic can be used for the case where t '= T-2.

ここに、ｓ_ｔ′＋１＝^ｄｅｆｇ_ｔ′（ｓ_ｔ′，ｙ_ｔ′＋１，ｕ_ｔ′＋１）である。ステップ（７８）〜（８３）はｔ′＝Ｔ−２，Ｔ−３，・・・ｔである場合について最適に適用することができ、従って、最適コントローラを回復させる。各時間ｔ′∈［ｔ，Ｔ］ついて、コントローラが統計値ｓ_ｔ′の値を行動Ｕ_ｔ′にマッピングすることに注目されたい。 Here, s _{t ′ + 1} = ^def g _{t ′} (s _{t ′} , y _{t ′ + 1} , u _{t ′ + 1} ). Steps (78)-(83) can be optimally applied for the case where t '= T-2, T-3,... T, thus restoring the optimal controller. Note that for each time t′ε [t, T], the controller maps the value of the statistic s _{t ′} to the action U _{t ′} .

１３付録ＩＩＩ：定義
ベータ変数： 13 Appendix III: Definitions Beta variables:

以下の公式は、所望の平均ｍと分散ｓ２を照合するベータ分布のパラメータを提供する。 The following formula provides a parameter for the beta distribution that matches the desired mean m with the variance s2.

ガンマ関数 Gamma function

ガンマ関数は以下の特性を有する。 The gamma function has the following characteristics.

ロジスティック関数 Logistic function

シグマ集合体を所与とした場合の期待値：Ｙは確率空間（Ω，Ｆ，Ｐ）上の整数ランダム変数、すなわち、Ｅ（｜Ｙ｜）∈Ｒであるものとし、σ∈Ｆはシグマ代数であるものとする。Ｙが所与である場合の条件付き期待値は、以下の特性を有するＰ準確実固有ランダム変数である。 Expected value when a sigma aggregate is given: Y is an integer random variable on the probability space (Ω, F, P), ie, E (| Y |) ∈R, and σ∈F is sigma Assume it is an algebra. The conditional expectation for a given Y is a P semi-certain eigenrandom variable with the following characteristics:

Ｅ（Ｙ｜σ）はσ可測である。 E (Y | σ) is σ measurable.

任意のＡ∈σについて、∫_ＡＥ（Ｙ｜σ）ｄＰ＝∫_ＡＹｄＰ。 For any _{A∈σ, ∫ A E (Y |} σ) dP = ∫ A YdP.

Ｅ（Ｙ^２）∈Ｒの場合は、Ｅ（Ｙ｜σ）は、最小二乗の意味におけるＹに最も近いＰ準確実固有σ可測ランダム変数である、すなわち、以下のとおりである。 For E (Y ² ) εR, E (Y | σ) is a P quasi-certain eigen-sigma measurable random variable closest to Y in the least-squares sense, ie:

ＸはＹと同じ確率空間のランダム変数、σ（Ｘ）はＸにより誘導されるシグマ代数であるものとする。Ｘを所与とした場合のＹの期待値は、σ（Ｘ）を所与とした場合のＹの期待値である、すなわち、以下のとおりになる。 Let X be a random variable in the same probability space as Y, and σ (X) be a sigma algebra induced by X. The expected value of Y when X is given is the expected value of Y when σ (X) is given, that is, as follows.

エントロピ： entropy:

条件付きエントロピ： Conditional entropy:

相互情報： Mutual information:

（Ａ）は、非特許文献１９において使用したロボットの頭部の概略図、（Ｂ）は、乳児−９の写真である。ロボットの画像が乳児の背後に置かれた鏡に映っているのが見える。(A) is the schematic of the head of the robot used in the nonpatent literature 19, (B) is a photograph of the infant-9. You can see the robot image in the mirror behind the baby. 必要最小限の機能だけを備えた社会的ロボットの構成を示す図である。It is a figure which shows the structure of the social robot provided only with the required minimum function. タイマとインジケータ変数との力学のグラフ表示である。It is a graphical representation of the dynamics of timers and indicator variables. 生成モデルをグラフ表示である。The generation model is a graph display. （Ａ），（Ｂ）は、モデルにより作られる２つの偶発事象クラスタの図である。(A) and (B) are diagrams of two contingent event clusters created by the model. （Ａ）は、１５０回の試験のラスタ図、（Ｂ）は、音声センサが動作中である確率は時間の関数を示す図である。(A) is a raster diagram of 150 tests, and (B) is a diagram showing the probability that the voice sensor is operating as a function of time. （Ａ）は、乳児をシミュレーションするインフォマックス・コントローラの反応を示す図、（Ｂ）は、時間の関数としての反応するエージェントが存在する場合の事後確率を示す図、（Ｃ）は、４３秒後のエージェント速度と背景速度の事後分布を示す図、（Ｄ）は、エージェントの反応速度に関する不確実性と背景の反応速度に関する不確実性の比率を示す図である。(A) is a diagram showing the response of an infomax controller that simulates an infant, (B) is a diagram showing the posterior probability when there is a reacting agent as a function of time, and (C) is 43 seconds. The figure which shows the posterior distribution of a subsequent agent speed and background speed, (D) is a figure which shows the ratio of the uncertainty regarding the reaction speed of an agent, and the uncertainty regarding the background reaction speed. 本実施態様によるロボット装置の外部形状を示した透視図である。It is the perspective view which showed the external shape of the robot apparatus by this embodiment. ロボット装置についての自由度構成の概略図である。It is the schematic of the freedom degree structure about a robot apparatus. ロボット装置のシステム構成を示した図である。It is the figure which showed the system configuration | structure of the robot apparatus. 本発明に係るロボット装置の他のシステム構成を示した図である。It is the figure which showed the other system structure of the robot apparatus which concerns on this invention. 図１１に示したロボット装置野動作説明に供する図である。It is a figure with which it uses for description of robot apparatus field operation | movement shown in FIG.

Explanation of symbols

３０，４０ロボット装置、４１音声入力器、４２音声出力器、４３制御器、２５１画像入力装置、２５２音声入力装置、２５３音声出力装置、２５４通信インタフェース、２１１ＣＰＵ、２１２ＲＡＭ、２１３ＲＯＭ、２１４外部記憶装置、２０１バス・インタフェース、３１１ＣＰＵ、３１２ＲＡＭ、３１３ＲＯＭ、３１４外部記憶装置、３０１バス・インタフェース 30, 40 Robot device, 41 Audio input device, 42 Audio output device, 43 Controller, 251 Image input device, 252 Audio input device, 253 Audio output device, 254 Communication interface, 211 CPU, 212 RAM, 213 ROM, 214 External Storage device, 201 bus interface, 311 CPU, 312 RAM, 313 ROM, 314 External storage device, 301 bus interface

Claims

An interaction device that sets a self-controller to maximize the expectation of information defined between hypotheses about the interaction object and self-input / output.

The interaction device according to claim 1, wherein the hypothesis is whether or not an interaction target exists.

The interaction device according to claim 1, wherein the interaction target is a user.

2. The interaction device according to claim 1, wherein the input / output is an audio microphone input / loud speaker output.

2. The interaction device according to claim 1, wherein the interaction device includes an expression mechanism and outputs an expression together with an a posteriori probability of whether an interaction target exists.

The interaction device according to claim 1, wherein the expression mechanism is a mimic motion output mechanism.

An interaction apparatus comprising: a control unit that outputs an action at a timing at which an expected acquisition information amount for the presence of an interaction object is maximized based on input / output information.

8. The interaction apparatus according to claim 7, further comprising an audio microphone and a loudspeaker, wherein the input / output information is audio information.