JP2007125630A

JP2007125630A - Robot device and motion control method

Info

Publication number: JP2007125630A
Application number: JP2005318851A
Authority: JP
Inventors: Chernova Sonia; シェルノーバソニア; Craig Ronald Arkin; クレッグアーキンロナルド; Kazumi Aoyama; 一美青山; Tsutomu Sawada; 務澤田
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2005-11-01
Filing date: 2005-11-01
Publication date: 2007-05-24

Abstract

<P>PROBLEM TO BE SOLVED: To provide a robot device and a motion control method capable of acquiring a series of repeatedly performed motions as routine work to be executable without planning. <P>SOLUTION: When a series of motions including element motions A, B and C is repeated performed, a schemer B learns a schemer A as a trigger schemer, a schemer C learns the schemer B as a trigger schemer and the series of motions is acquired as the routine work. After that, when the schemer A is performed, the schemer B adds a routine bias RSE to its own motion value. At the end of the schemer A, since the motion value of the schemer B to which the routine bias RSE is added is largest at that time, the schemer B is performed after the schemer A. Similarly the schemer C is performed after the schemer B. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、外部刺激や自己の内部状態に応じて自律的に行動可能なロボット装置及びその行動制御方法に関する。 The present invention relates to a robot apparatus that can act autonomously in response to an external stimulus or its internal state, and a behavior control method thereof.

電気的又は磁気的な作用を用いて人間（生物）の動作に似た運動を行う機械装置を「ロボット」という。我が国においてロボットが普及し始めたのは、１９６０年代末からであるが、その多くは、工場における生産作業の自動化・無人化等を目的としたマニピュレータや搬送ロボット等の産業用ロボット（Industrial Robot）であった。 A mechanical device that performs an action similar to that of a human (living body) using an electrical or magnetic action is called a “robot”. Robots have begun to spread in Japan since the late 1960s, but many of them are industrial robots such as manipulators and transfer robots for the purpose of automating and unmanned production work in factories. Met.

最近では、人間のパートナーとして生活を支援する、すなわち住環境その他の日常生活上の様々な場面における人的活動を支援する実用ロボットの開発が進められている。このような実用ロボットは、産業用ロボットとは異なり、人間の生活環境の様々な局面において、個々に個性の相違した人間、又は様々な環境への適応方法を自ら学習する能力を備えている。例えば、犬、猫のように４足歩行の動物の身体メカニズムやその動作を模した「ペット型」ロボット、或いは、２足直立歩行を行う人間等の身体メカニズムや動作をモデルにしてデザインされた「人間型」又は「人間形」ロボット（Humanoid Robot）等のロボット装置は、既に実用化されつつある。 Recently, practical robots that support life as a human partner, that is, support human activities in various situations in daily life such as the living environment, have been developed. Unlike industrial robots, such practical robots have the ability to learn how to adapt themselves to humans with different personalities or to various environments in various aspects of the human living environment. For example, it was designed based on the body mechanism and motion of a “pet-type” robot that imitates the body mechanism and movement of a quadruped animal such as a dog or cat, or a human who walks upright on two legs. Robotic devices such as “humanoid” or “humanoid” robots are already in practical use.

これらのロボット装置は、産業用ロボットと比較して、エンターテインメント性を重視した様々な動作を行うことができるため、エンターテインメントロボットと称される場合もある。また、そのようなロボット装置には、外部刺激や自己の内部状態に応じて自律的に行動可能なものがある。 Since these robot devices can perform various operations with an emphasis on entertainment performance as compared with industrial robots, they may be referred to as entertainment robots. In addition, there are robots that can act autonomously according to external stimuli or their internal state.

例えば、特許文献１に記載されたロボット装置は、視覚や聴覚などのセンサ入力を外部刺激の情報とすると共に、本能や感情などの内部状態モデルから得られる情報を内部状態の情報とし、これらの情報に応じて自律的に行動選択を行っている。 For example, the robot apparatus described in Patent Document 1 uses sensor inputs such as vision and hearing as information of external stimuli, and information obtained from an internal state model such as instinct and emotion as internal state information. Autonomous action selection is performed according to information.

この特許文献１のように、ロボット装置の内部にある価値基準によって内外の状況判断を行い、自律的に行動選択を行うと、その行動形態は創発的なものとなり、より複雑な行動を発現することが可能となる。その反面、状況判断基準がロボット装置内部で閉じているため、ロボット装置がどのような計画に基づいて一連の行動を発現しているのかが、第三者的な立場にあるユーザからは分かりづらくなる場合がある。 As in this patent document 1, when an internal / external situation is determined based on a value standard inside the robot apparatus and an action is selected autonomously, the action form becomes emergent and expresses a more complicated action. It becomes possible. On the other hand, because the situation criteria are closed inside the robotic device, it is difficult for a third-party user to understand what kind of plan the robotic device expresses a series of actions. There is a case.

そこで、特許文献２には、各要素行動が記述された行動記述モジュール（スキーマ）毎に実行優先度を表す行動価値を外部刺激及び／又は内部状態に基づいて計算し、その行動価値の大きさに基づいて一又は複数の行動記述モジュールを選択して行動を発現するようなロボット装置において、一連の行動記述モジュールの行動価値を所定の順序に従って強制的に引き上げ、一連の行動を発現させる技術が提案されている。この特許文献２記載の技術によれば、ロボット装置がある計画に基づいて、すなわちある意図に基づいて行動しているように見せることができる。 Therefore, Patent Document 2 calculates an action value representing an execution priority for each action description module (schema) in which each element action is described based on an external stimulus and / or an internal state, and the magnitude of the action value. In a robot apparatus that selects one or a plurality of action description modules based on the behavior and expresses the action, a technique for forcibly raising the action value of the series of action description modules according to a predetermined order and expressing the series of actions Proposed. According to the technique described in Patent Document 2, it is possible to make the robot apparatus appear to be acting based on a certain plan, that is, based on a certain intention.

特開２００２−２１０６８１号公報JP 2002-210681 A 特開２００４−２３７３９１号公報JP 2004-237391 A D. A. Norman and T. Shallice,“Consciousness and Self Regulation: Advances in Theory and Research”, Academic Press, 1986, Vol.4, ch. Attention to action: Willed and automatic control of behavior, pp.515-549D. A. Norman and T. Shallice, “Consciousness and Self Regulation: Advances in Theory and Research”, Academic Press, 1986, Vol. 4, ch. Attention to action: Willed and automatic control of behavior, pp.515-549

ところで、認知科学の分野ではContention Schedulingという用語が知られている（非特許文献１を参照）。これは、１つ１つの行動を計画しながら一連の行動を実行していたものが、何度も繰り返すうちに計画することなく一連の行動を実行できるようになることを表す。例えば、新入社員が会社への行き方を計画し、どの駅で乗り換え、どこで切符を購入する、といった一連の行動を毎日実行しているうちに、計画を立てなくても会社に行けるようになることも、このContention Schedulingの一例である。 By the way, the term Contention Scheduling is known in the field of cognitive science (see Non-Patent Document 1). This means that a series of actions executed while planning each action can be executed without planning over and over again. For example, new employees plan to go to the company, perform a series of actions every day, such as where to change trains and where to buy tickets, so that they can go to the company without planning. Is also an example of this Contention Scheduling.

上述した自律型のロボット装置においても、ある計画に基づいた一連の行動を繰り返し発現するうちに、計画することなくその一連の行動を発現できるようになることは、計画の計算負荷を軽減する上でも意義のあることである。 Even in the above-described autonomous robot apparatus, while a series of actions based on a certain plan are repeatedly expressed, the series of actions can be expressed without planning, thereby reducing the calculation load of the plan. But it is meaningful.

しかしながら、上述した特許文献２を含め、繰り返し発現した一連の行動をルーチンワークとして獲得し、計画せずに実行可能とする技術は未だ提案されていないのが現状であった。 However, the present situation has not yet been proposed, including the above-described Patent Document 2, in which a series of actions that are repeatedly expressed are acquired as routine work and can be executed without planning.

本発明は、このような従来の実情に鑑みて提案されたものであり、繰り返し行った一連の行動をルーチンワークとして獲得し、計画せずに実行可能とするロボット装置及びその行動制御方法を提供することを目的とする。 The present invention has been proposed in view of such a conventional situation, and provides a robot apparatus that acquires a series of repeated actions as a routine work and that can be executed without planning, and an action control method thereof. The purpose is to do.

上述した目的を達成するために、本発明に係るロボット装置は、外部刺激及び／又は内部状態に応じて自律的に行動可能なロボット装置において、それぞれ所定の要素行動が記述され、外部刺激及び／又は内部状態に応じて自身の要素行動の実行優先度を表す行動価値を算出する複数の行動記述モジュールと、各行動記述モジュールの実行優先度の大きさに基づいて一又は複数の行動記述モジュールを選択し、選択した行動記述モジュールに記述された要素行動を発現させる行動選択手段とを備え、上記各行動記述モジュールは、自身の要素行動と特定の要素行動との組み合わせが学習されたものである場合、当該特定の要素行動が発現されると、自身の行動価値に第１のバイアス値を加えることを特徴とする。 In order to achieve the above-described object, the robot apparatus according to the present invention is a robot apparatus capable of acting autonomously in response to an external stimulus and / or an internal state. Or, a plurality of action description modules that calculate an action value representing the execution priority of their own element action according to the internal state, and one or more action description modules based on the magnitude of the execution priority of each action description module A behavior selecting means for selecting and expressing the element behavior described in the selected behavior description module, each behavior description module having learned a combination of its own element behavior and a specific element behavior. In this case, when the specific element behavior is expressed, the first bias value is added to the own behavior value.

また、上述した目的を達成するために、本発明に係るロボット装置の行動制御方法は、外部刺激及び／又は内部状態に応じて自律的に行動可能なロボット装置の行動制御方法において、それぞれ所定の要素行動が記述され、外部刺激及び／又は内部状態に応じて自身の要素行動の実行優先度を表す行動価値を算出する複数の行動記述モジュールから、実行優先度の大きさに基づいて一又は複数の行動記述モジュールを選択する行動選択工程と、上記行動選択工程にて選択された行動記述モジュールに記述された要素行動を発現する行動発現工程とを有し、上記各行動記述モジュールは、自身の要素行動と特定の要素行動との組み合わせが学習されたものである場合、当該特定の要素行動が発現されると、自身の行動価値に第１のバイアス値を加えることを特徴とする。 In order to achieve the above-described object, a behavior control method for a robot apparatus according to the present invention is a robot apparatus behavior control method capable of autonomously acting according to an external stimulus and / or an internal state. One or more based on the magnitude of execution priority from a plurality of action description modules in which element behavior is described, and an action value representing the execution priority of its own element action according to an external stimulus and / or internal state is calculated. A behavior selection step of selecting the behavior description module, and a behavior expression step of expressing the element behavior described in the behavior description module selected in the behavior selection step. When a combination of an elemental action and a specific elemental action has been learned, when the specific elemental action is expressed, the first bias value is set to its own action value. Characterized in that it obtain.

本発明に係るロボット装置及びその行動制御方法によれば、各行動記述モジュールは、自身の要素行動と特定の要素行動との組み合わせが学習されたものである場合、当該特定の要素行動が発現されると、自身の行動価値に第１のバイアス値を加えるため、当該特定の要素行動の次には自身の要素行動が発現されることとなる。したがって、同様のことを繰り返すことで、計画することなく一連の要素行動を発現させることができる。 According to the robot apparatus and the action control method thereof according to the present invention, each action description module expresses the specific element action when the combination of the element action and the specific element action is learned. Then, in order to add the first bias value to the own action value, the own element action is expressed next to the specific element action. Therefore, by repeating the same thing, a series of element behavior can be expressed without planning.

以下、本発明を適用した具体的な実施の形態について、図面を参照しながら詳細に説明する。この実施の形態は、本発明を、人間を模した外観形状とされ、外部刺激や自己の内部状態に応じて自律的に行動可能なロボット装置に適用したものである。以下では、先ずロボット装置の構成について説明し、次にロボット装置の行動制御システムについて説明し、最後にロボット装置の行動制御に関する具体的な実験例について説明する。 Hereinafter, specific embodiments to which the present invention is applied will be described in detail with reference to the drawings. In this embodiment, the present invention is applied to a robot apparatus having an external shape imitating a human being and capable of acting autonomously according to an external stimulus or its own internal state. In the following, first, the configuration of the robot apparatus will be described, then the action control system of the robot apparatus will be described, and finally, a specific experimental example regarding the action control of the robot apparatus will be described.

（１）ロボット装置の構成
先ず、本実施の形態におけるロボット装置の外観構成を図１に示す。図１に示すように、ロボット装置１は、体幹部ユニット２の所定の位置に頭部ユニット３が連結されると共に、左右２つの腕部ユニット４Ｒ／Ｌと、左右２つの脚部ユニット５Ｒ／Ｌとが連結されて構成されている。但し、Ｒ及びＬの各々は、右及び左の各々を示す接尾辞である。 (1) Configuration of Robot Device First, the external configuration of the robot device according to the present embodiment is shown in FIG. As shown in FIG. 1, the robot apparatus 1 includes a head unit 3 connected to a predetermined position of the trunk unit 2, and two left and right arm units 4R / L and two right and left leg units 5R /. L is connected. However, each of R and L is a suffix indicating each of right and left.

このロボット装置１の機能構成を図２に模式的に示す。図２に示すように、ロボット装置１は、全体の動作の統括的制御及びその他のデータ処理を行う制御ユニット２０と、入出力部４０と、駆動部５０と、電源部６０とで構成される。 A functional configuration of the robot apparatus 1 is schematically shown in FIG. As shown in FIG. 2, the robot apparatus 1 includes a control unit 20 that performs overall control of the entire operation and other data processing, an input / output unit 40, a drive unit 50, and a power supply unit 60. .

入出力部４０は、入力部として、人間の「目」に相当し外部の状況を撮影するＣＣＤ（Charge Coupled Device）カメラ４１、人間の「耳」に相当するマイクロフォン４２や、頭部や背中等の部位に配設され、所定の押圧を受けるとこれを電気的に検出することでユーザの接触を感知するタッチセンサ４４、前方に位置する物体までの距離を測定するための距離センサ、五感に相当するその他の各種センサを装備している。また、入出力部４０は、出力部として、頭部ユニット３に備えられ、人間の「口」に相当するスピーカ４３、人間の目の位置に設けられ、視覚認識状態等を表現するＬＥＤ（Light Emission Diode）４５等を装備している。これらの出力部は、音声やＬＥＤ４５の点滅など、腕部ユニット４Ｒ／Ｌや脚部ユニット５Ｒ／Ｌ等を用いたモーション以外の形式によっても、ロボット装置１からのユーザ・フィードバックを実現することができる。 The input / output unit 40 serves as an input unit such as a CCD (Charge Coupled Device) camera 41 that captures an external situation corresponding to a human “eye”, a microphone 42 that corresponds to a human “ear”, a head, a back, and the like. The touch sensor 44 that detects a user's contact by electrically detecting a predetermined pressure when receiving a predetermined pressure, a distance sensor for measuring the distance to an object located in front, Equivalent other various sensors are equipped. The input / output unit 40 is provided in the head unit 3 as an output unit, and is provided at the position of a speaker 43 corresponding to a human “mouth”, the human eye, and an LED (Light Emission Diode) 45 etc. are equipped. These output units can realize user feedback from the robot apparatus 1 by a format other than the motion using the arm unit 4R / L, the leg unit 5R / L, or the like, such as voice or blinking of the LED 45. it can.

例えば、頭部ユニット３の頭頂部の所定箇所に複数のタッチセンサ４４を設け、各タッチセンサ４４における接触検出を複合的に活用することで、例えばロボット装置１の頭部を「撫でる」、「叩く」、「軽く叩く」等のユーザからの働きかけを検出することができる。具体的には、例えばタッチセンサ４４のうちの幾つかが所定時間をおいて順次接触したことを検出した場合、これを「撫でられた」と判別し、短時間のうちに接触を検出した場合、これを「叩かれた」と判別するなど場合分けすることができる。ロボット装置１は、この検出結果に応じて内部状態を変化させ、この内部状態の変化を上述の出力部等により表現することができる。 For example, by providing a plurality of touch sensors 44 at predetermined locations on the top of the head unit 3 and utilizing the contact detection of each touch sensor 44 in combination, for example, the head of the robot apparatus 1 can be “boiled”, “ It is possible to detect an action from the user such as “tapping” or “tapping lightly”. Specifically, for example, when it is detected that some of the touch sensors 44 have sequentially contacted after a predetermined time, it is determined that “boiled” and contact is detected within a short time This can be divided into cases such as discriminating it as “struck”. The robot apparatus 1 can change the internal state according to the detection result, and can express the change of the internal state by the above-described output unit or the like.

駆動部５０は、制御ユニット２０が指令する所定の運動パターンに従ってロボット装置１の機体動作を実現する機能ブロックであり、行動制御による制御対象である。駆動部５０は、ロボット装置１の各関節における自由度を実現するための機能モジュールであり、それぞれの関節におけるロール軸、ピッチ軸、ヨー軸等の各軸毎に設けられた複数の駆動ユニット５４_１〜５４_ｎで構成される。各駆動ユニット５４_１〜５４_ｎは、所定軸回りの回転動作を行うモータ５１_１〜５１_ｎと、モータ５１_１〜５１_ｎの回転位置を検出するエンコーダ５２_１〜５２_ｎと、エンコーダ５２_１〜５２_ｎの出力に基づいてモータ５１_１〜５１_ｎの回転位置や回転速度を適応的に制御するドライバ５３_１〜５３_ｎとの組み合わせで構成される。 The drive unit 50 is a functional block that realizes the body operation of the robot apparatus 1 in accordance with a predetermined motion pattern commanded by the control unit 20, and is a control target by behavior control. The drive unit 50 is a functional module for realizing a degree of freedom in each joint of the robot apparatus 1, and a plurality of drive units 54 provided for each axis such as a roll axis, a pitch axis, and a yaw axis in each joint. consisting of _{1 ~54} _n. Each drive unit ₅₄ 1 through 54 _n includes a motor ₅₁ 1 to 51 _n to perform rotation of the predetermined axis, and the encoder ₅₂ 1 to 52 _n for detecting a rotational position of the motor ₅₁ 1 to 51 _n, encoders ₅₂ 1 to The motors 51 _{1 to} 51 _n are configured in combination with drivers 53 ₁ to 53 _n that adaptively control the rotational positions and rotational speeds of the motors 51 _{1 to} 51 _n based on the output of 52 _n .

電源部６０は、その字義通り、ロボット装置１内の各電気回路等に対して給電を行う機能モジュールである。本実施の形態におけるロボット装置１は、バッテリを用いた自律駆動式であり、電源部６０は、充電バッテリ６１と、充電バッテリ６１の充放電状態を管理する充放電制御部６２とで構成される。 The power supply unit 60 is a functional module that supplies power to each electric circuit or the like in the robot apparatus 1 as its meaning. The robot apparatus 1 according to the present embodiment is an autonomous drive type using a battery, and the power supply unit 60 includes a charge battery 61 and a charge / discharge control unit 62 that manages the charge / discharge state of the charge battery 61. .

充電バッテリ６１は、例えば、複数本のリチウムイオン２次電池セルをカートリッジ式にパッケージ化した「バッテリ・パック」の形態で構成される。 The rechargeable battery 61 is configured, for example, in the form of a “battery pack” in which a plurality of lithium ion secondary battery cells are packaged in a cartridge type.

また、充放電制御部６２は、充電バッテリ６１の端子電圧や充電／放電電流量、充電バッテリ６１の周囲温度等を測定することで充電バッテリ６１の残存容量を把握し、充電の開始時期や終了時期等を決定する。充放電制御部６２が決定する充電の開始及び終了時期は制御ユニット２０に通知され、ロボット装置１が充電オペレーションを開始及び終了するためのトリガとなる。 Further, the charge / discharge control unit 62 grasps the remaining capacity of the charge battery 61 by measuring the terminal voltage of the charge battery 61, the amount of charge / discharge current, the ambient temperature of the charge battery 61, and the like, and the start timing and end of charge. Determine the timing. The charging start / end timing determined by the charge / discharge control unit 62 is notified to the control unit 20 and serves as a trigger for the robot apparatus 1 to start and end the charging operation.

制御ユニット２０は、人間の「頭脳」に相当し、例えばロボット装置１の頭部ユニット３或いは体幹部ユニット２に搭載されている。 The control unit 20 corresponds to a human “brain”, and is mounted on the head unit 3 or the trunk unit 2 of the robot apparatus 1, for example.

この制御ユニット２０の内部構成を図３に示す。図３に示すように、制御ユニット２０は、メイン・コントローラとしてのＣＰＵ（Central Processing Unit）２１が、メモリ及びその他の各回路コンポーネントや周辺機器とバス接続された構成となっている。バス２８は、データ・バス、アドレス・バス、コントロール・バス等を含む共通信号伝送路である。バス２８上の各装置にはそれぞれに固有のアドレス（メモリ・アドレス又はＩ／Ｏアドレス）が割り当てられている。ＣＰＵ２１は、アドレスを指定することによってバス２８上の特定の装置と通信することができる。 The internal configuration of the control unit 20 is shown in FIG. As shown in FIG. 3, the control unit 20 has a configuration in which a CPU (Central Processing Unit) 21 as a main controller is connected to a memory and other circuit components and peripheral devices by a bus. The bus 28 is a common signal transmission path including a data bus, an address bus, a control bus, and the like. Each device on the bus 28 is assigned a unique address (memory address or I / O address). The CPU 21 can communicate with a specific device on the bus 28 by specifying an address.

ＲＡＭ（Random Access Memory）２２は、ＤＲＡＭ（Dynamic RAM）等の揮発性メモリで構成された書き込み可能なメモリであり、ＣＰＵ２１が実行するプログラム・コードをロードしたり、実行プログラムによる作業データの一時的に保存したりするために使用される。 A RAM (Random Access Memory) 22 is a writable memory composed of a volatile memory such as a DRAM (Dynamic RAM), and loads a program code executed by the CPU 21 or temporarily stores work data by the execution program. Used to save on.

ＲＯＭ（Read Only Memory）２３は、プログラムやデータを恒久的に格納する読み出し専用メモリである。ＲＯＭ２３に格納されるプログラム・コードとしては、ロボット装置１の電源投入時に実行する自己診断テスト・プログラムや、ロボット装置１の動作を規定する動作制御プログラム等が挙げられる。なお、ロボット装置１の制御プログラムには、ＣＣＤカメラ４１やマイクロフォン４２等のセンサ入力を処理してシンボルとして認識する「センサ入力・認識処理プログラム」、短期記憶や長期記憶等の記憶動作（後述）を司りながらセンサ入力と所定の行動制御モデルとに基づいてロボット装置１の行動を制御する「行動制御プログラム」、行動制御モデルに従って各関節モータの駆動やスピーカ４３の音声出力等を制御する「駆動制御プログラム」等が含まれる。 A ROM (Read Only Memory) 23 is a read only memory for permanently storing programs and data. Examples of the program code stored in the ROM 23 include a self-diagnosis test program that is executed when the robot apparatus 1 is powered on, and an operation control program that defines the operation of the robot apparatus 1. The control program for the robot apparatus 1 includes a “sensor input / recognition processing program” that processes sensor inputs from the CCD camera 41, the microphone 42, etc., and recognizes them as symbols, and storage operations such as short-term memory and long-term memory (described later). A “behavior control program” for controlling the behavior of the robot apparatus 1 based on the sensor input and a predetermined behavior control model while controlling the driving of each joint motor and the sound output of the speaker 43 according to the behavior control model. Control program "and the like.

不揮発性メモリ２４は、例えばＥＥＰＲＯＭ（Electrically Erasable and Programmable ROM）のように電気的に消去再書き込みが可能なメモリ素子で構成され、逐次更新すべきデータを不揮発的に保持するために使用される。逐次更新すべきデータとしては、暗号鍵やその他のセキュリティ情報、出荷後にインストールすべき装置制御プログラム等が挙げられる。 The nonvolatile memory 24 is composed of a memory element that can be erased and rewritten electrically, such as an EEPROM (Electrically Erasable and Programmable ROM), and is used to hold data to be sequentially updated in a nonvolatile manner. Data to be updated sequentially includes an encryption key, other security information, a device control program to be installed after shipment, and the like.

インターフェース２５は、制御ユニット２０の外部の機器と相互接続し、データ交換を可能にするための装置である。インターフェース２５は、例えば、ＣＣＤカメラ４１、マイクロフォン４２、又はスピーカ４３等との間でデータ入出力を行う。また、インターフェース２５は、駆動部５０内の各ドライバ５３_１〜５３_ｎとの間でデータやコマンドの入出力を行う。 The interface 25 is a device for interconnecting with devices outside the control unit 20 and enabling data exchange. The interface 25 performs data input / output with the CCD camera 41, the microphone 42, the speaker 43, or the like, for example. The interface 25 inputs / outputs data and commands to / from the drivers 53 _{1 to} 53 _n in the driving unit 50.

また、インターフェース２５は、ＲＳ（Recommended Standard）−２３２Ｃ等のシリアル・インターフェース、ＩＥＥＥ（Institute of Electrical and electronics Engineers）１２８４等のパラレル・インターフェース、ＵＳＢ（Universal Serial Bus）インターフェース、ｉ−Ｌｉｎｋ（ＩＥＥＥ１３９４）インターフェース、ＳＣＳＩ（Small Computer System Interface）インターフェース、ＰＣカードやメモリカードを受容するメモリカード・インターフェース（カード・スロット）等のような、コンピュータの周辺機器接続用の汎用インターフェースを備え、ローカル接続された外部機器との間でプログラムやデータの移動を行うようにしてもよい。 The interface 25 includes a serial interface such as RS (Recommended Standard) -232C, a parallel interface such as IEEE (Institute of Electrical and electronics Engineers) 1284, a USB (Universal Serial Bus) interface, and an i-Link (IEEE 1394) interface. Externally connected devices with general-purpose interfaces for connecting peripheral devices of computers, such as SCSI (Small Computer System Interface) interfaces, memory card interfaces (card slots) that accept PC cards and memory cards, etc. You may make it move a program and data between.

また、インターフェース２５の他の例として、赤外線通信（ＩｒＤＡ）インターフェースを備え、外部機器と無線通信を行うようにしてもよい。 As another example of the interface 25, an infrared communication (IrDA) interface may be provided to perform wireless communication with an external device.

さらに、制御ユニット２０は、無線通信インターフェース２６やネットワーク・インターフェース・カード（ＮＩＣ）２７等を含み、Ｂｌｕｅｔｏｏｔｈ（登録商標）のような近接無線データ通信や、ＩＥＥＥ８０２．１１ｂのような無線ネットワーク、或いはインターネット等の広域ネットワークを経由して、外部の様々なホスト・コンピュータとデータ通信を行うことができる。 Further, the control unit 20 includes a wireless communication interface 26, a network interface card (NIC) 27, and the like, and a proximity wireless data communication such as Bluetooth (registered trademark), a wireless network such as IEEE 802.11b, or Data communication can be performed with various external host computers via a wide area network such as the Internet.

このようなロボット装置１とホスト・コンピュータとの間におけるデータ通信により、遠隔のコンピュータ資源を用いて、ロボット装置１の複雑な動作制御を演算したり、遠隔操作したりすることも可能とされる。 By such data communication between the robot apparatus 1 and the host computer, it is possible to calculate complex operation control of the robot apparatus 1 or remotely operate it using remote computer resources. .

（２）ロボット装置の行動制御システム
次に、上述したロボット装置１の行動制御システムについて詳細に説明する。ここで、上述したロボット装置１は、自己及び周囲の状況や、ユーザからの指示及び働きかけに応じて自律的に行動し得るようになされている。すなわち、ロボット装置１は、外部刺激及び内部状態に応じて自律的に行動を発現することができる。詳細は後述するが、ロボット装置１は、自身の各要素行動が記述された行動記述モジュール（スキーマ）を複数有している。各スキーマは、自身の要素行動の実行優先度を表す行動価値（Activation Level；ＡＬ）を外部刺激及び内部状態に基づいて単位時間毎に計算し、ロボット装置１は、その行動価値の大きさに基づいて一又は複数のスキーマを選択して行動を発現する。 (2) Behavior control system of robot apparatus Next, the behavior control system of the robot apparatus 1 mentioned above is demonstrated in detail. Here, the above-described robot apparatus 1 is configured to be able to act autonomously according to the situation of itself and surroundings, and instructions and actions from the user. That is, the robot apparatus 1 can autonomously express actions according to the external stimulus and the internal state. Although details will be described later, the robot apparatus 1 has a plurality of action description modules (schema) in which each element action of itself is described. Each schema calculates an action value (Activation Level; AL) representing the execution priority of its own element action for each unit time based on the external stimulus and the internal state, and the robot apparatus 1 determines the magnitude of the action value. One or more schemas are selected based on the behavior.

ロボット装置１の行動制御システム１０の機能構成を図４に模式的に示す。この行動制御システム１０は、オブジェクト指向プログラミングを採り入れて実装することができる。この場合、各ソフトウェアは、データとそのデータに対する処理手続きとを一体化させた「オブジェクト」というモジュール単位で扱われる。また、各オブジェクトは、メッセージ通信と共有メモリを使ったオブジェクト間通信方法によりデータの受け渡しとＩｎｖｏｋｅとを行うことができる。 The functional configuration of the behavior control system 10 of the robot apparatus 1 is schematically shown in FIG. This behavior control system 10 can be implemented by adopting object-oriented programming. In this case, each software is handled in units of modules called “objects” in which data and processing procedures for the data are integrated. In addition, each object can perform data transfer and invoke using message communication and an inter-object communication method using a shared memory.

行動制御システム１０は、外部環境７０を認識するために、視覚認識機能部８１、聴覚認識機能部８２、及び接触認識機能部８３等からなる機能モジュールである外部刺激認識部８０を備えている。 In order to recognize the external environment 70, the behavior control system 10 includes an external stimulus recognition unit 80 that is a functional module including a visual recognition function unit 81, an auditory recognition function unit 82, a contact recognition function unit 83, and the like.

視覚認識機能部８１は、例えばＣＣＤカメラ４１のような画像入力装置を介して入力された撮影画像を基に、顔認識や色認識等の画像認識処理や特徴抽出を行う。 The visual recognition function unit 81 performs image recognition processing such as face recognition and color recognition and feature extraction based on a photographed image input via an image input device such as the CCD camera 41, for example.

また、聴覚認識機能部８２は、マイクロフォン４２等の音声入力装置を介して入力される音声データを音声認識して、特徴抽出を行ったり、単語セット（テキスト）を認識したりする。 The auditory recognition function unit 82 recognizes voice data input via a voice input device such as the microphone 42 and performs feature extraction or recognizes a word set (text).

さらに、接触認識機能部８３は、例えば機体の頭部ユニット３等に内蔵されたタッチセンサ４４によるセンサ信号を認識して、「撫でられた」とか「叩かれた」という外部刺激を認識する。 Further, the contact recognition function unit 83 recognizes an external stimulus such as “stroked” or “struck” by recognizing a sensor signal from the touch sensor 44 built in the head unit 3 of the aircraft.

内部状態管理部９１は、本能や感情といった数種類の情動を数式モデル化して管理する感情・本能モデルを有しており、上述の視覚認識機能部８１、聴覚認識機能部８２、及び接触認識機能部８３によって認識された外部刺激に応じてロボット装置１の本能や感情といった内部状態を管理する。この感情・本能モデルは、それぞれ認識結果と行動履歴とを入力に持ち、それぞれ感情値と本能値とを管理している。行動モデルは、これらの感情値や本能値を参照することができる。 The internal state management unit 91 has an emotion / instinct model for managing several types of emotions such as instinct and emotion by modeling them. The above-described visual recognition function unit 81, auditory recognition function unit 82, and contact recognition function unit The internal state such as instinct and emotion of the robot apparatus 1 is managed in accordance with the external stimulus recognized by 83. This emotion / instinctive model has recognition results and action histories as inputs, and manages emotion values and instinct values, respectively. The behavior model can refer to these emotion values and instinct values.

また、行動制御システム１０は、外部刺激の認識結果や内部状態の変化に応じて行動制御を行うために、時間経過に従って失われる短期的な記憶を行う短期記憶部９２と、情報を比較的長期間保持するための長期記憶部９３とを備えている。短期記憶及び長期記憶という記憶メカニズムの分類は神経心理学に依拠する。 In addition, the behavior control system 10 includes a short-term storage unit 92 that performs short-term memory that is lost over time, and information that is relatively long in order to perform behavior control according to a recognition result of an external stimulus or a change in an internal state. A long-term storage unit 93 for holding a period. The classification of memory mechanisms, short-term memory and long-term memory, relies on neuropsychology.

短期記憶部９２は、上述の視覚認識機能部８１、聴覚認識機能部８２、及び接触認識機能部８３によって外部環境から認識されたターゲットやイベントを短期間保持する機能モジュールである。例えば、図２に示すＣＣＤカメラ４１からの入力画像を約１５秒程度の短い期間だけ記憶する。 The short-term storage unit 92 is a functional module that holds targets and events recognized from the external environment by the visual recognition function unit 81, the auditory recognition function unit 82, and the contact recognition function unit 83 described above for a short period. For example, the input image from the CCD camera 41 shown in FIG. 2 is stored for a short period of about 15 seconds.

長期記憶部９３は、物の名前など学習により得られた情報を長期間保持するために使用される。長期記憶部９３は、例えばあるスキーマにおいて外部刺激から内部状態の変化を連想記憶することができる。 The long-term storage unit 93 is used to hold information obtained by learning such as the name of an object for a long period of time. The long-term storage unit 93 can associatively store changes in the internal state from an external stimulus in a certain schema, for example.

また、ロボット装置１の行動制御は、反射行動部１０３によって実現される「反射行動」と、状況依存行動階層１０２によって実現される「状況依存行動」と、熟考行動階層１０１によって実現される「熟考行動」とに大別される。 Further, the behavior control of the robot apparatus 1 is performed by the “reflex behavior” realized by the reflex behavior unit 103, the “situation-dependent behavior” realized by the situation-dependent behavior hierarchy 102, and the “contemplation behavior hierarchy 101”. It is roughly divided into “action”.

反射行動部１０３は、上述の視覚認識機能部８１、聴覚認識機能部８２、及び接触認識機能部８３によって認識された外部刺激に応じて反射的な機体動作を実現する機能モジュールである。反射行動とは、基本的に、センサ入力された外部情報の認識結果を直接受けて、これを分類して、出力行動を直接決定する行動のことである。例えば、人間の顔を追いかけたり、頷いたりといった振る舞いは反射行動として実装することが好ましい。 The reflex behavior unit 103 is a functional module that realizes a reflexive body operation according to an external stimulus recognized by the visual recognition function unit 81, the auditory recognition function unit 82, and the contact recognition function unit 83 described above. The reflex action is basically an action that directly receives the recognition result of the external information input from the sensor, classifies it, and directly determines the output action. For example, behavior such as chasing a human face or scolding is preferably implemented as a reflex behavior.

状況依存行動階層１０２は、上述の短期記憶部９２及び長期記憶部９３の記憶内容や、内部状態管理部９１によって管理されるロボット装置１の内部状態を基に、ロボット装置１が現在置かれている状況に即応した行動を制御する。 The situation-dependent action hierarchy 102 includes the robot device 1 currently placed based on the contents stored in the short-term storage unit 92 and the long-term storage unit 93 described above and the internal state of the robot device 1 managed by the internal state management unit 91. Control behaviors that respond quickly to the situation.

この状況依存行動階層１０２は、要素行動毎にステートマシンを用意しており、それ以前の行動や状況に依存して、センサ入力された外部情報の認識結果を分類して、行動を機体上で発現する。また、状況依存行動階層１０２は、内部状態をある範囲に保つためのホメオスタシス行動も実現し、内部状態が指定した範囲内を越えた場合には、その内部状態を当該範囲内に戻すための行動が出現し易くなるようにその行動を活性化させる（実際には、内部状態と外部環境の両方を考慮した形で行動が選択される）。状況依存行動は、反射行動に比し、反応時間が遅い。 This situation-dependent action hierarchy 102 prepares a state machine for each element action, classifies recognition results of external information input from sensors depending on previous actions and situations, and displays actions on the aircraft. To express. The situation-dependent action hierarchy 102 also realizes a homeostasis action for keeping the internal state within a certain range, and when the internal state exceeds the specified range, an action for returning the internal state to the relevant range. The action is activated so that it is easy to appear (actually, the action is selected in consideration of both the internal state and the external environment). Situation-dependent behavior has a slower response time than reflex behavior.

熟考行動階層１０１は、上述の短期記憶部９２及び長期記憶部９３の記憶内容に基づいて、ロボット装置１の比較的長期に亘る行動計画等を行う。熟考行動とは、与えられた状況或いは人間からの命令により、推論やそれを実現するための計画を立てて行われる行動のことである。例えば、ロボット装置１の現在位置と目標位置とから経路を探索することは熟考行動に相当する。このような推論や計画は、ロボット装置１がインタラクションを保つための反応時間よりも処理時間や計算負荷を要する（すなわち処理時間がかかる）可能性があるため、上述の反射行動部１０３や状況依存行動階層１０２がリアルタイムで反応を返しながら、熟考行動階層１０１は推論や計画を行う。 The contemplation action hierarchy 101 performs an action plan for the robot apparatus 1 over a relatively long period of time based on the storage contents of the short-term storage unit 92 and the long-term storage unit 93 described above. A contemplation action is an action that is performed by inference and a plan for realizing it in accordance with a given situation or a command from a human. For example, searching for a route from the current position and the target position of the robot apparatus 1 corresponds to a contemplation action. Such an inference or plan may require a processing time or a calculation load (that is, a processing time) rather than a reaction time for the robot apparatus 1 to maintain interaction. While the action hierarchy 102 responds in real time, the contemplation action hierarchy 101 performs inference and planning.

熟考行動階層１０１、状況依存行動階層１０２、及び反射行動部１０３は、ロボット装置１のハードウェア構成に非依存の上位のアプリケーション・プログラムとして記述することができる。これに対し、ハードウェア依存行動制御部１０４は、これら上位アプリケーションからの命令に応じて、関節アクチュエータの駆動等の機体のハードウェア（外部環境）を直接操作する。このような構成により、ロボット装置１は、制御プログラムに基づいて自己及び周囲の状況を判断し、ユーザからの指示及び働きかけに応じて自律的に行動できる。 The contemplation action hierarchy 101, the situation dependent action hierarchy 102, and the reflex action section 103 can be described as higher-level application programs that are independent of the hardware configuration of the robot apparatus 1. On the other hand, the hardware-dependent behavior control unit 104 directly operates the hardware (external environment) of the machine body such as driving of the joint actuator in accordance with commands from these higher-level applications. With such a configuration, the robot apparatus 1 can determine its own and surrounding conditions based on the control program, and can act autonomously according to instructions and actions from the user.

以下、行動制御システム１０についてさらに説明する。図５は、行動制御システム１０のオブジェクト構成を示す模式図である。 Hereinafter, the behavior control system 10 will be further described. FIG. 5 is a schematic diagram illustrating an object configuration of the behavior control system 10.

図５に示すように、視覚認識機能部８１は、Face Detector１１１、Multi Color Tracker１１２、Face Identify１１３という３つのオブジェクトで構成される。Face Detector１１１は、画像フレーム中から顔領域を検出するオブジェクトであり、検出結果をFace Identify１１３に出力する。Multi Color Tracker１１２は、色認識を行うオブジェクトであり、認識結果をFace Identify１１３及びShort Term Memory９２に出力する。また、Face Identify１１３は、検出された顔画像を手持ちの人物辞書で検索する等して人物の識別を行い、顔画像領域の位置、大きさ情報と共に人物のＩＤ情報をShort Term Memory９２に出力する。 As shown in FIG. 5, the visual recognition function unit 81 includes three objects, Face Detector 111, Multi Color Tracker 112, and Face Identify 113. The Face Detector 111 is an object that detects a face area from an image frame, and outputs the detection result to the Face Identify 113. The Multi Color Tracker 112 is an object that performs color recognition, and outputs the recognition result to the Face Identify 113 and the Short Term Memory 92. The Face Identify 113 identifies a person by searching for a detected face image in a personal dictionary, and outputs the person ID information together with the position and size information of the face image area to the Short Term Memory 92.

聴覚認識機能部８２は、Audio Recog１１４とSpeech Recog１１５という２つのオブジェクトで構成される。Audio Recog１１４は、マイクロフォン４２等の音声入力装置からの音声データを受け取って、特徴抽出及び音声区間検出を行うオブジェクトであり、音声区間の音声データの特徴量及び音源方向をSpeech Recog１１５やShort Term Memory９２に出力する。Speech Recog１１５は、Audio Recog１１４から受け取った音声特徴量と音声辞書及び構文辞書とを使って音声認識を行うオブジェクトであり、認識された単語セットをShort Term Memory９２に出力する。 The auditory recognition function unit 82 includes two objects, an audio recog 114 and a speech recog 115. The Audio Recog 114 is an object that receives voice data from a voice input device such as the microphone 42 and performs feature extraction and voice section detection. The feature amount and sound source direction of the voice data in the voice section are stored in the Speech Recog 115 and the Short Term Memory 92. Output. The Speech Recog 115 is an object that performs voice recognition using the voice feature amount, the voice dictionary, and the syntax dictionary received from the Audio Recog 114, and outputs the recognized word set to the Short Term Memory 92.

触覚認識記憶部８３は、タッチセンサ４４からのセンサ入力を認識するTactile Sensor１１６というオブジェクトで構成され、認識結果はShort Term Memory９２や内部状態を管理するオブジェクトであるInternal Status Manager９１に出力する。 The tactile sensation recognition storage unit 83 includes an object called a Tactile Sensor 116 that recognizes a sensor input from the touch sensor 44, and outputs the recognition result to the Short Term Memory 92 and the Internal Status Manager 91 that is an object for managing the internal state.

Internal Status Manager９１は、内部状態管理部を構成するオブジェクトであり、本能や感情といった数種類の情動を数式モデル化して管理しており、上述の認識系の各オブジェクトによって認識された外部刺激に応じてロボット装置１の本能や感情といった内部状態を管理する。 The Internal Status Manager 91 is an object that constitutes an internal state management unit, manages several kinds of emotions such as instinct and emotion by modeling them, and robots according to external stimuli recognized by each object of the recognition system described above The internal state of the device 1 such as instinct and emotion is managed.

Short Term Memory９２は、短期記憶部を構成するオブジェクトであり、上述の認識系の各オブジェクトによって外部環境から認識されたターゲットやイベントを短期間保持（例えばＣＣＤカメラ４１からの入力画像を約１５秒程度の短い期間だけ記憶）する機能モジュールであり、Short Term Memory９２のクライアント（ＳＴＭクライアント）であるNormalＳＢＬ（Situated Behavior Layer）１０２に対して外部刺激の通知（Notify）を定期的に行う。 The short term memory 92 is an object constituting the short term memory unit, and holds targets and events recognized from the external environment by each object of the recognition system described above (for example, an input image from the CCD camera 41 is about 15 seconds). This is a functional module that stores only for a short period of time), and periodically sends notification of external stimuli (Notify) to Normal SBL (Situated Behavior Layer) 102 which is a client (STM client) of Short Term Memory 92.

Long Term Memory９３は、長期記憶部を構成するオブジェクトであり、物の名前など学習により得られた情報を長期間保持するために使用される。Long Term Memory９３は、例えばあるスキーマにおいて外部刺激から内部状態の変化を連想記憶することができる。 The Long Term Memory 93 is an object that constitutes a long-term storage unit, and is used to hold information obtained by learning such as the name of an object for a long period. The Long Term Memory 93 can associatively store changes in the internal state from an external stimulus in a certain schema, for example.

NormalＳＢＬ１０２は、状況依存行動階層を構成するオブジェクトである。NormalＳＢＬ１０２は、ＳＴＭクライアントとなるオブジェクトであり、Short Term Memory９２から定期的に外部刺激（ターゲットやイベント）に関する情報の通知を受け取ると、実行すべきスキーマを決定する（後述）。 Normal SBL 102 is an object that constitutes a situation-dependent action hierarchy. The Normal SBL 102 is an object serving as an STM client, and determines a schema to be executed upon receiving a notification of information related to an external stimulus (target or event) periodically from the Short Term Memory 92 (described later).

ReflexiveＳＢＬ１０３は、反射行動部を構成するオブジェクトであり、上述した認識系の各オブジェクトによって認識された外部刺激に応じて反射的・直接的な機体動作を実行する。例えば、人間の顔を追いかける、頷く、障害物の検出により咄嗟に避けるといった振る舞いを行う。 The Reflexive SBL 103 is an object that constitutes a reflex action unit, and executes a reflexive / direct airframe operation according to an external stimulus recognized by each object of the recognition system described above. For example, behaviors such as chasing a human face, crawling, and avoiding obscurity by detecting an obstacle are performed.

NormalＳＢＬ１０２は、外部刺激や内部状態の変化等の状況に応じた動作を選択する。これに対し、ReflexiveＳＢＬ１０３は、外部刺激に応じて反射的な動作を選択する。これら２つのオブジェクトによる行動選択は独立して行われるため、互いに選択されたスキーマを機体上で実行する場合に、ロボット装置１のハードウェア・リソースが競合して実現不可能なこともある。Resource Manager１２１というオブジェクトは、NormalＳＢＬ１０２とReflexiveＳＢＬ１０３とによる行動選択時のハードウェアの競合を調停する。そして、調停結果に基づいて機体動作を実現する各オブジェクトに通知することにより機体が駆動する。 The Normal SBL 102 selects an operation according to a situation such as an external stimulus or a change in the internal state. On the other hand, the Reflexive SBL 103 selects a reflex operation according to an external stimulus. Since the action selection by these two objects is performed independently, when the schemas selected from each other are executed on the machine, the hardware resources of the robot apparatus 1 may compete and may not be realized. An object called Resource Manager 121 mediates hardware contention when selecting an action by Normal SBL 102 and Reflexive SBL 103. Then, the airframe is driven by notifying each object that realizes the airframe motion based on the arbitration result.

Sound Performer１２２、Motion Controller１２３、ＬＥＤ Controller１２４は、機体動作を実現するオブジェクトである。Sound Performer１２２は、音声出力を行うためのオブジェクトであり、Resource Manager１２１経由でNormalＳＢＬ１０２から与えられたテキスト・コマンドに応じて音声合成を行い、ロボット装置１の機体上のスピーカ４３から音声出力を行う。また、Motion Controller１２３は、機体上の各関節アクチュエータの動作を行うためのオブジェクトであり、Resource Manager１２１経由でNormalＳＢＬ１０２から手や脚等を動かすコマンドを受けたことに応答して、該当する関節角を計算する。また、ＬＥＤ Controller１２４は、ＬＥＤ４４の点滅動作を行うためのオブジェクトであり、Resource Manager１２１経由でNormalＳＢＬ１０２からコマンドを受けたことに応答してＬＥＤ４４の点滅駆動を行う。 The Sound Performer 122, the Motion Controller 123, and the LED Controller 124 are objects that realize the machine operation. The Sound Performer 122 is an object for outputting sound, performs sound synthesis in accordance with a text command given from the Normal SBL 102 via the Resource Manager 121, and outputs sound from the speaker 43 on the body of the robot apparatus 1. The Motion Controller 123 is an object for operating each joint actuator on the aircraft, and calculates a corresponding joint angle in response to receiving a command for moving a hand, a leg, or the like from the Normal SBL 102 via the Resource Manager 121. To do. The LED controller 124 is an object for performing the blinking operation of the LED 44, and performs the blinking drive of the LED 44 in response to receiving a command from the Normal SBL 102 via the Resource Manager 121.

以上、行動制御システム１０の機能構成及びオブジェクト構成について説明したが、以下では、先ず現在置かれている状況に即応した行動を行う状況依存行動について説明し、次にある計画に基づいた行動を行う熟考行動と、熟考行動により繰り返し発現した一連の行動をルーチンワークとして獲得する方法とについて説明する。 The functional configuration and the object configuration of the behavior control system 10 have been described above. In the following, first, the situation-dependent behavior that performs an action in response to the current situation is described, and then the behavior based on a certain plan is performed. Describe the pondering behavior and the method of acquiring a series of behaviors that are repeatedly expressed by the pondering behavior as routine work.

（２−１）状況依存行動
状況依存行動は、上述のように状況依存行動階層１０２によって制御される。状況依存行動階層１０２による状況依存行動制御の形態を図６に模式的に示す。 (2-1) Situation Dependent Behavior Situation dependent behavior is controlled by the situation dependent behavior hierarchy 102 as described above. The form of the situation dependent action control by the situation dependent action hierarchy 102 is schematically shown in FIG.

視覚認識機能部８１、聴覚認識機能部８２、及び接触認識機能部８３からなる外部刺激認識部８０における外部環境７０の認識結果（センサ情報）１３１は、外部刺激１３２として状況依存行動階層（NormalＳＢＬ）１０２に与えられる。また、外部刺激認識部８０による外部環境７０の認識結果に応じた内部状態の変化１３３も状況依存行動階層１０２に与えられる。そして、状況依存行動階層１０２では、外部刺激１３２や内部状態の変化１３３に応じて状況を判断して、行動選択を実現することができる。状況依存行動階層１０２では、外部刺激１３２や内部状態の変化１３３によって、各要素行動が記述されたスキーマの行動価値を算出し、行動価値の大きさに基づいて選択されたスキーマの要素行動を実行する。行動価値の算出には、例えばライブラリを利用することにより、全てのスキーマについて統一的な計算処理を行うことができる。 The recognition result (sensor information) 131 of the external environment 70 in the external stimulus recognition unit 80 including the visual recognition function unit 81, the auditory recognition function unit 82, and the contact recognition function unit 83 is a situation-dependent action hierarchy (Normal SBL) as the external stimulus 132. 102. In addition, a change 133 in the internal state according to the recognition result of the external environment 70 by the external stimulus recognition unit 80 is also given to the situation-dependent action hierarchy 102. In the situation-dependent action hierarchy 102, action selection can be realized by judging the situation according to the external stimulus 132 or the internal state change 133. In the situation dependent action hierarchy 102, the action value of the schema in which each element action is described is calculated by the external stimulus 132 or the change 133 of the internal state, and the element action of the schema selected based on the magnitude of the action value is executed. To do. For the calculation of the action value, for example, by using a library, unified calculation processing can be performed for all schemas.

（２−１−１）スキーマ
図７には、状況依存行動階層１０２が複数のスキーマ（要素行動）１４１によって構成されている様子を模式的に示している。状況依存行動階層１０２は、要素行動として行動記述モジュールを有し、行動記述モジュール毎にステートマシンを用意しており、それ以前の行動（動作）や状況に依存して、センサ入力された外部情報の認識結果を分類し、動作を機体上で発現する。要素行動となる行動記述モジュールは、外部刺激や内部状態に応じた状況判断を行うＭｏｎｉｔｏｒ機能と、行動実行に伴う状態遷移（ステートマシン）を実現するＡｃｔｉｏｎ機能とを備えたスキーマとして記述される。 (2-1-1) Schema FIG. 7 schematically shows that the situation-dependent action hierarchy 102 is composed of a plurality of schemas (element actions) 141. The situation-dependent action hierarchy 102 has an action description module as an element action, a state machine is prepared for each action description module, and external information input from a sensor depending on the previous action (action) or situation The recognition results are classified and the action is expressed on the aircraft. The behavior description module, which is an elemental behavior, is described as a schema having a Monitor function for determining a situation according to an external stimulus or an internal state, and an Action function for realizing a state transition (state machine) associated with the execution of the behavior.

状況依存行動階層１０２（より厳密には、状況依存行動階層１０２のうち、通常の状況依存行動を制御する階層）は、複数のスキーマ１４１が階層的に連結されたツリー構造として構成され、外部刺激や内部状態の変化に応じてより最適なスキーマ１４１を統合的に判断して行動制御を行うようになっている。このツリー１４２は、例えば動物行動学的（Ethological）な状況依存行動を数式化した行動モデルや、感情表現を実行するためのサブツリー等、複数のサブツリー（又は枝）を含んでいる。 The situation-dependent action hierarchy 102 (more strictly speaking, a hierarchy that controls a normal situation-dependent action among the situation-dependent action hierarchy 102) is configured as a tree structure in which a plurality of schemas 141 are hierarchically connected, and external stimuli. In addition, more optimal schema 141 is integratedly determined according to changes in the internal state and behavior control is performed. The tree 142 includes a plurality of subtrees (or branches) such as an action model obtained by formulating an animal behavioral situation-dependent action and a subtree for executing emotion expression.

状況依存行動階層１０２におけるスキーマのツリー構造の一例を図８に模式的に示す。図８に示すように、状況依存行動階層１０２は、短期記憶部９２から外部刺激の通知（Notify）を受けるルートスキーマ１５１_１、１５２_１、１５３_１を先頭に、抽象的な行動カテゴリから具体的な行動カテゴリに向かうように、各階層毎にスキーマが配設されている。例えば、ルートスキーマの直近下位の階層には、「探索する（Investigate）」、「食べる（Ingestive）」、「遊ぶ（Play）」というスキーマ１５１_２、１５２_２、１５３_２が配設されている。そして、スキーマ１５１_２「探索する（Investigate）」の下位には、「InvestigativeLocomotion」、「HeadinAirSniffing」、「InvestigativeSniffing」等の、より具体的な探索行動を記述した複数のスキーマ１５１_３が配設されている。同様に、スキーマ１５２_２「食べる（Ingestive）」の下位には、「Eat」、「Drink」等の、より具体的な飲食行動を記述した複数のスキーマ１５２_３が配設され、スキーマ１５３_２「遊ぶ（Play）」の下位には、「PlayBowing」、「PlayGreeting」、「PlayPawing」等の、より具体的な遊ぶ行動を記述した複数のスキーマ１５３_３が配設されている。 An example of a schema tree structure in the situation-dependent action hierarchy 102 is schematically shown in FIG. As shown in FIG. 8, the situation-dependent action hierarchy 102 is concretely specified from the abstract action category with the root schemas 151 ₁ , 152 ₁ , and 153 ₁ that receive notifications (Notify) of external stimuli from the short-term storage unit 92 as the head. A schema is arranged for each hierarchy so as to go to various action categories. For example, in the hierarchy immediately below the root schema, schemas 151 ₂ , 152 ₂ , and 15 ₂ such as “Investigate”, “Ingestive”, and “Play” are arranged. A plurality of schemas 151 ₃ describing more specific search behaviors such as “InvestigativeLocomotion”, “HeadinAirSniffing”, “InvestigativeSniffing”, etc. are arranged below the schema 151 ₂ “Investigate”. Yes. Similarly, the lower the schema 152 ₂ "eat (Ingestive)", "Eat", such as "Drink", a plurality of schemas 152 ₃ describing a more specific food behaviors disposed, schema 153 ₂ " the lower play (Play) "," PlayBowing "," PlayGreeting "," PlayPawing "such as, a plurality of schemas 153 ₃ describing a more specific play behavior is provided.

図示の通り、各スキーマは外部刺激１３２と内部状態（の変化）１３３を入力としている。また、各スキーマは、少なくともＭｏｎｉｔｏｒ関数とＡｃｔｉｏｎ関数とを備えている。 As shown, each schema takes an external stimulus 132 and an internal state (change) 133 as inputs. Each schema includes at least a Monitor function and an Action function.

Ｍｏｎｉｔｏｒ関数とは、外部刺激と内部状態とに応じて当該スキーマの行動価値を算出する関数であり、各スキーマは、このような行動価値算出手段としてのＭｏｎｉｔｏｒ機能を有する。図８に示すようなツリー構造を構成する場合、上位（親）のスキーマは外部刺激及び内部状態を引数として下位（子供）のスキーマのＭｏｎｉｔｏｒ関数をコールすることができ、子スキーマは行動価値を返値とする。また、スキーマは自分の行動価値を算出するために、さらに下位のスキーマのＭｏｎｉｔｏｒ関数をコールすることができる。そして、最上位のルートスキーマには各サブツリーからの行動価値が返されるので、外部刺激及び内部状態の変化に応じた最適なスキーマ、すなわち行動を統合的に判断することができる。この際、ルートスキーマは、行動価値が最も高いスキーマを選択してもよく、行動価値が所定の閾値を超えた２以上のスキーマを選択して並列的に実行させるようにしてもよい。但し、並列実行させる場合には各スキーマ同士でハードウェア・リソースの競合がないことを前提とする。 The Monitor function is a function for calculating the behavior value of the schema in accordance with the external stimulus and the internal state, and each schema has a Monitor function as such behavior value calculation means. When the tree structure as shown in FIG. 8 is configured, the upper (parent) schema can call the Monitor function of the lower (child) schema with the external stimulus and the internal state as arguments, and the child schema has an action value. Return value. In addition, the schema can call the Monitor function of a lower-level schema in order to calculate its own action value. Since the action value from each sub-tree is returned to the topmost root schema, the optimum schema corresponding to the external stimulus and the change in the internal state, that is, the action can be determined in an integrated manner. At this time, a schema having the highest action value may be selected as the root schema, or two or more schemas having action values exceeding a predetermined threshold value may be selected and executed in parallel. However, when executing in parallel, it is assumed that there is no competition of hardware resources between schemas.

一方、Ａｃｔｉｏｎ関数は、スキーマ自身が持つ行動を記述したステートマシンを備えている。図８に示すようなツリー構造を構成する場合、親スキーマは、Ａｃｔｉｏｎ関数をコールして、子スキーマの実行を開始したり中断させたりすることができる。但し、ＡｃｔｉｏｎのステートマシンはＲｅａｄｙにならないと初期化されない。言い換えれば、中断しても状態はリセットされず、スキーマが実行中の作業データを保存することから、中断再実行が可能である。 On the other hand, the Action function includes a state machine that describes the behavior of the schema itself. When the tree structure shown in FIG. 8 is configured, the parent schema can call the Action function to start or interrupt the execution of the child schema. However, the action state machine is not initialized unless it becomes Ready. In other words, even if it is interrupted, the state is not reset, and the work data being executed by the schema is saved, so that it can be interrupted and reexecuted.

（２−１−２）行動価値の算出
上述したように、スキーマ毎に算出される行動価値とは、その要素行動をロボット装置１がどの程度実行したいか（実行優先度）を表すものであり、ロボット装置１は、この行動価値に基づいて一又は複数の要素行動を選択することにより、行動を発現する。 (2-1-2) Calculation of Action Value As described above, the action value calculated for each schema represents how much the robot apparatus 1 wants to execute the element action (execution priority). The robot apparatus 1 expresses an action by selecting one or a plurality of element actions based on the action value.

この際、各スキーマは、自身に対応付けられた外部刺激及び内部状態に基づいて行動価値を算出するが、この行動価値は、
（ａ）モチベーション値（Motivation value；Mot）
（ｂ）リリーシング値（Releasing value；Rel）
（ｃ）行動価値バイアス（Self Excitation value；SE）
（ｄ）デフォルト行動価値（Rest Level；RL）
（ｅ）ランダムノイズ(Random noise；Noise)
の各要素の重み付け和によって算出される。 At this time, each schema calculates an action value based on an external stimulus and an internal state associated with the schema.
(A) Motivation value (Mot)
(B) Releasing value (Rel)
(C) Behavior value bias (Self Excitation value; SE)
(D) Default action value (Rest Level; RL)
(E) Random noise (Noise)
It is calculated by the weighted sum of each element.

以下では、ある「種類」、「大きさ」の対象物が存在するとき、スキーマ「食べる（Ingestive）」の行動価値を算出する場合を例として、上記（ａ）〜（ｅ）の各要素について説明すると共に、（ｆ）最終的な行動価値、についても説明する。 In the following, when there is an object of a certain “type” and “size”, the case where the behavior value of the schema “Ingestive” is calculated is taken as an example for each of the elements (a) to (e) above. Along with the explanation, (f) the final action value will also be explained.

（ａ）モチベーション値
モチベーション値Motは、各スキーマの要素行動に対する欲求を示す欲求値Ins[i]に基づいて算出され、この欲求値Ins[i]は、各スキーマに対応付けられた内部状態値Int[i]に基づいて算出される。例えば、スキーマ「食べる（Ingestive）」には、内部状態値Int[NOURISHMENT（栄養状態）]が対応付けられており、この内部状態値Int[NOURISHMENT]から欲求値Ins[NOURISHMENT（食欲）]が算出される。 (A) Motivation value The motivation value Mot is calculated based on a desire value Ins [i] indicating a desire for element behavior of each schema, and the desire value Ins [i] is an internal state value associated with each schema. Calculated based on Int [i]. For example, an internal state value Int [NOURISHMENT (nutrition state)] is associated with the schema “Ingestive”, and a desire value Ins [NOURISHMENT] is calculated from the internal state value Int [NOURISHMENT]. Is done.

欲求値Ins[i]の算出には、内部状態値Int[i]と欲求値Ins[i]との関係を表す関数を用いることができる。具体的には、図９に示すような関数が挙げられる。図９では、内部状態値Int[NOURISHMENT]の大きさを０乃至１００とし、そのときの欲求値Ins[NOURISHMENT]の大きさが−１乃至１となるような関数を示している。例えば内部状態値が８割満たされているときに欲求値が０となるような内部状態値−欲求値曲線Ｌ１を設定することで、ロボット装置１は、常に内部状態値が８割の状態を維持するように行動を選択するようになる。これにより、例えば、空腹であれば食欲が増大し、腹八分目以上では食欲がなくなるという状態を反映した行動を発現させることができる。 In calculating the desire value Ins [i], a function representing the relationship between the internal state value Int [i] and the desire value Ins [i] can be used. Specifically, there is a function as shown in FIG. FIG. 9 shows a function in which the magnitude of the internal state value Int [NOURISHMENT] is 0 to 100 and the desired value Ins [NOURISHMENT] at that time is −1 to 1. For example, by setting an internal state value-desired value curve L1 such that the desire value becomes 0 when the internal state value is 80% satisfied, the robot apparatus 1 always has a state where the internal state value is 80%. Choose to act to maintain. Thereby, for example, an appetite increases if the person is hungry, and an action reflecting the state that the appetite is lost after the eighth minute or more can be expressed.

なお、上述した具体例では、内部状態値が０乃至１００の範囲において欲求値が−１乃至１の範囲で変化するものとしたが、内部状態値が０乃至１００の範囲において欲求値が１乃至０に変化するようにしてもよい。また、内部状態毎に異なる内部状態値−欲求値関数を用意してもよい。 In the specific example described above, the desire value is changed in the range of −1 to 1 when the internal state value is in the range of 0 to 100. However, the desire value is 1 to 1 in the range of the internal state value of 0 to 100. It may be changed to 0. Further, a different internal state value-desired value function may be prepared for each internal state.

モチベーション値Motは、以上のようにして求められた欲求値Ins[i]に基づいて、以下の式（１）のように求められる。ここで、Ｗ_Ｍｏｔ［ｉ］は重み係数である。 The motivation value Mot is obtained by the following equation (1) based on the desire value Ins [i] obtained as described above. Here, W _Mot [i] is a weighting coefficient.

（ｂ）リリーシング値
リリーシング値Relは、要素行動を発現することによって現在の満足度Sat[i]がどの程度変化するかを表す予想満足度変化値dSat[i]と、変化後の予想満足度ESat[i]とから算出される。 (B) Release value Releasing value Rel is an expected satisfaction change value dSat [i] that indicates how much the current satisfaction level Sat [i] changes due to the expression of elemental behavior, and the expected value after the change. It is calculated from the satisfaction degree ESat [i].

ここで、ロボット装置１の内部状態値と満足度とは互いに関連しているため、予想満足度変化値dSat[i]は、要素行動を発現することによって現在の内部状態値Int[i]がどの程度変化するかを表す予想内部状態変化値dInt[i]に基づいて算出することができる。 Here, since the internal state value and the satisfaction degree of the robot apparatus 1 are related to each other, the expected satisfaction change value dSat [i] is expressed as the current internal state value Int [i] by expressing the element behavior. It can be calculated based on an expected internal state change value dInt [i] representing how much it changes.

この予想内部状態変化値dInt[i]は、行動価値算出データベースの行動価値算出データを参照して求めることができる。行動価値算出データは、外部刺激と予想内部状態変化値dInt[i]との対応が記述されたものであり、この行動価値算出データベースを参照することで、入力された外部刺激に応じた予想内部状態変化値dInt[i]を取得することができる。 The expected internal state change value dInt [i] can be obtained by referring to the behavior value calculation data in the behavior value calculation database. The action value calculation data describes the correspondence between the external stimulus and the expected internal state change value dInt [i]. By referring to this action value calculation database, the expected internal value corresponding to the input external stimulus is described. The state change value dInt [i] can be acquired.

具体的に、行動価値算出データとしては、図１０に示すものが挙げられる。図１０に示すように、内部状態値Int[NOURISHMENT]は、要素行動である「食べる」を発現した結果、対象物の大きさ（OBJECT_SIZE）が大きいほど、また対象物の種類（OBJECT_ID）がOBJECT_ID＝０に対応する対象物Ｍ１より、OBJECT_ID＝１に対応する対象物Ｍ２が、また、OBJECT_ID＝１に対応する対象物Ｍ２より、OBJECT_ID＝２に対応する対象物Ｍ３の方が満たされる量が大きいであろうと予想されている。 Specifically, what is shown in FIG. 10 is mentioned as action value calculation data. As shown in FIG. 10, the internal state value Int [NOURISHMENT] expresses the element action “eat”. As a result, the larger the object size (OBJECT_SIZE) is, the more the object type (OBJECT_ID) is OBJECT_ID. The object M2 corresponding to OBJECT_ID = 1 is satisfied from the object M1 corresponding to = 0, and the object M3 corresponding to OBJECT_ID = 2 is satisfied from the object M2 corresponding to OBJECT_ID = 1. It is expected to be big.

上述の予想満足度変化値dSat[i]及び予想満足度ESat[i]の算出には、内部状態値Int[i]と満足度Sat[i]との関係を表す関数を用いることができる。具体的には、図１１に示すような関数が挙げられる。図１１では、内部状態値Int[NOURISHMENT]の大きさを０乃至１００とし、内部状態値Int[NOURISHMENT]が０から８０近傍までは満足度Sat[NOURISHMENT]が０から増加し、それ以降は減少して内部状態値Int[NOURISHMENT]が１００で再び満足度Sat[NOURISHMENT]が０になるような曲線Ｌ２を示している。 In calculating the expected satisfaction change value dSat [i] and the expected satisfaction ESat [i], a function representing the relationship between the internal state value Int [i] and the satisfaction Sat [i] can be used. Specifically, there is a function as shown in FIG. In FIG. 11, the magnitude of the internal state value Int [NOURISHMENT] is 0 to 100, and the satisfaction level Sat [NOURISHMENT] increases from 0 until the internal state value Int [NOURISHMENT] ranges from 0 to 80, and thereafter decreases. Then, a curve L2 is shown such that the internal state value Int [NOURISHMENT] is 100 and the satisfaction level Sat [NOURISHMENT] is 0 again.

リリーシング値Relは、以上のようにして求められた予想満足度変化値dSat[i]及び予想満足度ESat[i]に基づいて、以下の式（２）のように求められる。ここで、Ｗ_Ｒｅｌ［ｉ］、Ｗ_ｄＳａｔは重み係数である。 The releasing value Rel is obtained by the following equation (2) based on the expected satisfaction change value dSat [i] and the expected satisfaction ESat [i] obtained as described above. Here, W _Rel [i] and W _dSat are weighting factors.

（ｃ）行動価値バイアス
行動価値バイアスSEは、行動価値にバイアスをかける、すなわち行動価値を底上げするための要素であり、以下の式（３）のように、ステータスバイアス（Status Self Excitation value；SSE）とルーチンバイアス（Routine Self Excitation value；RSE）との和として表される。 (C) Behavior Value Bias The behavior value bias SE is an element for biasing the behavior value, that is, for raising the behavior value. ) And routine bias (Routine Self Excitation value; RSE).

ステータスバイアスSSEは、あるスキーマが実行されているときに、そのスキーマの行動価値を底上げし、行動が容易に切り替わらないようにするものである。例えば、図１２に示すように、実行中のスキーマＡが時刻ｔ１で終了したとき、その時刻ではスキーマＢの行動価値が最も高いため、時刻ｔ２から時刻ｔ３まではスキーマＢが実行されることになる。このスキーマＢの実行中には、スキーマＢの行動価値にステータスバイアスSSEが加えられる。これにより、スキーマＢの要素行動が他のスキーマの要素行動によって妨げられるのを防止することができる。 Status bias SSE raises the behavioral value of a schema when it is being executed so that behaviors do not switch easily. For example, as shown in FIG. 12, when the executing schema A ends at time t1, the behavior value of schema B is the highest at that time, so that schema B is executed from time t2 to time t3. Become. During execution of this schema B, a status bias SSE is added to the action value of schema B. Thereby, it is possible to prevent the element behavior of the schema B from being hindered by the element behavior of another schema.

一方、ルーチンバイアスRSEは、後述のように一連の行動をルーチンワークとして獲得した後、自身の直前の要素行動（トリガスキーマ）が実行された場合に、自身の行動価値を底上げするものである。このルーチンバイアスRSEについての詳細は後述する。 On the other hand, the routine bias RSE raises its own action value when an element action (trigger schema) immediately before itself is executed after acquiring a series of actions as routine work as described later. Details of the routine bias RSE will be described later.

（ｄ）デフォルト行動価値
デフォルト行動価値RLは、各スキーマについてのデフォルトの行動価値を表した要素である。この行動価値をスキーマ毎に異ならせることにより、各要素行動についての生まれつきの優先順位を表現することができる。また、ロボット装置毎にその優先順位を異ならせることにより、ロボット装置の個性を表現することができる。 (D) Default action value The default action value RL is an element representing the default action value for each schema. By making this action value different for each schema, it is possible to express the priority of each element action. Further, the individuality of the robot apparatus can be expressed by changing the priority order of each robot apparatus.

ここで、あるスキーマの実行中に、上述のモチベーション値Motやリリーシング値Relが急激に低下したとき、そのスキーマの行動価値はデフォルト行動価値RLまで低下するが、この際、行動価値を急激に低下させるのではなく、所定の減衰パラメータに従って徐々に減少させることが好ましい。例えば、図１３に示すように、実行中のスキーマＡについて、時刻ｔ１にモチベーション値Motやリリーシング値Relが急激に低下したとき、所定の減衰パラメータに従って行動価値を徐々に減少させ、その行動価値がスキーマＢの行動価値よりも低くなりスキーマＢが実行されて初めて、デフォルト行動価値RLまで急激に低下させることが好ましい。 Here, when the motivation value Mot or the releasing value Rel described above suddenly decreases during the execution of a certain schema, the behavior value of that schema decreases to the default action value RL. Instead of decreasing, it is preferable to decrease gradually according to a predetermined attenuation parameter. For example, as shown in FIG. 13, when the motivation value Mot and the release value Rel rapidly decrease at time t1 for the schema A being executed, the action value is gradually reduced according to a predetermined attenuation parameter, and the action value However, it is preferable that the behavior value is rapidly decreased to the default behavior value RL only after the behavior value of the schema B is lowered and the schema B is executed.

このように、行動価値を徐々に減少させていくことによって、例えば次のような行動を実現することができる。ロボット装置１がボールを蹴る行動を実行していたときに、その行動を引き起こす内部状態である運動欲と、外部刺激であるボールが突然なくなったとする。このとき、上述の減衰メカニズムによって、ボールを蹴る行動の行動価値は徐々に減少するが、ロボット装置１は、他の行動の行動価値がボールを蹴る行動の行動価値よりも高くなるまで、ボールを捜し続けるなど、ボールを蹴る行動に関する一連の動作を行う。この間にボールが見つかれば、ボールを蹴る行動の行動価値は再び増加するため、その行動を続けることが可能になる。つまり、行動が突然切り替わるのではなく、行動を続けてみて、それでも駄目ならば諦める、といったことが実現可能になる。 In this way, by gradually decreasing the action value, for example, the following action can be realized. Assume that when the robot apparatus 1 is performing an action of kicking a ball, the desire for exercise, which is an internal state that causes the action, and the ball, which is an external stimulus, suddenly disappear. At this time, the action value of the action of kicking the ball gradually decreases by the above-described attenuation mechanism, but the robot apparatus 1 does not play the ball until the action value of the other action becomes higher than the action value of the action of kicking the ball. Perform a series of actions related to kicking the ball, such as continuing to search. If a ball is found during this time, the action value of the action of kicking the ball increases again, so that the action can be continued. In other words, it is possible to realize that the behavior is not suddenly switched, but the behavior is continued, and if it still fails, it is given up.

（ｅ）ランダムノイズ
ランダムノイズNoiseは、行動価値にランダムな値を付加するための要素である。この要素を導入することにより、行動価値にバリエーションを持たせることができる。例えば、図１４に示すように、スキーマＡ、Ｂ、Ｃについての行動価値は、要素行動を実行しているときも実行していないときも常に変動している。 (E) Random noise Random noise Noise is an element for adding a random value to the action value. By introducing this element, it is possible to have variations in action value. For example, as shown in FIG. 14, the action values for the schemas A, B, and C always fluctuate both when the element action is executed and when it is not executed.

なお、ランダムノイズの変動幅は任意に設定でき、例えば行動価値の大きさに比例させることができる。 In addition, the fluctuation range of random noise can be set arbitrarily, for example, can be made proportional to the magnitude | size of action value.

（ｆ）最終的な行動価値
上述したように、最終的な行動価値は、モチベーション値Mot、リリーシング値Rel、行動価値バイアスSE、デフォルト行動価値RL、ランダムノイズNoiseの各要素の重み付け和によって算出される。 (F) Final Action Value As described above, the final action value is calculated by the weighted sum of each element of motivation value Mot, releasing value Rel, action value bias SE, default action value RL, and random noise Noise. Is done.

最終的な行動価値を算出する前に、以下の式（４）に従ってモチベーション・リリーシング値MRが算出される。ここで、Ｗ_Ｍは重み係数である。 Before calculating the final action value, the motivation / releasing value MR is calculated according to the following equation (4). Here, _{W M} is a weighting coefficient.

最終的な行動価値ALは、このモチベーション・リリーシング値MRを用いて、以下の式（５）のように算出することができる。ここで、Ｗ_ＳＥは重み係数である。 The final action value AL can be calculated using the motivation / releasing value MR as shown in the following equation (5). Here, W _SE is a weighting factor.

各スキーマは、この行動価値ALに基づいて選択されるため、例えば同じ外部刺激が入力された場合であっても、そのときの内部状態の値によって異なる要素行動が選択され、出力される。 Since each schema is selected based on this action value AL, for example, even when the same external stimulus is input, different element actions are selected and output depending on the value of the internal state at that time.

（２−２）熟考行動とルーチンワークの獲得
熟考行動は、上述のように熟考行動階層１０１によって制御される。この熟考行動とは、与えられた状況或いは人間からの命令により、推論やそれを実現するための計画を立てて行われる行動のことである。 (2-2) Acquisition of contemplation action and routine work The contemplation action is controlled by the contemplation action hierarchy 101 as described above. This pondering action is an action performed by inference and a plan for realizing it in accordance with a given situation or a command from a human.

例えば、ボールを蹴る行動を実行する場合、ロボット装置１は、ボールを見つけ、ボールに近付き、ボールを蹴るという各要素行動を順に実行するように計画を立て、この計画に従って各要素行動を順次実行する必要がある。しかしながら、状況依存行動階層１０２では、上述したように、外部刺激及び内部状態に基づいてスキーマ毎に行動価値を算出し、その行動価値に基づいて一又は複数のスキーマを選択するようにしているため、必ずしも計画通りの順序でスキーマが選択されるとは限らない。 For example, when executing the action of kicking the ball, the robot apparatus 1 makes a plan to sequentially execute each element action of finding the ball, approaching the ball, and kicking the ball, and sequentially executes each element action according to this plan There is a need to. However, in the situation-dependent action hierarchy 102, as described above, the action value is calculated for each schema based on the external stimulus and the internal state, and one or more schemas are selected based on the action value. Schemas are not always selected in the planned order.

そこで、熟考行動階層１０１は、このように特定の計画に従って所望の要素行動を実行させたい場合には、所望の一以上のスキーマに対してインテンショナルバイアス（Intentional Bias；IB）を与え、そのスキーマの行動価値を強制的に引き上げる。インテンショナルバイアスIBが与えられたスキーマは、以下の式（６）に示すように、自身で算出した行動価値ALとインテンショナルバイアスIBとの和を自身の行動価値AL_TOTALとし、この行動価値AL_TOTALを上位（親）のスキーマに対して返す。 Therefore, the contemplation action hierarchy 101 gives an intentional bias (Intentional Bias; IB) to one or more desired schemas when it is desired to execute a desired elemental action according to a specific plan in this way. Forcibly raise the value of action. As shown in the following formula (6), the schema given the intentional bias IB is the sum of the action value AL calculated by itself and the intentional bias IB as the action value AL _TOTAL. Returns _TOTAL for the parent (parent) schema.

インテンショナルバイアスIBによって行動価値を引き上げる例を図１５に示す。図１５は、スキーマＡ〜Ｃの行動価値をこの順序に従って引き上げたものである。すなわち、時刻ｔ１にスキーマＡに対してインテンショナルバイアスIBを与えてスキーマＡを実行させ、時刻ｔ２にスキーマＡが終了するまでインテンショナルバイアスIBを与え続ける。スキーマＡが終了すると、次は時刻ｔ３にスキーマＢに対してインテンショナルバイアスIBを与えてスキーマＢを実行させ、時刻ｔ４にスキーマＢが終了すると、次は時刻ｔ５にスキーマＣに対してインテンショナルバイアスIBを与えてスキーマＣを実行させ、時刻ｔ６にスキーマＣが終了するまでインテンショナルバイアスIBを与え続ける。 FIG. 15 shows an example of raising the action value by the intentional bias IB. FIG. 15 shows the action values of the schemas A to C raised according to this order. In other words, the intentional bias IB is given to the schema A at time t1 to execute the schema A, and the intentional bias IB is continuously given until the schema A ends at time t2. When schema A is completed, next, an intentional bias IB is applied to schema B at time t3 to execute schema B. When schema B is completed at time t4, next, it is incremental to schema C at time t5. The schema C is executed with the bias IB applied, and the incremental bias IB is continuously applied until the schema C ends at time t6.

なお、熟考行動階層１０１は、所望のスキーマに対してインテンショナルバイアスIBを与え、そのスキーマの元の行動価値を単純に引き上げているだけであるため、他のスキーマの行動価値がインテンショナルバイアスIBの加算後の当該スキーマの行動価値よりも大きい場合には、インテンショナルバイアスIBが意味を持たないこともあり得る。 Note that the contemplation behavior hierarchy 101 simply gives an intentional bias IB to a desired schema and simply raises the original behavior value of that schema. If it is larger than the action value of the schema after the addition of, the intentional bias IB may be meaningless.

ところで、同じ一連の行動を何度も繰り返し実行する場合、熟考行動階層１０１は、毎回その一連の行動の計画を立て、その計画に従って一連のスキーマに対してインテンショナルバイアスIBを与えることになる。しかしながら、計算負荷の軽減という観点からは、熟考行動階層１０１が毎回計画を立て、一連のスキーマに対してインテンショナルバイアスIBを与えるのではなく、状況依存行動階層１０２の各スキーマが算出する行動価値に基づいてスキーマを順に選択していった結果、その一連の行動が実行されることが好ましい。 By the way, when the same series of actions is repeatedly executed many times, the contemplation action hierarchy 101 makes a plan for the series of actions each time, and gives an incremental bias IB to the series of schemas according to the plan. However, from the viewpoint of reducing the calculation load, the contemplation action hierarchy 101 makes a plan every time and does not give an incremental bias IB to a series of schemas, but the action value calculated by each schema of the situation-dependent action hierarchy 102 As a result of selecting the schema in order based on the above, it is preferable that the series of actions is executed.

そこで、本実施の形態では、上述したルーチンバイアスRSEを行動価値の算出に導入することにより、熟考行動階層１０１が計画を立てることなく、状況依存行動階層１０２のみで、繰り返し実行された一連の行動（ルーチンワーク）を実行可能としている。すなわち、上述したルーチンバイアスRSEは、インテンショナルバイアスIBの代わりとなるものである。各スキーマは、自身と特定のスキーマ（後述するトリガスキーマ）との組み合わせが学習されたものである場合、このトリガスキーマが実行されると、自身の行動価値にルーチンバイアスRSEを加える。この結果、トリガスキーマの次には自身が実行されることとなる。 Therefore, in the present embodiment, by introducing the above-described routine bias RSE into the calculation of the action value, the series of actions repeatedly executed only in the situation-dependent action hierarchy 102 without planning the contemplation action hierarchy 101. (Routine work) can be executed. That is, the above-described routine bias RSE replaces the intentional bias IB. When each schema is learned from a combination of itself and a specific schema (a trigger schema described later), when this trigger schema is executed, a routine bias RSE is added to its own action value. As a result, the trigger schema itself is executed next.

ルーチンワークを獲得する過程の概要を図１６に示す。以下、この図１６の各ステップについて説明する。 An overview of the process of acquiring routine work is shown in FIG. Hereinafter, each step of FIG. 16 will be described.

先ずステップＳ１では、熟考行動階層１０１により一連の行動を計画し、実行する。すなわち、上述のように計画を立て、その計画に従って一連のスキーマに対してインテンショナルバイアスIBを与えることにより、一連の行動を実行する。 First, in step S1, a series of actions are planned and executed by the contemplation action hierarchy 101. That is, a series of actions are executed by making a plan as described above and giving an intentional bias IB to the series of schemas according to the plan.

次にステップＳ２では、状況依存行動階層１０２がインテンショナルバイアスIBの与えられたスキーマを監視し、インテンションスキーマ履歴（Intentional Schema History；ＩＳＨ）を作成する。このインテンションスキーマ履歴ISHは、インテンショナルバイアスIBの与えられたスキーマの順序と、そのときのインテンショナルバイアスIBの値とを含む。 Next, in step S2, the situation-dependent action hierarchy 102 monitors the schema given the intentional bias IB, and creates an intention schema history (ISH). This intention schema history ISH includes the order of the schema given the intentional bias IB and the value of the intentional bias IB at that time.

続いてステップＳ３では、インテンションスキーマ履歴ISHに基づいてルーチンワークを獲得する。具体的には、インテンションスキーマ履歴ISHから実行されているスキーマとその直前に実行されたスキーマとの２つ組を抜き出す。この直前に実行されたスキーマをトリガスキーマと呼ぶ。各スキーマは、自身のトリガスキーマと、自身に与えられたインテンショナルバイアスIBとからなる候補獲得ルーチンリスト（Candidate Captured Routine List；ＣＣＲＬ）を作成する。そして、この候補獲得ルーチンリストCCRL内のトリガスキーマを所定の条件に基づいてルーチンバイアスRSEを与えるための獲得ルーチンリスト（Captured Routine List；ＣＲＬ）に移すことにより、獲得ルーチンリストCRLを作成する。 In step S3, a routine work is acquired based on the intention schema history ISH. Specifically, two sets of the schema executed from the intention schema history ISH and the schema executed immediately before are extracted. The schema executed immediately before this is called a trigger schema. Each schema creates a candidate captured routine list (CCRL) consisting of its own trigger schema and an intentional bias IB given to itself. Then, the acquisition routine list CRL is created by moving the trigger schema in the candidate acquisition routine list CCRL to an acquisition routine list (Captured Routine List; CRL) for giving a routine bias RSE based on a predetermined condition.

ここで、トリガスキーマを候補獲得ルーチンリストCCRLから獲得ルーチンリストCRLに移す条件としては、例えば、
（ａ）各スキーマとトリガスキーマとの組み合わせが所定回数以上発生したこと
（ｂ）各スキーマとトリガスキーマとの組み合わせが所定回数以上発生し、且つ、その発生確率が所定値以上であること
の何れかとすることができる。 Here, as a condition for moving the trigger schema from the candidate acquisition routine list CCRL to the acquisition routine list CRL, for example,
(A) A combination of each schema and trigger schema has occurred a predetermined number of times (b) Any combination of each schema and trigger schema has occurred a predetermined number of times and the occurrence probability is a predetermined value or more It can be.

また、ルーチンバイアスRSEの加え方としては、例えば、
（ｃ）トリガスキーマが実行されたときに、過去に与えられたインテンショナルバイアスIBの平均値IB_aveを加える
（ｄ）トリガスキーマが実行されたときに、過去に与えられたインテンショナルバイアスIBの平均値IB_aveを、当該スキーマの要素行動とトリガスキーマの要素行動との組み合わせの発生確率でスケーリングした値（インテンショナルバイアスIBの期待値）を加える
（ｅ）トリガスキーマが実行されたときに、過去に与えられたインテンショナルバイアスIBの平均値IB_aveを、当該スキーマとトリガスキーマとの組み合わせの発生確率に従って確率論的に加える
の何れかとすることができる。 Moreover, as a method of adding the routine bias RSE, for example,
(C) When the trigger schema is executed, the average value IB _ave of the previously given intentional bias IB is added. (D) When the trigger schema is executed, the incremental bias IB given in the past is added. (E) When the trigger schema is executed, the average value IB _ave is added with a value (expected value of the intentional bias IB) scaled by the occurrence probability of the combination of the element behavior of the schema and the trigger schema. The mean value IB _ave of the intentional bias IB given in the past can be either added probabilistically according to the occurrence probability of the combination of the schema and the trigger schema.

最後にステップＳ４では、一連の行動をルーチンワークとして実行する。すなわち、各スキーマは、自身のトリガスキーマが実行されると、自身の行動価値にルーチンバイアスRSEを加え、実行準備する。このようにルーチンバイアスRSEを加えることにより、当該スキーマの行動価値は他のスキーマよりも大きくなるため、トリガスキーマが終了すると、次はルーチンバイアスRSEを加えたスキーマが実行されることになる。 Finally, in step S4, a series of actions are executed as routine work. That is, each schema prepares for execution by adding a routine bias RSE to its own action value when its own trigger schema is executed. By adding the routine bias RSE in this way, the action value of the schema becomes larger than that of other schemas. Therefore, when the trigger schema ends, the schema to which the routine bias RSE is added is executed next.

なお、最初のスキーマにはトリガスキーマが存在しないが、この最初のスキーマに対してインテンショナルバイアスIBを一瞬だけ与えることにより、そのスキーマの行動価値を一瞬だけ引き上げればよい。一瞬だけ行動価値を引き上げることによりそのスキーマが選択され、実行されると、最初のスキーマの行動価値にはステータスバイアスSSEが加えられ、行動価値が引き上げられるため、その最初のスキーマは、他のスキーマによって妨げられることなく自身の要素行動を実行することができる。 The trigger schema does not exist in the first schema, but it is only necessary to raise the behavioral value of the schema for a moment by giving an intentional bias IB to the first schema for a moment. When the schema is selected and executed by raising the behavioral value for a moment, a status bias SSE is added to the behavioral value of the first schema and the behavioral value is raised, so that the first schema is another schema You can perform your own elemental actions without being interrupted by

ルーチンバイアスRSEにより行動価値を引き上げる例を図１７に示す。図１７は、スキーマＡ、Ｂ、Ｃの要素行動をこの順序でルーチンワークとして獲得したものである。すなわち、スキーマＢのトリガスキーマはスキーマＡであり、スキーマＣのトリガスキーマはスキーマＢである。図１７において、時刻ｔ１にスキーマＡに対してインテンショナルバイアスIBを一瞬だけ与えてスキーマＡを実行させると、スキーマＡがトリガスキーマとなるスキーマＢは、時刻ｔ２に行動価値にルーチンバイアスRSEを加える。時刻ｔ３にスキーマＡが終了すると、その時点ではルーチンバイアスRSEの加えられたスキーマＢの行動価値が最も大きいため、時刻ｔ４にスキーマＢが実行され、スキーマＢの行動価値にステータスバイアスSSEが加えられる。同様に、時刻ｔ４にスキーマＢが実行されると、スキーマＢがトリガスキーマとなるスキーマＣは、時刻ｔ５に行動価値にルーチンバイアスRSEを加える。時刻ｔ６にスキーマＢが終了すると、その時点ではルーチンバイアスRSEの加えられたスキーマＣの行動価値が最も大きいため、時刻ｔ７にスキーマＣが実行され、時刻ｔ８にスキーマＣが終了するまで、スキーマＣの行動価値にステータスバイアスSSEが加えられる。 FIG. 17 shows an example of raising the action value by the routine bias RSE. FIG. 17 shows the element actions of the schemas A, B, and C acquired in this order as routine work. That is, the trigger schema of schema B is schema A, and the trigger schema of schema C is schema B. In FIG. 17, when the intentional bias IB is given to the schema A for a moment and the schema A is executed at the time t1, the schema B, which is the trigger schema, adds the routine bias RSE to the action value at the time t2. . When the schema A ends at the time t3, the behavior value of the schema B to which the routine bias RSE is added is the largest at that time, so the schema B is executed at the time t4, and the status bias SSE is added to the behavior value of the schema B. . Similarly, when schema B is executed at time t4, schema C whose schema B is the trigger schema adds a routine bias RSE to the action value at time t5. When schema B ends at time t6, the behavior value of schema C to which the routine bias RSE has been added is the largest at that time, so schema C is executed at time t7, and until the end of schema C at time t8. A status bias SSE is added to the action value of.

なお、各スキーマは、トリガスキーマが実行されたときに自身の行動価値にルーチンバイアスRSEを加え、自身の行動価値を単純に引き上げているだけであるため、自身のデフォルト行動価値RLが低い場合には、ルーチンバイアスRSEを加えたとしても、他のスキーマの行動価値より低くなることもあり得る。 Each schema simply adds the routine bias RSE to its own action value when the trigger schema is executed, and simply raises its own action value, so when its own default action value RL is low Even if routine bias RSE is added, it can be lower than the behavioral value of other schemas.

ここで、熟考行動階層１０１は、状況依存行動階層１０２がルーチンワークを獲得したことを知らないため、同じ一連の行動を実行させたいときに、一連のスキーマに対してインテンショナルバイアスIBを与えてしまう虞がある。そこで、各スキーマは、自身の行動価値にルーチンバイアスRSEを加えた場合には、そのルーチンバイアスRSEを熟考行動階層１０１に通知するものとする。このルーチンバイアスRSEが通知されると、熟考行動階層１０１は、その一連のスキーマに対してインテンショナルバイアスIBを与えることを停止する。 Here, the contemplation action hierarchy 101 does not know that the situation-dependent action hierarchy 102 has acquired the routine work, so when it is desired to execute the same series of actions, an intentional bias IB is given to the series of schemas. There is a risk of it. Therefore, when a routine bias RSE is added to its own action value, each schema notifies the contemplation action hierarchy 101 of the routine bias RSE. When this routine bias RSE is notified, the contemplation behavior layer 101 stops giving the intentional bias IB to the series of schemas.

（３）行動制御に関する具体的な実験例
最後に、ロボット装置１の行動制御に関する具体的な実験例について説明する。この実験例では、状況依存行動階層に図１８のようなスキーマのツリー構造が構成されているものとする。このツリー構造は、図１８に示すように、ルートスキーマ「Root」の下位の階層に、「ベルを見つける（FindBell）」、「ベルを鳴らす（RingBell）」、「眠る（Sleep）」、「教室に行く（GoToClass）」、「サッカーをする（Soccer）」、「歌う（Sing）」というスキーマが配設されたものである。 (3) Specific Experimental Example Regarding Behavior Control Finally, a specific experimental example regarding behavior control of the robot apparatus 1 will be described. In this experimental example, it is assumed that a tree structure of a schema as shown in FIG. 18 is configured in the situation-dependent action hierarchy. As shown in FIG. 18, this tree structure includes “FindBell”, “Ring Bell”, “Sleep”, “Classroom” in the hierarchy below the root schema “Root”. GoToClass ”,“ Soccer ”,“ Sing ”schemas are arranged.

以下に示す第１乃至第４の実験では、「ベルを見つける（FindBell）」、「ベルを鳴らす（RingBell）」、「教室に行く（GoToClass）」、「歌う（Sing）」の各要素行動からなる一連の行動をルーチンワークとして獲得した例について説明する。 In the following first to fourth experiments, the element behaviors of “Find Bell”, “Ring Bell”, “Go to Class”, “Sing” An example of acquiring a series of actions as routine work will be described.

なお、この第１乃至第４の実験では、各スキーマは、自身とトリガスキーマとの組み合わせが所定の回数以上、且つ所定の確率以上発生した場合に、トリガスキーマを候補獲得ルーチンリストCCRLから獲得ルーチンリストCRLに移した。 In the first to fourth experiments, each schema has a routine for acquiring the trigger schema from the candidate acquisition routine list CCRL when the combination of itself and the trigger schema has occurred a predetermined number of times or more and a predetermined probability. Moved to list CRL.

（３−１）第１の実験
先ず、第１の実験について説明する。第１の実験で実行した計画及びその割合と、各スキーマが作成した獲得ルーチンリストCRLとを図１９に示す。第１の実験では、「教室に行く（GoToClass）」、「ベルを見つける（FindBell）」、「ベルを鳴らす（RingBell）」、「歌う（Sing）」という計画のみを１００％の割合で実行した。熟考行動階層１０１は、この計画に従って各スキーマに対してインテンショナルバイアスIBを与えた。 (3-1) First Experiment First, the first experiment will be described. FIG. 19 shows the plan executed in the first experiment and its ratio, and the acquisition routine list CRL created by each schema. In the first experiment, only “Go to Class”, “FindBell”, “RingBell”, and “Sing” plans were executed at a rate of 100%. . The contemplation behavior hierarchy 101 gave an intentional bias IB to each schema according to this plan.

各スキーマの獲得ルーチンリストCRLには、トリガスキーマ、自身に与えられたインテンショナルバイアスIBの平均値IB_ave、トリガスキーマの発生割合が記述されている。例えば、スキーマ「ベルを見つける（FindBell）」の獲得ルーチンリストCRLでは、スキーマ「教室に行く（GoToClass）」がトリガスキーマとなっている。括弧内の“１００”という数字は、スキーマ「ベルを見つける（FindBell）」に与えられたインテンショナルバイアスIBの平均値IB_aveを示している。また、“２０／２０”という数字は、計画を実行した２０回のうち、スキーマ「教室に行く（GoToClass）」がトリガスキーマであった回数が２０回であることを示している。スキーマ「教室に行く（GoToClass）」は、常に最初に実行されるため、トリガスキーマは存在しない。 The acquisition routine list CRL for each schema describes the trigger schema, the average value IB _ave of the intentional bias IB given to the schema, and the generation ratio of the trigger schema. For example, in the acquisition routine list CRL for the schema “FindBell”, the schema “Go to Class” is the trigger schema. The number “100” in parentheses indicates the average value IB _ave of the intentional bias IB given to the schema “FindBell”. The number “20/20” indicates that the number of times that the schema “Go to Class” was the trigger schema among the 20 times that the plan was executed was 20 times. Since the schema “Go To Class” is always executed first, there is no trigger schema.

ここで、従来のように熟考行動階層１０１がインテンショナルバイアスIBを与えて計画を実行させる場合の行動価値の推移を図２０（Ａ）に示す。図２０（Ａ）に示すように、ロボット装置１は、スキーマ「眠る（Sleep）」を実行していたが、熟考行動階層１０１により中断される。熟考行動階層１０１は、一連のスキーマに対して順にインテンショナルバイアスIBを与えることにより、計画を実行する。行動価値の推移のグラフは、図１５と同様の形状となっている。 Here, FIG. 20 (A) shows the transition of the action value when the contemplation action hierarchy 101 gives the intentional bias IB and executes the plan as in the conventional case. As shown in FIG. 20A, the robot apparatus 1 is executing the schema “Sleep”, but is interrupted by the contemplation action hierarchy 101. The contemplation behavior hierarchy 101 executes a plan by sequentially giving an intentional bias IB to a series of schemas. The behavior value transition graph has the same shape as FIG.

一方、第１の実験における行動価値の推移を図２０（Ｂ）に示す。この第１の実験では、過去に与えられたインテンショナルバイアスIBの期待値をルーチンバイアスRSEとした。図２０（Ｂ）に示すように、ロボット装置１は、スキーマ「眠る（Sleep）」を実行していたが、熟考行動階層１０１により中断される。熟考行動階層１０１は、スキーマ「教室に行く（GoToClass）」に対して一瞬だけインテンショナルバイアスIBを与えることにより、計画を開始する。スキーマ「教室に行く（GoToClass）」が実行されると、その次のスキーマ「ベルを見つける（FindBell）」の行動価値にルーチンバイアスRSEが加えられ、実行準備中となる。同様にして、一連の行動が実行される。行動価値の推移のグラフは、図１７と同様の形状となっている。 On the other hand, the behavior value transition in the first experiment is shown in FIG. In this first experiment, the expected value of the intentional bias IB given in the past was used as the routine bias RSE. As illustrated in FIG. 20B, the robot apparatus 1 is executing the schema “Sleep”, but is interrupted by the contemplation action hierarchy 101. The contemplation action hierarchy 101 starts planning by giving an intentional bias IB for a moment to the schema “Go to Class”. When the schema “Go to Class” is executed, the routine bias RSE is added to the action value of the next schema “FindBell” and it is ready to execute. Similarly, a series of actions are executed. The behavior value transition graph has the same shape as FIG.

（３−２）第２の実験
次に、第２の実験について説明する。第２の実験で実行した計画及びその割合と、各スキーマが作成した獲得ルーチンリストCRLとを図２１に示す。第２の実験では、「教室に行く（GoToClass）」、「ベルを見つける（FindBell）」、「ベルを鳴らす（RingBell）」、「歌う（Sing）」という計画を５０％の割合で実行すると共に、「教室に行く（GoToClass）」、「歌う（Sing）」、「ベルを見つける（FindBell）」という計画を５０％の割合で実行した。熟考行動階層１０１は、この計画に従って各スキーマに対してインテンショナルバイアスIBを与えた。 (3-2) Second Experiment Next, a second experiment will be described. FIG. 21 shows the plan executed in the second experiment and its ratio, and the acquisition routine list CRL created by each schema. In the second experiment, the plans of “going to the classroom (GoToClass)”, “finding the bell (FindBell)”, “ringing the bell (RingBell)” and “singing (Sing)” are executed at a rate of 50%. , “Go to Class”, “Sing” and “FindBell” were implemented at a 50% rate. The contemplation behavior hierarchy 101 gave an intentional bias IB to each schema according to this plan.

この実験では２種類の計画を実行したため、スキーマ「ベルを見つける（FindBell）」及びスキーマ「歌う（Sing）」は、２種類のトリガスキーマを有している。一方、２種類の計画の何れにおいても、スキーマ「ベルを鳴らす（RingBell）」の直前にはスキーマ「ベルを見つける（FindBell）」が実行されるため、スキーマ「ベルを鳴らす（RingBell）」のトリガスキーマはスキーマ「ベルを見つける（FindBell）」のみである。 Since this experiment performed two types of plans, the schema “FindBell” and the schema “Sing” have two types of trigger schemas. On the other hand, in either of the two types of plans, the schema “FindBell” is executed immediately before the schema “RingBell”, so the trigger of the schema “RingBell” is triggered. The only schema is the schema “FindBell”.

第２の実験における行動価値の推移を図２２に示す。この第２の実験では、過去に与えられたインテンショナルバイアスIBの期待値をルーチンバイアスRSEとした。図２２（Ａ）は、スキーマ「ベルを見つける（FindBell）」とスキーマ「歌う（Sing）」とのデフォルト行動価値RLが略々同じ場合の行動価値の推移を示すものであり、図２２（Ｂ）は、スキーマ「ベルを見つける（FindBell）」のデフォルト行動価値RLがスキーマ「歌う（Sing）」のデフォルト行動価値RLよりも小さい場合の行動価値の推移を示すものである。 The transition of the action value in the second experiment is shown in FIG. In this second experiment, the expected value of the intentional bias IB given in the past was used as the routine bias RSE. FIG. 22A shows the transition of the behavior value when the default behavior value RL of the schema “FindBell” and the schema “Sing” is substantially the same, and FIG. ) Shows the transition of the behavior value when the default behavior value RL of the schema “FindBell” is smaller than the default behavior value RL of the schema “Sing”.

スキーマ「ベルを見つける（FindBell）」とスキーマ「歌う（Sing）」とのデフォルト行動価値RLが略々同じ場合、図２２（Ａ）に示すように、スキーマ「教室に行く（GoToClass）」が実行されると、スキーマ「ベルを見つける（FindBell）」及びスキーマ「歌う（Sing）」の行動価値にルーチンバイアスRSEが加えられ、実行準備中となる。但し、両者のデフォルト行動価値RLが略々同じであり、また、両者の発生割合が５０％同士で等しく、両者に与えられるルーチンバイアスRSEも略々同じとなるため、両者の行動価値の大小はランダムノイズNoiseによって決定される。スキーマ「教室に行く（GoToClass）」の次に何れのスキーマが実行されるかは、スキーマ「教室に行く（GoToClass）」が終了したときの行動価値の大小による。図２２（Ａ）の場合、スキーマ「教室に行く（GoToClass）」が終了したときの行動価値は、スキーマ「歌う（Sing）」よりもスキーマ「ベルを見つける（FindBell）」の方が大きかったため、スキーマ「教室に行く（GoToClass）」の次にはスキーマ「ベルを見つける（FindBell）」が実行されている。その後、スキーマ「ベルを鳴らす（RingBell）」、スキーマ「歌う（Sing）」が順に実行され、計画を終了する。 When the default action values RL of the schema “FindBell” and the schema “Sing” are substantially the same, as shown in FIG. 22A, the schema “Go to Class” is executed. Then, the routine bias RSE is added to the action values of the schema “FindBell” and the schema “Sing”, and it is ready for execution. However, both default action values RL are substantially the same, both occurrence ratios are equal at 50%, and routine bias RSE given to both is also substantially the same. Random noise is determined by Noise. Which schema is executed next to the schema “Go to Class (GoToClass)” depends on the behavior value when the schema “Go to Class (GoToClass)” ends. In the case of FIG. 22 (A), the behavior value when the schema “Go To Class” ends is larger in the schema “FindBell” than in the schema “Sing”. Next to the schema “Go To Class” is the schema “FindBell”. Thereafter, the schema “RingBell” and the schema “Sing” are executed in order, and the plan ends.

一方、スキーマ「ベルを見つける（FindBell）」のデフォルト行動価値RLがスキーマ「歌う（Sing）」のデフォルト行動価値RLよりも小さい場合、図２２（Ｂ）に示すように、両者に与えられるルーチンバイアスRSEは略々同じであるものの、両者のデフォルト行動価値RLが異なるため、スキーマ「ベルを見つける（FindBell」の行動価値よりもスキーマ「歌う（Sing）」の行動価値の方が大きくなり、スキーマ「教室に行く（GoToClass）」の次にはスキーマ「歌う（Sing）」が実行されている。その後、スキーマ「ベルを見つける（FindBell」、スキーマ「ベルを鳴らす（RingBell）」が順に実行され、計画を終了する。 On the other hand, when the default action value RL of the schema “FindBell” is smaller than the default action value RL of the schema “Sing”, as shown in FIG. Although the RSE is almost the same, but the default action value RL of the two is different, the action value of the schema “Sing” is larger than the action value of the schema “FindBell”, and the schema “ Next to “Go to Class”, the schema “Sing” is executed, then the schema “FindBell”, schema “RingBell” is executed in order, and the plan Exit.

（３−３）第３の実験
次に、第３の実験について説明する。第３の実験で実行した計画及びその割合と、各スキーマが作成した獲得ルーチンリストCRLとを図２３に示す。第３の実験では、「教室に行く（GoToClass）」、「ベルを見つける（FindBell）」、「ベルを鳴らす（RingBell）」、「歌う（Sing）」という計画を７５％の割合で実行すると共に、「教室に行く（GoToClass）」、「歌う（Sing）」、「ベルを見つける（FindBell）」という計画を２５％の割合で実行した。熟考行動階層１０１は、この計画に従って各スキーマに対してインテンショナルバイアスIBを与えた。 (3-3) Third Experiment Next, a third experiment will be described. FIG. 23 shows the plan executed in the third experiment and its ratio, and the acquisition routine list CRL created by each schema. In the third experiment, the plan of “going to the classroom (GoToClass)”, “finding the bell (FindBell)”, “ringing the bell (RingBell)”, “singing (Sing)” is executed at a rate of 75%. , “Go to Class”, “Sing”, “FindBell” plans were executed at a rate of 25%. The contemplation behavior hierarchy 101 gave an intentional bias IB to each schema according to this plan.

この実験でも２種類の計画を実行したため、スキーマ「ベルを見つける（FindBell）」及びスキーマ「歌う（Sing）」は、２種類のトリガスキーマを有している。一方、２種類の計画の何れにおいても、スキーマ「ベルを鳴らす（RingBell）」の直前にはスキーマ「ベルを見つける（FindBell）」が実行されるため、スキーマ「ベルを鳴らす（RingBell）」のトリガスキーマはスキーマ「ベルを見つける（FindBell）」のみである。 Since two types of plans were executed in this experiment, the schema “FindBell” and the schema “Sing” have two types of trigger schemas. On the other hand, in either of the two types of plans, the schema “FindBell” is executed immediately before the schema “RingBell”, so the trigger of the schema “RingBell” is triggered. The only schema is the schema “FindBell”.

第３の実験における行動価値の推移を図２４に示す。この第３の実験では、過去に与えられたインテンショナルバイアスIBの期待値をルーチンバイアスRSEとした。図２４（Ａ）は、スキーマ「ベルを見つける（FindBell）」とスキーマ「歌う（Sing）」とのデフォルト行動価値RLが略々同じ場合の行動価値の推移を示すものであり、図２４（Ｂ）は、スキーマ「ベルを見つける（FindBell）」のデフォルト行動価値RLがスキーマ「歌う（Sing）」のデフォルト行動価値RLよりも小さい場合の行動価値の推移を示すものである。 The transition of the action value in the third experiment is shown in FIG. In this third experiment, the expected value of the intentional bias IB given in the past was used as the routine bias RSE. FIG. 24A shows the transition of the behavior value when the default behavior value RL of the schema “FindBell” and the schema “Sing” is substantially the same, and FIG. ) Shows the transition of the behavior value when the default behavior value RL of the schema “FindBell” is smaller than the default behavior value RL of the schema “Sing”.

スキーマ「ベルを見つける（FindBell）」とスキーマ「歌う（Sing）」とのデフォルト行動価値RLが略々同じ場合、図２４（Ａ）に示すように、スキーマ「教室に行く（GoToClass）」が実行されると、スキーマ「ベルを見つける（FindBell）」及びスキーマ「歌う（Sing）」の行動価値にルーチンバイアスRSEが加えられ、実行準備中となる。この際、両者に与えられるルーチンバイアスRSEは、過去に与えられたインテンショナルバイアスIBの期待値である。すなわち、第３の実験では、スキーマ「教室に行く（GoToClass）」の次にスキーマ「ベルを見つける（FindBell）」が実行される割合が７５％であり、スキーマ「歌う（Sing）」が実行される割合が２５％であるため、両者に与えられるルーチンバイアスRSEもその割合を反映した値となる。この結果、スキーマ「歌う（Sing）」の行動価値よりもスキーマ「ベルを見つける（FindBell）」の行動価値の方が大きくなるため、スキーマ「教室に行く（GoToClass）」の次にはスキーマ「ベルを見つける（FindBell）」が実行されている。その後、スキーマ「ベルを鳴らす（RingBell）」、スキーマ「歌う（Sing）」が順に実行され、計画を終了する。 When the default action values RL of the schema “FindBell” and the schema “Sing” are substantially the same, as shown in FIG. 24A, the schema “Go to Class” is executed. Then, the routine bias RSE is added to the action values of the schema “FindBell” and the schema “Sing”, and it is ready for execution. At this time, the routine bias RSE given to both is an expected value of the intentional bias IB given in the past. That is, in the third experiment, the schema “FindBell” is executed after the schema “Go to Class (GoToClass)” is 75%, and the schema “Sing” is executed. Since the ratio is 25%, the routine bias RSE given to both is also a value reflecting the ratio. As a result, the behavior value of the schema “FindBell” is larger than the behavior value of the schema “Sing”. Therefore, the schema “Bell” goes to the schema “Go to Class”. "FindBell" is running. Thereafter, the schema “RingBell” and the schema “Sing” are executed in order, and the plan ends.

一方、スキーマ「ベルを見つける（FindBell）」のデフォルト行動価値RLがスキーマ「歌う（Sing）」のデフォルト行動価値RLよりも小さい場合、図２４（Ｂ）に示すように、スキーマ「ベルを見つける（FindBell）」に与えられるルーチンバイアスRSEは、スキーマ「歌う（Sing）」に与えられるルーチンバイアスRSEよりも大きいものの、両者のデフォルト行動価値RLが異なるため、スキーマ「ベルを見つける（FindBell）」の行動価値よりもスキーマ「歌う（Sing）」の行動価値の方が大きくなり、スキーマ「教室に行く（GoToClass）」の次にはスキーマ「歌う（Sing）」が実行されている。その後、スキーマ「ベルを見つける（FindBell」、スキーマ「ベルを鳴らす（RingBell）」が順に実行され、計画を終了する。 On the other hand, when the default action value RL of the schema “FindBell” is smaller than the default action value RL of the schema “Sing”, as shown in FIG. The routine bias RSE given to FindBell) is larger than the routine bias RSE given to the schema Sing, but because the default action value RL of the two is different, the behavior of the schema FindFell The action value of the schema “Sing” is greater than the value, and the schema “Sing” is executed next to the schema “Go to Class”. Thereafter, the schema “Find Bell” (FindBell) and schema “Ring Bell” are executed in order, and the plan is finished.

（３−４）第４の実験
最後に、第４の実験について説明する。第４の実験で実行した計画及びその割合と、各スキーマが作成した獲得ルーチンリストCRLとは、上述の第３の実験と同じである。 (3-4) Fourth Experiment Finally, the fourth experiment will be described. The plan executed in the fourth experiment and its ratio and the acquisition routine list CRL created by each schema are the same as those in the third experiment described above.

第４の実験における行動価値の推移を図２５に示す。この第４の実験では、ルーチンバイアスRSEを加えるスキーマを確率論に基づいて決定した。図２５に示すように、スキーマ「教室に行く（GoToClass）」が実行されると、スキーマ「ベルを見つける（FindBell）」又はスキーマ「歌う（Sing）」の行動価値にルーチンバイアスRSEが加えられ、実行準備中となる。この際、何れのスキーマにルーチンバイアスRSEが加えられるかは、確率論に基づいて決定される。すなわち、第３の実験では、スキーマ「教室に行く（GoToClass）」の次にスキーマ「ベルを見つける（FindBell）」が実行される割合が７５％であり、スキーマ「歌う（Sing）」が実行される割合が２５％であるため、７５％の確率でスキーマ「ベルを見つける（FindBell）」に対してルーチンバイアスRSEが加えられ、２５％の確率でスキーマ「歌う（Sing）」に対してルーチンバイアスRSEが加えられる。なお、各スキーマの行動価値は単位時間毎に算出されるため、ルーチンバイアスRSEが加えられるスキーマも単位時間毎に決定される。図２５の場合、スキーマ「教室に行く（GoToClass）」が終了したときの行動価値は、スキーマ「歌う（Sing）」よりもスキーマ「ベルを見つける（FindBell）」の方が大きかったため、スキーマ「教室に行く（GoToClass）」の次にはスキーマ「ベルを見つける（FindBell）」が実行されている。その後、スキーマ「ベルを鳴らす（RingBell）」、スキーマ「歌う（Sing）」が順に実行され、計画を終了する。 The transition of the action value in the fourth experiment is shown in FIG. In this fourth experiment, the schema for applying the routine bias RSE was determined based on probability theory. As shown in FIG. 25, when the schema “Go To Class” is executed, a routine bias RSE is added to the action value of the schema “FindBell” or schema “Sing”, Preparing for execution. At this time, to which schema the routine bias RSE is added is determined based on probability theory. That is, in the third experiment, the schema “FindBell” is executed after the schema “Go to Class (GoToClass)” is 75%, and the schema “Sing” is executed. The rate of 25% is 25%, so a routine bias RSE is added to the schema “FindBell” with a probability of 75%, and a routine bias to the schema “Sing” with a probability of 25%. RSE is added. Since the action value of each schema is calculated every unit time, the schema to which the routine bias RSE is added is also determined every unit time. In the case of FIG. 25, since the schema “FindBell” is larger than the schema “Sing”, the behavior value when the schema “Go to Class (GoToClass)” ends is larger than the schema “Classroom”. Next to “Go To Class”, the schema “FindBell” is executed. Thereafter, the schema “RingBell” and the schema “Sing” are executed in order, and the plan ends.

以上、本発明を実施するための最良の形態について説明したが、本発明は上述した実施の形態のみに限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能であることは勿論である。 Although the best mode for carrying out the present invention has been described above, the present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the scope of the present invention. Of course.

本実施の形態におけるロボット装置の外観構成を示す図である。It is a figure which shows the external appearance structure of the robot apparatus in this Embodiment. ロボット装置の機能構成を示す図である。It is a figure which shows the function structure of a robot apparatus. ロボット装置の制御ユニットの構成を示す図である。It is a figure which shows the structure of the control unit of a robot apparatus. ロボット装置の行動制御システムの機能構成を示す図である。It is a figure which shows the function structure of the action control system of a robot apparatus. 行動制御システムのオブジェクト構成を示す図である。It is a figure which shows the object structure of a behavior control system. 行動制御システムの状況依存行動階層による状況依存行動制御の形態を示す図である。It is a figure which shows the form of the situation dependence action control by the situation dependence action hierarchy of an action control system. 状況依存行動階層が複数のスキーマによって構成されている様子を示す図である。It is a figure which shows a mode that the situation dependence action hierarchy is comprised by the some schema. 状況依存行動階層におけるスキーマのツリー構造の一例を示す図である。It is a figure which shows an example of the tree structure of the schema in a situation dependence action hierarchy. 内部状態値と欲求値との関係の一例を示す図である。It is a figure which shows an example of the relationship between an internal state value and a desire value. 行動価値算出データの一例を示す図である。It is a figure which shows an example of action value calculation data. 内部状態値と満足度との関係の一例を示す図である。It is a figure which shows an example of the relationship between an internal state value and satisfaction. 行動価値を算出するための要素であるステータスバイアスを説明する図である。It is a figure explaining the status bias which is an element for calculating action value. 行動価値を算出するための要素であるデフォルト行動価値と、減衰パラメータとを説明する図である。It is a figure explaining the default action value which is an element for calculating action value, and an attenuation parameter. 行動価値を算出するための要素であるランダムノイズを説明する図である。It is a figure explaining the random noise which is an element for calculating action value. インテンショナルバイアスによって行動価値を引き上げる例を示す図である。It is a figure which shows the example which raises action value by an intentional bias. ルーチンワークを獲得する過程の概要を示すフローチャートである。It is a flowchart which shows the outline | summary of the process which acquires routine work. ルーチンバイアスによって行動価値を引き上げる例を示す図である。It is a figure which shows the example which raises action value by routine bias. 第１乃至第４の実験におけるスキーマのツリー構造を示す図である。It is a figure which shows the tree structure of the schema in the 1st thru | or 4th experiment. 第１の実験において実行した計画と、各スキーマの獲得ルーチンリストとを示す図である。It is a figure which shows the plan performed in the 1st experiment, and the acquisition routine list | wrist of each schema. インテンショナルバイアスを与えて計画を実行させる場合の行動価値の推移と、第１の実験における行動価値の推移とを示す図である。It is a figure which shows the transition of the action value in the case of giving an intentional bias and performing a plan, and the transition of the action value in a 1st experiment. 第２の実験において実行した計画と、各スキーマの獲得ルーチンリストとを示す図である。It is a figure which shows the plan performed in the 2nd experiment, and the acquisition routine list | wrist of each schema. 第２の実験における行動価値の推移を示す図である。It is a figure which shows transition of the action value in a 2nd experiment. 第３の実験において実行した計画と、各スキーマの獲得ルーチンリストとを示す図である。It is a figure which shows the plan performed in the 3rd experiment, and the acquisition routine list | wrist of each schema. 第３の実験における行動価値の推移を示す図である。It is a figure which shows transition of the action value in a 3rd experiment. 第４の実験における行動価値の推移を示す図である。It is a figure which shows transition of the action value in a 4th experiment.

Explanation of symbols

１ロボット装置、１０行動制御システム、２０制御ユニット、４０入出力部、５０駆動部、８０外部刺激認識部、９１内部状態管理部、９２短期記憶部、９３長期記憶部、１０１熟考行動階層、１０２状況依存行動階層 DESCRIPTION OF SYMBOLS 1 Robot apparatus, 10 Action control system, 20 Control unit, 40 Input / output part, 50 Drive part, 80 External stimulus recognition part, 91 Internal state management part, 92 Short-term memory part, 93 Long-term memory part, 101 Contemplation action hierarchy, 102 Situation-dependent behavior hierarchy

Claims

In a robot apparatus capable of acting autonomously in response to an external stimulus and / or an internal state,
A plurality of action description modules each describing a predetermined element action and calculating an action value representing an execution priority of the element action according to an external stimulus and / or an internal state;
Action selection means for selecting one or more action description modules based on the magnitude of execution priority of each action description module, and expressing the element actions described in the selected action description module;
When each of the above behavior description modules has learned a combination of its own element behavior and a specific element behavior, when the specific element behavior is expressed, the behavior description module sets a first bias value to its own action value. A robot device characterized by adding.

A behavior control means for sequentially adding a second bias value to the behavior value of each behavior description module included in the series of behavior description modules based on a predetermined plan;
Each behavior description module included in the series of behavior description modules creates a history of the element behavior immediately before itself, and learns a combination of the element behavior of itself and the immediately preceding element behavior based on the history. The robot apparatus according to claim 1.

3. Each behavior description module included in the series of behavior description modules learns a combination of the element behavior and a specific element behavior when the combination has occurred a predetermined number of times or more. Robotic device.

Each behavior description module included in the series of behavior description modules has a combination of its own element behavior and a specific element behavior for a predetermined number of times, and the immediately preceding element behavior is the specific element behavior. The robot apparatus according to claim 2, wherein the combination is learned when the probability is greater than or equal to a predetermined value.

3. Each action description module included in the series of action description modules uses an average value of the second bias value added to its own action value as the first bias value. Robotic device.

Each behavior description module included in the series of behavior description modules has learned a combination of its own element behavior and a specific element behavior. A value obtained by scaling the average value of the second bias value added to the value with the probability that the element action immediately before the value is the specific element action is added to the action value as the first bias value. The robot apparatus according to claim 5.

Each behavior description module included in the series of behavior description modules has learned a combination of its own element behavior and a specific element behavior. The average value of the second bias value added to the value is set as the first bias value, and is added to the own action value according to the probability that the immediately preceding element action is the specific element action. The robot apparatus according to claim 5.

Each behavior description module included in the series of behavior description modules adds the first bias value to its own behavior value, and notifies the behavior control means of the first bias value,
The robot apparatus according to claim 2, wherein the behavior control means stops adding the second bias value to the behavior value of each behavior description module included in the series of behavior description modules.

In a behavior control method of a robot apparatus capable of acting autonomously according to an external stimulus and / or an internal state,
Based on the magnitude of execution priority from a plurality of action description modules, each of which describes a predetermined element action and calculates an action value that represents the execution priority of its own element action according to an external stimulus and / or internal state An action selection step of selecting one or more action description modules;
A behavior expression step for expressing the element behavior described in the behavior description module selected in the behavior selection step,
When each of the above behavior description modules has learned a combination of its own element behavior and a specific element behavior, when the specific element behavior is expressed, the behavior description module sets a first bias value to its own action value. A behavior control method for a robot apparatus, characterized by comprising: