JP2002264057A

JP2002264057A - Robot device, action control method for robot device, program and recording medium

Info

Publication number: JP2002264057A
Application number: JP2001069137A
Authority: JP
Inventors: Hiroaki Ogawa; 浩明小川
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2001-03-12
Filing date: 2001-03-12
Publication date: 2002-09-18

Abstract

PROBLEM TO BE SOLVED: To learn action and to connect an internal state to the learned action. SOLUTION: This robot device is provided with a learning part 1 for learning new action, and a corresponding part 2 for making the internal state such as emotion correspond to the action learned by the learning part 1. When put in a certain emotional state, the robot device can make the corresponding learned action appear or can change the corresponding emotional state according to the action when taking learned action.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、自律的に行動する
ロボット装置及びそのようなロボット装置の行動制御方
法、そのようなロボット装置の行動を制御するためのプ
ログラム、及びそのようなプログラムが記録された記録
媒体に関する。The present invention relates to a robot apparatus that behaves autonomously, a method for controlling the action of such a robot apparatus, a program for controlling the action of such a robot apparatus, and a program for recording such a program. The recorded recording medium.

【０００２】[0002]

【従来の技術】自律型のロボット装置のアプリケーショ
ンであるペット型のロボット装置等の自律型のエンター
テイメントロボット装置は、内的状態を示す感情モデル
や本能モデル等を持ち、感情モデルや本能モデルの出力
を動作に反映することで、より生物的な動作をすること
が可能とされている。ここで例えば、感情モデルとは、
感情の状態を数値或いはプログラムとしてモデル化した
もので、本能モデルとは、本能の状態を数値或いはプロ
グラムとしてモデル化したものである。2. Description of the Related Art An autonomous entertainment robot apparatus such as a pet-type robot apparatus which is an application of an autonomous robot apparatus has an emotion model or an instinct model indicating an internal state, and outputs an emotion model or an instinct model. Is reflected in the movement, thereby making it possible to perform a more biological movement. Here, for example, the emotion model is
The state of emotion is modeled as a numerical value or a program, and the instinct model is a model of the state of the instinct as a numerical value or a program.

【０００３】感情モデル等については、予め定められた
プログラムであり、ロボット装置は、予め規定された行
動の実行、経過時間、センサ入力（ユーザ入力を含む）
等を与えることで上述のような感情モデル等のパラメー
タを変更し、変更後のパラメータが所定の閾値を超えた
場合、所定の動作を出現するようになされている。[0003] The emotion model and the like are predetermined programs, and the robot apparatus executes predetermined actions, elapsed time, sensor input (including user input).
By changing the parameters such as the emotion model as described above, the predetermined operation appears when the changed parameter exceeds a predetermined threshold.

【０００４】また、ロボット装置は、複数の行動パター
ンデータ（或いは行動計画データ）を保持し或いは登録
がなされており、上述したような感情モデルに応じて一
の行動パターンデータを自動的に選択して、選択した行
動パターンデータを再生することにより、多彩な自律行
動を出現させている。Further, the robot apparatus holds or registers a plurality of action pattern data (or action plan data), and automatically selects one action pattern data according to the emotion model as described above. Then, by reproducing the selected action pattern data, various autonomous actions appear.

【０００５】一方、予め決定されている行動パターンデ
ータを再生するのではなく、周囲の環境や状況に応じた
行動をするロボット装置も提案されている。すなわち、
行動パターンデータを保持し、その行動パターンデータ
を利用して行動をするのではなく、外部環境等に応じて
その場限りの行動を出現させるロボット装置といったも
のがある。On the other hand, there has been proposed a robot apparatus which does not reproduce predetermined action pattern data, but performs an action according to the surrounding environment and situation. That is,
There is a robot device that holds action pattern data and does not act using the action pattern data, but causes an ad hoc action to appear according to an external environment or the like.

【０００６】具体的には、特開2000-122992号公報に
は、報酬（reward、リワード）を行動意欲の基準とする
ことで、外部環境等に応じて行動範囲を選ぶように自律
的に行動するロボット装置の技術が提案されている。Specifically, Japanese Patent Application Laid-Open No. 2000-122992 discloses that a reward (reward) is used as a criterion of motivation to act autonomously so as to select a range of action according to an external environment or the like. There has been proposed a technology of a robot device that performs the following.

【０００７】また、特開平11-126198号公報には、リカ
レント型ニューラルネットワーク（以下、ＲＮＮとい
う。）を用いて行動の学習を行う技術が提案されてい
る。この技術では、ＲＮＮを利用した行動の学習によ
り、一連の行動を分節化して獲得することが可能とされ
ており、更に一連の分節化された動作のシーケンスを分
節化したような上位の構造、さらにそのまた上位の構造
を階層的に獲得することが可能とされている。この技術
によれば、ロボット装置は、個々の学習状況に応じて、
例えば、「出口に向かい直進する」、「部屋から出
る」、「廊下を右に曲がる」或いは「廊下を直進する」
等の種々の動作を分節化し、それらの動作を組み合わせ
て行動するようになされている。Japanese Patent Application Laid-Open No. H11-126198 proposes a technique for learning an action using a recurrent neural network (hereinafter, referred to as RNN). According to this technology, a series of actions can be segmented and acquired by learning actions using an RNN, and a higher-level structure such as a segmented sequence of a series of segmented actions can be obtained. In addition, it is possible to hierarchically acquire a higher-order structure. According to this technology, the robot device is adapted to each learning situation,
For example, "go straight to the exit", "get out of the room", "turn right in the corridor" or "go straight in the corridor"
And the like are segmented, and these actions are combined to act.

【０００８】このように分節化して行動を学習すること
が可能とされたロボット装置は、使用者に応じて様々な
行動を学習により獲得することができるようになる。す
なわち、ロボット装置が学習する動作は、学習環境が異
なるので、動作環境に応じて様々な動作を獲得すること
ができるのである。つまり、使用者（例えば、飼い主）
によりロボット装置に教示する環境が異なるので、その
ような環境に応じて、ロボット装置は、様々な動作を獲
得することができる。[0008] The robot device capable of segmenting and learning an action as described above can acquire various actions by learning according to the user. That is, since the learning operation of the robot apparatus has a different learning environment, various operations can be obtained according to the operating environment. That is, the user (eg, owner)
Therefore, the environment in which the robot apparatus is taught is different, and the robot apparatus can acquire various operations according to such an environment.

【０００９】このようなロボット装置は、上述のように
行動パターンデータを再生することでしか行動できない
ロボット装置と比較して、自己の環境に則した行動を行
うようになるので、使用者から見て、さらに自然に自律
的な行動をするものとして鑑賞することができる。Such a robot device behaves in accordance with its own environment, as compared with a robot device that can only act by reproducing the behavior pattern data as described above, so that the user can see Therefore, it can be appreciated as a more autonomous behavior.

【００１０】[0010]

【発明が解決しようとする課題】ところで、上述したよ
うに、行動パターンデータを再生することでしか行動で
きないロボット装置は独自性があるとは言えない。すな
わち例えば、嬉しいときには万歳する行動しか出現させ
ることができない。However, as described above, a robot device that can only act by reproducing the action pattern data cannot be said to be unique. In other words, for example, when you are happy, you can only make an action that will live forever.

【００１１】また、行動パターンが予め決定されている
ことは、換言すれば自らの行動に対して感情モデルを変
化させることができるロボット装置の場合には、予め用
意された行動でしか感情モデルを変化させることができ
ない。In addition, the fact that the behavior pattern is determined in advance means that, in other words, in the case of a robot apparatus that can change the emotion model with respect to its own behavior, the emotion model can be determined only by the behavior prepared in advance. It cannot be changed.

【００１２】さらに、ロボット装置の内部状態（例え
ば、感情状態や本能状態）と学習状態とは直接の関係を
もっていなかった。例えば、そのため、行動を学習した
としても、その行動は、感情等の内部状態に結びついて
はいなかった。Further, the internal state (for example, emotional state or instinct state) of the robot apparatus and the learning state have no direct relationship. For example, even if an action is learned, the action is not linked to an internal state such as an emotion.

【００１３】そこで、本発明は、上述の実情に鑑みてな
されたものであり、行動を学習するとともに、その学習
した行動に内部状態を結びつけることができるロボット
装置、ロボット装置の行動制御方法、プログラム及び記
録媒体の提供を目的とする。Accordingly, the present invention has been made in view of the above-mentioned circumstances, and has a robot apparatus, a behavior control method of a robot apparatus, and a program capable of learning an action and connecting an internal state to the learned action. And the provision of a recording medium.

【００１４】[0014]

【課題を解決するための手段】本発明に係るロボット装
置は、入力情報に応じて内的状態を変化させ、内的状態
に基づいて行動をするロボット装置である。このロボッ
ト装置は、上述の課題を解決するために、入力情報に応
じて内的状態を変化させる内的状態変化手段と、新たな
行動の学習をする学習手段と、学習手段により学習した
行動に対して内的状態を対応付けする対応付け手段と、
行動の制御を内的状態に対応付けて行う行動制御手段と
を備える。A robot device according to the present invention is a robot device that changes an internal state according to input information and performs an action based on the internal state. In order to solve the above-described problem, the robot apparatus includes an internal state changing unit that changes an internal state according to input information, a learning unit that learns a new behavior, and an action learned by the learning unit. Associating means for associating an internal state with the
Behavior control means for controlling the behavior in association with the internal state.

【００１５】このような構成を備えるロボット装置は、
新たな行動の学習をする学習手段により学習した行動に
対して内的状態を対応付け手段により対応付けし、行動
制御手段により、行動の制御を内的状態に対応付けて行
う。そして、ロボット装置は、そのように対応付けされ
た内的状態を内的状態変化手段によって変化される。[0015] The robot device having such a configuration is as follows.
The internal state is associated with the behavior learned by the learning means for learning a new behavior by the association means, and the behavior control is performed by the behavior control means in association with the internal state. Then, the robot apparatus changes the associated internal state by the internal state changing means.

【００１６】これにより、ロボット装置は、行動を学習
するともに、その行動に感情を対応付けすることがで
き、例えば、内的状態手段により変化されて感情が所定
の状態になったとき学習した行動を出現させたり、学習
した行動により対応する感情を内的状態手段により変化
させたりする。Thus, the robot device can learn the behavior and associate the emotion with the behavior. For example, the robot apparatus learns the behavior when the emotion is changed by the internal state means and the emotion reaches a predetermined state. Appear, or the emotion corresponding to the learned behavior is changed by the internal state means.

【００１７】また、本発明に係るロボット装置の行動制
御方法は、入力情報に応じて内的状態を変化させ、内的
状態に基づいて行動をするロボット装置の行動を制御す
るロボット装置の行動制御方法である。このロボット装
置の行動制御方法は、上述の課題を解決するために、ロ
ボット装置が新たな行動の学習をする学習工程と、学習
工程にて学習した行動に対して、ロボット装置が内的状
態を対応付けする対応付け工程と、ロボット装置が行動
の制御を内的状態に対応付けて行う行動制御工程とを有
する。Further, the behavior control method for a robot device according to the present invention is a behavior control method for a robot device that changes an internal state in accordance with input information and controls the behavior of the robot device acting based on the internal state. Is the way. In order to solve the above-described problem, the behavior control method of the robot device includes a learning step in which the robot device learns a new behavior, and the robot device changing an internal state with respect to the behavior learned in the learning process. There is an associating step of associating, and an action controlling step in which the robot apparatus performs action control in association with the internal state.

【００１８】このようなロボット装置の行動制御方法に
より、ロボット装置は、行動を学習するともに、その行
動に感情を対応付けすることができ、例えば、内的状態
手段により変化されて感情が所定の状態になったとき学
習した行動を出現させたり、学習した行動により対応す
る感情を内的状態手段により変化させたりする。According to such a behavior control method for a robot device, the robot device can learn the behavior and associate the behavior with the emotion. For example, the robot apparatus changes the emotion by the internal state means and the emotion is changed to a predetermined level. When the state is reached, the learned action is caused to appear, or the emotion corresponding to the learned action is changed by the internal state means.

【００１９】また、本発明に係るプログラムは、入力情
報に応じて内的状態を変化させ、内的状態に基づいて行
動をするロボット装置の行動を制御するためのプログラ
ムである。このプログラムは、上述の課題を解決するた
めに、ロボット装置が新たな行動の学習をする学習工程
と、学習工程にて学習した行動に対して、ロボット装置
が内的状態を対応付けする対応付け工程と、ロボット装
置が行動の制御を内的状態に対応付けて行う行動制御工
程とを実行する。Further, the program according to the present invention is a program for changing an internal state in accordance with input information and controlling an action of a robot apparatus which acts based on the internal state. In order to solve the above-described problem, the program includes a learning process in which the robot device learns a new behavior, and an association in which the robot device associates an internal state with the behavior learned in the learning process. And a behavior control step in which the robot device performs behavior control in association with the internal state.

【００２０】このようなプログラムにより行動が制御さ
れるロボット装置は、行動を学習するともに、その行動
に感情を対応付けすることができ、例えば、内的状態手
段により変化されて感情が所定の状態になったとき学習
した行動を出現させたり、学習した行動により対応する
感情を内的状態手段により変化させたりする。The robot device whose behavior is controlled by such a program can learn the behavior and associate the behavior with the emotion. For example, the robot apparatus changes the emotion by the internal state means so that the emotion is in a predetermined state. When it becomes, the learned behavior appears, or the emotion corresponding to the learned behavior is changed by the internal state means.

【００２１】また、本発明に係る記録媒体は、入力情報
に応じて内的状態を変化させ、内的状態に基づいて行動
をするロボット装置の行動を制御するためのプログラム
である。この記録媒体は、上述の課題を解決するため
に、ロボット装置が新たな行動の学習をする学習工程
と、学習工程にて学習した行動に対して、ロボット装置
が内的状態を対応付けする対応付け工程と、ロボット装
置が行動の制御を内的状態に対応付けて行う行動制御工
程とを実行するプログラムが記録されている。[0021] The recording medium according to the present invention is a program for changing an internal state according to input information and controlling the behavior of a robot device that acts based on the internal state. In order to solve the above-described problem, the recording medium has a learning process in which the robot device learns a new behavior, and a process in which the robot device associates an internal state with the behavior learned in the learning process. A program is recorded that executes an attaching step and an action control step in which the robot device performs action control in association with an internal state.

【００２２】このような記録媒体に記録されたプログラ
ムにより行動が制御されるロボット装置は、行動を学習
するともに、その行動に感情を対応付けすることがで
き、例えば、内的状態手段により変化されて感情が所定
の状態になったとき学習した行動を出現させたり、学習
した行動により対応する感情を内的状態手段により変化
させたりする。A robot apparatus whose behavior is controlled by a program recorded on such a recording medium can learn an action and associate an emotion with the action. Then, when the emotion reaches a predetermined state, the learned behavior appears, or the corresponding emotion is changed by the internal behavior by the learned behavior.

【００２３】[0023]

【発明の実施の形態】以下、本発明の実施の形態につい
て図面を用いて説明する。この実施の形態は、本発明
を、自律的に行動するロボット装置に適用したものであ
る。Embodiments of the present invention will be described below with reference to the drawings. In this embodiment, the present invention is applied to a robot device that acts autonomously.

【００２４】本発明が適用されるロボット装置は、周囲
の環境や内部の状態に応じて自律的に行動をする自律型
のロボット装置である。そして、ロボット装置は、本発
明が適用されることにより、新たな行動を学習するとと
もに、学習した行動に、例えば、感情状態や本能状態等
の内部状態を連動させることができるようになされてい
る。The robot device to which the present invention is applied is an autonomous robot device that behaves autonomously according to the surrounding environment and internal state. By applying the present invention, the robot device can learn a new behavior and can link an internal state such as an emotional state or an instinct state to the learned behavior. .

【００２５】実施の形態の説明では、本発明の適用され
て実現されるロボット装置による行動の学習について先
ず説明して、その後、ロボット装置の具体的な構成につ
いて説明する。In the description of the embodiment, learning of behavior by a robot device realized by applying the present invention will be described first, and then a specific configuration of the robot device will be described.

【００２６】（１）行動の学習及び学習した行動への内
的状態の連動行動の学習は、図１に示すように、学習部１及び対応付
け部２を有することにより実現されている。ここで、学
習部１は、新たな行動の学習をする学習手段であって、
対応付け部２は、学習部１により学習した行動に対して
感情等の内的状態を対応付けする対応付け手段である。
例えば、この学習部１及び対応付け部２は、ソフトウェ
アプログラムによって構成されるオブジェクトやモジュ
ールとして構成されている。(1) Learning of an action and linking of the internal state to the learned action Learning of the action is realized by having a learning unit 1 and an associating unit 2 as shown in FIG. Here, the learning unit 1 is a learning means for learning a new behavior,
The associating unit 2 is an associating unit that associates an internal state such as an emotion with an action learned by the learning unit 1.
For example, the learning unit 1 and the association unit 2 are configured as objects or modules configured by a software program.

【００２７】学習部１は、例えば、リカレント型ニュー
ラルネットワーク（以下、ＲＮＮという。）といった学
習モデルによって構成されている。ここで、ＲＮＮは、
学習対象とされる行動の情報が、入力層、中間層及び出
力層に向かって入力されるニューラルネットワークとさ
れている。このＲＮＮにおける行動学習の際の処理につ
いては後で詳しく説明する。The learning unit 1 is configured by a learning model such as a recurrent neural network (hereinafter, referred to as RNN). Here, RNN is
The information of the action to be learned is a neural network that is input toward the input layer, the intermediate layer, and the output layer. The processing at the time of action learning in the RNN will be described later in detail.

【００２８】学習部１は、入力がなされることにより、
対応する出力をし、入力を学習対象として学習する。そ
して、入力は、時系列データとしてなされるものであ
る。例えば、行動を学習する際の入力としては、ロボッ
ト装置が行動することによって得られるセンサ入力やモ
ータ出力等が挙げられ、具体的には、センサ入力として
は撮像信号が挙げられる。また、学習部１に学習対象と
して入力される情報についてはこれに限定されるものは
なく、例えば行動に対応して内部的に生成される行動情
報であってもよい。The learning unit 1 receives the input,
Output the corresponding output and learn with the input as the learning target. The input is made as time-series data. For example, examples of the input when learning an action include a sensor input and a motor output obtained by the action of the robot apparatus. Specifically, the sensor input includes an image signal. The information input as a learning target to the learning unit 1 is not limited to this, and may be, for example, behavior information generated internally corresponding to a behavior.

【００２９】このような学習部１は、学習対象とされる
入力に対して、出力をするようになされている。ここ
で、学習部１の出力はいわゆる教示信号として把握され
るものである。この出力は、学習後では、同一の行動を
出現している限りにおいて、同様な値を示すようにな
る。The learning section 1 outputs an input to be learned. Here, the output of the learning unit 1 is grasped as a so-called teaching signal. This output shows the same value after learning as long as the same action appears.

【００３０】対応付け部２では、学習部１からの出力に
対して、感情の対応付け（或いは感情の連動）をする。
ここで、感情が対応付けについては、学習後の出力に対
して行っている。また、実施の形態では、学習部１から
の出力に対して感情を対応付けすることで、行動への感
情の対応付けをしているが、これに限定されるものでは
なく、学習した行動に感情を連動させることであれば良
い。また、実施の形態では、出力に感情を対応付けする
場合について説明しているが、これに限定されるもので
はなく、対応付けする対象を本能としてもよく、或いは
それ以外のロボット装置において行動に影響される内的
状態を対象とすることもできる。The associating unit 2 associates emotions (or links emotions) with the output from the learning unit 1.
Here, the correspondence between emotions is performed on the output after learning. In the embodiment, the emotion is associated with the behavior by associating the emotion with the output from the learning unit 1. However, the present invention is not limited to this. All you have to do is link your emotions. Further, in the embodiment, the case where the emotion is associated with the output is described. However, the present invention is not limited to this, and the target to be associated may be an instinct, or the behavior may be performed in other robot devices. The affected internal state can also be targeted.

【００３１】さらに、学習した行動に対応付けする感情
については様々考えることができる。学習した行動を、
感情から見た属性によって差別化できるようであれば、
学習した行動に最適な感情を対応付けすることができ
る。例えば、大きな動作を初めてするような場合には、
恐怖心が伴うものでもあり、よって、そのような行動に
対しては、「恐怖」の感情を対応付けするようにする。
その他に、人間に近づく行動に「喜び」の感情を対応付
けし、また、知らない人間に近づく行動に好奇心に結び
つく感情を対応づけることも挙げられる。さらに、学習
した行動に対して、ランダムに選択した感情を対応させ
ることもでき、この場合、学習した行動により豊かな感
情を表出することが可能になる。Furthermore, various feelings can be considered for the emotions associated with the learned behavior. The learned behavior,
If you can differentiate by emotional attributes,
The most suitable emotion can be associated with the learned behavior. For example, if you are doing a big motion for the first time,
It is also accompanied by fear, so that such a behavior is associated with the feeling of “fear”.
In addition, it is also possible to associate an emotion of "joy" with an action approaching a human, and to associate an emotion leading to curiosity with an action approaching an unknown human. Furthermore, it is also possible to make a randomly selected emotion correspond to the learned behavior, and in this case, it is possible to express a richer emotion by the learned behavior.

【００３２】このような学習部１及び対応付け部２によ
り、新たな行動を学習することができるようになり、さ
らに、学習した行動に所定の行動を対応付けすることが
できる。このようにすることで、ロボット装置は、ある
感情の状態になったとき、対応された学習した行動を出
現させることができる。これにより、例えば、同じ感情
の状態にある場合でも、ロボット装置毎、異なる行動を
出現させるようになる。With the learning unit 1 and the associating unit 2, a new behavior can be learned, and a predetermined behavior can be associated with the learned behavior. By doing so, the robot device can cause the corresponding learned behavior to appear when a certain emotional state occurs. Thereby, for example, even when the robots are in the same emotional state, different actions appear for each robot device.

【００３３】さらに、ロボット装置は、学習した行動を
出現させた際に、対応される感情の状態を変化させるこ
ともできる。例えば、学習した行動に、感情の変化量を
対応付けしておくことにより、行動した際に、対応した
感情の状態をその変化量だけ変化させることができる。Further, the robot device can change the state of the corresponding emotion when the learned behavior appears. For example, by associating the amount of change in emotion with the learned behavior, the state of the corresponding emotion can be changed by the amount of change when the user acts.

【００３４】また、ロボット装置は、使用者によって行
動をする環境が異なることから、学習する行動も異なっ
てくる。このようなことから、ロボット装置毎に異なっ
た行動を、それに感情を対応付けして学習する。よっ
て、各使用者は、ロボット装毎に独自性のある行動を鑑
賞することができるようになる。すなわち、ロボット装
置は、使用者によって行動の学習環境が異なるので、例
えば「怒り」の感情レベルが同じであっても、各ロボッ
ト装置は、使用者毎に異なる行動を出現させるようにな
る。Further, in the robot apparatus, since the environment in which the user acts is different, the learning action is also different. For this reason, different behaviors for each robot device are learned by associating them with emotions. Therefore, each user can appreciate the unique behavior of each robot device. That is, since the learning environment of the behavior of the robot device differs depending on the user, even if, for example, the emotion level of “anger” is the same, each robot device causes a different behavior to appear for each user.

【００３５】以上のように、ロボット装置は、行動を学
習し、その行動に対して感情等の内的状態を連動させる
ことにより、より生物的な表現を実現するものとなる。As described above, the robot device realizes a more biological expression by learning an action and linking the action to an internal state such as an emotion.

【００３６】（２）行動を学習するための具体的な構成以上のように、感情の対応付けをして、行動を学習する
ことができる。行動を学習する技術としては、例えば、
特開平11-126198号公報に開示されている技術が挙げら
れる。本発明が適用されたロボット装置は、例えば、こ
の技術を採用して行動を学習している。ここでは、その
行動を学習する技術の概略について説明する。(2) Specific Configuration for Learning Behavior As described above, behavior can be learned by associating emotions. As a technique for learning behavior, for example,
There is a technique disclosed in Japanese Patent Application Laid-Open No. H11-126198. The robot device to which the present invention is applied learns an action, for example, by adopting this technology. Here, an outline of a technique for learning the behavior will be described.

【００３７】図２は、データ処理部の構成例を示してい
る。この図２に示す構成は、図１に示す学習部１の具体
的な構成になる。ロボット装置は、後で詳述するよう
に、障害物を検出するセンサと、ロボットを移動させる
ために駆動されるモータが備えており、それらの情報
が、このデータ処理部に学習対象として入力される。FIG. 2 shows a configuration example of the data processing unit. The configuration shown in FIG. 2 is a specific configuration of the learning unit 1 shown in FIG. As will be described in detail later, the robot device includes a sensor that detects an obstacle and a motor that is driven to move the robot.The information is input to the data processing unit as a learning target. You.

【００３８】ｎ個のＲＮＮ１−１〜１−ｎには、センサ
とモータの状態に対応する入力ｘ_ｔが入力されている。
ＲＮＮ１−１は、図３に示すように構成されている。な
お、図示は省略するが、他のＲＮＮ１−２〜１−ｎも、
この図３に示すＲＮＮ１−１と同様に構成されている。[0038] n-number of RNN1-1～1-n are input _{x t} corresponding to the state of the sensor and the motor is input.
The RNN 1-1 is configured as shown in FIG. Although not shown, the other RNNs 1-2 to 1-n also
The configuration is the same as that of the RNN 1-1 shown in FIG.

【００３９】この図３に示すように、ＲＮＮ１−１は、
所定の数の入力層のニューロン３１を有し、このニュー
ロン３１に、センサの状態に対応する入力ｓ_ｔと、モー
タの状態に対応する入力ｍ_ｔが入力されている。ニュー
ロン３１の出力は、中間層のニューロン３２を介して、
出力層のニューロン３３に供給されるようになされてい
る。そして、出力層のニューロン３３からは、ＲＮＮ１
−１のセンサの状態に対応する出力ｓ_ｔ＋１と、モータ
の状態に対応する出力ｍ_ｔ＋１が出力されるようになさ
れている。また、出力の一部は、コンテキスト（contex
t）Ｃ_ｔとして、入力層のニューロン３１にフィードバ
ックされるようになされている。As shown in FIG. 3, RNN1-1 is:
Has a neuronal 31 of a predetermined number of the input layer, the neuron 31, an input s _t corresponding to the state of the sensor, the input m _t corresponding to the state of the motor is input. The output of the neuron 31 is output through the neuron 32 in the hidden layer.
The signal is supplied to the neuron 33 in the output layer. From the neuron 33 in the output layer, RNN1
An output _{st + 1} corresponding to the state of the sensor of −1 and an output mt _{+ 1} corresponding to the state of the motor are output. Also, part of the output is context (contex
As t) C _t, it is adapted to be fed back to the neuron 31 in the input layer.

【００４０】ＲＮＮ１−１〜１−ｎの出力は、対応する
ゲート２−１〜２−ｎを介して合成回路３に入力され、
ここで合成され、予測出力ｙ_ｔ＋１が出力されるように
なされている。Outputs of the RNNs 1-1 to 1-n are input to the synthesizing circuit 3 via the corresponding gates 2-1 to 2-n.
Here, they are synthesized and the predicted output yt _{+ 1} is output.

【００４１】学習時においては、教師信号としての目標
値ｙ^＊ _ｔ＋１と、各ＲＮＮ１−１〜１−ｎの出力の誤
差が、対応するゲート２−１〜２−ｎの状態を制御する
ようになされている。At the time of learning, the error between the target value y ^* _{t + 1} as the teacher signal and the output of each of the RNNs 1-1 to ₁ -n controls the state of the corresponding gate 2-1 to 2-n. It has been done.

【００４２】以上の下位のＲＮＮ１−１〜１−ｎ、ゲー
ト２−１〜２−ｎ、及び合成回路３と同様の構成が、よ
り上位の階層にも形成されている。すなわち、上位の階
層には、ＲＮＮ１１−１〜１１−ｎ、ゲート１２−１〜
１２−ｎ、及び合成回路１３が設けられている。そし
て、ＲＮＮ１１−１〜１１−ｎには、下位の階層のゲー
ト２−１〜２−ｎの導通状態（開閉度）に対応するシー
ケンス（ゲートシーケンス）Ｇ_ｔが入力されるようにな
されている。そして、各ＲＮＮ１１−１〜１１−ｎから
は、出力Ｇ^１ _Ｔ＋１乃至Ｇ^ｎ _Ｔ＋１が出力され、合成
回路１３からは、予測出力Ｇ_Ｔ＋１が出力されるように
なされている。また、学習時においては、教師信号とし
て、目標値Ｇ^＊ _Ｔ＋１が入力されている。なお、図２
には、２つの階層だけが示されているが、必要に応じ
て、さらに、より上位の階層を設けることも可能であ
る。The same configuration as the above-described lower RNNs 1-1 to 1-n, gates 2-1 to 2-n, and synthesis circuit 3 is also formed in a higher hierarchy. That is, RNNs 11-1 to 11-n and gates 12-1 to 12-1 are located at higher levels.
12-n and a combining circuit 13 are provided. Then, the RNN11-1~11-n, are adapted sequence corresponding to the conduction state of the gate 2-1 to 2-n of the lower layer (closed degree) (gating sequence) _{G t} is input . Then, outputs G ¹ _{T + 1 to} G ⁿ _{T + 1} are output from the RNNs 11-1 to 11 -n, and a prediction output G _{T + 1} is output from the combining circuit 13. At the time of learning, a target value G ^* _{T + 1} is input as a teacher signal. Note that FIG.
Shows only two hierarchies, but higher hierarchies can be provided if necessary.

【００４３】図４は、上位の階層を構成する第１のＲＮ
Ｎ１１−１の構成を示している。なお、他のＲＮＮ１１
−２〜１１−ｎも、この図４に示すＲＮＮ１１−１と同
様の構成とされている。FIG. 4 is a diagram showing a first RN constituting a higher hierarchy.
The configuration of N11-1 is shown. Note that other RNNs 11
-2 to 11-n have the same configuration as the RNN 11-1 shown in FIG.

【００４４】図４に示すように、上位の階層のＲＮＮ１
１−１は、基本的に、図３に示した下位の階層のＲＮＮ
１−１と同様に構成されており、入力層には複数のニュ
ーロン４１が、中間層には複数のニューロン４２が、そ
して出力層には複数のニューロン４３が配置されてい
る。入力層には、ゲート２−１〜２−ｎの導通状態に対
応する信号ｇ^１ _Ｔ乃至ｇ^ｎ _Ｔが入力されるとともに、
ゲートの導通（開放）している周期（時間）Ｉ_Ｔが入力
される。出力層からは、これらの入力に対応して、出力
ｇ^１ _Ｔ＋１乃至ｇ^ｎ _Ｔ＋１と、Ｉ_Ｔ＋１が出力され
る。また、出力層の出力の一部は、コンテキストＣ_Ｔと
して入力層にフィードバックされている。As shown in FIG. 4, the upper layer RNN1
1-1 is basically the RNN of the lower hierarchy shown in FIG.
1-1, a plurality of neurons 41 are arranged in an input layer, a plurality of neurons 42 are arranged in an intermediate layer, and a plurality of neurons 43 are arranged in an output layer. Signals g ¹ _{T to} g ⁿ _T corresponding to the conduction states of the gates 2-1 to 2-n are input to the input layer,
Period that the gate conduction (opening) (Time) I _T is input. From the output layer, outputs g ¹ _{T + 1 to} g ⁿ _{T + 1} and _{IT + 1} are output corresponding to these inputs. Also, part of the output of the output layer is fed back to the input layer as a context C _T.

【００４５】ここで、ＲＮＮ１−１〜１−ｎのアルゴリ
ズムについて説明する。ゲートの導通状態は、ソフトマ
ックス（soft-max）のアクティベーションファンクショ
ンを用いて、（１）式で示すように表される。Here, the algorithm of the RNNs 1-1 to 1-n will be described. The conduction state of the gate is expressed as shown in equation (1) using a soft-max activation function.

【００４６】[0046]

【数１】 (Equation 1)

【００４７】ここで、ｇ^ｉは、ｉ番目のゲートの導通状
態に対応するゲート係数を表し、ｓ ^ｉは、ｉ番目のゲー
トの導通状態の内部状態に対応する値を表している。従
って、合成回路３の出力ｙ_ｔ＋１は、（２）式で表され
る。Here, gⁱIs the conduction state of the i-th gate
Represents the gate coefficient corresponding to the state, s ⁱIs the i-th game
The value corresponding to the internal state of the conduction state of the switch. Obedience
Thus, the output y of the synthesis circuit 3_{t + 1}Is given by equation (2)
You.

【００４８】[0048]

【数２】 (Equation 2)

【００４９】ここで、予測学習時に最大の値となる
（３）式で示す尤度関数を定義する。Here, a likelihood function represented by the equation (3), which becomes the maximum value during prediction learning, is defined.

【００５０】[0050]

【数３】 (Equation 3)

【００５１】なお、ここで、σは、スケーリングパラメ
ータを表している。Here, σ represents a scaling parameter.

【００５２】学習時、ＲＮＮ１−１乃至１−ｎの重み係
数とゲート係数ｇは、尤度関数が最大となるように同時
に更新される。認識時においては、ゲート係数だけが更
新される。At the time of learning, the weight coefficients and the gate coefficients g of RNNs 1-1 to 1-n are simultaneously updated so that the likelihood function is maximized. At the time of recognition, only the gate coefficient is updated.

【００５３】これらの重み係数とゲート係数を更新する
ルールを確立するために、尤度関数の指数関数の内部変
数Ｓ^ｉに関する傾きと、ｉ番目のＲＮＮの出力ｙ^ｉに関
する傾きを（４）式及び（５）式のように求める。In order to establish a rule for updating these weighting coefficients and gate coefficients, the slope of the likelihood function with respect to the internal variable S ⁱ of the exponential function and the slope of the i-th RNN output y ^{i with the} equation (4) And (5).

【００５４】[0054]

【数４】 (Equation 4)

【００５５】[0055]

【数５】 (Equation 5)

【００５６】ここで、ｇ（ｉ｜ｘ_ｔ，ｙ^＊ _ｔ＋１）
は、ｉ番目のＲＮＮが入力ｘ_ｔのとき、目標出力ｙ^＊
_ｔ＋１を発生する事象後確率を意味し、（６）式で表さ
れる。Here, g (i | x _t , y ^* _{t + 1} )
When the i-th RNN is an input _{x t,} the target output ^{y *}
_It means the post-event probability of generating _{t + 1} and is expressed by equation (6).

【００５７】[0057]

【数６】 (Equation 6)

【００５８】ここで、||ｙ^＊ _ｔ＋１−ｙ^ｊ _ｔ＋１||^２
は、現在の予測の自乗誤差を表している。[0058] In this ^{_{^{case, || y * t + 1 -y}}} j t + 1 || 2
Represents the square error of the current prediction.

【００５９】上記（４）式は、ｓ^ｉを更新する方向を表
している。また、（５）式に示されるように、尤度関数
の指数関数のｙ^ｉ _ｔ＋１に関する傾きは、誤差条件ｙ
^＊ _ｔ _＋１−ｙ^ｉ _ｔ＋１の誤差項を含んでいる。この誤
差項は、ｉ番目のＲＮＮの事象後確率により重み付けさ
れている。[0059] Equation (4) represents a direction to update the ^{s i.} Also, (5) as shown in the formula, the tilt related to y ^{i t} _{+ 1} of the exponential function of the likelihood function, the error condition y
^* _T ₊₁ contains the error term of -y ⁱ _{t + 1.} This error term is weighted by the i-th RNN post-event probability.

【００６０】このように、ＲＮＮ１−１〜１−ｎの重み
係数は、事象後確率にのみ比例して、ｉ番目のＲＮＮの
出力と目標値の誤差を補正するように調整される。これ
によりｎ個のＲＮＮのうち、１つのエキスパートＲＮＮ
だけが、与えられたトレーニングパターン（学習パター
ン）を排他的に学習するようになされる。各ＲＮＮの誤
差は、（７）式で表される。As described above, the weighting coefficients of the RNNs 1-1 to 1-n are adjusted so as to correct the error between the output of the i-th RNN and the target value in proportion to only the post-event probability. Thereby, one expert RNN out of n RNNs
Only the given training pattern (learning pattern) is exclusively learned. The error of each RNN is expressed by equation (7).

【００６１】[0061]

【数７】 (Equation 7)

【００６２】ＲＮＮ１−１〜１−ｎの実際の学習は、上
記式で得られた誤差に基づいてバックプロパゲーション
法により実行される。The actual learning of the RNNs 1-1 to 1-n is performed by the back propagation method based on the error obtained by the above equation.

【００６３】これにより、ＲＮＮ１−１乃至１−ｎは、
入力ｘ_ｔのうち、それぞれ他と異なる所定の時系列パタ
ーンを識別することができるエキスパートとなるよう
に、学習が行われる。Thus, RNNs 1-1 to 1-n are:
Of the input x _t, so that each the experts can identify predetermined time series pattern which is different from the others, learning is performed.

【００６４】以上のことは、上位の階層におけるＲＮＮ
１１−１〜１１−ｎにおいても同様である。ただし、こ
の場合における入力は、ゲートシーケンスＧ_Ｔであり、
その出力は、Ｇ^ｉ _Ｔ＋１となる。The above is based on the fact that the RNN in the higher hierarchy
The same applies to 11-1 to 11-n. However, the input in this case is a gate sequence G _T,
The output is G ⁱ _{T + 1} .

【００６５】このような構成により、個別の動作をＲＮ
Ｎ１−１〜１−ｎが個別に学習することができる。そし
て、ＲＮＮ１−１〜１−ｎが学習し各動作の発現は、ゲ
ート２−１〜２−ｎで管理されており、このゲートの様
々な動作シーケンス（つまり様々な動作の順序の組み合
わせ）をＲＮＮ１１−１〜１１−ｎが学習している。す
なわち、このような情報の学習手法により、行動を文節
化して学習することができるようになる。With such a configuration, individual operations can be performed by RN
N1-1 to 1-n can learn individually. The RNNs 1-1 to 1-n learn and the manifestation of each operation is managed by the gates 2-1 to 2-n, and various operation sequences of the gates (that is, combinations of various operation orders) are determined. RNNs 11-1 to 11-n are learning. That is, by using such an information learning method, the behavior can be segmented and learned.

【００６６】このような複数のＲＮＮによって構成され
た学習部１を有することで、ロボット装置は、図５に示
すような通路を構成する部屋を移動し、その移動の際に
行動を学習することができる。例えば、距離センサに基
づいた行動を学習をする。具体的には、ロボット装置
は、部屋を移動し、その間に学習部を構成する層を自己
組織化することにより、行動の学習をするのである。By having such a learning unit 1 composed of a plurality of RNNs, the robot device can move in a room forming a passage as shown in FIG. 5 and learn an action during the movement. Can be. For example, the action based on the distance sensor is learned. Specifically, the robot device learns the behavior by moving in the room and self-organizing the layers constituting the learning unit during the movement.

【００６７】以上のように概略を説明した学習手法の技
術が特開平11-126198号公報に開示されており、本発明
に係る実施の形態のロボット装置の学習部１は、このよ
うな学習手法を取り入れて構築することができる。本発
明の実施の形態に係るロボット装置は、このような学習
部１により、新たな行動を学習することができるように
なる。これにより、新たな行動として第ｋ番目の行動
（分節化された行動）を学習した後において、その第ｋ
番目の行動が選択された場合には、第ｋ番目のＲＮＮ
（ＲＮＮ１１−ｋ）によって学習した行動を出現するこ
とができるようになる。ここで、例えば、学習した行動
を含めた出現可能とされる行動から一の行動を出現させ
るための選択は、感情が現在の姿勢等の条件によって決
定されるものであり、例えば、行動切換え部によってな
される。行動切換え部については後で詳述する。また、
ここで、選択対象とされる行動群は、予め登録されてい
る行動と上述したような学習により得た行動とから構成
されている。The technology of the learning method outlined above is disclosed in Japanese Patent Application Laid-Open No. H11-126198, and the learning unit 1 of the robot apparatus according to the embodiment of the present invention uses such a learning method. Can be built. The robot device according to the embodiment of the present invention can learn a new behavior by such a learning unit 1. As a result, after learning the k-th action (segmented action) as a new action,
If the action is selected, the k-th RNN
The behavior learned by (RNN11-k) can appear. Here, for example, the selection for causing one action to appear from the possible actions including the learned action is such that the emotion is determined by a condition such as the current posture. Done by The action switching unit will be described later in detail. Also,
Here, the action group to be selected includes actions registered in advance and actions obtained by learning as described above.

【００６８】（３）本実施の形態によるロボット装置の
構成次に、上述したような行動の学習をするロボット装置の
具体的な構成について説明する。(3) Configuration of Robot Apparatus According to the Present Embodiment Next, a specific configuration of the robot apparatus that learns the above-described behavior will be described.

【００６９】図６に示すように、「犬」を模した形状の
いわゆるペットロボットとされ、胴体部ユニット１０２
の前後左右にそれぞれ脚部ユニット１０３Ａ，１０３
Ｂ，１０３Ｃ，１０３Ｄが連結されると共に、胴体部ユ
ニット１０２の前端部及び後端部にそれぞれ頭部ユニッ
ト１０４及び尻尾部ユニット１０５が連結されて構成さ
れている。As shown in FIG. 6, a so-called pet robot imitating a “dog” is formed.
Leg units 103A, 103
B, 103C, and 103D are connected, and a head unit 104 and a tail unit 105 are connected to the front end and the rear end of the body unit 102, respectively.

【００７０】胴体部ユニット１０２には、図７に示すよ
うに、ＣＰＵ（Central ProcessingUnit）１１０、ＤＲ
ＡＭ（Dynamic Random Access Memory）１１１、フラッ
シュＲＯＭ（Read ０nly Memory）１１２、ＰＣ（Perso
nal Computer）カードインターフェース回路１１３及び
信号処理回路１１４が内部バス１１５を介して相互に接
続されることにより形成されたコントロール部１１６
と、このロボット装置１００の動力源としてのバッテリ
１１７とが収納されている。また、胴体部ユニット１０
２には、ロボット装置１００の向きや動きの加速度を検
出するための角速度センサ１１８及び加速度センサ１１
９なども収納されている。As shown in FIG. 7, a CPU (Central Processing Unit) 110 and a DR
AM (Dynamic Random Access Memory) 111, Flash ROM (Read 0nly Memory) 112, PC (Perso
control unit 116 formed by connecting a card interface circuit 113 and a signal processing circuit 114 to each other via an internal bus 115.
And a battery 117 as a power source of the robot device 100 are stored. The body unit 10
2 includes an angular velocity sensor 118 and an acceleration sensor 11 for detecting the acceleration of the direction and movement of the robot apparatus 100.
9 etc. are also stored.

【００７１】また、頭部ユニット１０４には、外部の状
況を撮像するためのＣＣＤ（ChargeCoupled Device）カ
メラ１２０と、使用者からの「撫でる」や「叩く」とい
った物理的な働きかけにより受けた圧力を検出するため
のタッチセンサ１２１と、前方に位置する物体までの距
離を測定するための距離センサ１２２と、外部音を集音
するためのマイクロホン１２３と、鳴き声等の音声を出
力するためのスピーカ１２４と、ロボット装置１００の
「目」に相当するＬＥＤ（Light Emitting Diode）（図
示せず）となどがそれぞれ所定位置に配置されている。Also, the head unit 104 receives a charge (CUP) camera 120 for capturing an image of an external situation and a pressure received by a physical action such as “stroke” or “hit” from the user. A touch sensor 121 for detection, a distance sensor 122 for measuring a distance to an object located ahead, a microphone 123 for collecting external sounds, and a speaker 124 for outputting a sound such as a squeal And an LED (Light Emitting Diode) (not shown) corresponding to the “eye” of the robot device 100 are arranged at predetermined positions.

【００７２】さらに、各脚部ユニット１０３Ａ〜１０３
Ｄの関節部分や各脚部ユニット１０３Ａ〜１０３Ｄ及び
胴体部ユニット１０２の各連結部分、頭部ユニット１０
４及び胴体部ユニット１０２の連結部分、並びに尻尾部
ユニット１０５の尻尾１０５Ａの連結部分などにはそれ
ぞれ自由度数分のアクチュエータ１２５_１〜１２５_ｎ及
びポテンショメータ１２６_１〜１２６_ｎが配設されてい
る。例えば、アクチュエータ１２５_１〜１２５_ｎはサー
ボモータを構成として有している。サーボモータの駆動
により、脚部ユニット１０３Ａ〜１０３Ｄが制御され
て、目標の姿勢或いは動作に遷移する。Further, each leg unit 103A-103
D, joint portions of the leg units 103A to 103D and the trunk unit 102, the head unit 10
Actuators 125 _{1 to} 125 _n and potentiometers 126 _{1 to} 126 _n are provided for the number of degrees of freedom, respectively, at a connection portion between the body unit 4 and the body unit 102 and a connection portion at the tail 105 A of the tail unit 105. For example, each of the actuators 125 _{1 to} 125 _n has a servomotor. By driving the servo motor, the leg units 103A to 103D are controlled, and the state shifts to the target posture or operation.

【００７３】そして、これら角速度センサ１１８、加速
度センサ１１９、タッチセンサ１２１、距離センサ１２
２、マイクロホン１２３、スピーカ１２４及び各ポテン
ショメータ１２６_１〜１２６_ｎなどの各種センサ並びに
ＬＥＤ及び各アクチュエータ１２５_１〜１２５_ｎは、
それぞれ対応するハブ１２７_１〜１２７_ｎを介してコン
トロール部１１６の信号処理回路１１４と接続され、Ｃ
ＣＤカメラ１２０及びバッテリ１１７は、それぞれ信号
処理回路１１４と直接接続されている。The angular velocity sensor 118, the acceleration sensor 119, the touch sensor 121, and the distance sensor 12
2. Various sensors such as a microphone 123, a speaker 124, and each of the potentiometers 126 _{1 to} 126 _n , an LED, and each of the actuators 125 _{1 to} 125 _n are:
The hubs 127 ₁ to 127 _n are connected to the signal processing circuit 114 of the control unit 116 via the corresponding hubs 127 ₁ to 127 _n , respectively.
The CD camera 120 and the battery 117 are directly connected to the signal processing circuit 114, respectively.

【００７４】信号処理回路１ｌ４は、上述の各センサか
ら供給されるセンサデータや画像データ及び音声データ
を順次取り込み、これらをそれぞれ内部バス１１５を介
してＤＲＡＭ１１１内の所定位置に順次格納する。また
信号処理回路１１４は、これと共にバッテリ１１７から
供給されるバッテリ残量を表すバッテリ残量データを順
次取り込み、これをＤＲＡＭ１１１内の所定位置に格納
する。The signal processing circuit 114 sequentially takes in the sensor data, image data, and audio data supplied from each of the sensors described above, and sequentially stores them at predetermined positions in the DRAM 111 via the internal bus 115. In addition, the signal processing circuit 114 sequentially takes in remaining battery power data indicating the remaining battery power supplied from the battery 117 and stores the data in a predetermined position in the DRAM 111.

【００７５】このようにしてＤＲＡＭ１１１に格納され
た各センサデータ、画像データ、音声データ及びバッテ
リ残量データは、この後ＣＰＵ１１０がこのロボット装
置１００の動作制御を行う際に利用される。The sensor data, image data, audio data, and remaining battery data stored in the DRAM 111 as described above are used when the CPU 110 controls the operation of the robot device 100 thereafter.

【００７６】実際上ＣＰＵ１１０は、ロボット装置１０
０の電源が投入された初期時、胴体部ユニット１０２の
図示しないＰＣカードスロットに装填されたメモリカー
ド１２８又はフラッシュＲＯＭ１１２に格納された制御
プログラムをＰＣカードインターフェース回路１１３を
介して又は直接読み出し、これをＤＲＡＭ１１１に格納
する。In practice, the CPU 110
At the initial time when the power supply of the main unit 102 is turned on, the control program stored in the memory card 128 or the flash ROM 112 inserted in the PC card slot (not shown) of the body unit 102 is read out directly or directly through the PC card interface circuit 113. Is stored in the DRAM 111.

【００７７】また、ＣＰＵ１１０は、この後上述のよう
に信号処理回路１１４よりＤＲＡＭ１１１に順次格納さ
れる各センサデータ、画像データ、音声データ及びバッ
テリ残量データに基づいて自己及び周囲の状況や、使用
者からの指示及び働きかけの有無などを判断する。The CPU 110 then determines the status of itself and its surroundings and the usage status based on the sensor data, image data, audio data, and remaining battery data sequentially stored in the DRAM 111 from the signal processing circuit 114 as described above. Judge the instruction from the person and the presence or absence of the action.

【００７８】さらに、ＣＰＵ１１０は、この判断結果及
びＤＲＡＭ１１１に格納しだ制御プログラムに基づいて
続く行動を決定すると共に、当該決定結果に基づいて必
要なアクチュエータ１２５_１〜１２５_ｎを駆動させるこ
とにより、頭部ユニット１０４を上下左右に振らせた
り、尻尾部ユニット１０５の尻尾１０５Ａを動かせた
り、各脚部ユニット１０３Ａ〜１０３Ｄを駆動させて歩
行させるなどの行動を行わせる。[0078] Furthermore, CPU 110 is configured to determine a subsequent action based on the control program that is stored in the determination result and DRAM 111, by driving the actuator ₁₂₅ 1 to 125 _n as required based on the determination result, the head Actions such as swinging the unit 104 up and down, left and right, moving the tail 105A of the tail unit 105, and driving and walking each leg unit 103A to 103D are performed.

【００７９】また、この際ＣＰＵ１１０は、必要に応じ
て音声データを生成し、これを信号処理回路１１４を介
して音声信号としてスピーカ１２４に与えることにより
当該音声信号に基づく音声を外部に出力させたり、上述
のＬＥＤを点灯、消灯又は点滅させる。At this time, the CPU 110 generates audio data as necessary, and supplies the generated audio data to the speaker 124 as an audio signal via the signal processing circuit 114 to output an audio based on the audio signal to the outside. The above-mentioned LED is turned on, turned off or blinked.

【００８０】このようにしてこのロボット装置１００に
おいては、自己及び周囲の状況や、使用者からの指示及
び働きかけに応じて自律的に行動し得るようになされて
いる。In this way, the robot device 100 can behave autonomously in accordance with its own and surrounding conditions, and instructions and actions from the user.

【００８１】（２）制御プログラムのソフトウェア構成ここで、ロボット装置１００における上述の制御プログ
ラムのソフトウェア構成は、図８に示すようになる。こ
の図８において、デバイス・ドライバ・レイヤ３０は、
この制御プログラムの最下位層に位置し、複数のデバイ
ス・ドライバからなるデバイス・ドライバ・セット１３
１から構成されている。この場合、各デバイス・ドライ
バは、ＣＣＤカメラ１２０（図７）やタイマ等の通常の
コンピュータで用いられるハードウェアに直接アクセス
するごとを許されたオブジェクトであり、対応するハー
ドウェアからの割り込みを受けて処理を行う。(2) Software Configuration of Control Program Here, the software configuration of the above-described control program in the robot device 100 is as shown in FIG. In FIG. 8, the device driver layer 30 includes:
A device driver set 13 located at the lowest layer of the control program and including a plurality of device drivers
1 is comprised. In this case, each device driver is an object that is allowed to directly access hardware used in a normal computer, such as a CCD camera 120 (FIG. 7) and a timer, and receives an interrupt from the corresponding hardware. Perform processing.

【００８２】また、ロボティック・サーバ・オブジェク
ト１３２は、デバイス・ドライバ・レイヤ１３０の最下
位層に位置し、例えば上述の各種センサやアクチュエー
タ１２５_１〜１２５_ｎ等のハードウェアにアクセスする
ためのインターフェースを提供するソフトウェア群でな
るバーチャル・ロボット１３３と、電源の切換えなどを
管理するソフトウェア群でなるパワーマネージャ１３４
と、他の種々のデバイス・ドライバを管理するソフトウ
ェア群でなるデバイス・ドライバ・マネージャ１３５
と、ロボット装置１００の機構を管理するソフトウェア
群でなるデザインド・ロボット１３６とから構成されて
いる。The robotic server object 132 is located at the lowest layer of the device driver layer 130, and is an interface for accessing hardware such as the various sensors and actuators 125 _{1 to} 125 _n described above. Virtual robot 133, which is a software group that provides power, and a power manager 134, which is a software group that manages switching of power supply and the like.
And a device driver manager 135 which is a software group for managing various other device drivers.
And a designed robot 136 which is a software group for managing the mechanism of the robot apparatus 100.

【００８３】マネージャ・オブジェクト１３７は、オブ
ジェクト・マネージャ１３８及びサービス・マネージャ
１３９から構成されている。オブジェクト・マネージャ
１３８は、ロボティック・サーバ・オブジェクト１３
２、ミドル・ウェア・レイヤ１４０、及びアプリケーシ
ョン・レイヤ１４１に含まれる各ソフトウェア群の起動
や終了を管理するソフトウェア群であり、サービス・マ
ネージャ１３９は、メモリカード１２８（図７）に格納
されたコネクションファイルに記述されている各オブジ
ェクト間の接続情報に基づいて各オブジェクトの接続を
管理するソフトウェア群である。The manager object 137 is composed of an object manager 138 and a service manager 139. The object manager 138 manages the robotic server object 13
2. A software group that manages the activation and termination of each software group included in the middleware layer 140 and the application layer 141. The service manager 139 connects the software stored in the memory card 128 (FIG. 7). A group of software that manages the connection of each object based on the connection information between the objects described in the file.

【００８４】ミドル・ウェア・レイヤ１４０は、ロボテ
ィック・サーバ・オブジェクト１３２の上位層に位置
し、画像処理や音声処理などのこのロボット装置１００
の基本的な機能を提供するソフトウェア群から構成され
ている。また、アプリケーション・レイヤ１４１は、ミ
ドル・ウェア・レイヤ１４０の上位層に位置し、当該ミ
ドル・ウェア・レイヤ１４０を構成する各ソフトウェア
群によって処理された処理結果に基づいてロボット装置
１００の行動を決定するためのソフトウェア群から構成
されている。The middleware layer 140 is located on the upper layer of the robotic server object 132.
It consists of a software group that provides the basic functions of. The application layer 141 is located above the middleware layer 140, and determines the behavior of the robot device 100 based on the processing result processed by each software group constituting the middleware layer 140. It consists of a group of software for performing

【００８５】なお、ミドル・ウェア・レイヤ１４０及び
アプリケーション・レイヤ１４１の具体なソフトウェア
構成をそれぞれ図９に示す。FIG. 9 shows specific software configurations of the middleware layer 140 and the application layer 141, respectively.

【００８６】ミドル・ウェア・レイヤ１４０は、図９に
示すように、騒音検出用、温度検出用、明るさ検出用、
音階認識用、距離検出用、姿勢検出用、タッチセンサ
用、動き検出用及び色認識用の各信号処理モジュール１
５０〜１５８並びに入力セマンティクスコンバータモジ
ュール１５９などを有する認識系１６０と、出力セマン
ティクスコンバータモジュール１６８並びに姿勢管理
用、トラッキング用、モーション再生用、歩行用、転倒
復帰用、ＬＥＤ点灯用及び音再生用の各信号処理モジュ
ール１６１〜１６７などを有する出力系６９とから構成
されている。As shown in FIG. 9, the middle wear layer 140 is used for noise detection, temperature detection, brightness detection,
Signal processing modules 1 for scale recognition, distance detection, posture detection, touch sensor, motion detection, and color recognition
50 to 158, a recognition system 160 having an input semantics converter module 159, etc., and an output semantics converter module 168 and each of posture management, tracking, motion reproduction, walking, fall return, LED lighting and sound reproduction. And an output system 69 having signal processing modules 161 to 167 and the like.

【００８７】認識系１６０の各信号処理モジュール１５
０〜１５８は、ロボティック・サーバ・オブジェクト１
３２のバーチャル・ロボット１３３によりＤＲＡＭ１１
１（図７）から読み出される各センサデータや画像デー
タ及び音声データのうちの対応するデータを取り込み、
当該データに基づいて所定の処理を施して、処理結果を
入力セマンティクスコンバータモジュール１５９に与え
る。ここで、例えば、バーチャル・ロボット１３３は、
所定の通信規約によって、信号の授受或いは変換をする
部分として構成されている。Each signal processing module 15 of the recognition system 160
0 to 158 are robotic server objects 1
DRAM 11 by 32 virtual robots 133
1 (FIG. 7), the corresponding data among the sensor data, the image data, and the audio data read from
A predetermined process is performed based on the data, and a processing result is provided to the input semantics converter module 159. Here, for example, the virtual robot 133
It is configured as a part that exchanges or converts signals according to a predetermined communication protocol.

【００８８】入力セマンティクスコンバータモジュール
１５９は、これら各信号処理モジュール１５０〜１５８
から与えられる処理結果に基づいて、「うるさい」、
「暑い」、「明るい」、「ボールを検出した」、「転倒
を検出した」、「撫でられた」、「叩かれた」、「ドミ
ソの音階が聞こえた」、「動く物体を検出した」又は
「障害物を検出した」などの自己及び周囲の状況や、使
用者からの指令及び働きかけを認識し、認識結果をアプ
リケーション・レイヤ１４１（図７）に出力する。The input semantics converter module 159 is provided for each of the signal processing modules 150 to 158.
"Noisy" based on the processing result given by
"Hot", "Bright", "Detected ball", "Detected fall", "Stroked", "Slapped", "Heared Domiso scale", "Detected moving object" Or, it recognizes the situation of itself and surroundings such as "detected an obstacle", and commands and actions from the user, and outputs the recognition result to the application layer 141 (FIG. 7).

【００８９】アプリケーション・レイヤ１４ｌは、図１
０に示すように、行動モデルライブラリ１７０、行動切
換えモジュール１７１、学習モジュール１７２、感情モ
デル１７３及び本能モデル１７４の５つのモジュールか
ら構成されている。The application layer 141 is the one shown in FIG.
As shown in FIG. 0, it is composed of five modules: a behavior model library 170, a behavior switching module 171, a learning module 172, an emotion model 173, and an instinct model 174.

【００９０】行動モデルライブラリ１７０には、図１１
に示すように、「バッテリ残量が少なくなった場合」、
「転倒復帰する」、「障害物を回避する場合」、「感情
を表現する場合」、「ボールを検出した場合」などの予
め選択されたいくつかの条件項目にそれぞれ対応させ
て、それぞれ独立した行動モデル１７０_１〜１７０_ｎが
設けられている。The behavior model library 170 includes FIG.
As shown in, "When the battery level is low"
Independently corresponding to several pre-selected condition items such as "return to fall", "when avoiding obstacles", "when expressing emotion", "when ball is detected", etc. Behavior models 170 _{1 to} 170 _n are provided.

【００９１】そして、これら行動モデル１７０_１〜１７
０_ｎは、それぞれ入力セマンティクスコンバータモジュ
ール１５９から認識結果が与えられたときや、最後の認
識結果が与えられてから一定時間が経過したときなど
に、必要に応じて後述のように感情モデル１７３に保持
されている対応する情動のパラメータ値や、本能モデル
１７４に保持されている対応する欲求のパラメータ値を
参照しながら続く行動をそれぞれ決定し、決定結果を行
動切換えモジュール１７１に出力する。The behavior models 170 _{1 to} 17 ₁
0 _n are sent to the emotion model 173 as described later, as necessary, when a recognition result is given from the input semantics converter module 159 or when a certain period of time has passed since the last recognition result was given. The subsequent actions are determined with reference to the parameter values of the corresponding emotions held and the parameter values of the corresponding desires held in the instinct model 174, and the determination result is output to the action switching module 171.

【００９２】なお、この実施の形態の場合、各行動モデ
ル１７０_１〜１７０_ｎは、次の行動を決定する手法とし
て、図１２に示すような１つのノード（状態）ＮＯＤＥ
_０〜ＮＯＤＥ_ｎから他のどのノードＮＯＤＥ_０〜ＮＯＤ
Ｅ_ｎに遷移するかを各ノードＮＯＤＥ_０〜ＮＯＤＥ_ｎに
間を接続するアークＡＲＣ_１〜ＡＲＣ_ｎ１に対してそれ
ぞれ設定された遷移確率Ｐ_１〜Ｐ_ｎに基づいて確率的に
決定する有限確率オートマトンと呼ばれるアルゴリズム
を用いる。In the case of this embodiment, each of the behavior models 170 _{1 to} 170 _n uses one node (state) NODE as shown in FIG.
_{0 to} NODE _n to any other node NODE _{0 to} NOD
Finite probability automaton for determining probabilistically based on the transition probability _P 1 to P _n which is set respectively arc _ARC 1 _~ARC _n1 connecting between whether a transition to E _n each node NODE ₀ ~NODE _n An algorithm called is used.

【００９３】具体的に、各行動モデル１７０_１〜１７０
_ｎは、それぞれ自己の行動モデル１７０_１〜１７０_ｎを
形成するノードＮＯＤＥ_０〜ＮＯＤＥ_ｎにそれぞれ対応
させて、これらノードＮＯＤＥ_０〜ＮＯＤＥ_ｎごとに図
１３に示すような状態遷移表１８０を有している。Specifically, each of the behavior models 170 _{1 to} 170 ₁
_n has a state transition table 180 as shown in FIG. 13 for each of the nodes NODE _{0 to} NODE _n corresponding to the nodes NODE ₀ to NODE _n forming their own behavior models 170 _{1 to} 170 _n , respectively. ing.

【００９４】この状態遷移表１８０では、そのノードＮ
ＯＤＥ_０〜ＮＯＤＥ_ｎにおいて遷移条件とする入力イベ
ント（認識結果）が「入力イベント名」の行に優先順に
列記され、その遷移条件についてのさらなる条件が「デ
ータ名」及び「データ範囲」の行における対応する列に
記述されている。In this state transition table 180, the node N
Input events (recognition results) as transition conditions in ODE _{0 to} NODE _n are listed in order of priority in the row of “input event name”, and further conditions for the transition conditions are described in the rows of “data name” and “data range”. It is described in the corresponding column.

【００９５】したがって、図１３の状態遷移表８０で表
されるノードＮＯＤＥ_１００では、「ボールを検出（Ｂ
ＡＬＬ）」という認識結果が与えられた場合に、当該認
識結果と共に与えられるそのボールの「大きさ（ＳＩＺ
Ｅ）」が「0から1000」の範囲であることや、「障害物
を検出（ＯＢＳＴＡＣＬＥ）」という認識結果が与えら
れた場合に、当該認識結果と共に与えられるその障害物
までの「距離（ＤＩＳＴＡＮＣＥ）」が「0から100」の
範囲であることが他のノードに遷移するための条件とな
っている。Therefore, in the node NODE ₁₀₀ represented by the state transition table 80 in FIG.
ALL) ", the size of the ball (SIZ) given together with the recognition result is given.
E) is in the range of “0 to 1000”, or when a recognition result of “obstacle detected (OBSTABLE)” is given, the “distance (DISTANCE)” to the obstacle given together with the recognition result is given. )) Is in the range of “0 to 100”, which is a condition for transitioning to another node.

【００９６】また、このノードＮＯＤＥ_１００では、認
識結果の入力がない場合においても、行動モデル１７０
_１〜１７０_ｎが周期的に参照する感情モデル１７３及び
本能モデル７４にそれぞれ保持された各情動及び各欲求
のパラメータ値のうち、感情モデル７３に保持された
「喜び（ＪＯＹ）」、「驚き（ＳＵＲＰＲＩＳＥ）」若
しくは「悲しみ（ＳＵＤＮＥＳＳ）」のいずれかのパラ
メータ値が「50から100」の範囲であるときには他のノ
ードに遷移することができるようになっている。In the node NODE ₁₀₀ , even when no recognition result is input, the behavior model 170
₁ to 170 _n is out of the parameter values of the emotions and the desire held respectively in the emotion model 173 and the instinct model 74 refers periodically, held in the emotion model 73 "joy (JOY)", "surprise ( When the parameter value of either “SURPRISE” or “Sadness” is in the range of “50 to 100”, transition to another node can be made.

【００９７】また、状態遷移表１８０では、「他のノー
ドヘの遷移確率」の欄における「遷移先ノード」の列に
そのノードＮＯＤＥ_０〜ＮＯＤＥ_ｎから遷移できるノ
ード名が列記されていると共に、「入力イベント名」、
「データ値」及び「データの範囲」の行に記述された全
ての条件が揃ったときに遷移できる他の各ノードＮＯＤ
Ｅ_０〜ＮＯＤＥ_ｎへの遷移確率が「他のノードヘの遷移
確率」の欄内の対応する箇所にそれぞれ記述され、その
ノードＮＯＤＥ_０〜ＮＯＤＥ_ｎに遷移する際に出力すべ
き行動が「他のノードヘの遷移確率」の欄における「出
力行動」の行に記述されている。なお、「他のノードヘ
の遷移確率」の欄における各行の確率の和は１００
［％］となっている。In the state transition table 180, in the column of "transition destination node" in the column of "transition probability to another node", the names of nodes that can transition from the nodes NODE ₀ to NODE _n are listed. Input event name ",
Other nodes NOD that can transition when all the conditions described in the rows of “data value” and “data range” are met
The transition probabilities from E _{0 to} NODE _n are respectively described in corresponding portions in the column of “transition probability to another node”, and the action to be output when transitioning to the node NODE _{0 to} NODE _n is “other It is described in the row of “output action” in the column of “transition probability to node”. Note that the sum of the probabilities of each row in the column of “transition probability to another node” is 100
[%].

【００９８】したがって、図１３の状態遷移表１８０で
表されるノードＮＯＤＥ_１００では、例えば「ボールを
検出（ＢＡＬＬ）」し、そのボールの「ＳＩＺＥ（大き
さ）」が「0から1000」の範囲であるという認識結果が
与えられた場合には、「30［％］」の確率で「ノードＮ
ＯＤＥ_１２０（node 120）」に遷移でき、そのとき「Ａ
ＣＴＩＯＮ１」の行動が出力されることとなる。Therefore, in the node NODE ₁₀₀ represented by the state transition table 180 in FIG. 13, for example, “ball is detected (BALL)”, and the “SIZE” of the ball is in the range of “0 to 1000”. Is given, the probability of “30 [%]” and “node N
ODE ₁₂₀ (node 120) "and then" A
The action of “CTION1” is output.

【００９９】また、本発明の適用によりロボット装置１
００は、先に説明したように、行動を学習して、そのよ
うな学習した行動に感情を対応付けすることができるよ
うになされているが、このような状態遷移表１８０にお
ける遷移先とされる他のノードにそのような学習した行
動（Action）を記述するとともに、対応される感情の値
を記述し、或いは予め記述してある行動（対応される感
情の値も既に決定されている）を実行可能状態（すなわ
ち、使用可能に解除）とすることにより、行動の学習と
行動への感情の対応付けが可能になる。Further, the robot apparatus 1 according to the present invention is applied.
00, as described above, can learn the behavior and associate the emotion with the learned behavior. In addition, the learned action (Action) is described in another node, and the value of the corresponding emotion is described, or the action described in advance (the value of the corresponding emotion is already determined). Is made executable (that is, released to be usable), learning of the behavior and association of the emotion with the behavior become possible.

【０１００】各行動モデル１７０_１〜１７０_ｎは、それ
ぞれこのような状態遷移表１８０として記述されたノー
ドＮＯＤＥ_０〜ＮＯＤＥ_ｎがいくつも繋がるようにし
て構成されており、入力セマンティクスコンバータモジ
ュール１５９から認識結果が与えられたときなどに、対
応するノードＮＯＤＥ_０〜ＮＯＤＥ_ｎの状態遷移表を利
用して確率的に次の行動を決定し、決定結果を行動切換
えモジュール１７１に出力するようになされている。Each of the behavior models 170 _{1 to} 170 _n is constituted by connecting a number of nodes NODE ₀ to NODE _n described as such a state transition table 180, and is recognized from the input semantics converter module 159. When a result is given, the next action is determined stochastically using the state transition table of the corresponding nodes NODE ₀ to NODE _n , and the determined result is output to the action switching module 171. .

【０１０１】図１０に示す行動切換えモジュール１７１
は、行動モデルライブラリ１７０の各行動モデル１７０
_１〜１７０_ｎからそれぞれ出力される行動のうち、予め
定められた優先順位の高い行動モデル１７０_１〜１７０
_ｎから出力された行動を選択し、当該行動を実行すべき
旨のコマンド（以下、これを行動コマンドという。）を
ミドル・ウェア・レイヤ１４０の出力セマンティクスコ
ンバータモジュール１６８に送出する。なお、この実施
の形態においては、図１１において下側に表記された行
動モデル１７０_１〜１７０_ｎほど優先順位が高く設定さ
れている。The action switching module 171 shown in FIG.
Is the behavior model 170 of the behavior model library 170
Among the behaviors output from _{1 to} 170 _n, behavior models 170 _{1 to} 170 having a predetermined high priority
_n, and outputs a command to execute the action (hereinafter referred to as an action command) to the output semantics converter module 168 of the middleware layer 140. In this embodiment, the priority order is set higher for the behavior models 170 _{1 to} 170 _n shown on the lower side in FIG.

【０１０２】また、学習した行動を再現する際には、行
動切換えモジュール１７１は、指示された所望の行動を
選択して、その行動を実行すべきコマンドを、出力セマ
ンティクスコンバータモジュール１６８に送出する。こ
の行動切換えモジュール１７１からのコマンドにより、
ロボット装置１００は、学習した行動を出力することが
できるようになる。When reproducing the learned behavior, the behavior switching module 171 selects the designated desired behavior and sends a command to execute the behavior to the output semantics converter module 168. By the command from the action switching module 171,
The robot device 100 can output the learned behavior.

【０１０３】さらに、行動切換えモジュール１７１は、
行動完了後に出力セマンティクスコンバータモジュール
１６８から与えられる行動完了情報に基づいて、その行
動が完了したことを学習モジュール１７２、感情モデル
１７３及び本能モデル１７４に通知する。Further, the action switching module 171 includes:
After the action is completed, the learning module 172, the emotion model 173, and the instinct model 174 are notified of the completion of the action based on the action completion information provided from the output semantics converter module 168.

【０１０４】一方、学習モジュール１７２は、入力セマ
ンティクスコンバータモジュール１５９から与えられる
認識結果のうち、「叩かれた」や「撫でられた」など、
使用者からの働きかけとして受けた教示の認識結果を入
力する。On the other hand, the learning module 172 recognizes, among the recognition results given from the input semantics converter module 159, such as “hit” or “stroke”.
The recognition result of the instruction received as an action from the user is input.

【０１０５】そして、学習モジュール１７２は、この認
識結果及び行動切換えモジュール１７１からの通知に基
づいて、「叩かれた（叱られた）」ときにはその行動の
発現確率を低下させ、「撫でられた（誉められた）」と
きにはその行動の発現確率を上昇させるように、行動モ
デルライブラリ１７０における対応する行動モデル１７
０_１〜１７０_ｎの対応する遷移確率を変更する。Then, based on the recognition result and the notification from the action switching module 171, the learning module 172 lowers the probability of occurrence of the action when “slapped (scolded)”, and “ In some cases, the corresponding behavior model 17 in the behavior model library 170 is increased so as to increase the probability of occurrence of the behavior.
Changing the 0 ₁ to 170 _n corresponding transition probability.

【０１０６】例えば、上述したような学習部１は、実際
のロボット装置１においては、このような学習モジュー
ル１７２において構成され、実現されるものである。For example, the learning section 1 as described above is configured and realized by such a learning module 172 in the actual robot apparatus 1.

【０１０７】他方、感情モデル１７３は、「喜び（jo
y）」、「悲しみ（sadness）」、「怒り（anger）」、
「驚き（surprise）」、「嫌悪（disgust）」及び「恐
れ（fear）」の合計６つの情動について、各情動ごとに
その情動の強さを表すパラメータを保持している。そし
て、感情モデル１７３は、これら各情動のパラメータ値
を、それぞれ入力セマンティクスコンバータモジュール
１５９から与えられる「叩かれた」及び「撫でられた」
などの特定の認識結果と、経過時間及び行動切換えモジ
ュール１７１からの通知となどに基づいて周期的に更新
する。On the other hand, the emotion model 173 indicates “joy (jo
y) "," sadness "," anger ",
For a total of six emotions, “surprise”, “disgust” and “fear”, a parameter indicating the intensity of the emotion is stored for each emotion. Then, the emotion model 173 converts the parameter values of each of these emotions into “strapped” and “stroke” given from the input semantics converter module 159, respectively.
The update is periodically performed based on a specific recognition result such as, for example, an elapsed time and a notification from the action switching module 171.

【０１０８】具体的には、感情モデル１７３は、入力セ
マンティクスコンバータモジュール１５９から与えられ
る認識結果と、そのときのロボット装置１００の行動
と、前回更新してからの経過時間となどに基づいて所定
の演算式により算出されるそのときのその情動の変動量
を△Ｅ［ｔ］、現在のその情動のパラメータ値をＥ
［ｔ］、その情動の感度を表す係数をｋ_ｅとして、
（８）式によって次の周期におけるその情動のパラメー
タ値Ｅ［ｔ＋１］を算出し、これを現在のその情動のパ
ラメータ値Ｅ［ｔ］と置き換えるようにしてその情動の
パラメータ値を更新する。また、感情モデル１７３は、
これと同様にして全ての情動のパラメータ値を更新す
る。Specifically, emotion model 173 is based on a recognition result given from input semantics converter module 159, the behavior of robot device 100 at that time, the elapsed time since the last update, and the like. The variation amount of the emotion at that time calculated by the arithmetic expression is ΔE [t], and the current parameter value of the emotion is E
[T], the coefficient representing the sensitivity of the emotion as _{k e,}
The parameter value E [t + 1] of the emotion in the next cycle is calculated by the equation (8), and the parameter value of the emotion is updated by replacing the parameter value E [t] with the parameter value E [t] of the emotion. The emotion model 173 is
Similarly, the parameter values of all emotions are updated.

【０１０９】[0109]

【数８】 (Equation 8)

【０１１０】ここで、本発明が適用されることにより、
ロボット装置１００は、学習した行動に感情を対応付け
することができるようになっているが、ここでは、それ
に対応する処理として、ロボット装置１００の学習した
行動の行動結果をフィードバックさせて、情動のパラメ
ータ値を更新している。例えば、行動結果のフィードバ
ックは、行動切換えモジュール１７１の出力（感情が付
加された行動）によりなされる。また、行動結果のフィ
ードバックにより情動のパラメータ値を所定量変化させ
る必要があるが、例えば、適当に決定した変化量を、学
習した行動に予め対応付けしておくようにする。このよ
うにすることで、ロボット装置１００が、ＲＮＮによっ
て学習した行動に相当する行動を起こすことによって、
それに対応される情動パラメータを変化させることがで
きるようになる。Here, by applying the present invention,
The robot apparatus 100 is capable of associating the emotion with the learned behavior, but here, as a processing corresponding thereto, the behavior result of the behavior learned by the robot apparatus 100 is fed back, and the emotion Updating parameter values. For example, the feedback of the action result is performed by the output of the action switching module 171 (the action to which the emotion is added). In addition, it is necessary to change the parameter value of the emotion by a predetermined amount based on the feedback of the action result. For example, an appropriately determined change amount is previously associated with the learned action. By doing so, the robot apparatus 100 performs an action corresponding to the action learned by the RNN,
The corresponding emotion parameter can be changed.

【０１１１】なお、各認識結果や出力セマンティクスコ
ンバータモジュール１６８からの通知が各情動のパラメ
ータ値の変動量△Ｅ［ｔ］にどの程度の影響を与えるか
は予め決められており、例えば「叩かれた」といった認
識結果は「怒り」の情動のパラメータ値の変動量△Ｅ
［ｔ］に大きな影響を与え、「撫でられた」といった認
識結果は「喜び」の情動のパラメータ値の変動量△Ｅ
［ｔ］に大きな影響を与えるようになっている。It is determined in advance how much each recognition result and the notification from the output semantics converter module 168 affect the variation ΔE [t] of the parameter value of each emotion. Is the amount of change in the parameter value of the emotion of “anger” △ E
[T] is greatly affected, and the recognition result such as “stroke” is the variation amount of the parameter value of the emotion of “joy” 喜び E
[T] is greatly affected.

【０１１２】ここで、出力セマンティクスコンバータモ
ジュール１６８からの通知とは、いわゆる行動のフィー
ドバック情報（行動完了情報）であり、行動の出現結果
の情報であり、感情モデル１７３は、このような情報に
よっても感情を変化させる。これは、例えば、「吠え
る」といった行動により怒りの感情レベルが下がるとい
ったようなことである。なお、出力セマンティクスコン
バータモジュール１６８からの通知は、上述した学習モ
ジュール１７２にも入力されており、学習モジュール１
７２は、その通知に基づいて行動モデル１７０_１〜１７
０_ｎの対応する遷移確率を変更する。Here, the notification from the output semantics converter module 168 is so-called action feedback information (action completion information), information on the appearance result of the action, and the emotion model 173 also uses such information. Change emotions. This is, for example, a behavior such as "barking" that lowers the emotional level of anger. Note that the notification from the output semantics converter module 168 is also input to the learning module 172 described above, and the learning module 1
72 is an action model 170 _{1 to} 17 based on the notification.
Change the corresponding transition probabilities of 0 _n .

【０１１３】一方、本能モデル１７４は、「運動欲（ex
ercise）」、「愛情欲（affection）」、「食欲（appet
ite）」及び「好奇心（curiosity）」の互いに独立した
４つの欲求について、これら欲求ごとにその欲求の強さ
を表すパラメータを保持している。そして、本能モデル
１７４は、これらの欲求のパラメータ値を、それぞれ入
力セマンティクスコンバータモジュール１５９から与え
られる認識結果や、経過時間及び行動切換えモジュール
１７１からの通知などに基づいて周期的に更新する。On the other hand, the instinct model 174 indicates that “the desire to exercise (ex
ercise), “affection”, “appet”
ite) "and" curiosity ", each of which has a parameter indicating the strength of the desire for each of the four independent desires. Then, the instinct model 174 periodically updates these parameter values of the desire based on the recognition result given from the input semantics converter module 159, the elapsed time, the notification from the action switching module 171 and the like.

【０１１４】具体的には、本能モデル１７４は、「運動
欲」、「愛情欲」及び「好奇心」については、認識結
果、経過時間及び出力セマンティクスコンバータモジュ
ール１６８からの通知などに基づいて所定の演算式によ
り算出されるそのときのその欲求の変動量をΔＩ
［ｋ］、現在のその欲求のパラメータ値をＩ［ｋ］、そ
の欲求の感度を表す係数ｋ_ｉとして、所定周期で（９）
式を用いて次の周期におけるその欲求のパラメータ値Ｉ
［ｋ＋１］を算出し、この演算結果を現在のその欲求の
パラメータ値Ｉ［ｋ］と置き換えるようにしてその欲求
のパラメータ値を更新する。また、本能モデル１７４
は、これと同様にして「食欲」を除く各欲求のパラメー
タ値を更新する。More specifically, the instinct model 174 determines, based on the recognition result, the elapsed time, the notification from the output semantics converter module 168, and the like, for “exercise desire”, “affection desire”, and “curiosity”. The change amount of the desire at that time calculated by the arithmetic expression is ΔI
[K], the current parameter value of the desire I [k], as the coefficient k _i which represents the sensitivity of the desire, in a predetermined cycle (9)
Using the equation, the parameter value I of the desire in the next cycle
[K + 1] is calculated, and the calculation result is replaced with the current parameter value I [k] of the desire to update the parameter value of the desire. Instinct model 174
Updates the parameter values of each desire except “appetite” in the same manner.

【０１１５】[0115]

【数９】 (Equation 9)

【０１１６】ここで、本発明が適用されることにより、
ロボット装置１００は、感情の場合と同様に、学習した
行動に本能を対応付けすることができるようになってい
るが、ここでは、それに対応する処理として、ロボット
装置１００の学習した行動の行動結果をフィードバック
させて、欲求のパラメータ値を更新している。例えば、
行動結果のフィードバックは、行動切換えモジュール１
７１の出力（感情が付加された行動）によりなされる。
また、行動結果のフィードバックにより欲求のパラメー
タ値を所定量変化させる必要があるが、例えば、適当に
決定した変化量を、学習した行動に予め対応付けしてお
くようにする。このようにすることで、ロボット装置
が、ＲＮＮによって学習した行動に相当する行動を起こ
すことによって、それに対応される欲求パラメータを変
化させることができるようになる。Here, by applying the present invention,
The robot apparatus 100 is capable of associating the instinct with the learned behavior as in the case of the emotion, but here, as the processing corresponding thereto, the behavior result of the behavior learned by the robot apparatus 100 is described. To update the parameter value of desire. For example,
The action result feedback is sent to the action switching module 1
This is performed by the output of 71 (behavior to which an emotion is added).
In addition, it is necessary to change the parameter value of the desire by a predetermined amount based on the feedback of the action result. For example, an appropriately determined change amount is associated with the learned action in advance. By doing so, the robot device can change the desire parameter corresponding to the behavior by causing the behavior corresponding to the behavior learned by the RNN.

【０１１７】なお、認識結果及び出力セマンティクスコ
ンバータモジュール１６８からの通知などが各欲求のパ
ラメータ値の変動量△Ｉ［ｋ］にどの程度の影響を与え
るかは予め決められており、例えば出力セマンティクス
コンバータモジュール１６８からの通知は、「疲れ」の
パラメータ値の変動量△Ｉ［ｋ］に大きな影響を与える
ようになっている。It is determined in advance how much the recognition result and the notification from the output semantics converter module 168 affect the variation ΔI [k] of the parameter value of each desire, for example, the output semantics converter. The notification from the module 168 has a large influence on the variation ΔI [k] of the parameter value of “fatigue”.

【０１１８】なお、本実施の形態においては、各情動及
び各欲求（本能）のパラメータ値がそれぞれ0から100ま
での範囲で変動するように規制されており、また係数ｋ
_ｅ、ｋ_ｉの値も各情動及び各欲求ごとに個別に設定され
ている。In the present embodiment, the parameter value of each emotion and each desire (instinct) is regulated to fluctuate in the range of 0 to 100, and the coefficient k
_e, the value of k _i is also set individually for each emotion and each desire.

【０１１９】一方、ミドル・ウェア・レイヤ４０の出力
セマンティクスコンバータモジュール１６８は、図９に
示すように、上述のようにしてアプリケーション・レイ
ヤ１４１の行動切換えモジュール１７１から与えられる
「前進」、「喜ぶ」、「鳴く」又は「トラッキング（ボ
ールを追いかける）」といった抽象的な行動コマンドを
出力系１６９の対応する信号処理モジュール１６１〜１
６７に与える。On the other hand, the output semantics converter module 168 of the middleware layer 40 is, as shown in FIG. 9, "forward" and "pleased" given from the action switching module 171 of the application layer 141 as described above. , "Screaming" or "tracking (following the ball)" is output to the corresponding signal processing module 161-1 of the output system 169.
Give to 67.

【０１２０】そしてこれら信号処理モジュール１６１〜
１６７は、行動コマンドが与えられると当該行動コマン
ドに基づいて、その行動を行うために対応するアクチュ
エータ１２５_１〜１２５_ｎ（図７）に与えるべきサーボ
指令値や、スピーカ１２４（図７）から出力する音の音
声データ及び又は「目」のＬＥＤに与える駆動データを
生成し、これらのデータをロボティック・サーバ・オブ
ジェクト１３２のバーチャル・ロボット１３３及び信号
処理回路１１４（図７）を順次介して対応するアクチュ
エータ１２５_１〜１２５_ｎ又はスピーカ１２４又はＬＥ
Ｄに順次送出する。The signal processing modules 161 to 161
167, given the behavior command based on the action command, and servo command value to be supplied to the actuator ₁₂₅ 1 to 125 _n (FIG. 7) corresponding to perform that action, the output from the speaker 124 (FIG. 7) The sound data of the sound to be played and / or the driving data to be given to the LED of the "eye" are generated, and these data are sequentially processed via the virtual robot 133 of the robotic server object 132 and the signal processing circuit 114 (FIG. 7). Actuator 125 _{1 to} 125 _n or speaker 124 or LE
D.

【０１２１】このようにしてロボット装置１００におい
ては、制御プログラムに基づいて、自己（内部）及び周
囲（外部）の状況や、使用者からの指示及び働きかけに
応じた自律的な行動を行うことができるようになされて
いる。As described above, in the robot apparatus 100, based on the control program, it is possible to perform an autonomous action according to its own (internal) and surrounding (external) conditions, and instructions and actions from the user. It has been made possible.

【０１２２】以上が、具体的なロボット装置１の構成で
ある。なお、上述の実施の形態では、学習した行動に対
して感情を対応付けて、学習した行動と感情を連動して
変化させる場合について説明したが、これに限定される
ものではない。例えば、実際にした行動を学習した行動
と比較して、その比較結果に基づいて所定の感情を行動
に表出させたり、感情を変化させることもできる。例え
ば、学習により新たな環境における行動を学習すること
ができるのであるが、このような学習をした結果と比較
して、学習していない環境における行動をしときに所定
の感情にする。例えば、所定の感情として、「驚き」や
「恐怖」の感情を変化させる。これは、例えば、未知の
環境に行ったときや未知の入力があったときには、動物
の一般的な感情として「驚き」や「恐怖」が伴うからで
ある。The specific configuration of the robot device 1 has been described above. In the above-described embodiment, the case where the learned behavior is associated with the emotion and the learned behavior and the emotion are changed in conjunction with each other has been described. However, the present invention is not limited to this. For example, an actual action can be compared with a learned action, and based on the comparison result, a predetermined emotion can be expressed in the action or the emotion can be changed. For example, a behavior in a new environment can be learned by learning. When a behavior in an environment where learning is not performed is compared with a result of such learning, a predetermined emotion is set. For example, the emotion of “surprise” or “fear” is changed as the predetermined emotion. This is because, for example, when the user goes to an unknown environment or receives an unknown input, “surprise” or “fear” is accompanied as a general emotion of the animal.

【０１２３】例えば、学習手法としてＲＮＮを使用した
場合、学習後のＲＮＮに対する新たな行動の入力（具体
的には、センサ入力）が、学習されていないものである
とき、出力は未知の値として出力される。つまり、ＲＮ
Ｎの出力するセンサの状態は実際のセンサの状態と大き
くくい違う。よって、その際の出力から、その行動或い
は環境が学習されていないことは判断できる。このよう
なことから、出力により判別した結果、学習した行動或
いは環境とは異なる行動或いは環境であるときには、例
えば「驚き」等の所定の感情を行動に表出させたり、そ
の感情レベルを変化させたりすることもできる。For example, when RNN is used as a learning method, when an input of a new action (specifically, a sensor input) to the RNN after learning is not learned, the output is set to an unknown value. Is output. That is, RN
The state of the sensor that outputs N is very different from the actual state of the sensor. Therefore, it can be determined from the output at that time that the action or environment has not been learned. For this reason, as a result of the discrimination based on the output, when the action or environment is different from the learned action or environment, a predetermined emotion such as “surprise” is expressed in the action or the emotion level is changed. You can also.

【０１２４】[0124]

【発明の効果】本発明に係るロボット装置は、入力情報
に応じて内的状態を変化させる内的状態変化手段と、新
たな行動の学習をする学習手段と、学習手段により学習
した行動に対して内的状態を対応付けする対応付け手段
と、行動の制御を内的状態に対応付けて行う行動制御手
段とを備えることにより、行動を学習するともに、その
行動に感情を対応付けすることができ、例えば、内的状
態手段により変化されて感情が所定の状態になったと
き、学習した行動を出現させることができるようにな
り、また、学習した行動により対応する感情を内的状態
手段により変化させることができるようになる。The robot apparatus according to the present invention comprises: an internal state changing means for changing an internal state according to input information; a learning means for learning a new action; Means for associating the internal state with the internal state, and behavior control means for performing the control of the behavior in association with the internal state, so that the behavior can be learned and the emotion can be associated with the behavior. For example, when the emotion is changed to a predetermined state by the internal state means, the learned behavior can be made to appear, and the emotion corresponding to the learned behavior can be expressed by the internal state means. Can be changed.

【０１２５】また、本発明に係るロボット装置の行動制
御方法は、ロボット装置が新たな行動の学習をする学習
工程と、学習工程にて学習した行動に対してロボット装
置が内的状態を対応付けする対応付け工程と、ロボット
装置が行動の制御を内的状態に対応付けて行う行動制御
工程とを有することにより、このようなロボット装置の
行動制御方法により制御されるロボット装置は、行動を
学習するともに、その行動に感情を対応付けすることが
でき、例えば、内的状態手段により変化されて感情が所
定の状態になったとき、学習した行動を出現させること
ができるようになり、また、学習した行動により対応す
る感情を内的状態手段により変化させることができるよ
うになる。Further, in the behavior control method for a robot apparatus according to the present invention, the learning step in which the robot apparatus learns a new behavior and the robot apparatus associates the internal state with the behavior learned in the learning step. The robot device controlled by such a behavior control method of a robot device has an associating step of performing a behavior control process in which the behavior of the robot device is associated with an internal state. At the same time, emotions can be associated with the behavior, for example, when the emotion is changed to a predetermined state by the internal state means, the learned behavior can appear, The emotion corresponding to the learned behavior can be changed by the internal state means.

【０１２６】また、本発明に係るプログラムは、ロボッ
ト装置が新たな行動の学習をする学習工程と、学習工程
にて学習した行動に対して、ロボット装置が内的状態を
対応付けする対応付け工程と、ロボット装置が行動の制
御を内的状態に対応付けて行う行動制御工程とをロボッ
ト装置に実行させることにより、ロボット装置は、行動
を学習するともに、その行動に感情を対応付けすること
ができ、例えば、内的状態手段により変化されて感情が
所定の状態になったとき、学習した行動を出現させるこ
とができるようになり、また、学習した行動により対応
する感情を内的状態手段により変化させることができる
ようになる。The program according to the present invention includes a learning step in which the robot device learns a new action, and an association step in which the robot device associates the internal state with the action learned in the learning step. And causing the robot device to execute a behavior control step in which the robot device controls the behavior in association with the internal state, so that the robot device can learn the behavior and associate the emotion with the behavior. For example, when the emotion is changed to a predetermined state by the internal state means, the learned behavior can be made to appear, and the emotion corresponding to the learned behavior can be expressed by the internal state means. Can be changed.

【０１２７】また、本発明に係る記録媒体は、ロボット
装置が新たな行動の学習をする学習工程と、学習工程に
て学習した行動に対して、ロボット装置が内的状態を対
応付けする対応付け工程と、ロボット装置が行動の制御
を内的状態に対応付けて行う行動制御工程とをロボット
装置に実行させるプログラムが記録されており、このよ
うな記録媒体に記録されたプログラムにより行動が制御
されるロボット装置は、行動を学習するともに、その行
動に感情を対応付けすることができ、例えば、内的状態
手段により変化されて感情が所定の状態になったとき、
学習した行動を出現させることができるようになり、ま
た、学習した行動により対応する感情を内的状態手段に
より変化させることができるようになる。Further, the recording medium according to the present invention provides a learning step in which the robot device learns a new action, and an association in which the robot device associates the internal state with the action learned in the learning step. A program for causing the robot device to execute a process and a behavior control process in which the robot device performs behavior control in association with an internal state is recorded, and the behavior is controlled by the program recorded on such a recording medium. The robot device learns the behavior and can associate the behavior with the emotion. For example, when the emotion is changed to the predetermined state by the internal state means,
The learned behavior can appear, and the emotion corresponding to the learned behavior can be changed by the internal state means.

[Brief description of the drawings]

【図１】実施の形態のロボット装置における発明を実現
する要部を示すブロック図である。FIG. 1 is a block diagram showing a main part for realizing the invention in a robot device according to an embodiment.

【図２】上述の学習部の具体的な構成であって、複数の
ＲＮＮによって階層的に構成されているものを示す図で
ある。FIG. 2 is a diagram showing a specific configuration of the above-described learning unit, which is hierarchically configured by a plurality of RNNs.

【図３】上述の階層構造として構成されている学習部の
下位層のＲＮＮの構成を示す図である。FIG. 3 is a diagram illustrating a configuration of an RNN in a lower layer of a learning unit configured as the above-described hierarchical structure.

【図４】上述の階層構造として構成されている学習部の
上位層のＲＮＮの構成を示す図である。FIG. 4 is a diagram illustrating a configuration of an RNN in an upper layer of a learning unit configured as the above-described hierarchical structure.

【図５】上述の学習部による行動学習を説明するために
使用した図である。FIG. 5 is a diagram used to explain behavior learning by the learning unit described above.

【図６】実施の形態のロボット装置の外観構成を示す斜
視図である。FIG. 6 is a perspective view illustrating an external configuration of the robot device according to the embodiment.

【図７】上述のロボット装置の回路構成を示すブロック
図である。FIG. 7 is a block diagram illustrating a circuit configuration of the robot device described above.

【図８】上述のロボット装置のソフトウェア構成を示す
ブロック図である。FIG. 8 is a block diagram showing a software configuration of the robot device described above.

【図９】上述のロボット装置のソフトウェア構成におけ
るミドル・ウェア・レイヤの構成を示すブロック図であ
る。FIG. 9 is a block diagram showing a configuration of a middleware layer in the software configuration of the robot device described above.

【図１０】上述のロボット装置のソフトウェア構成にお
けるアプリケーション・レイヤの構成を示すブロック図
である。FIG. 10 is a block diagram showing a configuration of an application layer in the software configuration of the robot device described above.

【図１１】上述のアプリケーション・レイヤの行動モデ
ルライブラリの構成を示すブロック図である。FIG. 11 is a block diagram showing a configuration of an action model library of the application layer described above.

【図１２】ロボット装置の行動決定のための情報となる
有限確率オートマトンを説明するために使用した図であ
る。FIG. 12 is a diagram used for explaining a finite probability automaton that is information for determining an action of the robot apparatus.

【図１３】有限確率オートマトンの各ノードに用意され
た状態遷移表を示す図である。FIG. 13 is a diagram showing a state transition table prepared for each node of the finite probability automaton.

[Explanation of symbols]

１学習部、２対応付け部、１００ロボット装置 1 learning unit, 2 association unit, 100 robot device

Claims

[Claims]

1. An internal state is changed according to input information,
A robot device that acts based on an internal state, comprising: an internal state changing unit that changes the internal state according to input information; a learning unit that learns a new behavior; A robot apparatus, comprising: associating means for associating an internal state with an action performed; and action control means for controlling action in association with the internal state.

2. The robot apparatus according to claim 1, wherein the input information is at least one of external, internal, and action result information.

3. An emotion model in which the internal state is modeled using an emotion state as a parameter, wherein the internal state changing means changes the parameter of the emotion model according to the input information. The robot device according to claim 1, wherein:

4. An instinct model in which the internal state is modeled using an instinct state as a parameter, wherein the internal state changing means changes the parameter of the instinct model in accordance with the input information. The robot device according to claim 1, wherein:

5. A storage device in which the behavior data is registered, wherein the learning device performs the learning of the new behavior in order to make the non-appearing behavior data registered in the storage device appear in advance. The robot apparatus according to claim 1, wherein:

6. The robot apparatus according to claim 1, wherein the action control means causes a corresponding action to appear based on the change in the internal state.

7. The learning means learns to act in a new environment, and the internal state changing means performs an action in an environment not learned by the learning means. The robot apparatus according to claim 1, wherein the target state is changed.

8. The learning method according to claim 1, wherein the learning means learns by using a neural network in which information on an action to be learned is input to an input layer, an intermediate layer, and an output layer. Robotic device.

9. An internal state is changed according to input information,
A behavior control method for a robot device that controls the behavior of a robot device that acts based on an internal state, comprising: a learning step in which the robot device learns a new behavior; and a behavior learned in the learning process. On the other hand, an associating step in which the robot apparatus associates an internal state, and an action control step in which the robot apparatus performs action control in association with the internal state. Behavior control method.

10. A program for changing an internal state in accordance with input information and controlling an action of a robot apparatus that acts based on the internal state, wherein the robot apparatus learns a new action. A learning step of performing, an associating step in which the robot apparatus associates an internal state with the behavior learned in the learning step, and the robot apparatus performs control of the behavior in association with the internal state. A program for causing a robot apparatus to execute an action control step.

11. A program for changing an internal state in accordance with input information and controlling an action of a robot apparatus that performs an action based on the internal state, wherein the robot apparatus learns a new action. A learning step of performing, an associating step in which the robot apparatus associates an internal state with the behavior learned in the learning step, and the robot apparatus performs control of the behavior in association with the internal state. A recording medium on which a program for causing a robot device to execute an action control step is recorded.