JP2003340760A

JP2003340760A - Robot device and robot control method, recording medium and program

Info

Publication number: JP2003340760A
Application number: JP2002145335A
Authority: JP
Inventors: Tsutomu Sawada; 務澤田; Masahiro Fujita; 雅博藤田; Osamu Hanagata; 理花形; Takeshi Takagi; 剛高木
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2002-05-20
Filing date: 2002-05-20
Publication date: 2003-12-02

Abstract

<P>PROBLEM TO BE SOLVED: To enable a robot to act so as not to make a user feel bored. <P>SOLUTION: An action control unit 72 calculates the action value, and determines an action (a) to be performed based on the value. When the action control unit 72 performs the action (a), a reward (r) is given by an environment user 111. The action control unit 72 updates the action value based on the reward (r) acquired from the environment user 111 and the preset learning ratio. The action control unit 72 changes the learning ratio based on input information. The present invention is applicable to an enhancement learning system of the robot. <P>COPYRIGHT: (C)2004,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、ロボット装置およ
びロボット制御方法、記録媒体、並びにプログラムに関
し、特に、ロボットに、人間と同じように、かつ、ユー
ザを飽きさせないように、行動させることができるよう
にしたロボット装置およびロボット制御方法、記録媒
体、並びにプログラムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a robot apparatus, a robot control method, a recording medium, and a program, and in particular, it allows a robot to act in the same manner as a human being and keep the user from getting bored. And a robot control method, a recording medium, and a program.

【０００２】[0002]

【従来の技術】生命体を模擬するロボット装置とインタ
ラクションする場合、ユーザは、ロボット装置に対して
「人と同じような」振る舞いを期待している。2. Description of the Related Art When interacting with a robot apparatus simulating a living organism, a user expects the robot apparatus to behave like a human.

【０００３】ロボット装置にこのような行動を実行させ
るには、ロボット装置に行動獲得のための強化学習を行
わせればよい。強化学習においては、ロボット装置は、
行動価値に基づいて、行動を学習する。In order for the robot apparatus to execute such an action, the robot apparatus may be made to perform reinforcement learning for action acquisition. In reinforcement learning, the robotic device
Learn behavior based on behavior value.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、このよ
うな従来の強化学習では、行動価値の学習率（更新率）
は、ロボット装置の外部状態や内部状態によらず一定で
あり、同様の報酬を与えた際の行動価値は同様に更新さ
れる。However, in such conventional reinforcement learning, the learning rate (update rate) of action value is
Is constant regardless of the external state or internal state of the robot apparatus, and the action value when a similar reward is given is similarly updated.

【０００５】したがって、ロボット装置に、人と同じよ
うに、かつ、ユーザを飽きさせないように、振る舞わせ
ることは困難であった。Therefore, it has been difficult for the robot apparatus to behave in the same manner as a human being and to keep the user from getting tired.

【０００６】本発明はこのような状況に鑑みてなされた
ものであり、ロボット装置に、人と同じように行動させ
つつ、ユーザを飽きさせないように行動させることがで
きるようにすることを目的とする。The present invention has been made in view of the above circumstances, and an object thereof is to allow a robot apparatus to act in the same manner as a human being, but to keep the user from getting tired. To do.

【０００７】[0007]

【課題を解決するための手段】本発明のロボット装置
は、行動学習能力を動的に変更する行動管理手段を備え
ることを特徴とする。A robot apparatus according to the present invention is characterized by comprising action management means for dynamically changing the action learning ability.

【０００８】行動学習能力は、学習率によって決定され
ることができる。The behavioral learning ability can be determined by the learning rate.

【０００９】学習率は、入力情報に応じて変化されるこ
とができる。The learning rate can be changed according to the input information.

【００１０】時刻を計時する計測手段をさらに備え、学
習率は、時刻に応じて変化されることができる。The learning rate can be changed according to the time by further comprising a measuring means for measuring the time.

【００１１】本発明のロボット制御方法は、行動学習能
力を動的に変更する行動管理ステップを含むことを特徴
とする。The robot control method of the present invention is characterized by including an action management step of dynamically changing the action learning ability.

【００１２】本発明の記録媒体のプログラムは、行動学
習能力を動的に変更する行動管理ステップを含むことを
特徴とする。The program of the recording medium of the present invention is characterized by including an action management step of dynamically changing the action learning ability.

【００１３】本発明のプログラムは、行動学習能力を動
的に変更する行動管理ステップをコンピュータに実行さ
せることを特徴とする。The program of the present invention is characterized by causing a computer to execute an action management step of dynamically changing the action learning ability.

【００１４】本発明のロボット装置およびロボット制御
方法、記録媒体、並びにプログラムにおいては、入力情
報に基づいて、行動学習能力が動的に変更される。In the robot apparatus, the robot control method, the recording medium, and the program of the present invention, the behavior learning ability is dynamically changed based on the input information.

【００１５】[0015]

【発明の実施の形態】以下、本発明の実施の形態につい
て、図面を参照して説明する。図１は、本発明を適用し
たペットロボット１の例を示す斜視図である。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a perspective view showing an example of a pet robot 1 to which the present invention is applied.

【００１６】例えば、ペットロボット１は、４本足の小
熊形状のものとされており、胴体部ユニット２の前後左
右に、それぞれ脚部ユニット３Ａ、３Ｂ、３Ｃ、３Ｄが
連結されるとともに、胴体部ユニット２の前端部と後端
部に、それぞれ頭部ユニット４と尻尾部ユニット５が連
結されている。For example, the pet robot 1 has a four-legged bear shape, and the leg units 3A, 3B, 3C, and 3D are connected to the front, rear, left, and right of the body unit 2, respectively, and the body is The head unit 4 and the tail unit 5 are connected to the front end and the rear end of the sub unit 2, respectively.

【００１７】図２は、図１のペットロボット１の内部構
成の例を示すブロック図である。胴体部ユニット２に
は、ペットロボット１の全体を制御するコントローラ１
０、ペットロボット１の各部に電力を供給するバッテリ
１１、並びにバッテリセンサ１２、および熱センサ１３
からなる内部センサ１４が格納されている。このコント
ローラ１０には、CPU(Central Processing Unit)１０
Ａ、CPU１０Ａが各部を制御するためのプログラムが記
憶されているメモリ１０Ｂ、および計時動作を行い、現
在の日時、起動後の経過時間等を計測する時計１０Ｃが
設けられている。FIG. 2 is a block diagram showing an example of the internal configuration of the pet robot 1 of FIG. The body unit 2 includes a controller 1 that controls the entire pet robot 1.
0, a battery 11 for supplying electric power to each part of the pet robot 1, a battery sensor 12, and a heat sensor 13.
An internal sensor 14 consisting of is stored. The controller 10 includes a CPU (Central Processing Unit) 10
A, a memory 10B in which a program for the CPU 10A to control each unit is stored, and a clock 10C that performs a time counting operation and measures the current date and time, elapsed time after startup, and the like.

【００１８】また、CPU１０Ａには、インタネットに代
表されるネットワークを介してデータを通信する通信部
６３、プログラムなどの各種データを格納する半導体メ
モリなどよりなる記憶部６２が接続されている。さら
に、リムーバブルメモリ６１などの記録媒体に対してデ
ータを読み書きするドライブ６０が必要に応じて接続さ
れる。The CPU 10A is also connected to a communication unit 63 for communicating data via a network typified by the Internet, and a storage unit 62 including a semiconductor memory for storing various data such as programs. Further, a drive 60 for reading / writing data from / to a recording medium such as the removable memory 61 is connected as necessary.

【００１９】このペットロボット１に本発明を適用した
ロボット装置としての動作を実行させるロボット制御プ
ログラムは、リムーバブルメモリ６１に格納された状態
でペットロボット１に供給され、ドライブ６０によって
読み出されて、記憶部６２に内蔵されるハードディスク
ドライブにインストールされる。記憶部６２にインスト
ールされたロボット制御プログラムは、ユーザから入力
されるコマンドに対応するCPU１０Ａの指令によって、
記憶部６２からメモリ１０Ｂにロードされて実行され
る。A robot control program for causing the pet robot 1 to perform an operation as a robot apparatus to which the present invention is applied is supplied to the pet robot 1 in a state of being stored in the removable memory 61, read by the drive 60, The hard disk drive installed in the storage unit 62 is installed. The robot control program installed in the storage unit 62 is instructed by the CPU 10A corresponding to the command input by the user.
It is loaded from the storage unit 62 into the memory 10B and executed.

【００２０】頭部ユニット４には、外部からの刺激を感
知するセンサとして、音を感知する「耳のような聴覚器
官」に相当するマイクロフォン１５、CCD(Charge Coupl
ed Device)、CMOS(Complementary Metal Oxide Semicon
ductor)、およびイメージセンサなどから構成され、外
部の画像信号を取得する「目のような視覚器官」に相当
するビデオカメラ１６、およびユーザが接触することに
よる圧力等を感知する「肌等のような触覚器官」に相当
するタッチセンサ１７が、それぞれ所定の位置に設けら
れている。また、頭部ユニット４には、対象物までの距
離を測定する位置検出センサ１８、および所定の音階を
出力するペットロボット１の「口のような発声器官」に
相当するスピーカ１９が、それぞれ所定の位置に設置さ
れている。The head unit 4 has a microphone 15 and a CCD (Charge Coupl) corresponding to an "auditory organ like an ear" that senses sound as a sensor that senses an external stimulus.
ed Device), CMOS (Complementary Metal Oxide Semicon)
ductor), an image sensor, and the like, and a video camera 16 corresponding to a “visual organ like an eye” that acquires an external image signal, and a “skin like a skin” that senses pressure or the like caused by contact with a user. Touch sensors 17 corresponding to "tactile organs" are provided at predetermined positions. Further, the head unit 4 is provided with a position detection sensor 18 that measures a distance to an object and a speaker 19 that outputs a predetermined scale and that corresponds to a “mouth-like vocal organ” of the pet robot 1. It is installed in the position.

【００２１】脚部ユニット３Ａ乃至３Ｄのそれぞれの関
節部分、脚部ユニット３Ａ乃至３Ｄのそれぞれと胴体部
ユニット２の連結部分、頭部ユニット４と胴体部ユニッ
ト２の連結部分、並びに尻尾部ユニット５と胴体部ユニ
ット２の連結部分などには、アクチュエータが設置され
ている。アクチュエータは、コントローラ１０からの指
示に基づいて各部を動作させる。The respective joint portions of the leg units 3A to 3D, the connecting portions of the leg units 3A to 3D and the body unit 2, the connecting portions of the head unit 4 and the body unit 2, and the tail unit 5 An actuator is installed at a connecting portion between the body unit 2 and the body unit 2. The actuator operates each part based on an instruction from the controller 10.

【００２２】図２の例においては、脚部ユニット３Ａに
は、アクチュエータ３ＡＡ₁乃至３ＡＡ_Kが設けられ、脚
部ユニット３Ｂには、アクチュエータ３ＢＡ₁乃至３Ｂ
Ａ_Kが設けられている。また、脚部ユニット３Ｃには、
アクチュエータ３ＣＡ₁乃至３ＣＡ_Kが設けられ、脚部ユ
ニット３Ｄには、アクチュエータ３ＤＡ₁乃至３ＤＡ _Kが
設けられている。さらに、頭部ユニット４には、アクチ
ュエータ４Ａ₁乃至４Ａ_Lが設けられており、尻尾部ユニ
ット５には、アクチュエータ５Ａ₁および５Ａ ₂がそれぞ
れ設けられている。In the example of FIG. 2, the leg unit 3A is
Is the actuator 3AA₁Through 3AA_KProvided with legs
The unit 3B includes an actuator 3BA₁Through 3B
A_KIs provided. Also, in the leg unit 3C,
Actuator 3CA₁To 3 CA_KIs installed on the leg
Actuator 3DA for knit 3D₁Through 3DA _KBut
It is provided. Furthermore, the head unit 4 is
Player 4A₁Through 4A_LIs provided and the tail uni
The actuator 5A₁And 5A ₂Is that
It is provided.

【００２３】以下、脚部ユニット３Ａ乃至３Ｄに設けら
れているアクチュエータ３ＡＡ₁乃至３ＤＡ_K、頭部ユニ
ット４に設けられているアクチュエータ４Ａ₁乃至４
Ａ_L，および尻尾部ユニットに設けられているアクチュ
エータ５Ａ₁および５Ａ₂のそれぞれを個々に区別する必
要がない場合、適宜、まとめて、アクチュエータ３ＡＡ
₁乃至５Ａ₂と称する。Below, the leg units 3A to 3D are provided.
Actuator 3AA₁Through 3DA_K, Head uni
Actuator 4A provided on the unit 4₁Through 4
A_L, And the actuator provided in the tail unit
Eta 5A₁And 5A₂It is necessary to distinguish each of
If there is no need, actuator 3AA
₁Through 5A₂Called.

【００２４】さらに、脚部ユニット３Ａ乃至３Ｄには、
アクチュエータの他にスイッチ３ＡＢ乃至３ＤＢが、ペ
ットロボット１の足の裏に相当する場所に設置されてい
る。そして、ペットロボット１が歩行したとき、スイッ
チ３ＡＢ乃至３ＤＢが押下され、それを表す信号がコン
トローラ１０に入力されるようになされている。Further, the leg units 3A to 3D include
In addition to the actuators, the switches 3AB to 3DB are installed at places corresponding to the soles of the feet of the pet robot 1. Then, when the pet robot 1 walks, the switches 3AB to 3DB are pressed, and a signal indicating that is input to the controller 10.

【００２５】頭部ユニット４に設置されるマイクロフォ
ン１５は、ユーザの発話を含む周囲の音声（音）を集音
し、得られた音声信号をコントローラ１０に出力する。
ビデオカメラ１６は、周囲の状況を撮像し、得られた画
像信号を、コントローラ１０に出力する。タッチセンサ
１７は、例えば、頭部ユニット４の上部に設けられてお
り、ユーザからの「撫でる」や「叩く」といった物理的
な働きかけにより受けた圧力を検出し、その検出結果を
圧力検出信号としてコントローラ１０に出力する。位置
検出センサ１８は、例えば、赤外線を出射し、その反射
光を受光したタイミングにおいての検出結果をコントロ
ーラ１０に出力する。The microphone 15 installed in the head unit 4 collects a surrounding voice (sound) including the user's utterance and outputs the obtained voice signal to the controller 10.
The video camera 16 captures an image of the surrounding environment and outputs the obtained image signal to the controller 10. The touch sensor 17 is provided, for example, in the upper part of the head unit 4, detects the pressure received by a physical action such as “stroking” or “striking” from the user, and uses the detection result as a pressure detection signal. Output to the controller 10. The position detection sensor 18 outputs, for example, infrared rays and outputs the detection result at the timing when the reflected light is received to the controller 10.

【００２６】コントローラ１０は、マイクロフォン１
５、ビデオカメラ１６、タッチセンサ１７、および位置
検出センサ１８から与えられる音声信号、画像信号、圧
力検出信号等に基づいて、周囲の状況や、ユーザからの
指令、ユーザからの働きかけなどの有無を判断し、その
判断結果に基づいて、ペットロボット１が次に実行する
動作を決定する。そして、コントローラ１０は、その決
定に基づいて、必要なアクチュエータを駆動させ、これ
により、頭部ユニット４を上下左右に振らせたり、尻尾
部ユニット５を動かせたり、脚部ユニット３Ａ乃至３Ｄ
のそれぞれを駆動して、ペットロボット１を歩行させる
などの動作を実行させる。The controller 10 is the microphone 1
5, based on the audio signal, the image signal, the pressure detection signal, etc. provided from the video camera 16, the touch sensor 17, and the position detection sensor 18, the surrounding conditions, the instruction from the user, the presence or absence of an action from the user, etc. The pet robot 1 determines the operation to be performed next based on the determination result. Then, the controller 10 drives the necessary actuator based on the determination, thereby swinging the head unit 4 vertically and horizontally, moving the tail unit 5, and the leg units 3A to 3D.
To drive the pet robot 1 to walk or the like.

【００２７】その他にも、コントローラ１０は、ペット
ロボット１の頭部ユニット４などに設けられた、図示し
ないLED（Light Emitting Diode）を点灯、消灯または
点滅させるなどの処理を行う。In addition, the controller 10 performs processing such as turning on, off, or blinking an LED (Light Emitting Diode) (not shown) provided on the head unit 4 of the pet robot 1 or the like.

【００２８】図３は、図２のコントローラ１０の機能的
構成例を示すブロック図である。なお、図３に示す各機
能は、CPU１０Ａがメモリ１０Ｂに記憶されている制御
プログラムを実行することによって実現される。FIG. 3 is a block diagram showing a functional configuration example of the controller 10 shown in FIG. The functions shown in FIG. 3 are realized by the CPU 10A executing the control program stored in the memory 10B.

【００２９】コントローラ１０は、外部からの刺激を検
知するセンサ（マイクロフォン１５乃至位置検出センサ
１８、およびスイッチ３ＡＢ乃至３ＤＢ）からの各種信
号を検出するセンサ入力処理部３１と、センサ入力処理
部３１により検出された情報等に基づいて、ペットロボ
ット１を動作させる情報処理部３２から構成されてい
る。The controller 10 includes a sensor input processing section 31 for detecting various signals from sensors (microphone 15 to position detection sensor 18 and switches 3AB to 3DB) for detecting an external stimulus, and a sensor input processing section 31. The information processing unit 32 is configured to operate the pet robot 1 based on the detected information and the like.

【００３０】センサ入力処理部３１を構成する角度検出
部４１は、アクチュエータ３ＡＡ₁乃至５Ａ₂のそれぞれ
に設けられるモータが駆動されたとき、アクチュエータ
３ＡＡ₁乃至５Ａ₂のそれぞれから通知される情報に基づ
いて、その角度を検出する。角度検出部４１により検出
された角度情報は、情報処理部３２の行動管理部７２、
および音データ生成部７５に出力される。The angle detection unit 41 constituting the sensor input processing unit 31 is based on the information notified from each of the actuators 3AA _{1 to} 5A ₂ when the motor provided in each of the actuators 3AA _{1 to} 5A ₂ is driven. Then, the angle is detected. The angle information detected by the angle detection unit 41 is the action management unit 72 of the information processing unit 32.
And is output to the sound data generation unit 75.

【００３１】音量検出部４２は、マイクロフォン１５か
ら供給される信号に基づいて、その音量を検出し、検出
した音量情報を行動管理部７２、および音データ生成部
７５に出力する。The volume detecting section 42 detects the volume based on the signal supplied from the microphone 15, and outputs the detected volume information to the behavior management section 72 and the sound data generating section 75.

【００３２】音声認識部４３は、マイクロフォン１５か
ら供給される音声信号について音声認識を行う。音声認
識部４３は、その音声認識結果としての、例えば、「お
話しよう」、「歩け」、「伏せ」、「ボールを追いかけ
ろ」等の指令その他を、音声認識情報として、本能・感
情管理部７１、行動管理部７２、および音データ生成部
７５に通知する。The voice recognition unit 43 performs voice recognition on the voice signal supplied from the microphone 15. The voice recognition unit 43 uses, as voice recognition information, a command such as “let's talk”, “walk”, “prone”, “follow the ball” and the like as the voice recognition result as the voice recognition information. , The behavior management unit 72 and the sound data generation unit 75.

【００３３】画像認識部４４は、ビデオカメラ１６から
供給される画像信号を用いて、画像認識を行う。画像認
識部４４は、その処理の結果、例えば、「赤い丸いも
の」、「地面に対して垂直な、かつ、所定の高さ以上の
平面」、「広い開放的な場所」、「家族がいる」、「家
族の中の子供の友人がいる」等を検出したときには、
「ボールがある」、「壁がある」、「畑である」、「家
である」、「学校である」等の画像認識結果を、画像認
識情報として、本能・感情管理部７１、行動管理部７
２、および音データ生成部７５に通知する。The image recognition unit 44 performs image recognition using the image signal supplied from the video camera 16. As a result of the processing, the image recognition unit 44 has, for example, "a red round object", "a plane perpendicular to the ground and having a predetermined height or more", "a wide open place", and "a family is present". , ”“ I have a friend of a child in my family, ”etc.,
Instinct / emotion management unit 71, behavior management, using image recognition results such as "there is a ball", "there is a wall", "is a field", "is a house", and "is a school" as image recognition information. Part 7
2, and the sound data generation unit 75 is notified.

【００３４】圧力検出部４５は、タッチセンサ１７から
与えられる圧力検出信号を処理する。例えば、圧力検出
部４５は、その処理の結果、所定の閾値以上で、かつ、
短時間の圧力を検出したときには、「叩かれた（しから
れた）」と認識し、所定の閾値未満で、かつ、長時間の
圧力を検出したときには、「なでられた（ほめられ
た）」と認識して、その認識結果を、状態認識情報とし
て、本能・感情管理部７１、行動管理部７２、および音
データ生成部７５に通知する。The pressure detector 45 processes the pressure detection signal supplied from the touch sensor 17. For example, as a result of the processing, the pressure detection unit 45 is equal to or higher than a predetermined threshold value, and
When a short-time pressure is detected, it is recognized as "struck (exhausted)", and when a long-term pressure that is less than a predetermined threshold value is detected, "stroked (praised)" is detected. , ”And notifies the recognition result to the instinct / emotion management section 71, the behavior management section 72, and the sound data generation section 75 as state recognition information.

【００３５】位置検出部４６は、位置検出センサ１８か
ら供給される信号に基づいて、所定の対象物までの距離
を測定し、その距離情報を行動管理部７２、および音デ
ータ生成部７５に通知する。例えば、位置検出部４６
は、目の前にユーザの手などが差し出されたとき、その
手までの距離や、画像認識部４４により認識されたボー
ルまでの距離を検出する。The position detecting section 46 measures the distance to a predetermined object on the basis of the signal supplied from the position detecting sensor 18 and notifies the action managing section 72 and the sound data generating section 75 of the distance information. To do. For example, the position detector 46
Detects the distance to the hand and the distance to the ball recognized by the image recognition unit 44 when the user's hand or the like is extended in front of the eyes.

【００３６】スイッチ入力検出部４７は、ペットロボッ
ト１の足の裏に相当する部分に設けられているスイッチ
３ＡＢ乃至３ＤＢから供給される信号に基づき、例え
ば、ペットロボット１が歩行している状態において、そ
の歩行タイミングや、ユーザにより足の裏が触れられた
ことを本能・感情管理部７１、および行動管理部７２に
通知する。The switch input detection unit 47 is based on the signals supplied from the switches 3AB to 3DB provided in the part corresponding to the sole of the foot of the pet robot 1, for example, when the pet robot 1 is walking. The instinct / emotion management unit 71 and the behavior management unit 72 are notified of the walking timing and the fact that the sole of the foot is touched by the user.

【００３７】一方、情報処理部３２を構成する本能・感
情管理部７１は、ペットロボット１の本能、および感情
を管理し、所定のタイミングで、ペットロボット１の本
能を表すパラメータや、感情を表すパラメータを行動管
理部７２、および音データ生成部７５に出力する。On the other hand, the instinct / emotion management unit 71 which constitutes the information processing unit 32 manages the instinct and emotion of the pet robot 1, and expresses the parameter and the emotion indicating the instinct of the pet robot 1 at predetermined timing. The parameters are output to the behavior management unit 72 and the sound data generation unit 75.

【００３８】ペットロボット１の本能を表すパラメータ
と感情を表すパラメータについて、図４を参照して説明
する。図４に示すように、本能・感情管理部７１は、ペ
ットロボット１の感情を表現する感情モデル１０１と、
本能を表現する本能モデル１０２を記憶し、管理してい
る。The parameters representing the instinct and the emotions of the pet robot 1 will be described with reference to FIG. As shown in FIG. 4, the instinct / emotion management unit 71 includes an emotion model 101 that expresses the emotion of the pet robot 1,
The instinct model 102 expressing the instinct is stored and managed.

【００３９】感情モデル１０１は、例えば、「うれし
さ」、「悲しさ」、「怒り」、「楽しさ」、「恐れ」、
「嫌悪」等の感情の状態（度合い）を、所定の範囲（例
えば、０乃至１００等）の感情パラメータによってそれ
ぞれ表し、センサ入力処理部３１の音声認識部４３、画
像認識部４４、および圧力検出部４５からの出力や時間
経過等に基づいて、その値を変化させる。The emotion model 101 includes, for example, "joy", "sadness", "anger", "fun", "fear",
The emotional state (degree) such as “dislike” is represented by emotion parameters in a predetermined range (for example, 0 to 100), and the voice recognition unit 43, the image recognition unit 44, and the pressure detection unit of the sensor input processing unit 31. The value is changed based on the output from the unit 45, the passage of time, and the like.

【００４０】この例において、感情モデル１０１は、
「うれしさ」を表す感情ユニット１０１Ａ、「悲しさ」
を表す感情ユニット１０１Ｂ、「怒り」を表す感情ユニ
ット１０１Ｃ、「楽しさ」を表す感情ユニット１０１
Ｄ、「恐れ」を表す感情ユニット１０１Ｅ、および「嫌
悪」を表す感情ユニット１０１Ｆから構成されている。In this example, the emotion model 101 is
Emotional unit 101A expressing "joy", "sadness"
Emotional unit 101B that represents, Emotional unit 101C that represents "anger", Emotional unit 101 that represents "Enjoyment"
D, an emotion unit 101E indicating “fear”, and an emotion unit 101F indicating “dislike”.

【００４１】本能モデル１０２は、例えば、「運動
欲」、「愛情欲」、「食欲」、「好奇心」、「睡眠欲」
等の本能による欲求の状態（度合い）を、所定の範囲
（例えば、０乃至１００等）の本能のパラメータによっ
てそれぞれ表し、音声認識部４３、画像認識部４４、お
よび圧力検出部４５等からの出力や時間経過等に基づい
て、その値を変化させる。また、本能モデル１０２は、
行動履歴に基づいて、「運動欲」を表すパラメータを高
めたり、バッテリ１１の残量（電圧）に基づいて、
「食欲」を表すパラメータを高めたりする。The instinct model 102 is, for example, “motility desire”, “love desire”, “appetite”, “curiosity”, “sleep desire”.
The degree (degree) of desire by the instincts is represented by instinct parameters in a predetermined range (for example, 0 to 100), and output from the voice recognition unit 43, the image recognition unit 44, the pressure detection unit 45, and the like. The value is changed based on the elapsed time or the like. Also, the instinct model 102 is
Based on the action history, the parameter expressing "motivation for exercise" is increased, or based on the remaining amount (voltage) of the battery 11,
Increase the parameter that represents "appetite."

【００４２】この例において、本能モデル１０２は、
「運動欲」を表す本能ユニット１０２Ａ、「愛情欲」を
表す本能ユニット１０２Ｂ、「食欲」を表す本能ユニッ
ト１０２Ｃ、「好奇心」を表す本能ユニット１０２Ｄ、
および「睡眠欲」を表すユニット１０２Ｅから構成され
ている。In this example, the instinct model 102 is
An instinct unit 102A representing "motility", an instinct unit 102B representing "love", an instinct unit 102C representing "appetite", an instinct unit 102D representing "curiosity",
And a unit 102E representing "sleep desire".

【００４３】このような感情ユニット１０１Ａ乃至１０
１Ｆと本能ユニット１０２Ａ乃至１０２Ｅのパラメータ
は、外部からの入力だけでなく、図の矢印で示されるよ
うに、それぞれのユニット同士が相互に影響し合うこと
によっても変化される。Such emotion units 101A to 10A
The parameters of 1F and the instinct units 102A to 102E are changed not only by input from the outside but also by mutual influence of each unit as shown by an arrow in the figure.

【００４４】例えば、「うれしさ」を表現する感情ユニ
ット１０１Ａと「悲しさ」を表現する感情ユニット１０
１Ｂが相互抑制的に結合することにより、本能・感情管
理部７１は、ユーザにほめてもらったときには「うれし
さ」を表現する感情ユニット１０１Ａのパラメータを大
きくするとともに、「悲しさ」を表現する感情ユニット
１０１Ｂのパラメータを小さくするなどして、表現する
感情を変化させる。For example, the emotion unit 101A expressing "joy" and the emotion unit 10 expressing "sadness"
The instinctive / emotional management unit 71 increases the parameter of the emotion unit 101A expressing “joy” when the user compliments, and expresses “sadness” by the mutual inhibition of 1B. The emotion to be expressed is changed by reducing the parameter of the emotion unit 101B.

【００４５】また、感情モデル１０１を構成する各ユニ
ット同士、および本能モデル１０２を構成する各ユニッ
ト同士だけでなく、双方のモデルを超えて、それぞれの
ユニットのパラメータが変化される。Further, the parameters of the respective units forming the emotion model 101 and the units forming the instinct model 102 as well as the units forming the instinct model 102 are changed over both models.

【００４６】例えば、図４に示されるように、本能モデ
ル１０２の「愛情欲」を表す本能ユニット１０２Ｂや、
「食欲」を表す本能ユニット１０２Ｃのパラメータの変
化に応じて、感情モデル１０１の「悲しさ」を表現する
感情ユニット１０１Ｂや「怒り」を表現する感情ユニッ
ト１０１Ｃのパラメータが変化される。For example, as shown in FIG. 4, the instinct unit 102B representing the "love" of the instinct model 102,
The parameters of the emotion unit 101B expressing “sadness” and the emotion unit 101C expressing “anger” of the emotion model 101 are changed according to the change of the parameter of the instinct unit 102C indicating “appetite”.

【００４７】具体的には、「愛情欲」を表す本能ユニッ
ト１０２Ｂのパラメータ、または「食欲」を表す本能ユ
ニット１０２Ｃのパラメータが大きくなったとき、感情
モデル１０１の「悲しさ」を表現する感情ユニット１０
１Ｂのパラメータと「怒り」を表現する感情ユニット１
０１Ｃのパラメータが小さくなる。Specifically, when the parameter of the instinct unit 102B representing "love" or the parameter of the instinct unit 102C representing "appetite" becomes large, the emotion unit expressing "sadness" of the emotion model 101. 10
Emotional unit 1 expressing 1B parameters and "anger"
The parameter of 01C becomes small.

【００４８】このようにして、本能・感情管理部７１に
より、管理される感情のパラメータ、または本能のパラ
メータは、所定の周期で計測され、行動管理部７２、お
よび音データ生成部７５に出力される。In this way, the emotional parameters or instinct parameters managed by the instinct / emotion management unit 71 are measured in a predetermined cycle and output to the action management unit 72 and the sound data generation unit 75. It

【００４９】なお、本能・感情管理部７１には、音声認
識部４３、画像認識部４４、および圧力検出部４５等か
ら認識情報が供給される他に、行動管理部７２から、ペ
ットロボット１の現在、または過去の行動、具体的に
は、例えば、「長時間歩いた」などの行動の内容を示す
行動情報が供給されるようになされている。そして、本
能・感情管理部７１は、同一の認識情報等が与えられた
場合であっても、行動情報により示されるペットロボッ
ト１の行動に応じて、異なる内部情報を生成する。The instinct / emotion management unit 71 is supplied with recognition information from the voice recognition unit 43, the image recognition unit 44, the pressure detection unit 45, etc. Current or past behavior, specifically, behavior information indicating the content of the behavior such as "walking for a long time" is supplied. Then, the instinct / emotion management unit 71 generates different internal information according to the action of the pet robot 1 indicated by the action information even when the same recognition information or the like is given.

【００５０】例えば、ペットロボット１がユーザに挨拶
をし、ユーザに頭をなでられた場合には、ユーザに挨拶
をしたという行動情報と、頭をなでられたという認識情
報が本能・感情管理部７１に供給される。このとき、本
能・感情管理部７１においては、「うれしさ」を表す感
情ユニット１０１Ａの値が増加される。For example, when the pet robot 1 greets the user and the user strokes his / her head, the behavior information that the user is greeted and the recognition information that the head is stroked are instinct / emotion. It is supplied to the management unit 71. At this time, in the instinct / emotion management unit 71, the value of the emotion unit 101A representing “joy” is increased.

【００５１】図３の説明に戻り、行動管理部７２は、音
声認識部４３、および画像認識部４４等から供給されて
きた情報と、本能・感情管理部７１から供給されてきた
パラメータ、および時間経過等に基づいて次の行動を決
定し、決定した行動の実行を指示するコマンドを姿勢遷
移管理部７３に出力する。姿勢遷移管理部７３は、行動
管理部７２から指示された行動に基づいて、姿勢の遷移
を決定し、制御部７４に出力する。制御部７４は、姿勢
遷移管理部７３からの出力に基づき、アクチュエータ３
ＡＡ１乃至５Ａ２を制御して、行動管理部７２が決定し
た動作を行う。Returning to the description of FIG. 3, the action management unit 72 has the information supplied from the voice recognition unit 43, the image recognition unit 44, etc., the parameters supplied from the instinct / emotion management unit 71, and the time. The next action is determined based on the progress and the like, and a command instructing execution of the determined action is output to the posture transition management unit 73. The posture transition management unit 73 determines the posture transition based on the behavior instructed by the behavior management unit 72, and outputs the posture transition to the control unit 74. The control unit 74, based on the output from the posture transition management unit 73, the actuator 3
The operation determined by the behavior management unit 72 is performed by controlling AA1 to 5A2.

【００５２】また、音データ生成部７５は、音声認識部
４３、および画像認識部４４等から供給されてきた情報
と、本能・感情管理部７１から供給されてきたパラメー
タ、および時間経過等に基づいて音データを生成する。
そして、行動管理部７２は、ペットロボット１に発話を
させるとき、あるいは所定の動作に対応する音をスピー
カ１９から出力させるとき、音の出力を指示するコマン
ドを音声合成部７６に出力し、音声合成部７６は、音デ
ータ生成部７５から出力された音データに基づいて、ス
ピーカ１９に音を出力させる。The sound data generating section 75 is based on the information supplied from the voice recognizing section 43, the image recognizing section 44, etc., the parameters supplied from the instinct / emotion managing section 71, and the passage of time. To generate sound data.
Then, the behavior management unit 72 outputs a command instructing the sound output to the voice synthesis unit 76 when the pet robot 1 speaks or when a sound corresponding to a predetermined motion is output from the speaker 19. The synthesis unit 76 causes the speaker 19 to output a sound based on the sound data output from the sound data generation unit 75.

【００５３】図５は、本発明を適用した強化学習システ
ムの原理的構成を示している。ステップＳ１において、
行動管理部７２は、行動ａをペットロボット１に実行さ
せる。ステップＳ２において、環境・ユーザ１１１は、
その行動ａに対して、ペットロボット１に報酬ｒを与え
る。ここで、報酬ｒとは、行動ａが正しかったときに環
境・ユーザ１１１により行われる、例えば、「頭をなで
る」行為や、行動ａが誤っていた（ユーザが期待してい
ない行動であった）ときに環境・ユーザ１１１により行
われる、例えば、「頭をたたく」行為である。FIG. 5 shows the principle structure of a reinforcement learning system to which the present invention is applied. In step S1,
The behavior management unit 72 causes the pet robot 1 to execute the behavior a. In step S2, the environment / user 111
The pet robot 1 is given a reward r for the action a. Here, the reward r is, for example, an action of “stroking the head” performed by the environment / user 111 when the action a is correct, or the action a is erroneous (the action was not expected by the user. ) Occasionally, the environment / user 111 performs, for example, an action of “striking the head”.

【００５４】行動管理部７２は、獲得した報酬ｒに基づ
き、以下のような式（１）に従って、この行動ａに対す
る行動価値Q(a)を、新しい（報酬ｒに基づく）行動価値
Q₁(a)に更新する。The action management unit 72 determines the action value Q (a) for this action a based on the acquired reward r according to the following equation (1) and sets a new action value Q (a based on the reward r).
Update to Q ₁ (a).

【数１】 [Equation 1]

【００５５】なお、式（１）において、報酬ｒは、「頭
をなでられる」、「叩かれる」、「与えられたタスクを
達成する」、「与えられたタスクを失敗する」等のセン
サ入力処理部３１からの検出信号に応じて決定される値
であり、αは、０と１の間の値の係数であって、獲得し
た報酬をどの程度行動価値に反映させるかを決定するパ
ラメータ（学習率）である。In the equation (1), the reward r is a sensor such as "struck head", "struck", "achieve a given task", "fail a given task", etc. This is a value determined according to the detection signal from the input processing unit 31, and α is a coefficient of a value between 0 and 1, and is a parameter that determines how much the acquired reward is reflected in the action value. (Learning rate).

【００５６】式（１）より明らかなように、学習率αが
大きいほど、報酬ｒが反映され（学習能力が高く）、学
習率αが小さいほど、報酬ｒが反映されない（学習能力
が低い）。As is clear from the equation (1), the reward r is reflected as the learning rate α is larger (the learning ability is higher), and the reward r is not reflected as the learning rate α is smaller (the learning ability is lower). .

【００５７】このようにして求められた行動価値Q(a)
は、行動ａを実行して、報酬ｒを獲得するごとに更新さ
れる。したがって、新しく獲得した報酬ｒ₂によって、
更新された行動価値Q₂(a)は、前の行動価値Q₁(a)とその
前の行動価値Q₀(a)を使って表すと、The action value Q (a) thus obtained
Is updated each time the action a is executed and the reward r is obtained. Therefore, with the newly earned reward r ₂ ,
The updated action value Q ₂ (a) is expressed by using the previous action value Q ₁ (a) and the previous action value Q ₀ (a),

【数２】となり、学習率αは０と１の間の値なので、新しく獲得
した報酬ｒ₂の係数αは、前の報酬ｒ₁の係数（１−α）
αよりも必ず大きくなる。したがって、行動価値Q₂(a)
は、過去の報酬ｒ₁よりも新しく受け取った報酬ｒ₂のほ
うに、より重み付けされる。即ち、式（２）により、行
動価値Q(a)は、遠い過去の報酬よりも、最近受け取った
報酬が反映される。[Equation 2] Since the learning rate α is a value between 0 and 1, the coefficient α of the newly acquired reward r ₂ is the coefficient (1-α) of the previous reward r _1.
It will always be larger than α. Therefore, action value Q ₂ (a)
Is weighted more heavily on the newly received reward r ₂ than on the past reward r ₁ . That is, according to the expression (2), the action value Q (a) reflects the recently received reward rather than the reward in the distant past.

【００５８】本発明においては、行動学習能力が動的に
変更される。このため、行動管理部７２は、本能・感情
管理部７１からの感情のパラメータの出力に応じて、学
習率αを適宜変化させる。この場合の処理を図６を参照
して説明する。In the present invention, the behavior learning ability is dynamically changed. Therefore, the behavior management unit 72 appropriately changes the learning rate α according to the output of the emotion parameter from the instinct / emotion management unit 71. The processing in this case will be described with reference to FIG.

【００５９】ステップＳ１１において、行動管理部７２
は、本能・感情管理部７１から、感情のパラメータを読
み出す。ステップＳ１２において、行動管理部７２は、
読み出した感情のパラメータに基づいて、メモリ１０Ｂ
に記憶されている式またはテーブル（例えば、図７の例
の場合、「楽しさ」と学習率αとの関係を表す式または
テーブル）に基づき、学習率αを演算する。In step S11, the behavior management unit 72
Reads out emotion parameters from the instinct / emotion management unit 71. In step S12, the behavior management unit 72
Based on the read emotional parameters, the memory 10B
The learning rate α is calculated based on the equation or table stored in (for example, in the case of the example in FIG. 7, the equation or table representing the relationship between “fun” and the learning rate α).

【００６０】「楽しさ」を表すパラメータに基づいて演
算された学習率αの例を図７に示す。図７に示されるよ
うに、「楽しさ」を表すパラメータの値がほぼ「０」か
らｍ１まで（「非常につまらない」領域）は、学習率α
の値は徐々に大きくなっていき、学習能力が向上する。
そして、「楽しさ」を表すパラメータの値がｍ１からｍ
２まで（「楽しい」領域）は、学習率αはほぼ一定で、
学習能力は変化しない。さらに、「楽しさ」を表すパラ
メータの値がｍ２より大きくなったとき（「非常に楽し
い」領域）は、再び学習率αの値が徐々に大きくなり、
学習能力は向上する。即ち、ペットロボット１は、人間
と同様に、学習能力が向上すると、その行動をすること
が楽しくなり、その楽しさがさらに増してくる（非常に
楽しくなる）と、学習能力はさらに向上する。FIG. 7 shows an example of the learning rate α calculated on the basis of the parameter representing "fun". As shown in FIG. 7, when the value of the parameter indicating "fun" is from "0" to m1 ("very boring" area), the learning rate α is
The value of increases gradually and the learning ability improves.
Then, the value of the parameter indicating "fun" is from m1 to m
Up to 2 (“fun” area), the learning rate α is almost constant,
Learning ability does not change. Furthermore, when the value of the parameter indicating "fun" becomes larger than m2 ("very fun" area), the value of the learning rate α gradually increases again,
Learning ability improves. That is, as with the human, if the learning ability of the pet robot 1 is improved, it becomes more enjoyable to take the action, and when the enjoyment is further increased (becomes very enjoyable), the learning ability is further improved.

【００６１】また、本能・感情管理部７１からの本能の
パラメータの出力に応じて、学習率αを変化させるよう
にしてもよい。この場合の処理を図８を参照して説明す
る。Further, the learning rate α may be changed according to the output of the instinct parameter from the instinct / emotion management section 71. The processing in this case will be described with reference to FIG.

【００６２】ステップＳ２１において、行動管理部７２
は、本能・感情管理部７１から、本能のパラメータを読
み出す。ステップＳ２２において、行動管理部７２は、
読み出した本能のパラメータに基づいて、メモリ１０Ｂ
に記憶されている式またはテーブル（例えば、図９の例
の場合、「好奇心」と学習率αとの関係を表す式または
テーブル）に基づき、学習率αを演算する。In step S21, the behavior management unit 72
Reads out the instinct parameter from the instinct / emotion management unit 71. In step S22, the behavior management unit 72
Based on the read instinct parameters, the memory 10B
The learning rate α is calculated based on the equation or table (for example, in the case of the example in FIG. 9, an equation or table representing the relationship between “curiosity” and the learning rate α) stored in the above.

【００６３】「好奇心」を表すパラメータに基づいて演
算された学習率αの例を図９に示す。図９の例の場合、
「好奇心」を表すパラメータの値に比例して、学習率α
の値は大きくなり、学習能力が向上する。即ち、ペット
ロボット１は、人間と同様に、好奇心が多いほど、少な
い学習回数でその経験をより強く反映することができ
る。FIG. 9 shows an example of the learning rate α calculated based on the parameter indicating "curiosity". In the example of FIG. 9,
The learning rate α is proportional to the value of the parameter expressing “curiosity”.
The value of becomes large and the learning ability improves. That is, the pet robot 1 can reflect the experience more strongly with less number of times of learning as the human being has more curiosity, like the human.

【００６４】さらに、行動管理部７２は、画像認識部４
４からの場所に関する情報に応じて、学習率αを変化さ
せることができる。この場合の処理を図１０を参照して
説明する。Further, the behavior management unit 72 is composed of the image recognition unit 4
The learning rate α can be changed in accordance with the information on the location from 4 onward. The processing in this case will be described with reference to FIG.

【００６５】ステップＳ３１において、行動管理部７２
は、画像認識部４４の出力から、ペットロボット１が位
置する場所を検出する。ステップＳ３２において、行動
管理部７２は、ペットロボット１が位置する場所が学校
であるか否かを判定する。ステップＳ３２において、ペ
ットロボット１の位置する場所が学校であると判定され
た場合、行動管理部７２は、処理をステップＳ３３に進
め、学校における学習率αをメモリ１０Ｂから読み出
す。In step S31, the behavior management unit 72
Detects the place where the pet robot 1 is located from the output of the image recognition unit 44. In step S32, the behavior management unit 72 determines whether or not the place where the pet robot 1 is located is a school. When it is determined in step S32 that the place where the pet robot 1 is located is a school, the behavior management unit 72 advances the process to step S33, and reads the learning rate α at the school from the memory 10B.

【００６６】ステップＳ３２において、ペットロボット
１が位置する場所が学校ではないと判定された場合、行
動管理部７２は、処理をステップＳ３４に進め、ペット
ロボット１が位置する場所が家であるか否かを判定す
る。ステップＳ３４において、ペットロボット１が位置
する場所が家であると判定された場合、行動管理部７２
は、処理をステップＳ３５に進め、家における学習率α
をメモリ１０Ｂから読み出す。When it is determined in step S32 that the place where the pet robot 1 is located is not a school, the behavior management section 72 advances the processing to step S34, and it is determined whether the place where the pet robot 1 is located is a house. To determine. When it is determined in step S34 that the place where the pet robot 1 is located is a house, the behavior management unit 72
Advances the processing to step S35, where the learning rate α at home
Is read from the memory 10B.

【００６７】ステップＳ３４において、ペットロボット
１が位置する場所が家ではないと判定された場合、行動
管理部７２は、処理をステップＳ３６に進め、ペットロ
ボット１が位置する場所が畑であるか否かを判定する。
ステップＳ３６において、ペットロボット１が位置する
場所が畑であると判定された場合、行動管理部７２は、
処理をステップＳ３７に進め、畑における学習率αをメ
モリ１０Ｂから読み出す。When it is determined in step S34 that the place where the pet robot 1 is located is not a house, the behavior management unit 72 advances the processing to step S36, and the place where the pet robot 1 is located is a field. To determine.
When it is determined in step S36 that the place where the pet robot 1 is located is a field, the behavior management unit 72
The process proceeds to step S37, and the learning rate α in the field is read from the memory 10B.

【００６８】ステップＳ３３，Ｓ３５，またはＳ３７の
処理の後、行動管理部７２は、ステップＳ３８におい
て、読み出した学習率αを、新しい学習率αとして設定
する。そして、処理は終了する。After the processing of step S33, S35, or S37, the behavior management unit 72 sets the read learning rate α as a new learning rate α in step S38. Then, the process ends.

【００６９】ステップＳ３６において、ペットロボット
１が位置する場所が畑ではないと判定された場合、行動
管理部７２は、処理をステップＳ３９に進め、エラー処
理を行ない、処理を終了する。If it is determined in step S36 that the place where the pet robot 1 is located is not in the field, the action management section 72 advances the processing to step S39, performs error processing, and ends the processing.

【００７０】場所による学習率αの値を示す例を、図１
１に示す。ペットロボット１が学校に位置するとき、学
習率αの値は一番大きく、学習能力は高い。即ち、ペッ
トロボット１は、人間と同様に、学校にいるときは、少
ない学習回数でその経験をより強く反映することができ
る。An example showing the value of the learning rate α depending on the place is shown in FIG.
Shown in 1. When the pet robot 1 is located at the school, the learning rate α has the largest value and the learning ability is high. That is, the pet robot 1 can more strongly reflect the experience with less learning when in school, like a human.

【００７１】ペットロボット１が家に位置するときは、
学習率αの値は平均的な値とされ、ペットロボット１
は、平均的な学習能力をもつ。そして、ペットロボット
１が畑に位置するとき、学習率αの値は、一番小さくな
り、学習能力は低くなる。即ち、ペットロボット１は、
人間と同様に、畑のような開放的な場所にいるときは、
学習能力が低下する。When the pet robot 1 is located at home,
The learning rate α is an average value, and the pet robot 1
Has an average learning ability. Then, when the pet robot 1 is located in the field, the value of the learning rate α becomes the smallest and the learning ability becomes low. That is, the pet robot 1
Like humans, when you're in an open field like a field,
Learning ability is reduced.

【００７２】また、時刻に応じて、学習率αを変化させ
ることもできる。この場合の処理を、図１２を参照して
説明する。Further, the learning rate α can be changed according to the time. The processing in this case will be described with reference to FIG.

【００７３】ステップＳ５１において、行動管理部７２
は、時計１０Ｃから、現在の時刻を読み出す。ステップ
Ｓ５２において、行動管理部７２は、その読み出された
時刻に基づいて、メモリ１０Ｂに記憶されている式また
はテーブル（例えば、図１３の例の場合、「一日の流れ
の中の時刻」と学習率αとの関係を表す式またはテーブ
ル）に基づき、学習率αを演算する。In step S51, the behavior management unit 72
Reads the current time from the clock 10C. In step S52, the behavior management unit 72, based on the read time, the formula or table stored in the memory 10B (for example, in the case of the example in FIG. 13, “time in the flow of the day”). The learning rate α is calculated based on an expression or a table indicating the relationship between the learning rate α and the learning rate α.

【００７４】行動管理部７２が、時刻に基づいて、学習
率αを変化させる例を図１３に示す。朝起きたばかりの
時刻ｔ１のとき、学習率αは、最も小さい値とされ、ペ
ットロボット１は、学習能力が低い。そして、時刻の経
過とともに、徐々に学習率αは上がっていき、朝食を食
べた後の時刻ｔ２で、学習率αは、最大となる。したが
って、朝食を食べた後の時間帯では、ペットロボット１
は、学習能力が向上し、少ない学習回数でその経験をよ
り強く反映することができる。FIG. 13 shows an example in which the behavior management unit 72 changes the learning rate α based on time. At time t1 when the person just got up in the morning, the learning rate α is set to the smallest value, and the pet robot 1 has a low learning ability. Then, the learning rate α gradually increases with the passage of time, and the learning rate α becomes maximum at time t2 after eating breakfast. Therefore, during the time period after eating breakfast, the pet robot 1
Has improved learning ability and can reflect its experience more strongly with less learning.

【００７５】朝食を食べた後、眠くなる時刻ｔ３まで、
学習率αは徐々に下がっていく。したがって、時刻ｔ３
を中心とする眠くなる時間帯のとき、ペットロボット１
は、学習能力が低下する（ただし、時刻ｔ１を中心とす
る時間帯よりは学習能力が高い）。その後、夕方から夜
にかけての時刻ｔ４まで、学習率αは、徐々に上がって
いき、ペットロボット１は、また、学習能力が向上する
（時刻ｔ２の時間帯と時刻ｔ３の時間帯の中間のレベル
の学習能力をもつ）。After eating breakfast, until sleep time t3,
The learning rate α gradually decreases. Therefore, time t3
The pet robot 1
, The learning ability decreases (however, the learning ability is higher than in the time zone centered on time t1). After that, the learning rate α gradually increases until the time t4 from the evening to the night, and the pet robot 1 further improves the learning ability (a level intermediate between the time zone at the time t2 and the time zone at the time t3). With the learning ability).

【００７６】そして、寝る前の時刻ｔ５になるにつれ
て、学習率αは下がっていき、ペットロボット１は、徐
々に学習能力が低下する。そして、ペットロボット１は
就寝する。このように、ペットロボット１は、人間と同
様な行動を、１日の時間の流れの中で行なう。Then, at time t5 before going to bed, the learning rate α decreases, and the learning ability of the pet robot 1 gradually decreases. Then, the pet robot 1 goes to bed. In this way, the pet robot 1 performs the same action as a human being in the flow of time during the day.

【００７７】さらに、起動後の経過時刻（ペットロボッ
ト１の成長時刻）に応じて、学習率αを変化させてもよ
い。この場合の処理を、図１４を参照して説明する。Further, the learning rate α may be changed according to the elapsed time after the activation (the growing time of the pet robot 1). The processing in this case will be described with reference to FIG.

【００７８】ステップＳ６１において、行動管理部７２
は、時計１０Ｃから、ペットロボット１が生まれて（起
動されて）からの時刻（成長時刻）を読み出す。ステッ
プＳ４２において、行動管理部７２は、その読み出され
た成長時刻に基づいて、メモリ１０Ｂに記憶されている
式またはテーブル（例えば、図１５の例の場合、「成長
時刻」と学習率αとの関係を表す式またはテーブル）学
習率αを演算する。In step S61, the behavior management unit 72
Reads the time (growth time) since the pet robot 1 was born (started) from the clock 10C. In step S42, the behavior management unit 72, based on the read growth time, the formula or table stored in the memory 10B (for example, in the case of the example of FIG. 15, “growth time” and learning rate α). (Expression or table expressing the relationship of) Learning rate α is calculated.

【００７９】行動管理部７２は、成長時刻ｔに基づい
て、例えば、以下に示す式（３）のように学習率αを変
化させる。The action management section 72 changes the learning rate α based on the growth time t, for example, as in the following equation (3).

【数３】 [Equation 3]

【００８０】なお、式（３）において、τは基準時刻、
βは基準時刻付近での変化率を、それぞれ表す。また、
α_minとα_maxは、それぞれ最小と最大の学習率を表す。In the equation (3), τ is the reference time,
β represents the rate of change near the reference time. Also,
α _min and α _max represent the minimum and maximum learning rates, respectively.

【００８１】この式（３）を用いて計算された学習率α
の変化を示す例を図１５に示す。図１５に示されるよう
に、ペットロボット１が生まれたばかり（起動当初）
（ｔ＝０）のときの学習率αは、α_maxで表される最大
値とされ、少ない学習回数でその経験をより強く反映す
ることができる。その後、時刻（年齢）が経過すると、
徐々に学習率αは小さくなる。そして、学習率αは、最
終的にはα_minで表される最小値に収束し、ペットロボ
ット１は、学習能力が低下する。即ち、人間と同様に、
年をとるにつれて、学習能力が低下し、学んだことを身
につけることが困難になる。予め設定してある基準時刻
（基準年齢）τにおいて、学習率αは、最大値と最小値
の中間の値となる。Learning rate α calculated using this equation (3)
FIG. 15 shows an example showing the change of As shown in FIG. 15, the pet robot 1 has just been born (at the start-up).
The learning rate α at (t = 0) is the maximum value represented by α _max , and the experience can be more strongly reflected with a small number of learning times. After that, when the time (age) has passed,
The learning rate α gradually decreases. Then, the learning rate α finally converges to the minimum value represented by α _min , and the pet robot 1 has a reduced learning ability. That is, like humans,
As you grow older, your learning ability decreases and it becomes difficult to learn what you have learned. At a preset reference time (reference age) τ, the learning rate α is an intermediate value between the maximum value and the minimum value.

【００８２】なお、上述した一連の処理は、図１に示し
たような動物型のペットロボットに実行させるだけでな
く、例えば、２足歩行が可能な人間型ロボットや、コン
ピュータ内で活動する仮想ロボット等に実行させるよう
にしてもよい。また、本明細書において、ロボットに
は、人工エージェントも含まれる。The above-described series of processing is performed not only by the animal pet robot shown in FIG. 1 but also by, for example, a humanoid robot capable of bipedal walking or a virtual robot operating in a computer. It may be executed by a robot or the like. In the present specification, the robot also includes an artificial agent.

【００８３】また、上述した一連の処理は、ハードウエ
アにより実行させることもできるが、ソフトウエアによ
り実行させることもできる。一連の処理をソフトウエア
により実行させる場合には、そのソフトウエアを構成す
るプログラムが、専用のハードウエアに組み込まれてい
るロボット装置、または、各種のプログラムをインスト
ールすることで、各種の機能を実行することが可能な、
例えば汎用のロボット装置などに、ネットワークや記録
媒体からインストールされる。The series of processes described above can be executed by hardware, but can also be executed by software. When a series of processing is executed by software, various functions are executed by installing the robot device in which the program that constitutes the software is built into the dedicated hardware or various programs. Possible,
For example, it is installed in a general-purpose robot device or the like from a network or a recording medium.

【００８４】この記録媒体は、図２に示されるように、
装置本体とは別に、ユーザにプログラムを提供するため
に配布される、プログラムが記録されているリムーバブ
ルメモリ６１などよりなるパッケージメディアにより構
成されるだけでなく、装置本体に予め組み込まれた状態
でユーザに提供される、プログラムが記録されているメ
モリ１０Ｂに含まれるハードディスクなどで構成され
る。This recording medium, as shown in FIG.
In addition to the device body, the device is not only configured by a package medium such as a removable memory 61 in which the program is recorded, which is distributed to provide the program to the user, but is also installed in the device body in advance. And a hard disk included in the memory 10B in which the program is recorded.

【００８５】なお、本明細書において、記録媒体に記録
されるプログラムを記述するステップは、記載された順
序に沿って時系列的に行われる処理はもちろん、必ずし
も時系列的に処理されなくとも、並列的あるいは個別に
実行される処理をも含むものである。In the present specification, the steps for describing the program recorded on the recording medium are not limited to the processing performed in time series according to the order described, but are not necessarily performed in time series. It also includes processing executed in parallel or individually.

【００８６】[0086]

【発明の効果】以上のように、本発明によれば、ロボッ
ト装置を動作させることができる。また、その行動を通
して、ユーザに、より生命らしい擬似感覚を与えること
ができる。従って、ユーザが飽きない、ロボット装置を
実現することができる。As described above, according to the present invention, the robot device can be operated. Moreover, through the action, it is possible to give the user a more life-like pseudo feeling. Therefore, it is possible to realize a robot apparatus that the user does not get tired of.

[Brief description of drawings]

【図１】本発明を適用したペットロボットの外観の構成
例を示す斜視図である。FIG. 1 is a perspective view showing a configuration example of an external appearance of a pet robot to which the present invention has been applied.

【図２】図１のペットロボットの内部の構成例を示すブ
ロック図である。FIG. 2 is a block diagram showing an internal configuration example of the pet robot of FIG.

【図３】ペットロボットの機能モジュールの構成例を示
す図である。FIG. 3 is a diagram showing a configuration example of a functional module of a pet robot.

【図４】図３の本能・感情管理部の機能の例を模式的に
示す図である。4 is a diagram schematically showing an example of functions of an instinct / emotion management unit in FIG.

【図５】本発明を適用した強化学習システムの構成を示
す図である。FIG. 5 is a diagram showing a configuration of a reinforcement learning system to which the present invention has been applied.

【図６】感情パラメータに基づいて学習率を演算する処
理を説明するためのフローチャートである。FIG. 6 is a flowchart illustrating a process of calculating a learning rate based on emotion parameters.

【図７】感情パラメータに基づく学習率の変化の例を示
す図である。FIG. 7 is a diagram showing an example of changes in a learning rate based on emotion parameters.

【図８】本能パラメータに基づいて学習率を演算する処
理を説明するためのフローチャートである。FIG. 8 is a flowchart for explaining a process of calculating a learning rate based on an instinct parameter.

【図９】本能パラメータに基づく学習率の変化の例を示
す図である。FIG. 9 is a diagram showing an example of changes in a learning rate based on an instinct parameter.

【図１０】場所に基づいて学習率を決定する処理を説明
するためのフローチャートである。FIG. 10 is a flowchart illustrating a process of determining a learning rate based on a place.

【図１１】場所に基づく学習率の変化の例を示す図であ
る。FIG. 11 is a diagram showing an example of a change in learning rate based on a place.

【図１２】時刻に基づいて学習率を演算する処理を説明
するためのフローチャートである。FIG. 12 is a flowchart for explaining a process of calculating a learning rate based on time.

【図１３】時刻に基づく学習率の変化の例を示す図であ
る。FIG. 13 is a diagram showing an example of changes in the learning rate based on time.

【図１４】成長時刻に基づいて学習率を演算する処理を
説明するためのフローチャートである。FIG. 14 is a flowchart illustrating a process of calculating a learning rate based on growth time.

【図１５】成長時刻に基づく学習率の変化の例を示す図
である。FIG. 15 is a diagram showing an example of changes in a learning rate based on growth time.

[Explanation of symbols]

３１センサ入力処理部，３２情報処理部，４１
角度検出部，４２音量検出部，４３音声認識
部，４４画像認識部，４５圧力検出部，４６
位置検出部，４７スイッチ入力検出部，７１
本能・感情管理部，７２行動管理部，７３姿勢
遷移管理部，７４制御部，７５音データ生成部，
７６音声合成部31 sensor input processing unit, 32 information processing unit, 41
Angle detection unit, 42 Volume detection unit, 43 Voice recognition unit, 44 Image recognition unit, 45 Pressure detection unit, 46
Position detector, 47 Switch input detector, 71
Instinct / emotional management unit, 72 behavior management unit, 73 posture transition management unit, 74 control unit, 75 sound data generation unit,
76 Speech synthesizer

───────────────────────────────────────────────────── フロントページの続き (72)発明者花形理東京都品川区北品川６丁目７番35号ソニー株式会社内 (72)発明者高木剛東京都品川区北品川６丁目７番35号ソニー株式会社内Ｆターム(参考） 2C150 CA02 DA05 DA24 DA25 DA26 DA27 DA28 DF03 DF04 DF06 DF33 ED10 ED39 ED42 ED47 ED52 EF07 EF09 EF16 EF17 EF22 EF23 EF29 EF33 EF36 3C007 AS36 CS08 KS23 KS24 KS31 KS36 KS39 KT01 LW12 MT14 WA04 WA14 WB16 WC00 5H004 GA15 GA26 GB16 JB09 KD31 KD54 KD56 KD63 ─────────────────────────────────────────────────── ─── Continued front page (72) Inventor Osamu Hanagata 6-735 Kita-Shinagawa, Shinagawa-ku, Tokyo Soni -Inside the corporation (72) Inventor Tsuyoshi Takagi 6-735 Kita-Shinagawa, Shinagawa-ku, Tokyo Soni -Inside the corporation F-term (reference) 2C150 CA02 DA05 DA24 DA25 DA26 DA27 DA28 DF03 DF04 DF06 DF33 ED10 ED39 ED42 ED47 ED52 EF07 EF09 EF16 EF17 EF22 EF23 EF29 EF33 EF36 3C007 AS36 CS08 KS23 KS24 KS31 KS36 KS39 KT01 LW12 MT14 WA04 WA14 WB16 WC00 5H004 GA15 GA26 GB16 JB09 KD31 KD54 KD56 KD63

Claims

[Claims]

1. A robot apparatus that behaves in accordance with supplied input information, comprising a behavior management unit that dynamically changes a behavior learning ability.

2. The robot apparatus according to claim 1, wherein the behavior learning ability is determined by a learning rate.

3. The robot apparatus according to claim 2, wherein the learning rate is changed according to the input information.

4. The robot apparatus according to claim 2, further comprising a measuring unit that measures time, wherein the learning rate is changed according to the time.

5. A robot control method for a robot apparatus that behaves according to supplied input information, including a behavior management step of dynamically changing behavior learning ability.

6. A computer-readable program for a robot apparatus that behaves in accordance with supplied input information, including a behavior management step for dynamically changing behavior learning ability. Recording medium that is recorded.

7. A program for causing a computer that controls a robot apparatus that performs an action in accordance with supplied input information to execute an action management step of dynamically changing an action learning ability.