JP2002059384A

JP2002059384A - Learning system and learning method for robot

Info

Publication number: JP2002059384A
Application number: JP2000251483A
Authority: JP
Inventors: Takeshi Ohashi; 武史大橋; Kotaro Sabe; 浩太郎佐部; Masato Ito; 真人伊藤; Jun Yokono; 順横野
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2000-08-22
Filing date: 2000-08-22
Publication date: 2002-02-26

Abstract

PROBLEM TO BE SOLVED: To make an autonomous robot moving by obtaining three-dimensional information learn a method for moving an object by sensing the influence of its operation on the object in a work environment. SOLUTION: This robot is provided with a perception sensor including a camera, and a recurrent neural network as a learning mechanism. The robot moves a movable object in the external world by a controllable part of the robot, and senses an environment of the object and the movement of the object by the perception sensor, to learn a correlation between a method for moving respective revolute joint parts of the robot to the movement of the object. By estimating the movement of the object, the robot learns motion for moving the object by novelty rewarding.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、少なくとも肢体と
体幹部を有する脚式ロボットに対する学習システム及び
学習方法に係り、特に、肢体及び／又は体幹部を利用し
た各種の動作パターンを実行する脚式ロボットに対する
学習システム及び学習方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a learning system and a learning method for a legged robot having at least a limb and a trunk, and more particularly to a legged system for executing various motion patterns using a limb and / or a trunk. The present invention relates to a learning system and a learning method for a robot.

【０００２】更に詳しくは、本発明は、リカレント・ニ
ューラル・ネットワークを用いて時系列的な学習・教示
作用を実現する脚式ロボットに対する学習システム及び
学習方法に係り、特に、リカレント・ニューラル・ネッ
トワークによって対象物の動きを予測し、ノベルティ・
リワーディングによって所定の対象物を動かすための多
様なモーションを自己学習する脚式ロボットに対する学
習システム及び学習方法に関する。More specifically, the present invention relates to a learning system and a learning method for a legged robot which realizes a time-series learning / teaching operation using a recurrent neural network. Predict the movement of the object, novelty
The present invention relates to a learning system and a learning method for a legged robot that self-learns various motions for moving a predetermined target object by redirection.

【０００３】[0003]

【従来の技術】電気的若しくは磁気的な作用を用いて人
間の動作に似せた運動を行う機械装置のことを「ロボッ
ト」という。ロボットの語源は、スラブ語のＲＯＢＯＴ
Ａ(奴隷機械)に由来すると言われている。わが国では、
ロボットが普及し始めたのは１９６０年代末からである
が、その多くは、工場における生産作業の自動化・無人
化などを目的としたマニピュレータや搬送ロボットなど
の産業用ロボット（industrial robot）であった。2. Description of the Related Art A mechanical device that performs a motion similar to a human motion by using an electric or magnetic action is called a "robot". The origin of the robot is ROBOT in Slavic language
It is said to be from A (slave machine). In our country,
Robots began to spread from the late 1960s, but most of them were industrial robots such as manipulators and transfer robots for the purpose of automation and unmanned production work in factories. .

【０００４】アーム式ロボットのように、ある特定の場
所に植設して用いるような据置きタイプのロボットは、
部品の組立・選別作業など固定的・局所的な作業空間で
のみ活動する。これに対し、移動式のロボットは、作業
空間は非限定的であり、所定の経路上または無経路上を
自在に移動して、所定の若しくは任意の人的作業を代行
したり、ヒトやイヌあるいはその他の生命体に置き換わ
る種々の幅広いサービスを提供することができる。なか
でも脚式の移動ロボットは、クローラ式やタイヤ式のロ
ボットに比し不安定で姿勢制御や歩行制御が難しくなる
が、階段や梯子の昇降や障害物の乗り越えや、整地・不
整地の区別を問わない柔軟な歩行・走行動作を実現でき
るという点で優れている。[0004] A stationary type robot such as an arm type robot which is implanted and used in a specific place,
Active only in fixed and local work spaces such as parts assembly and sorting work. On the other hand, the mobile robot has a work space that is not limited, and can freely move on a predetermined route or on a non-route to perform a predetermined or arbitrary human work, or perform a human or dog operation. Alternatively, a wide variety of services that replace other living things can be provided. Among them, legged mobile robots are unstable and difficult to control posture and walking, compared to crawler type and tire type robots. It is excellent in that a flexible walking / running operation can be realized regardless of the type.

【０００５】最近では、イヌやネコのように４足歩行の
動物の身体メカニズムやその動作を模したペット型ロボ
ット、あるいは、ヒトのような２足直立歩行を行う動物
の身体メカニズムや動作をモデルにしてデザインされた
「人間形」若しくは「人間型」のロボット（humanoid r
obot）など、脚式移動ロボットに関する研究開発が進展
し、実用化への期待も高まってきている。Recently, a pet-type robot that simulates the body mechanism and operation of a four-legged animal such as a dog or a cat, or a body mechanism or movement of an animal such as a human that walks upright on two legs has been modeled. "Humanoid" or "humanoid" robot (humanoid r)
obot), research and development on legged mobile robots is progressing, and expectations for practical use are increasing.

【０００６】ロボットに対して所定動作を教え込むこと
を、「教示」若しくは「ティーチング」と呼ぶ。動作教
示には、例えば、作業現場においてオペレータ又はユー
ザが手取り足取り教える教示方式や、計算機などロボッ
ト外部のエディタ上で動作パターンの入力・作成・編集
を行う教示方式などが挙げられる。Teaching a predetermined operation to a robot is called "teaching" or "teaching". The operation teaching includes, for example, a teaching method in which an operator or a user teaches a hand and step at a work site, and a teaching method in which an operation pattern is input, created, and edited on an editor external to the robot such as a computer.

【０００７】しかしながら、従来のロボットにおいて
は、動作教示を行うために、その操作環境を相当程度理
解し習熟する必要があり、ユーザの負担が過大であっ
た。[0007] However, in the conventional robot, in order to teach the operation, it is necessary to understand and master the operation environment to a considerable extent, and the burden on the user is excessive.

【０００８】また、ロボットの動作教示や学習に関する
従来の手法では、あらかじめモーション・データとして
ロボットに与えられ、その動作を外部の状況に合わせて
再生するというプロセスを取るものが一般的であった。
この手法によれば、動作再生の安定性が見込まれる反
面、新しい動作を生み出すことが困難であることと、予
期しない状況には対応することができないという問題が
ある。[0008] Further, in the conventional method relating to the teaching and learning of the operation of the robot, it is common to take a process of giving the data as motion data to the robot in advance and reproducing the operation according to an external situation.
According to this method, although stability of operation reproduction is expected, it is difficult to create a new operation, and it is not possible to cope with an unexpected situation.

【０００９】また、モーションはオフラインで生成され
るものであるから、あらかじめ与えたモーションがロボ
ットの形状や現在の作業環境に対して最適であるという
保証もない。Further, since the motion is generated off-line, there is no guarantee that the motion given in advance is optimal for the shape of the robot or the current working environment.

【００１０】ところで、近年、ロボットの制御にニュー
ラル・ネットワークを適用する事例が紹介されている。[0010] In recent years, examples of applying a neural network to control of a robot have been introduced.

【００１１】ニューラル・ネットワークとは、人間の脳
における神経回路網を簡略化したモデルであり、神経細
胞ニューロンが一方向にのみ信号を通過するシナプスを
介して結合されているネットワークを意味する。ニュー
ロン間の信号の伝達はシナプスを介して行われ、シナプ
スの抵抗、すなわち重みを適当に調整することによりさ
まざまな情報処理が可能となる。各ニューロンは、他の
１以上のニューロンからの出力をシナプスによる重み付
けをして入力し、それら入力値の総和を非線型応答関数
の変形を加え、再度他のニューロンへ出力する。A neural network is a simplified model of a neural network in the human brain, and refers to a network in which nerve cell neurons are connected via synapses that pass signals only in one direction. Signal transmission between neurons is performed via synapses, and various information processings can be performed by appropriately adjusting the synaptic resistance, that is, the weight. Each neuron inputs the outputs from one or more other neurons by weighting them with synapses, applies a modification of the nonlinear response function to the sum of those input values, and outputs them to the other neurons again.

【００１２】ニューラル・ネットワークによる制御で
は、摩擦や粘性などの非線型問題にそのまま対応するこ
とができるとともに、学習機能を備えているので、パラ
メータの設定変更が不要になる。In the control by the neural network, it is possible to cope with a non-linear problem such as friction and viscosity as it is, and since a learning function is provided, it is not necessary to change parameter settings.

【００１３】しかしながら、ロボットが予期しないよう
な状況に対処した動作を生成するためにためにニューラ
ル・ネットワークを適用した事例は少ない。However, there are few cases in which a neural network is applied to generate an action corresponding to an unexpected situation of a robot.

【００１４】[0014]

【発明が解決しようとする課題】本発明の目的は、リカ
レント・ニューラル・ネットワークを用いて時系列的な
学習・教示作用を実現する、脚式ロボットに対する優れ
た学習システム及び学習方法を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide an excellent learning system and a learning method for a legged robot which realizes a time-series learning / teaching operation using a recurrent neural network. It is in.

【００１５】本発明の更なる目的は、リカレント・ニュ
ーラル・ネットワークによって対象物の動きを予測し、
ノベルティ・リワーディングによって所定の対象物を動
かすための多様なモーションを自己学習することができ
る、脚式ロボットに対する優れた学習システム及び学習
方法を提供することにある。It is a further object of the present invention to predict the motion of an object by a recurrent neural network,
An object of the present invention is to provide an excellent learning system and a learning method for a legged robot that can self-learn various motions for moving a predetermined object by novelty rewarding.

【００１６】本発明の更なる目的は、リカレント・ニュ
ーラル・ネットワークによって現在の環境に応じた新し
い動作を生み出すことができ、予期しない状況に対処し
て、多様な表現を可能にする、脚式ロボットのための学
習システム及び学習方法を提供することにある。[0016] It is a further object of the present invention to provide a legged robot capable of generating a new operation according to a current environment by a recurrent neural network, coping with unexpected situations, and enabling various expressions. It is an object of the present invention to provide a learning system and a learning method.

【００１７】[0017]

【課題を解決するための手段】本発明は、上記課題を参
酌してなされたものであり、複数の関節で構成されるロ
ボットのための学習システム又は学習方法であって、作
業空間上の対象物を動かすように各関節を駆動してロボ
ットの動作を制御する制御手段又はステップと、作業環
境上で発生する事象を検出する知覚手段又はステップ
と、前記制御手段又はステップにより発現されるロボッ
トの動作と該動作時の対象物の動き方と前記知覚手段又
はステップにより知覚した事象をリカレント・ニューラ
ル・ネットワークに学習する学習手段又はステップと、
を具備することを特徴とするロボットのための学習シス
テムである。SUMMARY OF THE INVENTION The present invention has been made in consideration of the above problems, and is a learning system or a learning method for a robot including a plurality of joints. A control means or step for controlling the operation of the robot by driving each joint so as to move an object; a perception means or step for detecting an event occurring on the work environment; Learning means or steps for learning an operation and an event perceived by the way of movement of the object at the time of the operation and the perception means or steps to a recurrent neural network,
A learning system for a robot, comprising:

【００１８】ここで、前記学習手段又はステップは、リ
カレント・ニューラル・ネットワークを用いた予測部又
はサブステップと、リカレント・ニューラル・ネットワ
ークを用いた学習部又はサブステップを備え、前記予測
部又はサブステップは前記制御手段又はステップにより
発現される動作と前記知覚手段又はステップにより知覚
される事象を基に次の時刻の事象を予測し、前記学習部
又はサブステップは、該予測した事象が前記知覚手段又
はステップにおいて次の時刻に現実に知覚された事象と
相違するときに、ロボットの動作と知覚される事象を学
習するようにしてもよい。Here, the learning means or step includes a prediction unit or sub-step using a recurrent neural network, and a learning unit or sub-step using a recurrent neural network. Predicts an event at the next time based on the action expressed by the control means or the step and the event perceived by the perception means or the step, and the learning unit or the sub-step determines whether the predicted event is the perception means Alternatively, when the event differs from the event actually perceived at the next time in the step, the event perceived as the operation of the robot may be learned.

【００１９】また、前記学習手段又はステップは、対象
物の動かし方に対する経験をまったく持たない状態の学
習フェーズと、ロボットの動作と対象物の動きに関する
１又はそれ以上の関係を学習している状態の新規性探索
フェーズとを有していてもよい。The learning means or step may include a learning phase in which the user has no experience in how to move the object, and a learning phase in which one or more relations between the operation of the robot and the motion of the object are learned. And a novelty search phase.

【００２０】前記学習フェーズでは、対象物に対して同
様のロボットの動作を所定回数だけ適用して、該ロボッ
トの動作と対象物の動き方に再現性を確認できた場合
に、ロボットの初期位置及び動作と対象物の動き方を学
習するようにしてもよい。また、ロボットの動作を所定
回数だけ試行しても対象物の動き方に再現性を確認でき
なかった場合には、ロボットの初期位置と動作の組み合
わせを変えて、学習作業を再試行するようにしてもよ
い。In the learning phase, the same robot motion is applied to the target object a predetermined number of times, and when the reproducibility of the robot motion and the motion of the target object can be confirmed, the initial position of the robot is determined. Alternatively, the operation and the way of movement of the object may be learned. Also, if the reproducibility of the movement of the target object cannot be confirmed even after trying the robot operation a predetermined number of times, change the combination of the initial position and the motion of the robot and retry the learning work. You may.

【００２１】また、前記新規性探索フェーズでは、ロボ
ットの初期位置と動作に対する対象物の動き方を予測
し、該予測した対象物の動き方が前記知覚手段又はステ
ップにより知覚された対象物の動き方と相違する場合
に、新規性を認めて、ロボットの初期位置及び動作と対
象物の動き方を学習するようにしてもよい。In the novelty search phase, the motion of the object with respect to the initial position and the motion of the robot is predicted, and the predicted motion of the object is determined by the motion of the object perceived by the perception means or step. When the robot is different from the robot, the novelty may be recognized and the initial position and the motion of the robot and the way of moving the object may be learned.

【００２２】そして、新規性が認められた場合には、対
象物に対して同様のロボットの動作を所定回数だけ適用
して、該ロボットの動作と対象物の動き方に再現性を確
認できた場合に、ロボットの初期位置及び動作と対象物
の動き方を学習するようにすればよい。When the novelty is recognized, the same robot operation is applied to the target object a predetermined number of times, and the reproducibility of the robot operation and the target object movement can be confirmed. In such a case, the initial position and motion of the robot and the manner of movement of the object may be learned.

【００２３】[0023]

【作用】本発明に係るロボットは、カメラを始めとする
知覚センサと、学習機構としてのリカレント・ニューラ
ル・ネットワークとを備えている。The robot according to the present invention includes a sensor such as a camera and a recurrent neural network as a learning mechanism.

【００２４】本発明によれば、ロボット自身の持つ制御
可能な部分によって外界の移動可能な対象物を動かし、
知覚センサによって対象物のおかれている環境と、対象
物の動きを知覚して、ロボットの各関節部の動かし方と
対象物の動きのとの関連を学習することができる。According to the present invention, the movable object in the outside world is moved by the controllable part of the robot itself,
The perception sensor can perceive the environment where the object is placed and the movement of the object, and learn the relationship between how to move each joint of the robot and the movement of the object.

【００２５】また、対象物の動きを予測して、ノベルテ
ィ・リワーディングにより対象物を動かすモーションを
自己学習することができる。Also, the motion of the object can be self-learned by predicting the motion of the object and performing novelty rewarding.

【００２６】ノベルティ・リワーディングを用いること
で、予期できない動きに対してより高い報酬を与えるこ
とができ、ロボットのモーションの多様性に無限の可能
性を与えることができる。この結果、ユーザは長い間同
じロボットを楽しむことができる。The use of novelty rewarding can provide higher rewards for unexpected movements, and can provide endless possibilities for the variety of robot motions. As a result, the user can enjoy the same robot for a long time.

【００２７】また、本発明に係る学習メカニズムを搭載
したロボットは、環境に応じたモーションの創発が可能
となる。したがって、あらかじめロボットに対してモー
ションのインプットを行う必要がなくなる。また、環境
に応じたモーションを生成することができるので、ユー
ザの環境ごとに多様な動作をするロボットを提供するこ
とができる。Further, the robot equipped with the learning mechanism according to the present invention can generate a motion according to the environment. Therefore, it is not necessary to input a motion to the robot in advance. Further, since a motion corresponding to the environment can be generated, it is possible to provide a robot that performs various operations for each environment of the user.

【００２８】本発明のさらに他の目的、特徴や利点は、
後述する本発明の実施例や添付する図面に基づくより詳
細な説明によって明らかになるであろう。Still other objects, features and advantages of the present invention are:
It will become apparent from the following more detailed description based on the embodiments of the present invention and the accompanying drawings.

【００２９】[0029]

【発明の実施の形態】以下、図面を参照しながら本発明
の実施例を詳解する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【００３０】図１には、本発明を実施に供される、四肢
による脚式歩行を行う歩行ロボット１の外観構成を示し
ている。図示の通り、該ロボット１は、四肢を有する動
物の形状や構造をモデルにして構成された多関節型の移
動ロボットである。とりわけ本実施例の歩行ロボット１
は、愛玩動物の代表例であるイヌの形状及び構造を模し
てデザインされたペット型ロボットという側面を有し、
例えば人間の住環境において人間と共存するとともに、
ユーザ操作に応答した動作表現を行うことができる。FIG. 1 shows an external configuration of a walking robot 1 for carrying out legged walking with limbs, which is used for carrying out the present invention. As shown in the figure, the robot 1 is a multi-joint type mobile robot configured based on the shape and structure of an animal having limbs. In particular, the walking robot 1 of the present embodiment
Has the aspect of a pet-shaped robot designed to imitate the shape and structure of a dog that is a typical example of pet animals,
For example, while coexisting with humans in the human living environment,
An operation expression in response to a user operation can be performed.

【００３１】歩行ロボット１は、胴体部ユニット２と、
頭部ユニット３と、尻尾４と、四肢すなわち脚部ユニッ
ト６Ａ〜６Ｄで構成される。The walking robot 1 includes a body unit 2 and
It is composed of a head unit 3, a tail 4, and limbs, that is, leg units 6A to 6D.

【００３２】頭部ユニット３は、ロール、ピッチ及びヨ
ーの各軸方向（図示）の自由度を持つ首関節７を介し
て、胴体部ユニット２の略前上端に配設されている。ま
た、頭部ユニット３には、イヌの「目」に相当するＣＣ
Ｄ（Charge Coupled Device：電荷結合素子）カメラ１
５と、「耳」に相当するマイクロフォン１６と、「口」
に相当するスピーカ１７と、触感に相当するタッチセン
サ１８が搭載されている。これら以外にも、生体の五感
を構成するセンサを含んでいても構わない。The head unit 3 is disposed substantially at the upper front end of the body unit 2 via a neck joint 7 having degrees of freedom in the axial directions (illustration) of roll, pitch and yaw. Also, the head unit 3 has a CC corresponding to a dog's “eye”.
D (Charge Coupled Device) Camera 1
5, microphone 16 corresponding to "ear", and "mouth"
And a touch sensor 18 corresponding to tactile sensation. In addition to these, sensors constituting the five senses of a living body may be included.

【００３３】尻尾４は、ロール及びピッチ軸の自由度を
持つ尻尾関節８を介して、胴体部ユニット２の略後上端
に湾曲若しくは揺動自在に取り付けられている。The tail 4 is attached to a substantially rear upper end of the body unit 2 via a tail joint 8 having a degree of freedom of a roll and a pitch axis so as to bend or swing freely.

【００３４】脚部ユニット６Ａ及び６Ｂは前足を構成
し、脚部ユニット６Ｃ及び６Ｄは後足を構成する。各脚
部ユニット６Ａ〜６Ｄは、それぞれ、大腿部ユニット９
Ａ〜９Ｄ及び脛部ユニット１０Ａ〜１０Ｄの組み合わせ
で構成され、胴体部ユニット２底面の前後左右の各隅部
に取り付けられている。大腿部ユニット９Ａ〜９Ｄは、
ロール、ピッチ、ヨーの各軸の自由度を持つ股関節１１
Ａ〜１１Ｄによって、胴体部ユニット２の各々の所定部
位に連結されている。また、大腿部ユニット９Ａ〜９Ｄ
と脛部ユニット１０Ａ〜１０Ｄの間は、ロール及びピッ
チ軸の自由度を持つ膝関節１２Ａ〜１２Ｄによって連結
されている。The leg units 6A and 6B constitute the forefoot, and the leg units 6C and 6D constitute the rear foot. The leg units 6A to 6D are respectively connected to the thigh unit 9
A to 9D and a combination of the shin units 10A to 10D are attached to the front, rear, left and right corners of the bottom of the body unit 2. The thigh units 9A to 9D are
Hip joint 11 with degrees of freedom in roll, pitch and yaw axes
A to 11D are connected to respective predetermined portions of the body unit 2. Also, the thigh units 9A to 9D
And the shin units 10A to 10D are connected by knee joints 12A to 12D having degrees of freedom of roll and pitch axes.

【００３５】なお、歩行ロボット１の関節自由度は、実
際には各軸毎に配備され関節アクチュエータ（図示しな
い）の回転駆動によって提供される。また、歩行ロボッ
ト１が持つ関節自由度の個数や配置は任意であり、本発
明の要旨を限定するものではない。Incidentally, the degree of freedom of the joint of the walking robot 1 is actually provided by rotation of a joint actuator (not shown) provided for each axis. The number and arrangement of the degrees of freedom of the joints of the walking robot 1 are arbitrary, and do not limit the gist of the present invention.

【００３６】図２には、歩行ロボット１の電気・制御系
統の構成図を模式的に示している。同図に示すように、
歩行ロボット１は、全体の動作の統括的制御やその他の
データ処理を行う制御部２０と、入出力部４０と、駆動
部５０と、電源部６０とで構成される。以下、各部につ
いて説明する。FIG. 2 schematically shows a configuration diagram of an electric / control system of the walking robot 1. As shown in the figure,
The walking robot 1 includes a control unit 20 that performs overall control of the entire operation and other data processing, an input / output unit 40, a driving unit 50, and a power supply unit 60. Hereinafter, each unit will be described.

【００３７】入出力部４０は、入力部として移動ロボッ
ト１の目に相当するＣＣＤカメラ１５や、耳に相当する
マイクロフォン１６、触感に相当するタッチセンサ１８
など、五感に相当する各種のセンサを含む。また、出力
部として、口に相当するスピーカ１７などを装備してい
る。これら出力部は、脚部などによる機械運動パターン
以外の形式で、歩行ロボット１からユーザに対してシン
ボリックなフィードバックを与えることができる。The input / output unit 40 includes, as input units, a CCD camera 15 corresponding to the eyes of the mobile robot 1, a microphone 16 corresponding to the ears, and a touch sensor 18 corresponding to the tactile sensation.
And various sensors corresponding to the five senses. Further, a speaker 17 corresponding to a mouth is provided as an output unit. These output units can give symbolic feedback from the walking robot 1 to the user in a form other than the mechanical movement pattern of the legs and the like.

【００３８】歩行ロボット１は、カメラ１５を含むこと
で、作業空間上に存在する任意の物体の形状や色彩を認
識することができる。また、歩行ロボット１は、カメラ
のような視覚手段の他に、赤外線、音波、超音波、電波
などの発信波を受信する受信装置をさらに備えていても
よい。この場合、各伝送波を検知するセンサ出力に基づ
いて発信源からの位置や向きを計測することができる。By including the camera 15, the walking robot 1 can recognize the shape and color of any object existing in the work space. In addition, the walking robot 1 may further include a receiving device that receives a transmitted wave such as an infrared ray, a sound wave, an ultrasonic wave, or a radio wave, in addition to a visual unit such as a camera. In this case, the position and the direction from the transmission source can be measured based on the sensor output for detecting each transmission wave.

【００３９】駆動部５０は、制御部２０が指令する所定
の運動パターンに従って歩行ロボット１の機械運動を実
現する機能ブロックであり、首関節７、尻尾関節８、股
関節１１Ａ〜１１Ｄ、膝関節１２Ａ〜１２Ｄなどのそれ
ぞれの関節におけるロール、ピッチ、ヨーなど各軸毎に
設けられた駆動ユニットで構成される。図示の例では、
歩行ロボット１はｎ個の関節自由度を有し、したがって
駆動部５０はｎ個の駆動ユニットで構成される。各駆動
ユニットは、所定軸回りの回転動作を行うモータ（関節
アクチュエータ）５１と、モータ５１の回転位置を検出
するエンコーダ（関節角度センサ）５２と、制御部から
の制御指令値とエンコーダ５２の出力に基づいてモータ
５１の回転位置や回転速度を適応的に制御するドライバ
５３の組み合わせで構成される。The drive unit 50 is a functional block for realizing the mechanical movement of the walking robot 1 according to a predetermined movement pattern instructed by the control unit 20, and includes a neck joint 7, a tail joint 8, hip joints 11A to 11D, and knee joints 12A to 12A. It is composed of a drive unit provided for each axis such as roll, pitch and yaw in each joint such as 12D. In the example shown,
The walking robot 1 has n joint degrees of freedom, and therefore the driving unit 50 is composed of n driving units. Each drive unit includes a motor (joint actuator) 51 that performs a rotation operation about a predetermined axis, an encoder (joint angle sensor) 52 that detects a rotational position of the motor 51, a control command value from a control unit, and an output of the encoder 52. And a combination of a driver 53 that adaptively controls the rotational position and rotational speed of the motor 51 based on the

【００４０】電源部６０は、その字義通り、歩行ロボッ
ト１内の各電気回路等に対して給電を行う機能モジュー
ルである。本実施例に係る歩行ロボット１は、バッテリ
を用いた自律駆動式であり、電源部６０は、充電バッテ
リ６１と、充電バッテリ６１の充放電状態を管理する充
放電制御部６２とで構成される。The power supply unit 60 is a functional module that supplies power to each electric circuit and the like in the walking robot 1 as the name implies. The walking robot 1 according to the present embodiment is of an autonomous driving type using a battery, and a power supply unit 60 includes a charge battery 61 and a charge / discharge control unit 62 that manages a charge / discharge state of the charge battery 61. .

【００４１】充電バッテリ６１は、例えば、複数本のニ
ッケル・カドミウム電池セルをカートリッジ式にパッケ
ージ化した「バッテリ・パック」の形態で構成される。The rechargeable battery 61 is configured, for example, in the form of a "battery pack" in which a plurality of nickel-cadmium battery cells are packaged in a cartridge type.

【００４２】また、充放電制御部６２は、バッテリ６１
の端子電圧や充電／放電電流量、バッテリ６１の周囲温
度などを測定することでバッテリ６１の残存容量を把握
し、充電の開始時期や終了時期などを決定するようにな
っている。The charge / discharge control unit 62 includes a battery 61
By measuring the terminal voltage, the amount of charge / discharge current, the ambient temperature of the battery 61, and the like, the remaining capacity of the battery 61 is grasped, and the start time and end time of charging are determined.

【００４３】制御部２０は、ヒトやイヌの「頭脳」に相
当し、例えば歩行ロボット１の頭部ユニット３あるいは
胴体部ユニット２に搭載される。The control unit 20 corresponds to the “brain” of a human or a dog, and is mounted on, for example, the head unit 3 or the torso unit 2 of the walking robot 1.

【００４４】図３には、制御部２０の構成をさらに詳細
に図解している。同図に示すように、制御部２０は、メ
イン・コントローラとしてのＣＰＵ（Central Processi
ng Unit）２１が、メモリその他の各回路コンポーネン
トや周辺機器とバス接続された構成となっている。バス
２７上の各装置にはそれぞれに固有のアドレス（メモリ
・アドレス又はＩ／Ｏアドレス）が割り当てられてお
り、ＣＰＵ２１はアドレス指定することでバス２８上の
特定の装置と通信することができる。FIG. 3 illustrates the configuration of the control unit 20 in more detail. As shown in FIG. 1, the control unit 20 includes a CPU (Central Process
ng Unit) 21 is connected to a memory and other circuit components and peripheral devices via a bus. Each device on the bus 27 is assigned a unique address (memory address or I / O address), and the CPU 21 can communicate with a specific device on the bus 28 by specifying an address.

【００４５】ＲＡＭ（Random Access Memory）２２は、
ＤＲＡＭ（Dynamic RAM）などの揮発性メモリで構成さ
れた書き込み可能メモリであり、ＣＰＵ２１が実行する
ロボット制御用のプログラム・コードをロードしたり、
作業データの一時的な保存のために使用される。The RAM (Random Access Memory) 22
It is a writable memory composed of a volatile memory such as a DRAM (Dynamic RAM), and is used to load a program code for robot control executed by the CPU 21,
Used for temporary storage of work data.

【００４６】ＲＯＭ（Read Only Memory）２３は、プロ
グラムやデータを恒久的に格納する読み出し専用メモリ
である。ＲＯＭ２３に格納されるプログラム・コードに
は、歩行ロボット１の電源投入時に実行する自己診断テ
スト・プログラムや、歩行ロボット１の動作を規定する
制御プログラムなどが挙げられる。The ROM (Read Only Memory) 23 is a read-only memory for permanently storing programs and data. The program codes stored in the ROM 23 include a self-diagnosis test program to be executed when the power of the walking robot 1 is turned on, a control program for defining the operation of the walking robot 1, and the like.

【００４７】本実施例では、歩行ロボット１の制御プロ
グラムには、リカレント・ニューラル・ネットワークに
基づく学習機能が適用されている。リカレント・ニュー
ラル・ネットワークによれば、時系列的な学習を行うこ
とができる。すなわち、音楽などの時系列的な入力情報
と、この音楽に合わせたダンスなどの時系列的な関節角
度パラメータとを関連付けた学習を行うことができる。
但し、リカレント・ニューラル・ネットワークを用いた
学習機構並びに教示機構の詳細については、後述に譲
る。In this embodiment, a learning function based on a recurrent neural network is applied to the control program of the walking robot 1. According to the recurrent neural network, time-series learning can be performed. That is, it is possible to perform learning in which time-series input information such as music is associated with time-series joint angle parameters such as dance according to the music.
However, the details of the learning mechanism and the teaching mechanism using the recurrent neural network will be described later.

【００４８】不揮発性メモリ２４は、例えばＥＥＰＲＯ
Ｍ（Electrically Erasable and Programmable ROM）の
ように、電気的に消去再書き込みが可能なメモリ素子で
構成され、逐次更新すべきデータを不揮発的に保持する
ために使用される。逐次更新すべきデータには、例え
ば、歩行ロボット１の行動パターンを規定する学習モデ
ル、感情モデル、本能モデル、行動計画モデルなどが挙
げられる。The nonvolatile memory 24 is, for example, an EEPROM.
A memory element such as M (Electrically Erasable and Programmable ROM), which is electrically erasable and rewritable, is used to hold data to be sequentially updated in a nonvolatile manner. The data to be sequentially updated includes, for example, a learning model that defines an action pattern of the walking robot 1, an emotion model, an instinct model, an action plan model, and the like.

【００４９】インターフェース２５は、制御部２０外の
機器と相互接続し、データ交換を可能にするための装置
である。インターフェース２５は、例えば、カメラ１５
やマイクロフォン１６、スピーカ１７との間でデータ入
出力を行う。また、インターフェース２５は、駆動部５
０内の各ドライバ５３−１…との間でデータやコマンド
の入出力を行う。また、インターフェース２５は、電源
部６０との間で充電開始及び充電終了信号の授受を行う
こともできる。The interface 25 is a device for interconnecting with equipment outside the control unit 20 and enabling data exchange. The interface 25 is, for example, the camera 15
And data input / output with the microphone 16 and the speaker 17. Further, the interface 25 includes the driving unit 5
Data and commands are input / output to / from each of the drivers 53-1 within. Further, the interface 25 can also transmit and receive charging start and charging end signals to and from the power supply unit 60.

【００５０】インターフェース２５は、ＲＳ（Recommen
ded Standard）−２３２Ｃなどのシリアル・インターフ
ェース、ＩＥＥＥ（Institute of Electrical and elec
tronics Engineers）１２８４などのパラレル・インタ
ーフェース、ＵＳＢ（Universal Serial Bus）インター
フェース、ｉ−Ｌｉｎｋ（ＩＥＥＥ１３９４）インター
フェース、ＳＣＳＩ（Small Computer System Interfac
e）インターフェースなどのような、コンピュータの周
辺機器接続用の汎用インターフェースを備え、ローカル
接続された外部機器との間でプログラムやデータの移動
を行うようにしてもよい。The interface 25 is an RS (Recommen
serial interface such as ded Standard) -232C, IEEE (Institute of Electrical and elec)
tronics Engineers) 1284, USB (Universal Serial Bus) interface, i-Link (IEEE1394) interface, SCSI (Small Computer System Interface)
e) A general-purpose interface for connecting peripheral devices of a computer, such as an interface, may be provided to transfer programs and data to and from a locally connected external device.

【００５１】また、インターフェース２５の１つとして
赤外線通信（ＩｒＤＡ）インターフェースを備え、外部
機器と無線通信を行うようにしてもよい。赤外線通信の
ための送受信部は、例えば頭部ユニット２や尻尾３な
ど、歩行ロボット１本体の先端部に設置されることが受
信感度の観点から好ましい。Further, an infrared communication (IrDA) interface may be provided as one of the interfaces 25 to perform wireless communication with an external device. The transmitting and receiving unit for infrared communication is preferably installed at the tip of the main body of the walking robot 1 such as the head unit 2 and the tail 3 from the viewpoint of reception sensitivity.

【００５２】さらに、制御部２０は、無線通信インター
フェース２６ネットワーク・インターフェース・カード
（ＮＩＣ）２７を含み、"ｂｌｕｅｔｏｏｔｈ"や"．１
１Ｂ"のような近接無線通信、あるいはＬＡＮ（Local A
rea Network：例えばＥｔｈｅｒｎｅｔ（登録商標））
やインターネットを経由して、外部のホスト・コンピュ
ータ１００とデータ通信を行うことができる。Further, the control unit 20 includes a wireless communication interface 26 and a network interface card (NIC) 27, and includes “bluetooth” and “.1”.
1B ", or LAN (Local A
rea Network: For example, Ethernet (registered trademark))
And data communication with the external host computer 100 via the Internet.

【００５３】このような歩行ロボット１とホストコンピ
ュータ間のデータ通信の目的は、遠隔のコンピュータ資
源を用いて歩行ロボット１の動作をリモート・コントロ
ールすることである。また、該データ通信の他の目的
は、動作モデルやその他のプログラム・コードなど歩行
ロボット１の動作制御に必要なデータやプログラムをネ
ットワーク経由で歩行ロボット１に供給することにあ
る。The purpose of such data communication between the walking robot 1 and the host computer is to remotely control the operation of the walking robot 1 using remote computer resources. Another object of the data communication is to supply data and programs necessary for operation control of the walking robot 1, such as an operation model and other program codes, to the walking robot 1 via a network.

【００５４】歩行ロボット１の動作制御は、現実には、
ＣＰＵ２１において所定のソフトウェア・プログラムを
実行することによって実現する。図４には、ロボット１
上で稼動するソフトウェア制御構成を模式的に示してい
る。The operation control of the walking robot 1 is actually
This is realized by executing a predetermined software program in the CPU 21. FIG. 4 shows the robot 1
The software control configuration that operates on the above is schematically shown.

【００５５】同図に示すように、ロボット制御用のソフ
トウェアは、複数層のソフトウェアで構成される階層構
造を備えている。制御用ソフトウェアにはオブジェクト
指向プログラミングを採り入れることができる。この場
合、各ソフトウェアは、データとそのデータに対する処
理手続きとを一体化させた「オブジェクト」というモジ
ュール単位で扱われる。As shown in the figure, the software for controlling the robot has a hierarchical structure composed of a plurality of layers of software. The control software can adopt object-oriented programming. In this case, each piece of software is handled in units of modules called "objects" in which data and processing procedures for the data are integrated.

【００５６】最下層のデバイス・ドライバは、各関節ア
クチュエータの駆動やセンサ出力の受信などハードウェ
アに対して直接アクセスすることを許容されたオブジェ
クトであり、ハードウェアからの割り込み要求に応答し
て該当する処理を行うようになっている。The lowermost device driver is an object permitted to directly access hardware such as driving each joint actuator and receiving a sensor output, and corresponds to an interrupt request from hardware. Processing is performed.

【００５７】仮想ロボットは、各種デバイス・ドライバ
と所定のオブジェクト間通信プロトコルに基づいて動作
するオブジェクトとの仲介となるオブジェクトである。
ロボット１を構成する各ハードウェア装置へのアクセス
は、この仮想ロボットを介して行われる。The virtual robot is an object that acts as an intermediary between various device drivers and objects that operate based on a predetermined inter-object communication protocol.
Access to each hardware device constituting the robot 1 is performed through this virtual robot.

【００５８】サービス・マネージャは、コネクション・
ファイルに記述されたオブジェクト間の接続情報を基
に、各オブジェクトに接続を促すシステム・オブジェク
トである。The service manager is responsible for the connection
A system object that prompts each object to connect based on connection information between objects described in the file.

【００５９】システム層より上位のソフトウェアは、オ
ブジェクト（プロセス）毎にモジュール化されており、
必要な機能毎にオブジェクトを選択して置換容易な構成
になっている。したがって、コネクション・ファイルを
書き換えることで、データ型が一致するオブジェクトの
入出力を自由に接続することができる。Software higher than the system layer is modularized for each object (process).
The configuration is such that an object can be selected for each necessary function and replaced easily. Therefore, by rewriting the connection file, it is possible to freely connect the input and output of the objects having the same data type.

【００６０】デバイス・ドライバ層とシステム層以外の
ソフトウェア・モジュールは、ミドルウェア層とアプリ
ケーション層に大別される。Software modules other than the device driver layer and the system layer are roughly divided into a middleware layer and an application layer.

【００６１】図５には、ミドルウェア層の内部構成を模
式的に図解している。FIG. 5 schematically illustrates the internal configuration of the middleware layer.

【００６２】ミドルウェア層は、ロボット１の基本的な
機能を提供するソフトウェア・モジュールの集まりであ
り、各モジュールの構成はロボット１の機械的・電気的
な特性や仕様、形状などハードウェア属性の影響を受け
る。The middleware layer is a group of software modules that provide basic functions of the robot 1. The configuration of each module is influenced by hardware attributes such as mechanical and electrical characteristics, specifications, and shape of the robot 1. Receive.

【００６３】ミドルウェア層は、機能的に、認識系のミ
ドルウェア（図５の左半分）と、出力系のミドルウェア
（図５の右半分）に分けることができる。The middleware layer can be functionally divided into middleware for recognition (left half in FIG. 5) and middleware for output (right half in FIG. 5).

【００６４】認識系のミドルウェアでは、画像データや
音声データ、その他のセンサから得られる検出データな
ど、ハードウェアからの生データを仮想ロボット経由で
受け取ってこれらを処理する。すなわち、各種入力情報
に基づき、音声認識、距離検出、姿勢検出、接触、動き
検出、色認識などの処理を行い、認識結果を得る（例え
ば、ボールを検出した、転倒を検出した、撫でられた、
叩かれた、ドミソの音階が聞こえた、動く物体を検出し
た、障害物を検出した、障害物を認識した、など）。認
識結果は、入力セマンティクス・コンバータを介して上
位のアプリケーション層に通知され、行動計画などに利
用される。The recognition system middleware receives raw data from hardware, such as image data, audio data, and detection data obtained from other sensors, via a virtual robot and processes them. That is, based on various types of input information, processing such as voice recognition, distance detection, posture detection, contact, motion detection, and color recognition is performed, and a recognition result is obtained (for example, a ball is detected, a fall is detected, ,
Hit, hear domes, detect a moving object, detect an obstacle, recognize an obstacle, etc.). The recognition result is notified to an upper application layer via an input semantics converter, and is used for an action plan or the like.

【００６５】一方、出力系のミドルウェアでは、歩行、
動きの再生、出力音の合成、目に相当するＬＥＤの点灯
制御などの機能を提供する。すなわち、アプリケーショ
ン層において立案された行動計画を出力セマンティクス
・コンバータを介して受信処理して、ロボット１の各機
能毎にロボットの各ジョイントのサーボ指令値や出力
音、出力光（ＬＥＤ）、出力音声などを生成して、出力
すなわち仮想ロボットを介してロボット１上で実演す
る。このような仕組みにより、より抽象的な行動コマン
ド（例えば、前進、後退、喜ぶ、吼える、寝る、体操す
る、驚く、トラッキングするなど）を与えることで、ロ
ボット１の各関節による動作を制御することができる。On the other hand, in the output middleware, walking,
It provides functions such as reproduction of motion, synthesis of output sound, and lighting control of an LED corresponding to an eye. That is, the action plan drafted in the application layer is received and processed via the output semantics converter, and the servo command value, output sound, output light (LED), output sound of each joint of the robot are output for each function of the robot 1. And the like, and perform a demonstration on the robot 1 via an output, that is, a virtual robot. By giving a more abstract action command (for example, forward, backward, please, bark, sleep, exercise, surprise, track, etc.) by such a mechanism, the operation of each joint of the robot 1 is controlled. Can be.

【００６６】また、図６には、アプリケーション層の内
部構成を模式的に図解している。FIG. 6 schematically illustrates the internal structure of the application layer.

【００６７】アプリケーションは、入力セマンティクス
・コンバータ経由で受け取った認識結果を使って、ロボ
ット１の行動計画を決定して、出力セマンティクス・コ
ンバータ経由で決定された行動を返すようになってい
る。The application determines the action plan of the robot 1 using the recognition result received via the input semantics converter, and returns the action determined via the output semantics converter.

【００６８】アプリケーションは、ロボット１の感情を
モデル化した感情モデルと、本能をモデル化した本能モ
デルと、外部事象とロボット１がとる行動との因果関係
を逐次記憶していく学習モジュールと、行動パターンを
モデル化した行動モデルと、行動モデルによって決定さ
れた行動の出力先を切り替える行動切替部とで構成され
る。The application includes an emotion model that models the emotions of the robot 1, an instinct model that models the instinct, a learning module that sequentially stores the causal relationship between the external event and the action taken by the robot 1, The behavior model includes a behavior model that models a pattern, and a behavior switching unit that switches an output destination of the behavior determined by the behavior model.

【００６９】入力セマンティクス・コンバータ経由で入
力される認識結果は、感情モデル、本能モデル、行動モ
デルに入力されるとともに、学習モジュールには学習教
示信号として入力される。The recognition result input via the input semantics converter is input to an emotion model, an instinct model, and an action model, and is input to the learning module as a learning instruction signal.

【００７０】行動モデルによって決定されたロボット１
の行動は、行動切替部並びに出力セマンティクス・コン
バータ経由でミドルウェアに送信され、ロボット１上で
実行される。あるいは、行動切替部を介して、行動履歴
として感情モデル、本能モデル、学習モジュールに、行
動履歴として供給される。Robot 1 determined by action model
Is transmitted to the middleware via the behavior switching unit and the output semantics converter, and is executed on the robot 1. Alternatively, the action history is supplied to the emotion model, the instinct model, and the learning module as the action history via the action switching unit.

【００７１】感情モデルと本能モデルは、それぞれ認識
結果と行動履歴を入力に持ち、感情値と本能値を管理し
ている。行動モデルは、これら感情値や本能値を参照す
ることができる。また、学習モジュールは、学習教示信
号に基づいて行動選択確率を更新して、更新内容を行動
モデルに供給する。Each of the emotion model and the instinct model has a recognition result and an action history as inputs, and manages emotion values and instinct values. The behavior model can refer to these emotion values and instinct values. The learning module updates the action selection probability based on the learning instruction signal, and supplies the updated content to the action model.

【００７２】本実施例に係る学習モジュールは、音楽デ
ータのような時系列データと、関節角度パラメータとを
関連付けて、時系列データとして学習することができ
る。時系列データの学習のために、リカレント・ニュー
ラル・ネットワークを採用する。リカレント・ニューラ
ル・ネットワークは、内部にフィードバック結合を備え
ることで、１周期前の情報をネットワーク内に持ち、こ
れによって時系列データの履歴を把握することができる
仕組みになっている。The learning module according to this embodiment can learn as time-series data by associating time-series data such as music data with joint angle parameters. A recurrent neural network is employed for learning time series data. The recurrent neural network has a mechanism in which information of one cycle before is stored in the network by providing a feedback connection inside, and the history of the time series data can be grasped by this.

【００７３】図７には、リカレント型のニューラル・ネ
ットワークの構成例を模式的に図解している。同図に示
すように、このネットワークは、入力データが入力され
るユニット群である入力層と、ネットワークの出力を出
すユニット群である出力層と、それら以外のユニット群
である中間層で構成される。FIG. 7 schematically illustrates an example of the configuration of a recurrent neural network. As shown in the figure, this network is composed of an input layer, which is a group of units to which input data is input, an output layer, which is a group of units outputting the network, and an intermediate layer, which is a group of other units. You.

【００７４】リカレント・ニューラル・ネットワークで
は、各ユニットにおける過去の出力がネットワーク内の
他のユニット（あるいは自分自身）に戻されるような結
合関係が許容される。したがって、時間に依存して各ニ
ューロンの状態が変化するような動的性質をネットワー
ク内に含めることができ、時系列パターンの認識や予測
を行うことができる。In the recurrent neural network, a connection relationship is allowed in which the past output in each unit is returned to another unit (or itself) in the network. Therefore, a dynamic property in which the state of each neuron changes depending on time can be included in the network, and time-series patterns can be recognized and predicted.

【００７５】図７に示す例では、リカレント・ニューラ
ル・ネットワークは、所定数の入力層のニューロンを有
している。各ニューロンには、センサの状態に相当する
ｓ_tと、モータすなわち関節アクチュエータの状態に相
当するｍ_tが入力されている。また、入力層のニューロ
ンの出力は、中間層のニューロンを介して、出力層のニ
ューロンに供給されている。In the example shown in FIG. 7, the recurrent neural network has a predetermined number of input layer neurons. Each neuron, and s _t corresponding to the state of the sensor, m _t corresponding to the state of the motor i.e. joint actuator is input. The output of the neuron of the input layer is supplied to the neuron of the output layer via the neuron of the intermediate layer.

【００７６】出力層のニューロンからは、リカレント・
ニューラル・ネットワークのセンサの状態に対応する出
力Ｓ_t+1が出力される。また、出力の一部は、コンテク
ストＣ_tとして、入力層のニューロンにフィードバック
されている。From the output layer neuron, the recurrent
An output _{St + 1} corresponding to the state of the sensor of the neural network is output. Also, part of the output, as the context C _t, is fed back to the neurons of the input layer.

【００７７】図示のリカレント・ニューラル・ネットワ
ークを用いた学習は、出力されたセンサの予測値Ｓ_t+1
と、実際に次の時刻で計測されたセンサの値ｓ_t+1との
誤差に基づいて、バック・プロパゲーション法により実
行される。このような学習機構により、入力されたセン
サとモータと時系列データに対して、次のセンサ情報を
予測することが可能になる。In the learning using the illustrated recurrent neural network, the output predicted value _{St + 1 of} the sensor is used.
Is executed by the back propagation method on the basis of the error from the sensor value _{st + 1} actually measured at the next time. With such a learning mechanism, the next sensor information can be predicted with respect to the input sensor, motor, and time-series data.

【００７８】図８には、リカレント・ニューラル・ネッ
トワークのインバース・ダイナミクスを示している。こ
れは、時刻ｔにおけるセンサ予測出力とコンテクストＣ
_tを与えて、時刻ｔ−１のセンサ入力とモータの状態入
力、コンテクストＣ_t-1を得るネットワーク構造であ
る。FIG. 8 shows the inverse dynamics of the recurrent neural network. This is because the sensor predicted output at time t and the context C
_This is a network structure that obtains a sensor input at time t−1, a motor state input, and a context C _{t−1 by} giving _t .

【００７９】インバース・ダイナミクスによる学習は、
図７に示したフォワード・ダイナミクスの出力を入力と
して、その出力結果とフォワード・ダイナミクスへの入
力との誤差を使って、同様にバック・プロパゲーション
法により実現する。したがって、図７に示したフォワー
ド・ダイナミクスによる学習と同時に、図８に示すイン
バース・ダイナミクスによる学習を行なうことができ
る。Learning by inverse dynamics is as follows.
Using the output of the forward dynamics shown in FIG. 7 as an input, and using the error between the output result and the input to the forward dynamics, it is similarly realized by the back propagation method. Therefore, the learning by the inverse dynamics shown in FIG. 8 can be performed simultaneously with the learning by the forward dynamics shown in FIG.

【００８０】このインバース・リカレント・ニューラル
・ネットワークを用いて、得られたセンサ入力とコンテ
クストを順次入力にフィードバックしていくことで、時
間を遡ってモータの状態を順に得ることができる。最終
的に、時刻ｔのセンサ出力ｓ _tを得るためのモータの時
系列ｍ₁，ｍ₂，…，ｍ_t-1を得ることができる。This inverse recurrent neural
・ Use the network to obtain the obtained sensor input and
Feedback to the input sequentially,
The state of the motor can be obtained in order by going back. Final
The sensor output s at time t _tTime to get the motor
Series m₁, M_Two, ..., m_t-1Can be obtained.

【００８１】図７に示すフォワード・ダイナミクスと図
８に示すインバース・ダイナミクスとを組み合わせ、さ
らにロボット制御用ソフトウェア（図４を参照のこと）
の枠組みに従ってモジュール化することより、ロボット
の学習機構を実現するリカレント・ニューラル・ネット
ワーク（ＲＮＮ）モジュールを構築することができる。
図９には、このＲＮＮモジュールの構成を図解してい
る。また、図１０には、ＲＮＮモジュールを搭載したロ
ボット制御用ソフトウェアの構成を図解している。The forward dynamics shown in FIG. 7 and the inverse dynamics shown in FIG. 8 are combined, and further, robot control software (see FIG. 4)
By modularizing according to the framework of (1), a recurrent neural network (RNN) module that realizes a robot learning mechanism can be constructed.
FIG. 9 illustrates the configuration of the RNN module. FIG. 10 illustrates the configuration of robot control software on which an RNN module is mounted.

【００８２】行動計画モジュールは、外部事象や感情モ
デル、本能モデル等に基づいてロボット１がとるべき行
動計画を立案し、コマンドすなわち各モータ（関節アク
チュエータ）に対する制御指示ｍ_tを発行するととも
に、ＲＮＮモジュールにもコマンドｍ_tを入力する。The action planning module drafts an action plan to be taken by the robot 1 based on an external event, an emotion model, an instinct model, etc., issues a command, that is, issues a control instruction m _t for each motor (joint actuator), and issues an RNN. also enter the command m _t to the module.

【００８３】姿勢管理モジュールは、コマンドｍ_tに従
って、仮想ロボットを介して、トラッキング、モーショ
ン再生、転倒復帰、歩行などの該当する動作を実現す
る。[0083] posture management module, according to the command m _t, through the virtual robot, tracking, motion playback, overturn recovery, to achieve the appropriate action such as walking.

【００８４】また、画像や音声、その他のセンサから得
られる検出データは、仮想ロボット経由で認識系のミド
ルウェア（前述）において処理され、それぞれの特徴量
が抽出される。これらセンサ特徴量Ｓ_tはＲＮＮモジュ
ールに入力される。Further, detection data obtained from images, sounds, and other sensors is processed by a middleware (described above) of a recognition system via a virtual robot, and the respective feature amounts are extracted. These sensors feature amount S _t is input to the RNN module.

【００８５】ＲＮＮモジュールは、学習フェーズでは、
コマンドｍ_tとセンサ特徴量Ｓ_tという２つの入力を用い
て、フォワード・モデル及びインバース・モデルの学習
を行う。In the learning phase, the RNN module
Using two inputs of commands m _t and a sensor feature amount S _t, performs learning of the forward model and inverse model.

【００８６】行動計画モジュールは、ＲＮＮモジュール
のフォワード・ダイナミクスからの出力として、次の時
刻におけるセンサの予測値Ｓ_t+1とコンテクストＣ_t+1を
観測することができる。学習フェーズでは、行動計画モ
ジュールは、自らの行動計画に基づいて、ロボット１の
行動を決定する。[0086] Action Plan module can as the output from the forward dynamics RNN module, to observe the predicted value S _{t + 1} and the context C _{t + 1} of the sensors at the next time. In the learning phase, the action plan module determines the action of the robot 1 based on its own action plan.

【００８７】これに対し、ＲＮＮモジュールによりロボ
ット１の学習がある程度進行した状態では、行動計画モ
ジュールは、ＲＮＮモジュールによるセンサ予測値Ｓ
_t+1とコンテクストＣ_t+1を必要に応じて内部状態に関連
付けて記憶する、という作業を行う。On the other hand, in a state where the learning of the robot 1 has progressed to some extent by the RNN module, the action planning module uses the sensor prediction value S
An operation of storing _{t + 1} and the context C _{t + 1} in association with the internal state as necessary is performed.

【００８８】そして、行動計画モジュールの内部に記憶
されたセンサ値とコンテクストを想起させる必要が発生
したときには、行動計画モジュールは、想起させたいセ
ンサ値ＳとコンテクストＣを取り出して、ＲＮＮモジュ
ールのインバース・モデルに対してこれを与える。これ
に対し、ＲＮＮモジュールでは、これらセンサ値Ｓ及び
コンテクストＣの入力を実現するアクションの時系列
を、インバース・ダイナミクス（図８を参照のこと）を
用いて順次計算し、この計算結果を姿勢管理モジュール
に送信する。この結果、行動計画モジュールが期待する
入力が得られるように、ロボット１が行動を行うように
なる。When it becomes necessary to recall the sensor values and the context stored in the action plan module, the action plan module extracts the sensor value S and the context C to be recalled and takes the inverse value of the RNN module. Give this to the model. On the other hand, the RNN module sequentially calculates the time series of the action for realizing the input of the sensor value S and the context C using inverse dynamics (see FIG. 8), and calculates the result of the attitude management. Send to module. As a result, the robot 1 acts so that the input expected by the action plan module is obtained.

【００８９】本実施例に係るロボットは、上述したよう
に、カメラを始めとする知覚センサと、学習機構として
のリカレント・ニューラル・ネットワークを備えてい
る。そして、ロボット自身の持つ制御可能な部分によっ
て外界の移動可能な対象物を動かし、知覚センサによっ
て対象物のおかれている環境と、対象物の動きを知覚し
て、ロボットの各関節部の動かし方と対象物の動きのと
の関連を学習するようになっている。また、対象物の動
きを予測して、ノベルティ・リワーディングにより対象
物を動かすモーションを自己学習することができる。As described above, the robot according to the present embodiment is provided with a sensor such as a camera and a recurrent neural network as a learning mechanism. Then, the movable object in the outside world is moved by the controllable part of the robot itself, and the perception sensor perceives the environment where the object is placed and the movement of the object, and moves each joint of the robot. It learns the relationship between the person and the movement of the object. In addition, the motion of the object can be predicted by predicting the motion of the object and the motion of moving the object can be self-learned by novelty rewarding.

【００９０】ノベルティ・リワーディングとは、要する
にセンサ予測値がセンサの実測値とかけ離れているほど
高い報酬を与えるように報酬の誤差を設定して学習を行
う方式である。The novelty rewarding is a method in which learning is performed by setting a reward error so as to give a higher reward as the predicted value of the sensor is far from the measured value of the sensor.

【００９１】図１０に示したＲＮＮモジュールのうち、
インバースＲＮＮによる学習を行う代わりに、図１１に
示すようにリカレント・ニューラル・ネットワークの入
力に対する報酬を出力するようなＲＮＮモジュールを用
意する。In the RNN module shown in FIG.
Instead of performing learning by inverse RNN, an RNN module that outputs a reward for the input of the recurrent neural network is prepared as shown in FIG.

【００９２】各関節アクチュエータへのコマンドｍ
_tと、各部からのセンサ出力値（例えばカメラからの入
力画像）Ｓ_tが入力として与えられ、センサの予測値
（例えば、次の場面でのカメラの入力画像の予測）Ｓ
_t+1が出力される。センサ予測値は、センサ実測値と比
較され、両者の差が大きいほど大きくなる報酬Ｒ_t+1が
得られ、次の関節アクチュエータのコマンドに影響を与
える。また、出力の一部は、コンテクストとして、コン
テクスト・ループによりフィードバックされる。Command m to each joint actuator
_t and a sensor output value (for example, an input image from a camera) S _{t from each unit} are given as inputs, and a sensor prediction value (for example, a prediction of a camera input image in the next scene)
_{t + 1} is output. The sensor predicted value is compared with the sensor measured value, and a reward _{Rt + 1} that increases as the difference between the sensor predicted value and the sensor predicted value increases is obtained, which affects the command of the next joint actuator. A part of the output is fed back as a context through a context loop.

【００９３】図１１に示すようなＲＮＮモジュールの出
力の評価は、同時に出力されたセンサ予測値がセンサの
実測値とかけ離れているほど高い報酬を与えるように報
酬の誤差を設定して学習を行う。したがって、このＲＮ
Ｎモジュールにとって不測の事態が起きるような行動を
取ると、高い報酬を得ることができる。The evaluation of the output of the RNN module as shown in FIG. 11 is performed by setting a reward error so that the higher the simultaneously outputted sensor predicted value is far from the actual measured value of the sensor, the higher the reward is given. . Therefore, this RN
If an action is taken that causes an unexpected situation for the N module, a high reward can be obtained.

【００９４】図１２には、ノベルティ・リワーディング
を採り入れたＲＮＮモジュールの構成例を図解してい
る。図示のような構成によれば、リカレント・ニューラ
ル・ネットワークによる学習が進んだ状態で、行動計画
モジュールが幾つかの行動の候補をＲＮＮモジュールに
与えると、このような行動を与えた結果の報酬を予測し
て、モータ動作選択部では報酬が最も大きくなるような
モータ動作すなわち行動計画を選ぶことができる。FIG. 12 illustrates a configuration example of an RNN module incorporating novelty rewarding. According to the configuration shown in the figure, when the action planning module gives some action candidates to the RNN module in a state where learning by the recurrent neural network has advanced, a reward resulting from giving such actions is given. Predictably, the motor operation selection unit can select a motor operation, that is, an action plan that maximizes the reward.

【００９５】したがって、ノベルティ・リワーディング
機構をリカレント・ニューラル・ネットワークに採り入
れることにより、ロボット１は今までに経験したことの
ない行動や外界の状態（センサ入力）を選ぶことがで
き、この結果、ロボット１の行動範囲が拡大する。Therefore, by incorporating the novelty rewarding mechanism into the recurrent neural network, the robot 1 can select a behavior or an external state (sensor input) that has not been experienced before, and as a result, The range of action 1 is expanded.

【００９６】図１３には、ノベルティ・リワーディング
を採り入れたロボット１の動作制御システムの構成う
ち、特にモーションの創発と学習に関連する部分を中心
に図解している。FIG. 13 illustrates the configuration of the operation control system of the robot 1 adopting novelty rewarding, particularly focusing on parts related to emergence and learning of motion.

【００９７】ロボット１は、身体すなわち各関節アクチ
ュエータを自由に制御することができ、目の前の対象物
を含む環境情報を入力するための視覚センサと、外部事
象と動作との因果関係を学習するとともに、予期しない
状況に対処した動作を生成するための機構を備えてい
る。図示の通り、該制御システムでは、学習並びに予期
しない状況への対処のために、予測機構と学習機構を備
えている。予測機構と学習機構は、いずれも図７に示す
ようなリカレント・ニューラル・ネットワークで構成さ
れる。The robot 1 can freely control the body, that is, each joint actuator, learns a visual sensor for inputting environmental information including an object in front of the user, and learns a causal relationship between an external event and an action. And a mechanism for generating an operation corresponding to an unexpected situation. As shown, the control system includes a prediction mechanism and a learning mechanism for learning and coping with unexpected situations. Both the prediction mechanism and the learning mechanism are configured by a recurrent neural network as shown in FIG.

【００９８】予測機構は、各関節アクチュエータごとに
配置された関節角度センサからの関節角度データと、カ
メラなどの視覚センサから得られた画像データとをそれ
ぞれ入力して、作業環境における次のシーン、すなわち
次の時刻における画像データをフォワード・ダイナミク
スにより予測して、後段の比較器に出力する。The prediction mechanism inputs joint angle data from a joint angle sensor arranged for each joint actuator and image data obtained from a visual sensor such as a camera, and inputs the next scene in the work environment, That is, the image data at the next time is predicted by the forward dynamics and output to the comparator at the subsequent stage.

【００９９】比較器は、図１１に示すようなノベルティ
・リワーディング機構で構成されており、予測機構から
出力される予測された次の時刻の画像データと、視覚セ
ンサによって得られた次の時刻における実際の画像デー
タを、それぞれ入力して比較を行う。The comparator is constituted by a novelty-rewarding mechanism as shown in FIG. 11, and the image data of the next predicted time outputted from the prediction mechanism and the next time obtained by the visual sensor are used. Actual image data is input and compared.

【０１００】そして、比較器は、視覚センサの予測値が
視覚センサの実測値とかけ離れているほど高い報酬を与
えるように報酬の誤差を設定して、次の各関節アクチュ
エータの動かし方すなわち行動計画を学習機構に出力す
る。The comparator sets the error of the reward so as to give a higher reward as the predicted value of the visual sensor is far from the actually measured value of the visual sensor, and moves the next joint actuator, that is, the action plan. Is output to the learning mechanism.

【０１０１】学習機構は、関節角度センサからの関節角
度データ、ならびに、視覚センサからの画像データを入
力して、フォワード・ダイナミクスにより、比較器が出
力する報酬に従って自己学習を行う。The learning mechanism inputs the joint angle data from the joint angle sensor and the image data from the visual sensor, and performs self-learning by forward dynamics according to the reward output from the comparator.

【０１０２】本実施例に係るロボット１は、作業環境内
にある適当な対象物を選択して、その対象物に対して、
ロボット１の動作と対象物の動かし方の学習を始めるよ
うになっている。The robot 1 according to the present embodiment selects an appropriate object in the working environment, and
The learning of the operation of the robot 1 and how to move the object is started.

【０１０３】このようなロボット１の動作段階は、ある
対象物に対してまったく経験を持たない状態の「学習フ
ェーズ」と、ロボット１の動作と対象物の動きに関する
１つ又はそれ以上の関係を既に学習している状態の「新
規性探索フェーズ」に区分される。The operation phase of the robot 1 includes a “learning phase” in which the user has no experience with a certain object and one or more relations between the operation of the robot 1 and the motion of the object. It is divided into the “newness search phase” in a state where it has already been learned.

【０１０４】図１４には、ロボット１が動作段階を切り
替えるための手順をフローチャートの形式で図解してい
る。FIG. 14 illustrates, in the form of a flowchart, a procedure for the robot 1 to switch the operation stage.

【０１０５】ロボット１が目の前にある対象物（例えば
ボール）を発見し、その動かし方を学習するという場面
が、作業途上で発生したとする（ステップＳ１）。It is assumed that a scene in which the robot 1 finds an object (for example, a ball) in front of the user and learns how to move the object occurs during the work (step S1).

【０１０６】このような場合、まず、その対象物の動か
し方を既に学習したことがあるか否かを判別する（ステ
ップＳ２）。そして、既に学習していれば、新規性探索
フェーズに遷移し（ステップＳ３）、未だ学習したこと
がなければ、学習フェーズに遷移する（ステップＳ
４）。In such a case, first, it is determined whether or not the method of moving the object has already been learned (step S2). If the learning has already been performed, the process transitions to the novelty search phase (step S3). If the learning has not yet been performed, the process transitions to the learning phase (step S3).
4).

【０１０７】図１５には、歩行ロボット１が、対象物と
してのボールの動かし方を初めて学習するとき、すなわ
ち学習フェーズにおける動作の様子を図解している。FIG. 15 illustrates the behavior of the walking robot 1 when learning how to move a ball as an object for the first time, that is, in the learning phase.

【０１０８】（１）ロボット１は、初めての対象物に遭
遇すると、与えられたパラメータ、若しくは得られた画
像データに基づいて、ロボット１自身と対象物との位置
関係を把握して、対象物に接触する可能性の高いパラメ
ータによって各脚部を駆動して、対象物を動かしてみ
る。(1) When the robot 1 encounters an object for the first time, the robot 1 grasps the positional relationship between the robot 1 itself and the object based on given parameters or obtained image data, and Each leg is driven by a parameter that is likely to come into contact with the object to move the object.

【０１０９】（２）対象物に接触した結果、対象物が動
いた場合、何度か同様の動作を繰り返して、対象物の動
き方に再現性があるか否かを検証する。例えば、対象物
がボールである場合には、同じような蹴り方を数回繰り
返すことにより、同じ方向に安定して転がるという現象
を導出することができる。(2) When the object moves as a result of contact with the object, the same operation is repeated several times to verify whether the movement of the object has reproducibility. For example, when the target object is a ball, a phenomenon of stably rolling in the same direction can be derived by repeating the same kicking method several times.

【０１１０】（３）他方、対象物の動き方に再現性を見
出せなかった場合、例えば数回ボールを繰り返し蹴って
も安定して転がらない場合には、最初のステージに戻っ
て、対象物への接触並びに動かし方を変更してみる。例
えば、対象物に対するロボット１の初期位置、ロボット
１の動作パターンの組み合わせ方を変更して、再度対象
物を動かしてみる。(3) On the other hand, if no reproducibility can be found in the manner of movement of the object, for example, if the ball does not roll stably even if the ball is repeatedly kicked several times, the process returns to the first stage and returns to the object. Try changing the contact and the way of moving. For example, the initial position of the robot 1 with respect to the target object and the combination of the operation patterns of the robot 1 are changed, and the target object is moved again.

【０１１１】（４）対象物の初期位置とロボット１の動
作、それによって得られる対象物の動き方に対して再現
性を確認することができた場合には、ロボット１は、そ
の初期位置、ロボット１の動作、対象物の動き方の組み
合わせを学習データとして記憶する。(4) If the reproducibility of the initial position of the object and the operation of the robot 1 and the movement of the object obtained by the operation can be confirmed, the robot 1 sets the initial position, The combination of the operation of the robot 1 and the manner of movement of the object is stored as learning data.

【０１１２】上述したような学習フェーズにおいては、
ロボット１は、未知の対象物に対して１つ目の動かし方
を学習することができる訳である。In the learning phase as described above,
That is, the robot 1 can learn the first method of moving the unknown object.

【０１１３】また、図１６には、ロボット１が初期状態
と対象物の一連の動き方を認識するための、視覚センサ
によって得られる画像の例を示している。FIG. 16 shows an example of an image obtained by the visual sensor for the robot 1 to recognize the initial state and a series of movements of the object.

【０１１４】ロボット１と対象物の初期相対位置情報
は、ロボット１の動作前の状態で撮像された画像データ
を基に得ることができる。The initial relative position information between the robot 1 and the object can be obtained based on image data captured in a state before the operation of the robot 1.

【０１１５】また、ロボット１の動作後すなわち動作に
より対象物が動いた結果は、ロボット１の動作後の状態
で撮像された画像データを基に得ることができる。Further, the result of the movement of the object after the operation of the robot 1, that is, the result of the operation, can be obtained based on the image data captured in the state after the operation of the robot 1.

【０１１６】これらロボット１の動作前後における各画
像データを、リカレント・ニューラル・ネットワークの
入力データとすることで、計算コストを軽減するととも
に、画像データのうち対象物以外の部分によって得られ
る環境情報も同時にニューラル・ネットワークの学習デ
ータとして入力することができる。この結果、ＲＮＮモ
ジュールは、対象物の動きと環境の様子を学習すること
ができる。By using each image data before and after the operation of the robot 1 as input data of the recurrent neural network, the calculation cost can be reduced, and environmental information obtained by a portion other than the object in the image data can be obtained. At the same time, it can be input as learning data of the neural network. As a result, the RNN module can learn the movement of the object and the state of the environment.

【０１１７】リカレント・ニューラル・ネットワーク
は、コンテクスト・ループを特徴とするニューラル・ネ
ットワークであり、時系列事象の学習を行うことができ
る（前述、並びに図７を参照のこと）。The recurrent neural network is a neural network characterized by a context loop, and is capable of learning a time-series event (see the above and FIG. 7).

【０１１８】本実施例では、学習フェーズにおいて、ロ
ボット１の動作前の画像データと各関節アクチュエータ
の動きパラメータをリカレント・ニューラル・ネットワ
ークへの入力とし、ロボットの動作後に得られる画像デ
ータを教示データとする。このようなリカレント・ニュ
ーラル・ネットワークにより学習がなされていれば、ロ
ボット１は、動作前の画像データを得ることによって、
例えばボールの蹴り方などの対象物に印加する動作を１
つ決めることで、対象物のその後の動き方を予測するこ
とができる。In the present embodiment, in the learning phase, the image data before the operation of the robot 1 and the motion parameters of each joint actuator are input to the recurrent neural network, and the image data obtained after the operation of the robot is used as the teaching data. I do. If learning is performed by such a recurrent neural network, the robot 1 obtains image data before operation,
For example, the action to be applied to the target such as how to kick the ball
By deciding, it is possible to predict the subsequent movement of the object.

【０１１９】上述したような学習フェーズを経て、ロボ
ット１がある対象物に関する１以上の動かし方を学習し
た後、ロボット１は新規性探索フェーズに遷移する。図
１７には、新規性探索フェーズにおける動作手順をフロ
ーチャートの形式で示している。以下、このフローチャ
ートに従って、新規性探索フェーズについて説明する。After the robot 1 has learned one or more ways of moving a certain object through the learning phase as described above, the robot 1 transitions to the novelty search phase. FIG. 17 is a flowchart illustrating an operation procedure in the novelty search phase. Hereinafter, the novelty search phase will be described with reference to this flowchart.

【０１２０】ロボット１は、既に１回以上の学習がなさ
れている。ここでは、ノベルティ・リワーディングに基
づき対象物を動かしてみる（ステップＳ１１）。すなわ
ち、ＲＮＮモジュールが返す報酬Ｒが大きくなりそうな
動作を選択して、そのパラメータに従って各関節アクチ
ュエータを駆動して、対象物を動かしてみる。The robot 1 has already learned one or more times. Here, the target object is moved based on novelty rewarding (step S11). That is, an operation in which the reward R returned by the RNN module is likely to be large is selected, and each joint actuator is driven according to the parameter to move the object.

【０１２１】次いで、各関節アクチュエータを駆動させ
た結果、対象物に接触してこれを動かすことができたか
否かをチェックする（ステップＳ１２）。Next, as a result of driving each joint actuator, it is checked whether or not the object can be moved in contact with the object (step S12).

【０１２２】対象物に接触していなかったり、あるいは
対象物に充分な力を印加することができず、動かすこと
ができなかった場合には、ステップＳ１１に戻り、制御
パラメータを変更して試行を繰り返す。If the object is not in contact with the object, or if the object cannot be moved with sufficient force, the process returns to step S11 to change the control parameters and execute a trial. repeat.

【０１２３】他方、対象物を動かすことができたなら
ば、ロボット１の脚部など各関節アクチュエータの制御
パラメータや、対象物を撮像した画像データを基に、リ
カレント・ニューラル・ネットワークによって対象物の
動きを予測し（ステップＳ１３）、さらに、予測される
動き方と現実に観察される対象物の動きとを比較する
（ステップＳ１４）。On the other hand, if the object can be moved, the object can be moved by the recurrent neural network based on the control parameters of each joint actuator such as the leg of the robot 1 and the image data of the image of the object. The motion is predicted (step S13), and the predicted motion is compared with the motion of the object actually observed (step S14).

【０１２４】対象物がリカレント・ニューラル・ネット
ワークによる予測に近い動きをした場合、新規性が低い
と判断して、ロボット１は、ステップＳ１１に戻り、制
御パラメータを変更して違う動き方を試行する。If the target object moves close to the prediction by the recurrent neural network, the robot 1 determines that the novelty is low, and returns to step S11 to change the control parameters and try a different way of moving. .

【０１２５】他方、対象物が予測とは大きく相違する動
きをした場合には、ロボット１は、さらに数回同様の動
きを繰り返して、その動きに再現性があるか否かを検証
する（ステップＳ１５）。On the other hand, if the target object makes a motion that is significantly different from the predicted motion, the robot 1 repeats the same motion several times to verify whether the motion has reproducibility (step S1). S15).

【０１２６】そして、再現性があると確認された場合に
は、その結果をリカレント・ニューラル・ネットワーク
に学習する（ステップＳ１６）。When it is confirmed that there is reproducibility, the result is learned to the recurrent neural network (step S16).

【０１２７】図１８には、２回目以降すなわち新規性探
索フェーズにおいて、歩行ロボット１が対象物としての
ボールを動かし方と対象物の動きを学習する様子を図解
している。FIG. 18 illustrates how the walking robot 1 learns how to move a ball as an object and the movement of the object in the second and subsequent times, that is, in the novelty search phase.

【０１２８】（１）ロボット１は、既に知っているボー
ルなどの対象物に遭遇すると、対象物に対して既に学習
した動かし方を適用して、ＲＮＮモジュールが返す報酬
Ｒを予測して、報酬の大きな行動を選択して、その場合
における対象物の動きを予測する。(1) When the robot 1 encounters an object such as a ball which is already known, the robot 1 applies the movement learned previously to the object, predicts a reward R returned by the RNN module, and Is selected, and the motion of the object in that case is predicted.

【０１２９】（２）同様の動かし方を数回繰り返して適
用して、予測した通りに対象物が動くかどうかを検証す
る。(2) The same moving method is repeated several times and applied to verify whether the object moves as predicted.

【０１３０】（３）数回繰り返して対象物を動かしてみ
て、予測通りに動く場合には、最初のステージに戻っ
て、対象物への接触並びに動かし方を変更して、より高
い報酬が得られるように新規性探索を行う。(3) If the object is repeatedly moved several times and moves as expected, the process returns to the first stage, and the contact and movement of the object are changed to obtain a higher reward. Search for novelty so that

【０１３１】（４）同様の動かし方を数回繰り返して適
用しても、予測した通りに対象物が動かない場合には、
新規性があると判断して、その動かし方と対象物の動き
方を、リカレント・ニューラル・ネットワークに学習す
る。(4) If the object does not move as predicted even if the same method of moving is repeated several times,
When it is determined that there is novelty, how to move the object and how to move the object are learned from the recurrent neural network.

【０１３２】なお、図１５や図１８に示す例では、対象
物をボールのような球形を用いて説明したが、本発明の
要旨はこれに限定されない。例えば、図１９に例示する
ように、立方体、三角柱、三角錐、円錐、円柱、あるい
はその他の多面体など、さまざまな形状の対象物に対し
ても、上述したような本実施例に係るノベルティ・リワ
ーディングを利用したリカレント・ニューラル・ネット
ワークによる学習を適用することができる。In the examples shown in FIGS. 15 and 18, the object has been described using a sphere such as a ball, but the gist of the present invention is not limited to this. For example, as illustrated in FIG. 19, the novelty rewarding according to the present embodiment as described above can be applied to objects having various shapes such as a cube, a triangular prism, a triangular pyramid, a cone, a cylinder, and other polyhedrons. The learning by the recurrent neural network using is applicable.

【０１３３】また、ロボット１や対象物が設置される床
面が図２０に示すように区々である場合や、その他、ロ
ボット１の作業環境が異なる場合であっても、同様に、
上述したノベルティ・リワーディングを利用したリカレ
ント・ニューラル・ネットワークによる学習を適用する
ことができることも理解されたい。Even when the floor surface on which the robot 1 and the object are placed is different as shown in FIG. 20, or when the working environment of the robot 1 is different, the same applies.
It should also be understood that learning by a recurrent neural network using novelty rewarding described above can be applied.

【０１３４】［追補］以上、特定の実施例を参照しなが
ら、本発明について詳解してきた。しかしながら、本発
明の要旨を逸脱しない範囲で当業者が該実施例の修正や
代用を成し得ることは自明である。[Supplement] The present invention has been described in detail with reference to the specific embodiments. However, it is obvious that those skilled in the art can modify or substitute the embodiment without departing from the scope of the present invention.

【０１３５】本発明の要旨は、必ずしも「ロボット」と
称される製品には限定されない。すなわち、電気的若し
くは磁気的な作用を用いて人間の動作に似せた運動を行
う機械装置であるならば、例えば玩具等のような他の産
業分野に属する製品であっても、同様に本発明を適用す
ることができる。The gist of the present invention is not necessarily limited to products called “robots”. That is, as long as the mechanical device performs a motion similar to a human motion by using an electric or magnetic action, the present invention similarly applies to a product belonging to another industrial field such as a toy. Can be applied.

【０１３６】要するに、例示という形態で本発明を開示
してきたのであり、限定的に解釈されるべきではない。
本発明の要旨を判断するためには、冒頭に記載した特許
請求の範囲の欄を参酌すべきである。In short, the present invention has been disclosed by way of example, and should not be construed as limiting.
In order to determine the gist of the present invention, the claims described at the beginning should be considered.

【０１３７】[0137]

【発明の効果】以上詳記したように、本発明によれば、
リカレント・ニューラル・ネットワークを用いて時系列
的な学習・教示作用を実現する、脚式ロボットに対する
優れた学習システム及び学習方法を提供することができ
る。As described above in detail, according to the present invention,
An excellent learning system and learning method for a legged robot that realizes a time-series learning / teaching operation using a recurrent neural network can be provided.

【０１３８】また、本発明によれば、リカレント・ニュ
ーラル・ネットワークによって対象物の動きを予測し、
ノベルティ・リワーディングによって所定の対象物を動
かすための多様なモーションを自己学習することができ
る、脚式ロボットに対する優れた学習システム及び学習
方法を提供することができる。Further, according to the present invention, the motion of an object is predicted by a recurrent neural network,
An excellent learning system and method for a legged robot that can self-learn various motions for moving a predetermined object by novelty rewarding can be provided.

【０１３９】また、本発明によれば、リカレント・ニュ
ーラル・ネットワークによって現在の環境に応じた新し
い動作を生み出すことができ、予期しない状況に対処
し、多様な表現を可能にする、脚式ロボットのための学
習システム及び学習方法を提供することができる。Further, according to the present invention, a new motion according to the current environment can be generated by the recurrent neural network, and an unexpected situation can be dealt with and various expressions can be performed. Learning system and learning method can be provided.

【０１４０】本発明によれば、ロボットは、モーション
の創発と学習によって、現在置かれている環境に応じた
新しい動作を生み出すことができ、多様な表現を可能に
する。この結果、ロボットの環境適応能力や行動範囲が
拡大し、動作による表現の多様性を確保することができ
る。According to the present invention, the robot can generate a new motion according to the environment in which the robot is located by generating and learning a motion, thereby enabling various expressions. As a result, the robot's ability to adapt to the environment and the range of action can be expanded, and the variety of expressions by movement can be ensured.

【０１４１】また、本発明によれば、ロボットのモーシ
ョンの多様性に無限の可能性を与えることができる結果
として、ユーザを長い間楽しませることができる。Further, according to the present invention, it is possible to give a user a long time as a result of giving endless possibilities to the variety of motions of the robot.

【０１４２】また、本発明によれば、環境に応じたモー
ションの創発が可能となるので、あらかじめロボットに
対してモーションのインプットを行う必要がなくなる。
また、環境に適応したモーションを生成することができ
るので、各ユーザの環境毎に多様な動作をするロボット
を実現することができる。Further, according to the present invention, since it is possible to generate a motion according to the environment, it is not necessary to input a motion to the robot in advance.
In addition, since a motion adapted to the environment can be generated, a robot that performs various operations for each environment of each user can be realized.

[Brief description of the drawings]

【図１】本発明を実施に供される、四肢による脚式歩行
を行う歩行ロボット１の外観構成を示した図である。FIG. 1 is a diagram illustrating an external configuration of a walking robot 1 that performs legged walking with limbs, according to an embodiment of the present invention.

【図２】脚式歩行ロボット１の電気・制御系統の構成図
を模式的に示した図である。FIG. 2 is a diagram schematically showing a configuration diagram of an electric / control system of the legged walking robot 1.

【図３】制御部２０の構成をさらに詳細に示した図であ
る。FIG. 3 is a diagram illustrating the configuration of a control unit 20 in further detail.

【図４】ロボット１上で稼動するソフトウェア制御構成
を模式的に示した図である。FIG. 4 is a diagram schematically showing a software control configuration operating on the robot 1.

【図５】ミドルウェア層の内部構成を模式的に示した図
である。FIG. 5 is a diagram schematically showing an internal configuration of a middleware layer.

【図６】アプリケーション層の内部構成を模式的に示し
た図である。FIG. 6 is a diagram schematically showing an internal configuration of an application layer.

【図７】リカレント型のニューラル・ネットワーク（フ
ォワード・ダイナミクス）の構成例を模式的に示した図
である。FIG. 7 is a diagram schematically illustrating a configuration example of a recurrent neural network (forward dynamics).

【図８】リカレント・ニューラル・ネットワークのイン
バース・ダイナミクスを示した図である。FIG. 8 is a diagram showing the inverse dynamics of a recurrent neural network.

【図９】ロボット制御用ソフトウェアの枠組みに従って
モジュール化されたＲＮＮモジュールの構成を示した図
である。FIG. 9 is a diagram showing a configuration of an RNN module modularized according to a framework of robot control software.

【図１０】図１０に示すＲＮＮモジュールを搭載したロ
ボット制御用ソフトウェアの構成を示した図である。FIG. 10 is a diagram showing a configuration of robot control software on which the RNN module shown in FIG. 10 is mounted.

【図１１】ノベルティ・リワーディングの仕組みを模式
的に示した図である。FIG. 11 is a diagram schematically showing a novelty rewarding mechanism.

【図１２】ノベルティ・リワーディングを採り入れたＲ
ＮＮモジュールの構成例を示した図である。FIG. 12: R incorporating novelty rewarding
FIG. 3 is a diagram illustrating a configuration example of an NN module.

【図１３】ノベルティ・リワーディングを採り入れたロ
ボットの動作制御システムの構成を模式的に示した図で
ある。FIG. 13 is a diagram schematically showing a configuration of a motion control system of a robot adopting novelty rewarding.

【図１４】ロボット１が動作段階を切り替えるための手
順を示したフローチャートである。FIG. 14 is a flowchart showing a procedure for the robot 1 to switch operation stages.

【図１５】歩行ロボット１が、対象物としてのボールの
動かし方を初めて学習する一連の動作（すなわち学習フ
ェーズにおける動作の様子）を描写した図である。FIG. 15 is a diagram illustrating a series of operations in which the walking robot 1 learns how to move a ball as an object for the first time (ie, an operation state in a learning phase).

【図１６】ロボット１が初期状態と対象物の一連の動き
方を認識するための、視覚センサによって得られる画像
の例を示した図である。FIG. 16 is a diagram showing an example of an image obtained by a visual sensor for the robot 1 to recognize an initial state and a series of movements of an object.

【図１７】新規性探索フェーズにおける動作手順を示し
たフローチャートである。FIG. 17 is a flowchart showing an operation procedure in a novelty search phase.

【図１８】歩行ロボット１が、新規性探索フェーズにお
ける対象物としてのボールの動かし方をを描写した図で
ある。FIG. 18 is a diagram illustrating how the walking robot 1 moves a ball as an object in the novelty search phase.

【図１９】ロボットや対象物が設置される環境の相違を
例示した図である。FIG. 19 is a diagram illustrating a difference between environments in which a robot and an object are installed.

【図２０】ロボットや対象物が設置される環境の相違を
例示した図である。FIG. 20 is a diagram illustrating a difference between environments in which a robot and an object are installed.

[Explanation of symbols]

１…歩行ロボット２…胴体部ユニット３…頭部ユニット４…尻尾６Ａ〜６Ｄ…脚部ユニット７…首関節８…尻尾関節９Ａ〜９Ｄ…大腿部ユニット１０Ａ〜１０Ｄ…脛部ユニット１１Ａ〜１１Ｄ…股関節１２Ａ〜１２Ｄ…膝関節１５…ＣＣＤカメラ１６…マイクロフォン１７…スピーカ１８…タッチセンサ２０…制御部２１…ＣＰＵ２２…ＲＡＭ２３…ＲＯＭ２４…不揮発メモリ２５…インターフェース２６…無線通信インターフェース２７…ネットワーク・インターフェース・カード２８…バス２９…キーボード４０…入出力部５０…駆動部５１…モータ（関節アクチュエータ）５２…エンコーダ（関節角度センサ）５３…ドライバ６０…電源部６１…充電バッテリ０２…充放電制御部 DESCRIPTION OF SYMBOLS 1 ... Walking robot 2 ... Torso unit 3 ... Head unit 4 ... Tail 6A-6D ... Leg unit 7 ... Neck joint 8 ... Tail joint 9A-9D ... Thigh unit 10A-10D ... Shin unit 11A-11D ... hip joints 12A to 12D ... knee joints 15 ... CCD cameras 16 ... microphones 17 ... speakers 18 ... touch sensors 20 ... control units 21 ... CPU 22 ... RAM 23 ... ROM 24 ... nonvolatile memories 25 ... interfaces 26 ... wireless communication interfaces 27 ... networks -Interface card 28-Bus 29-Keyboard 40-Input / output unit 50-Drive unit 51-Motor (joint actuator) 52-Encoder (joint angle sensor) 53-Driver 60-Power supply unit 61-Charging battery 02-Charge / discharge control Department

───────────────────────────────────────────────────── フロントページの続き (72)発明者伊藤真人東京都品川区北品川６丁目７番35号ソニー株式会社内 (72)発明者横野順東京都品川区北品川６丁目７番35号ソニー株式会社内Ｆターム(参考） 3F059 AA00 BA02 BB06 BC07 BC09 CA05 CA06 DA02 DA05 DA08 DB02 DB09 DD01 DD06 FA03 FA05 FA08 FA10 FC07 FC14 FC15 3F060 AA00 CA14 CA26 GA05 GA13 HA02 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Masato Ito 6-7-35 Kita-Shinagawa, Shinagawa-ku, Tokyo Inside Sony Corporation (72) Inventor Jun Jun-Yokono 7-35, Kita-Shinagawa, Shinagawa-ku, Tokyo Sony Corporation F term (reference) 3F059 AA00 BA02 BB06 BC07 BC09 CA05 CA06 DA02 DA05 DA08 DB02 DB09 DD01 DD06 FA03 FA05 FA08 FA10 FC07 FC14 FC15 3F060 AA00 CA14 CA26 GA05 GA13 HA02

Claims

[Claims]

1. A learning system for a robot comprising a plurality of joints, comprising: control means for controlling the operation of the robot by driving each joint so as to move an object in a work space; Perception means for detecting an event occurring above; learning for learning a motion of a robot expressed by the control means, a method of moving an object at the time of the operation, and an event perceived by the perception means to a recurrent neural network. Means for learning a robot.

2. The learning means includes a prediction unit using a recurrent neural network, and a learning unit using a recurrent neural network, wherein the prediction unit performs an operation expressed by the control unit and the perception. The learning unit predicts an event at the next time on the basis of an event perceived by the means, and when the predicted event differs from an event actually perceived at the next time in the perception unit, The learning system for a robot according to claim 1, wherein an event perceived as a motion is learned.

3. The learning means according to claim 1, wherein said learning means includes a learning phase having no experience of moving the object and a new learning state learning one or more relations between the operation of the robot and the movement of the object. The learning system for a robot according to claim 1, further comprising a sex search phase.

4. In the learning phase, the same robot operation is applied to the target object a predetermined number of times, and when the reproducibility of the robot operation and the target object movement is confirmed, the robot 4. The learning system for a robot according to claim 3, wherein the learning system learns an initial position and an action and how to move the object.

5. If the reproducibility of the movement of the target object cannot be confirmed even after trying the operation of the robot a predetermined number of times,
5. The learning system for a robot according to claim 4, wherein the retry is performed by changing a combination of the initial position and the motion of the robot.

6. The novelty search phase predicts a movement of the object with respect to the initial position and the motion of the robot, and the predicted movement of the object matches the movement of the object perceived by the perception means. 4. The learning system for a robot according to claim 3, wherein when different, the novelty is recognized, and the initial position and motion of the robot and how to move the object are learned.

7. When the novelty is recognized, the same robot operation is applied to the target object a predetermined number of times, and the reproducibility of the robot operation and the target object movement can be confirmed. 7. The learning system for a robot according to claim 6, wherein, in such a case, the initial position and motion of the robot and how to move the object are learned.

8. A learning method for a robot composed of a plurality of joints, comprising: a control step of driving each joint so as to move an object in a work space to control the operation of the robot; A perception step of detecting an event occurring above, a learning of learning a motion of the robot expressed by the control step, a method of moving an object at the time of the operation, and an event perceived by the perception step to a recurrent neural network. A learning method for a robot, comprising:

9. The learning step includes a prediction sub-step using a recurrent neural network and a learning sub-step using a recurrent neural network, and the prediction sub-step includes an operation expressed by the control step. And predicting an event at the next time based on the event perceived by the perception step, and in the learning sub-step, when the predicted event is different from the event actually perceived at the next time in the perception step. 9. The learning method for a robot according to claim 8, further comprising learning an event perceived as a motion of the robot.

10. The learning step according to claim 1, wherein the learning step includes a learning phase in which the user has no experience in how to move the object and a new learning phase in which one or more relations between the operation of the robot and the movement of the object are learned. The learning method for a robot according to claim 8, further comprising a sex search phase.

11. In the learning phase, the same robot operation is applied to the target object a predetermined number of times, and if the reproducibility of the robot operation and the target object movement is confirmed, the robot The learning method for a robot according to claim 10, wherein the initial position and the motion and how to move the object are learned.

12. If the reproducibility of the movement of the target object cannot be confirmed even after trying the operation of the robot a predetermined number of times, the retry is performed by changing the combination of the initial position and the operation of the robot. The learning method for a robot according to claim 11, wherein

13. In the novelty search phase, a motion of the object with respect to the initial position and the motion of the robot is predicted, and the predicted motion of the object matches the motion of the object perceived in the perception step. 11. The learning method for a robot according to claim 10, wherein when different, the novelty is recognized, and the initial position and motion of the robot and how to move the object are learned.

14. When the novelty is recognized, the same robot operation is applied to the target object a predetermined number of times,
14. The robot according to claim 13, wherein when the reproducibility of the operation of the robot and the movement of the object can be confirmed, the initial position and operation of the robot and the movement of the object are learned. Learning method.