JP2007305072A

JP2007305072A - Information processor, information processing method and program

Info

Publication number: JP2007305072A
Application number: JP2006135715A
Authority: JP
Inventors: Atsushi Tani; 淳谷; Takanosuke Nishimoto; 隆之助西本; Masato Ito; 真人伊藤
Original assignee: Sony Corp; RIKEN Institute of Physical and Chemical Research
Current assignee: Sony Corp; RIKEN Institute of Physical and Chemical Research
Priority date: 2006-05-15
Filing date: 2006-05-15
Publication date: 2007-11-22

Abstract

PROBLEM TO BE SOLVED: To efficiently learn time-series data added, by making it difficult to change the weight coefficient of a learnt RNN. SOLUTION: In the information processor 51, weight coefficient of RNN 71-n is learnt by giving a learning weight μn such that it makes difficult to change the weight coefficient to the RNN 71-n with a large use frequency FREQn in learning so far in additional learning processing of sensor motor signal as time-series data executed in RNN 71-1 to 71-n. The present invention is applicable to, for example, an information processor incorporated in a robot or the like. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、情報処理装置、情報処理方法、およびプログラムに関し、特に、学習済みのRNNの重み係数を変更しにくくすることにより、追加される時系列データを効率的に学習することができるようにする情報処理装置、情報処理方法、およびプログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program, and in particular, by making it difficult to change a weighting factor of a learned RNN, it is possible to efficiently learn added time-series data. The present invention relates to an information processing apparatus, an information processing method, and a program.

本出願人は、リカレント型ニューラルネットワークを用いて、時系列データを、学習した結果に応じて発生させることを先に提案した（例えば、特許文献１参照）。 The present applicant has previously proposed to generate time-series data according to the learning result using a recurrent neural network (see, for example, Patent Document 1).

この提案においては、図１に示されるように、情報処理装置が、基本的に、リカレント型ニューラルネットワーク（以下、RNNという）１−１乃至１−ｖを有する下位の階層のネットワークと、RNN１１−１乃至１１−ｖを有する上位の階層のネットワークとで構成される。 In this proposal, as shown in FIG. 1, the information processing apparatus basically includes a network in a lower hierarchy having recurrent neural networks (hereinafter referred to as RNN) 1-1 to 1-v, and RNN11- 1 to 11-v and an upper layer network.

下位の階層のネットワークにおいては、RNN１−１乃至１−ｖの出力が、それぞれ対応するゲート２−１乃至２−ｖを介して、合成回路３に供給され、合成される。 In the lower layer network, the outputs of RNNs 1-1 to 1-v are supplied to the synthesis circuit 3 via the corresponding gates 2-1 to 2-v, and synthesized.

上位の階層のネットワークにおいても、RNN１１−１乃至１１−ｖの出力が、対応するゲート１２−１乃至１２−ｖを介して合成回路１３に供給され、合成される。そして、上位の階層の合成回路１３の出力に基づいて、下位の階層のゲート２−１乃至２−ｖのオン、オフが制御される。 Also in the upper layer network, the outputs of the RNNs 11-1 to 11-v are supplied to the synthesis circuit 13 via the corresponding gates 12-1 to 12-v and synthesized. Then, on and off of the gates 2-1 to 2-v in the lower hierarchy are controlled based on the output of the synthesis circuit 13 in the upper hierarchy.

図１に示す構成を有する情報処理装置では、下位の階層のRNN１−１乃至１−ｖに、それぞれ時系列データＰ１乃至Ｐｖを発生させ、上位の階層の合成回路１３の出力に基づいて、下位の階層のゲート２−１乃至２−ｖのうちの所定のものをオンまたはオフさせるようにすることで、合成回路３からRNN１−１乃至１−ｖのうちの所定のものが発生した時系列データＰ１乃至Ｐｖのいずれかを選択的に出力させることができる。 In the information processing apparatus having the configuration shown in FIG. 1, time series data P1 to Pv are generated in RNNs 1-1 to 1-v in the lower hierarchy, respectively, and based on the output of the synthesis circuit 13 in the upper hierarchy, A time series in which a predetermined one of the RNNs 1-1 to 1-v is generated from the synthesis circuit 3 by turning on or off a predetermined one of the gates 2-1 to 2-v in the hierarchy of Any of the data P1 to Pv can be selectively output.

これにより、例えば、図２に示されるように、所定の時間、時系列データＰ１を生成させた後、次の所定の時間、時系列データＰ２を生成させ、さらに、その次の所定の時間、再び時系列データＰ１を生成させるなどして、時系列データを生成させることができる。
特開平１１−１２６１９８号公報 Thus, for example, as shown in FIG. 2, after the time series data P1 is generated for a predetermined time, the time series data P2 is generated for the next predetermined time, and further, the next predetermined time, The time series data can be generated by generating the time series data P1 again.
JP-A-11-126198

しかしながら、図１の情報処理装置のように複数のRNNに時系列データを学習させる場合であって、所定の時系列データを学習させた後、新たな時系列データを追加して学習させる場合、新たな時系列データが、学習済みの時系列データと特徴が似ていると、本来異なる時系列データであるのに、その特徴が似ている学習済みの時系列データを学習したRNNの重み係数が変更されてしまうという問題があった。 However, when the time series data is learned by a plurality of RNNs as in the information processing apparatus of FIG. 1, after learning the predetermined time series data and adding new time series data to learn, If the new time-series data has similar characteristics to the trained time-series data, it is inherently different time-series data, but the weighted coefficient of the RNN that learned the learned time-series data with similar characteristics There was a problem that changed.

本発明は、このような状況に鑑みてなされたものであり、学習済みのRNNの重み係数を変更しにくくすることにより、追加される時系列データを効率的に学習することができるようにするものである。 The present invention has been made in view of such a situation, and makes it possible to efficiently learn added time-series data by making it difficult to change the weighting factor of the learned RNN. Is.

本発明の一側面の情報処理装置は、複数のリカレント型ニューラルネットワークに対して所定の時系列データを学習させた後、前記複数のリカレント型ニューラルネットワークに対して、新たな時系列データを追加して学習させる場合に、前記所定の時系列データを以前に学習したときの前記複数のリカレント型ニューラルネットワークそれぞれの利用頻度に負の相関を有する学習の重みを決定する学習重み決定手段を備え、前記複数のリカレント型ニューラルネットワークそれぞれは、前記学習重み決定手段によって決定された前記学習の重みに応じて、前記新たな時系列データを学習する。 An information processing apparatus according to an aspect of the present invention adds predetermined time series data to a plurality of recurrent neural networks after learning predetermined time series data for a plurality of recurrent neural networks. Learning weight determining means for determining a learning weight having a negative correlation with the frequency of use of each of the plurality of recurrent neural networks when the predetermined time-series data was previously learned. Each of the plurality of recurrent neural networks learns the new time-series data according to the learning weight determined by the learning weight determining means.

本発明の一側面の情報処理方法は、複数のリカレント型ニューラルネットワークに対して所定の時系列データを学習させた後、前記複数のリカレント型ニューラルネットワークに対して、新たな時系列データを追加して学習させる場合に、前記所定の時系列データを以前に学習したときの前記複数のリカレント型ニューラルネットワークそれぞれの利用頻度に負の相関を有する学習の重みを決定し、決定された前記学習の重みに応じて、前記新たな時系列データを学習するステップを含む。 According to an information processing method of one aspect of the present invention, after a plurality of recurrent neural networks learn predetermined time series data, new time series data is added to the plurality of recurrent neural networks. The learning weight having a negative correlation with the usage frequency of each of the plurality of recurrent neural networks when the predetermined time-series data was previously learned is determined, and the determined learning weight And learning the new time series data.

本発明の一側面のプログラムは、複数のリカレント型ニューラルネットワークに対して所定の時系列データを学習させた後、前記複数のリカレント型ニューラルネットワークに対して、新たな時系列データを追加して学習させる場合に、前記所定の時系列データを以前に学習したときの前記複数のリカレント型ニューラルネットワークそれぞれの利用頻度に負の相関を有する学習の重みを決定し、決定された前記学習の重みに応じて、前記新たな時系列データを学習するステップを含む処理をコンピュータに実行させる。 The program according to one aspect of the present invention learns predetermined time-series data from a plurality of recurrent neural networks, and then adds new time-series data to the plurality of recurrent neural networks to learn. Determining a learning weight having a negative correlation with the usage frequency of each of the plurality of recurrent neural networks when the predetermined time series data was previously learned, and depending on the determined learning weight And causing the computer to execute a process including the step of learning the new time-series data.

本発明の一側面においては、複数のリカレント型ニューラルネットワークに対して所定の時系列データを学習させた後、複数のリカレント型ニューラルネットワークに対して、新たな時系列データを追加して学習させる場合に、所定の時系列データを以前に学習したときの複数のリカレント型ニューラルネットワークそれぞれの利用頻度に負の相関を有する学習の重みが決定され、決定された学習の重みに応じて、新たな時系列データが学習される。 In one aspect of the present invention, when a plurality of recurrent neural networks are trained with predetermined time-series data, and then a plurality of recurrent-type neural networks are additionally trained with new time-series data. In addition, a learning weight having a negative correlation with the usage frequency of each of the plurality of recurrent neural networks when the predetermined time series data was previously learned is determined, and a new time is determined according to the determined learning weight. Series data is learned.

本発明の一側面によれば、学習済みのRNNの重み係数を変更しにくくすることにより、追加される時系列データを効率的に学習することができる。 According to one aspect of the present invention, it is possible to efficiently learn time-series data to be added by making it difficult to change a weighting factor of a learned RNN.

以下に本発明の実施の形態を説明するが、本発明の構成要件と、明細書又は図面に記載の実施の形態との対応関係を例示すると、次のようになる。この記載は、本発明をサポートする実施の形態が、明細書又は図面に記載されていることを確認するためのものである。従って、明細書又は図面中には記載されているが、本発明の構成要件に対応する実施の形態として、ここには記載されていない実施の形態があったとしても、そのことは、その実施の形態が、その構成要件に対応するものではないことを意味するものではない。逆に、実施の形態が構成要件に対応するものとしてここに記載されていたとしても、そのことは、その実施の形態が、その構成要件以外の構成要件には対応しないものであることを意味するものでもない。 Embodiments of the present invention will be described below. Correspondences between the constituent elements of the present invention and the embodiments described in the specification or the drawings are exemplified as follows. This description is intended to confirm that the embodiments supporting the present invention are described in the specification or the drawings. Therefore, even if there is an embodiment which is described in the specification or the drawings but is not described here as an embodiment corresponding to the constituent elements of the present invention, that is not the case. It does not mean that the form does not correspond to the constituent requirements. Conversely, even if an embodiment is described here as corresponding to a configuration requirement, that means that the embodiment does not correspond to a configuration requirement other than the configuration requirement. It's not something to do.

本発明の一側面の情報処理装置（例えば、図３の情報処理装置５１）は、複数のリカレント型ニューラルネットワークに対して所定の時系列データを学習させた後、前記複数のリカレント型ニューラルネットワークに対して、新たな時系列データを追加して学習させる場合に、前記所定の時系列データを以前に学習したときの前記複数のリカレント型ニューラルネットワークそれぞれの利用頻度に負の相関を有する学習の重みを決定する学習重み決定手段（例えば、図３の制御回路７６）を備え、前記複数のリカレント型ニューラルネットワークそれぞれは、前記学習重み決定手段によって決定された前記学習の重みに応じて、前記新たな時系列データを学習する。 An information processing apparatus according to an aspect of the present invention (for example, the information processing apparatus 51 in FIG. 3) causes a plurality of recurrent neural networks to learn predetermined time-series data, and then causes the plurality of recurrent neural networks to On the other hand, when new time series data is added and learned, the learning weight having a negative correlation with the frequency of use of each of the plurality of recurrent neural networks when the predetermined time series data was previously learned Learning weight determining means (for example, the control circuit 76 in FIG. 3), and each of the plurality of recurrent neural networks includes the new weight according to the learning weight determined by the learning weight determining means. Learn time-series data.

本発明の一側面の情報処理方法またはプログラム（例えば、図１１の追加学習処理方法）は、複数のリカレント型ニューラルネットワークに対して所定の時系列データを学習させた後、前記複数のリカレント型ニューラルネットワークに対して、新たな時系列データを追加して学習させる場合に、前記所定の時系列データを以前に学習したときの前記複数のリカレント型ニューラルネットワークそれぞれの利用頻度に負の相関を有する学習の重みを決定し（例えば、図１１のステップＳ１０２）、決定された前記学習の重みに応じて、前記新たな時系列データを学習する（例えば、図１１のステップＳ１０３）ステップを含む。 An information processing method or program according to an aspect of the present invention (for example, the additional learning processing method of FIG. 11) causes a plurality of recurrent neural networks to learn predetermined time-series data, and then the plurality of recurrent neural networks. Learning to add negative time-series data to the network and learning negatively correlated with the frequency of use of each of the plurality of recurrent neural networks when the predetermined time-series data was previously learned Is determined (for example, step S102 in FIG. 11), and the new time series data is learned (for example, step S103 in FIG. 11) according to the determined learning weight.

以下、図を参照して、本発明の実施の形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図３は、本発明を適用した情報処理装置の構成例を示している。 FIG. 3 shows a configuration example of an information processing apparatus to which the present invention is applied.

図３の情報処理装置５１は、例えば、ロボットなどに組み込まれるものである。情報処理装置５１が組み込まれるロボットには、視認の対象となる対象物を検出するセンサと、ロボットを移動させるために駆動されるモータ（いずれも図示せず）が少なくとも具備されており、センサおよびモータからの信号であるセンサモータ信号が情報処理装置５１に供給される。 The information processing apparatus 51 in FIG. 3 is incorporated in, for example, a robot. The robot in which the information processing device 51 is incorporated includes at least a sensor that detects an object to be visually recognized and a motor (none of which is shown) that is driven to move the robot. A sensor motor signal that is a signal from the motor is supplied to the information processing apparatus 51.

情報処理装置５１は、下位時系列予測生成器６１、上位時系列予測生成器６２、およびゲート信号変換部６３により構成され、教師データとして与えられる時系列データを学習する学習処理と、その学習した結果に応じて、入力に対する時系列データを生成（再現）する生成処理を実行する。 The information processing apparatus 51 includes a lower time series prediction generator 61, an upper time series prediction generator 62, and a gate signal converter 63, and learning processing for learning time series data given as teacher data, and learning Depending on the result, a generation process for generating (reproducing) time-series data for the input is executed.

本実施の形態では、情報処理装置５１が、ヒューマノイドロボットが行う一連の動作である行動シーケンスを学習および生成する例について説明する。 In the present embodiment, an example will be described in which the information processing apparatus 51 learns and generates a behavior sequence that is a series of operations performed by a humanoid robot.

以下の例では、情報処理装置５１が３つの行動シーケンスA，B、およびCを学習する。 In the following example, the information processing apparatus 51 learns three action sequences A, B, and C.

行動シーケンスAとしてのヒューマノイドロボットの動作は、初期状態としての両腕を左右に広げた状態のロボットが、目の前のテーブルに置かれた四角い物体を視認し、物体を両手で掴んで所定の高さだけ持ち上げ、再びテーブルに置く動作を複数回行い、その後、初期状態の位置（以下、ホームポジションともいう）に両腕を戻す動作である。 The action of humanoid robot as action sequence A is as follows.A robot with both arms extended to the left and right as an initial state visually recognizes a square object placed on the table in front of the user, grabs the object with both hands, and This is an operation of lifting the height and placing it again on the table a plurality of times, and then returning both arms to the initial position (hereinafter also referred to as the home position).

行動シーケンスBとしてのヒューマノイドロボットの動作は、初期状態から、目の前のテーブルに置かれた四角い物体を視認し、右手で物体に触る、ホームポジションに戻す、左手で物体に触る、ホームポジションに戻るという動作、即ち、物体を片手で交互に触る動作を複数回行う動作である。 From the initial state, the humanoid robot moves as an action sequence B by visually recognizing a square object placed on the table in front of you, touching the object with the right hand, returning to the home position, touching the object with the left hand, and moving to the home position. The operation of returning, that is, the operation of alternately touching the object with one hand is performed a plurality of times.

行動シーケンスCとしてのヒューマノイドロボットの動作は、初期状態から、目の前のテーブルに置かれた四角い物体を視認し、両手で同時に物体を１回触って、ホームポジションに戻る動作である。 The action of the humanoid robot as the action sequence C is an action of visually recognizing a square object placed on a table in front of the eyes from the initial state, touching the object once with both hands, and returning to the home position.

以上のような行動シーケンスA乃至Cそれぞれを実行するときのセンサ（例えば、視覚センサなど）やモータの信号を、情報処理装置５１は学習および生成する。 The information processing apparatus 51 learns and generates signals from a sensor (for example, a visual sensor) and a motor when executing each of the action sequences A to C as described above.

下位時系列予測生成器６１は、N個のリカレント型ニューラルネットワーク（Recurrent Neural Network;以下、RNNという）７１−１乃至７１−Ｎ、そのRNN７１−１乃至７１−Ｎの後段に配置されるゲート７２−１乃至７２−Ｎ、合成回路７３、演算回路７４、メモリ７５、および制御回路７６により構成される。なお、RNN７１−１乃至７１−Ｎを特に区別する必要がない場合には、単にRNN７１と称する。その他のゲート７２などについても同様である。 The low-order time series prediction generator 61 includes N recurrent neural networks (hereinafter referred to as RNNs) 71-1 to 71-N, and a gate 72 disposed in a subsequent stage of the RNNs 71-1 to 71-N. -1 to 72-N, a synthesis circuit 73, an arithmetic circuit 74, a memory 75, and a control circuit 76. Note that RNNs 71-1 to 71-N are simply referred to as RNN 71 when it is not necessary to distinguish between them. The same applies to the other gates 72 and the like.

下位時系列予測生成器６１には、ヒューマノイドロボットに具備されたセンタおよびモータからのセンサモータ信号が入力される。ここで、時刻ｔに、下位時系列予測生成器６１に入力されるセンサモータ信号をｓｍ（ｔ）と表す。 The lower time series prediction generator 61 receives sensor motor signals from the center and motor provided in the humanoid robot. Here, the sensor motor signal input to the low-order time series prediction generator 61 at time t is represented as sm (t).

下位時系列予測生成器６１は、そこに入力される時刻ｔのセンサモータ信号ｓｍ（ｔ）に対して、時刻ｔ＋１のセンサモータ信号ｓｍ（ｔ＋１）を、予め学習した結果に応じて予測して、出力する。 The low-order time series prediction generator 61 predicts the sensor motor signal sm (t + 1) at time t + 1 according to the previously learned result with respect to the sensor motor signal sm (t) at time t input thereto. ,Output.

具体的には、RNN７１−ｎ（ｎ＝１，２，・・・，Ｎ）は、入力された時刻ｔのセンサモータ信号ｓｍ（ｔ）に対して、予め学習した結果に応じて時刻ｔ＋１のセンサモータ信号ｓｍ（ｔ＋１）を生成し、ゲート７２−ｎに出力する。 Specifically, the RNN 71-n (n = 1, 2,..., N) receives the sensor motor signal sm (t) at the input time t at the time t + 1 according to the result learned in advance. A sensor motor signal sm (t + 1) is generated and output to the gate 72-n.

ところで、行動シーケンスは、さまざまな複数の行動部品（運動プリミティブ（primitive））の集まり（連続）で構成されると考えることができる。例えば、上述した行動シーケンスAは、物体を視認する、両腕を物体に（掴むまで）近づける、物体を持ち上げる、持ち上げた物体を下げる、ホームポジションに戻る、等の行動部品の集まりであると考えることができる。RNN７１−１乃至７１−Ｎのそれぞれは、１つの行動部品に対応するセンサモータ信号の時系列データを排他的に学習する。 By the way, the action sequence can be considered to be composed of a collection (continuous) of various action parts (motion primitives). For example, the above-described behavior sequence A is considered to be a collection of behavioral parts such as visually recognizing an object, bringing both arms close to the object (until grasping), lifting the object, lowering the lifted object, and returning to the home position. be able to. Each of the RNNs 71-1 to 71-N exclusively learns time series data of sensor motor signals corresponding to one action component.

従って、RNN７１−１乃至７１−Ｎそれぞれに学習されている行動部品が異なるために、RNN７１−１乃至７１−Ｎのそれぞれには、同一のセンサモータ信号ｓｍ（ｔ）が入力されるが、RNN７１−１乃至７１−Ｎのそれぞれが出力するセンサモータ信号ｓｍ（ｔ＋１）は異なるものとなる。ここで、RNN７１−ｎが出力するセンサモータ信号ｓｍ（ｔ＋１）をセンサモータ信号ｓｍ_n（ｔ＋１）と表す。 Therefore, since the behavioral components learned by the RNNs 71-1 to 71-N are different, the same sensor motor signal sm (t) is input to each of the RNNs 71-1 to 71-N. The sensor motor signals sm (t + 1) output from −1 to 71-N are different. Here, the sensor motor signal sm (t + 1) output by the RNN 71-n is represented as a sensor motor signal sm _n (t + 1).

RNN７１−ｎの後段に配置されるゲート７２−ｎには、RNN７１−ｎからの時刻ｔ＋１のセンサモータ信号ｓｍ_n（ｔ＋１）の他に、ゲート信号変換部６３から、ゲート７２−１乃至７２−Ｎの開閉状態の制御信号であるゲート信号gate［Ｎ］＝｛ｇ₁，ｇ₂，・・・，ｇ_N｝が供給される。なお、後述するように、ゲート信号gate［Ｎ］を構成するゲート信号ｇ_nの総和は１（Σｇ_n＝１）となっている。 In addition to the sensor motor signal sm _n (t + 1) at the time t + 1 from the RNN 71-n, the gate 72-n disposed at the subsequent stage of the RNN 71-n receives the gates 72-1 to 72- from the gate signal conversion unit 63. A gate signal gate [N] = {g ₁ , g ₂ ,..., G _N }, which is a control signal for N open / close states, is supplied. As will be described later, the total sum of the gate signals g _n constituting the gate signal gate [N] is 1 (Σg _n = 1).

ゲート７２−ｎは、ゲート信号ｇ_nに応じて、RNN７１−ｎからのセンサモータ信号ｓｍ_n（ｔ＋１）の出力を開閉する。即ち、ゲート７２−ｎは、時刻ｔ＋１において、ｇ_n×ｓｍ_n（ｔ＋１）を合成回路７３に出力する。 Gate 72-n in response to the gate signal g _n, to open and close the output of the sensor motor signal sm _n (t + 1) from RNN71-n. That is, the gate 72-n outputs g _n × sm _n (t + 1) to the synthesis circuit 73 at time t + 1.

合成回路７３は、ゲート７２−１乃至７２−Ｎそれぞれからの出力を合成し、その合成の結果を、時刻ｔ＋１のセンサモータ信号ｓｍ（ｔ＋１）として出力する。即ち、合成回路７３は、次式（１）で表されるセンサモータ信号ｓｍ（ｔ＋１）を出力する。 The synthesis circuit 73 synthesizes the outputs from the gates 72-1 to 72-N, and outputs the result of the synthesis as a sensor motor signal sm (t + 1) at time t + 1. That is, the synthesis circuit 73 outputs a sensor motor signal sm (t + 1) expressed by the following equation (1).

演算回路７４は、センサモータ信号の時系列データの学習時、時刻ｔのセンサモータ信号ｓｍ（ｔ）に対してRNN７１−１乃至７１−Ｎそれぞれが出力した時刻ｔ＋１のセンサモータ信号ｓｍ₁（ｔ＋１）乃至ｓｍ_N（ｔ＋１）と、教師データとして下位時系列予測生成器６１に与えられる時刻ｔ＋１の教師センサモータ信号ｓｍ^*（ｔ＋１）との予測誤差errorL^t+1［Ｎ］＝｛errorL^t+1 ₁，errorL^t+1 ₂，・・・，errorL^t+1 _N｝を計算する。なお、予測誤差errorL^t+1［Ｎ］は、後述する式（１６）で表されるように、時刻ｔ＋１における誤差だけではなく、時刻ｔ＋１から過去Ｌステップ分を考慮した誤差として計算される。 When learning the time series data of the sensor motor signal, the arithmetic circuit 74 outputs the sensor motor signal sm ₁ (t + 1) at time t + 1 output from each of the RNNs 71-1 to 71-N with respect to the sensor motor signal sm (t) at time t. ) To sm _N (t + 1) and the prediction error errorL ^{t + 1} [N] = {errorL ^{t +} between the teacher sensor motor signal sm ^* (t + 1) at time t + 1 given to the lower time series prediction generator 61 as teacher data ¹ ₁ , errorL ^{t + 1} ₂ ,..., ErrorL ^{t + 1} _N } are calculated. Note that the prediction error errorL ^{t + 1} [N] is calculated not only as an error at time t + 1 but also as an error taking into account the past L steps from time t + 1, as expressed by equation (16) described later.

演算回路７４により計算された時刻ｔ＋１におけるRNN７１−ｎの予測誤差errorL^t+1 _nは、メモリ７５に供給され、記憶される。 The prediction error errorL ^{t + 1} _n of RNN 71-n at time t + 1 calculated by the arithmetic circuit 74 is supplied to the memory 75 and stored therein.

演算回路７４において、予測誤差errorL^t+1［Ｎ］の計算が時系列に繰り返され、メモリ７５に記憶されることにより、メモリ７５には、教師データに対する予測誤差の時系列データerrorL［Ｎ］が記憶される。この予測誤差の時系列データerrorL［Ｎ］は、上位時系列予測生成器６２に供給される。なお、演算回路７４は、教師データに対する予測誤差の時系列データerrorL［Ｎ］を、０から１の範囲の値に正規化してから出力する。 In the arithmetic circuit 74, the calculation of the prediction error errorL ^{t + 1} [N] is repeated in time series and stored in the memory 75, whereby the memory 75 stores the prediction error time series data errorL [N] for the teacher data. Is memorized. The time series data errorL [N] of the prediction error is supplied to the upper time series prediction generator 62. The arithmetic circuit 74 normalizes the prediction error time series data errorL [N] with respect to the teacher data to a value in the range of 0 to 1, and then outputs the normalized data.

メモリ７５は、上述したように、教師データに対する予測誤差の時系列データerrorL［Ｎ］を記憶する。また、メモリ７５は、RNN７１−１乃至７１−Ｎの利用頻度FREQ₁乃至FREQ_Nも記憶する。RNN７１−１乃至７１−Ｎの利用頻度FREQ₁乃至FREQ_Nについては図６を参照して後述する。 As described above, the memory 75 stores the time series data errorL [N] of the prediction error for the teacher data. The memory 75 also stores the usage frequencies FREQ _{1 to} FREQ _N of the RNNs 71-1 to 71-N. The usage frequencies FREQ _{1 to} FREQ _N of the RNNs 71-1 to 71-N will be described later with reference to FIG.

制御回路７６は、RNN７１−１乃至７１−Ｎ、演算回路７４、メモリ７５など、下位時系列予測生成器６１の各部を制御する。 The control circuit 76 controls each unit of the lower time series prediction generator 61 such as the RNNs 71-1 to 71-N, the arithmetic circuit 74, and the memory 75.

一方、上位時系列予測生成器６２は、１個の連続時間型のRNN（Continuous Time RNN：以下、CTRNNという）８１により構成される。 On the other hand, the high-order time-series prediction generator 62 includes a single continuous-time RNN (Continuous Time RNN: hereinafter referred to as CTRNN) 81.

上位時系列予測生成器６２のCTRNN８１は、下位時系列生成器６１のRNN７１−１乃至７１−Ｎが、生成時にどれくらいの予測誤差を発生させるかを推定（予測）して出力する。 The CTRNN 81 of the upper time series prediction generator 62 estimates (predicts) how much prediction error the RNNs 71-1 to 71-N of the lower time series generator 61 generate at the time of generation, and outputs them.

即ち、CTRNN８１は、RNN７１−１乃至７１−Ｎの予測誤差の時系列データerrorL［Ｎ］を教師データとして用いて学習し、その学習した結果に基づいて、RNN７１−１乃至７１−Ｎの推定予測誤差errorPredH［Ｎ］＝｛errorPredH₁，errorPredH₂，・・・，errorPredH_N｝を生成し、出力する。ここで、時刻ｔにおける推定予測誤差errorPredH［Ｎ］を、errorPredH^t［Ｎ］＝｛errorPredH^t ₁，errorPredH^t ₂，・・・，errorPredH^t _N｝とする。 That is, the CTRNN 81 learns by using the time series data errorL [N] of the prediction errors of the RNNs 71-1 to 71-N as teacher data, and based on the learned result, the estimated prediction of the RNNs 71-1 to 71-N The error errorPredH [N] = {errorPredH ₁ , errorPredH ₂ ,..., ErrorPredH _N } is generated and output. Here, the estimated prediction error errorPredH [N] at time t is assumed to be errorPredH ^t [N] = {errorPredH ^t ₁ , errorPredH ^t ₂ ,..., ErrorPredH ^t _N }.

また、CTRNN８１には、行動シーケンスA乃至Bのいずれの推定予測誤差errorPredH［Ｎ］を出力するかを切替えるタスク切替信号としてのタスクIDが与えられる。 Also, CTRNN 81 is given a task ID as a task switching signal for switching which estimated prediction error errorPredH [N] of behavior sequences A to B is to be output.

ゲート信号変換部６３は、ソフトマックス（softmax）関数を用いて、時刻ｔにおける推定予測誤差errorPredH^t［Ｎ］を、ゲート信号gate^t［Ｎ］＝｛ｇ^t ₁，ｇ^t ₂，・・・，ｇ^t _N｝に変換し、変換した結果をゲート７２−１乃至７２−Ｎに出力する。 The gate signal converter unit 63, using the soft max (softmax) function, the estimated prediction at time t error errorPredH ^t [N], the gate signal ^{gate t [N] = {g} t 1, g t 2, ··· , G ^t _N }, and outputs the converted result to the gates 72-1 to 72-N.

時刻ｔにおけるゲート７２−ｎに対するゲート信号ｇ^t _nは、次式（２）で表される。 The gate signal g ^t _n to the gate 72-n at time t is expressed by the following formula (2).

式（２）によれば、予測誤差の小さいものは大きい値に、予測誤差の大きいものは小さい値となるような非線形の変換が施される。その結果、予測誤差の小さいものほどゲートがより大きく開き、予測誤差の大きいものほどゲートがより小さく開くような制御が、下位時系列生成器６１のゲート７２−１乃至７２−Ｎにおいて行われることになる。 According to Equation (2), nonlinear conversion is performed such that a small prediction error has a large value and a large prediction error has a small value. As a result, control is performed in the gates 72-1 to 72-N of the lower time series generator 61 such that the smaller the prediction error, the larger the gate opens, and the larger the prediction error, the smaller the gate opens. become.

以上のように構成される情報処理装置５１では、上位時系列予測生成器６２が、下位時系列生成器６１のRNN７１−１乃至７１−Ｎが生成時に発生させる予測誤差の推定値である推定予測誤差errorPredH［Ｎ］を出力し、この推定予測誤差errorPredH［Ｎ］が、ゲート７２−１乃至７２−Ｎの開閉状態を制御するゲート信号gate［Ｎ］に変換される。そして、上述の（１）式で表される、開閉状態が制御されたゲート７２−１乃至７２−Ｎから出力されるRNN７１−１乃至７１−Ｎの出力信号ｓｍ₁（ｔ＋１）乃至ｓｍ_N（ｔ＋１）の総和が、時刻ｔ＋１のセンサモータ信号ｓｍ（ｔ＋１）として、ヒューマノイドロボットのセンサおよびモータに供給される。 In the information processing apparatus 51 configured as described above, the higher-order time-series prediction generator 62 is an estimated prediction that is an estimated value of a prediction error generated by the RNNs 71-1 to 71-N of the lower-order time series generator 61 at the time of generation. An error errorPredH [N] is output, and the estimated prediction error errorPredH [N] is converted into a gate signal gate [N] that controls the open / closed state of the gates 72-1 to 72-N. Then, the output signals sm ₁ (t + 1) to sm _N (output from the RNNs 71-1 to 71-N output from the gates 72-1 to 72-N whose open / close states are controlled, expressed by the above-described equation (1). The sum of (t + 1) is supplied to the sensor and motor of the humanoid robot as a sensor motor signal sm (t + 1) at time t + 1.

なお、上位時系列予測生成器６２は、上位時系列予測生成器６２の出力である推定予測誤差errorPredH［Ｎ］が、後段のゲート信号変換部６３においてゲート信号gate［Ｎ］に変換されるから、時刻ｔにおいて、どのゲート７２−１乃至７２−Ｎを（大きく）開放するかを予測しているとも言える。 The upper time series prediction generator 62 converts the estimated prediction error errorPredH [N], which is the output of the upper time series prediction generator 62, into the gate signal gate [N] in the subsequent gate signal conversion unit 63. It can also be said that the gates 72-1 to 72-N to be opened (largely) at the time t are predicted.

図４は、RNN７１−ｎの詳細な構成例を示している。 FIG. 4 shows a detailed configuration example of the RNN 71-n.

RNN７１−ｎは、図４に示されるように、入力層１０１、中間層（隠れ層）１０２、および出力層１０３により構成されており、入力層１０１は所定数のノード１１１を有し、中間層（隠れ層）１０２は、所定数のノード１１２を有し、出力層１０３は、所定数のノード１１３を有している。 As shown in FIG. 4, the RNN 71-n includes an input layer 101, an intermediate layer (hidden layer) 102, and an output layer 103. The input layer 101 has a predetermined number of nodes 111, and the intermediate layer The (hidden layer) 102 has a predetermined number of nodes 112, and the output layer 103 has a predetermined number of nodes 113.

入力層１０１のノード１１１には、時刻ｔにおけるセンサモータ信号ｓｍ（ｔ）と、時刻ｔの１つ前の時刻ｔ−１に出力層１０３の一部のノード１１３から出力され、RNN７１−ｎの内部状態を表すコンテキストｃ（ｔ）としてフィードバックされたデータが、入力される。 To the node 111 of the input layer 101, the sensor motor signal sm (t) at time t and the node 113 which is a part of the output layer 103 are output from the node 113 of the output layer 103 at time t-1 immediately before time t. Data fed back as the context c (t) representing the internal state is input.

中間層１０２のノード１１２は、入力層１０１のノード１１１から入力されたデータと、予め学習によって求められたノード１１１との間の重み係数とを積和する重み付け加算処理を行い、その演算結果を出力層１０３のノード１１３に出力する。 The node 112 of the intermediate layer 102 performs a weighted addition process that multiplies the data input from the node 111 of the input layer 101 and the weighting coefficient between the node 111 obtained in advance by learning, and outputs the calculation result. Output to the node 113 of the output layer 103.

出力層１０３を構成するノード１１３は、中間層１０２のノード１１２から入力されたデータと、予め学習によって求められたノード１１２との間の重み係数とを積和する重み付け加算処理の演算を行う。出力層１０３を構成する一部のノード１１３は、演算結果を、時刻ｔ＋１のセンサモータ信号ｓｍ_n（ｔ＋１）として出力する。また、出力層１０３を構成するその他の一部のノード１１３は、演算結果を、時刻ｔ＋１のコンテキストｃ（ｔ＋１）として、入力層１０１のノード１１１にフィードバックする。 The node 113 constituting the output layer 103 performs an operation of weighted addition processing for multiplying and summing the data input from the node 112 of the intermediate layer 102 and the weighting coefficient between the node 112 obtained in advance by learning. Some nodes 113 constituting the output layer 103 output the calculation result as a sensor motor signal sm _n (t + 1) at time t + 1. The other part of the nodes 113 constituting the output layer 103 feeds back the calculation result to the node 111 of the input layer 101 as the context c (t + 1) at time t + 1.

以上のように、RNN７１−ｎは、予め学習によって求められたノード間の重み係数を用いた重み付け加算処理により、入力された時刻ｔのセンサモータ信号ｓｍ（ｔ）に対して、時刻ｔ＋１のセンサモータ信号ｓｍ_n（ｔ＋１）を予測して出力する。 As described above, the RNN 71-n detects the sensor at the time t + 1 with respect to the input sensor motor signal sm (t) at the time t by the weighted addition process using the weighting coefficient between the nodes obtained by learning in advance. The motor signal sm _n (t + 1) is predicted and output.

なお、ノード間の重み係数を求める学習では、BPTT（Back Propagation Through Time）法が採用される。BPTT法は、コンテキストループを持つRNNの学習アルゴリズムであり、時間的な信号伝播の様子を空間的に展開することで、通常の階層型ニューラルネットワークにおけるバックプロパゲーション（BP）法を適用する手法である。次に後述するCTRNN８１において重み係数を求める場合も同様である。 Note that a BPTT (Back Propagation Through Time) method is employed in learning for obtaining a weighting coefficient between nodes. The BPTT method is an RNN learning algorithm with a context loop, and it applies the back-propagation (BP) method in a normal hierarchical neural network by spatially expanding the state of signal propagation over time. is there. The same applies to the case where the weighting coefficient is obtained in CTRNN 81 described later.

図５は、CTRNN８１として採用されるCTRNNの詳細な構成例を示している。 FIG. 5 shows a detailed configuration example of CTRNN adopted as CTRNN81.

図５のCTRNN１４１は、入力層１５１、中間層（隠れ層）１５２、出力層１５３、並びに演算部１５４および１５５により構成されている。 The CTRNN 141 in FIG. 5 includes an input layer 151, an intermediate layer (hidden layer) 152, an output layer 153, and arithmetic units 154 and 155.

入力層１５１は、入力ノード１６０−ｉ（ｉ＝１，・・・，Ｉ）、パラメータノード１６１−ｒ（ｒ＝１，・・・，Ｒ）、およびコンテキスト入力ノード１６２−ｋ（ｋ＝１，・・・，Ｋ）を有しており、中間層１５２は、隠れノード１６３−ｊ（ｊ＝１，・・・，Ｊ）を有している。また、出力層１５３は、出力ノード１６４−ｉ（ｉ＝１，・・・，Ｉ）と、コンテキスト出力ノード１６５−ｋ（ｋ＝１，・・・，Ｋ）を有している。 The input layer 151 includes input nodes 160-i (i = 1,..., I), parameter nodes 161-r (r = 1,..., R), and context input nodes 162-k (k = 1). ,..., K), and the intermediate layer 152 has hidden nodes 163-j (j = 1,..., J). The output layer 153 includes output nodes 164-i (i = 1,..., I) and context output nodes 165-k (k = 1,..., K).

なお、入力ノード１６０−ｉ、パラメータノード１６１−ｒ、コンテキスト入力ノード１６２−ｋ、隠れノード１６３−ｊ、出力ノード１６４−ｉ、およびコンテキスト出力ノード１６５−ｋの各ノードを区別する必要がない場合には、単に、入力ノード１６０、パラメータノード１６１、コンテキスト入力ノード１６２、隠れノード１６３、出力ノード１６４、およびコンテキスト出力ノード１６５という。 Note that it is not necessary to distinguish between the input node 160-i, the parameter node 161-r, the context input node 162-k, the hidden node 163-j, the output node 164-i, and the context output node 165-k. Are simply referred to as an input node 160, a parameter node 161, a context input node 162, a hidden node 163, an output node 164, and a context output node 165.

CTRNN１４１では、そこに入力される時刻ｔの状態ベクトルｘ^u（ｔ）に対して、時刻ｔ＋１の状態ベクトルｘ^u（ｔ＋１）を予測して、出力することが学習される。CTRNN１４１は、ネットワークの内部状態を表すコンテキストループと呼ばれる回帰ループをもち、その内部状態に基づく処理が行われることで対象となる時系列データの時間発展法則を学習することができる。 The CTRNN 141 learns to predict and output the state vector x ^u (t + 1) at time t + 1 with respect to the state vector x ^u (t) input at time t. The CTRNN 141 has a regression loop called a context loop representing the internal state of the network, and can learn the time evolution law of the target time-series data by performing processing based on the internal state.

CTRNN１４１に供給される時刻ｔの状態ベクトルｘ^u（ｔ）は、入力ノード１６０に入力される。パラメータノード１６１には、パラメータtsdata^uが入力される。パラメータtsdata^uは、CTRNN１４１に供給される状態ベクトルｘ^u（ｔ）の種類（時系列データのパターン）を識別するデータであり、CTRNN８１では、行動シーケンスを識別するデータとなる。パタメータtsdata^uは固定値であるが、継続的に同一の値が入力されていると考えることができるので、時刻ｔにおいてパラメータノード１６１に入力されるデータ（ベクトル）をパラメータtsdata^u（ｔ）とする。 The state vector x ^u (t) at time t supplied to the CTRNN 141 is input to the input node 160. A parameter tsdata ^u is input to the parameter node 161. The parameter tsdata ^u is data that identifies the type (time-series data pattern) of the state vector x ^u (t) supplied to the CTRNN 141, and the CTRNN 81 is data that identifies the action sequence. Although the parameter tsdata ^u is a fixed value, it can be considered that the same value is continuously input. Therefore, the data (vector) input to the parameter node 161 at time t is defined as the parameter tsdata ^u (t). To do.

入力ノード１６０−ｉには、時刻ｔの状態ベクトルｘ^u（ｔ）を構成するｉ番目の要素であるデータｘ^u _i（ｔ）が入力される。また、パラメータノード１６１−ｒには、時刻ｔのパラメータtsdata^u（ｔ）を構成するｒ番目の要素であるデータtsdata^u _r（ｔ）が入力される。さらに、コンテキスト入力ノード１６２−ｋには、時刻ｔのCTRNN１４１の内部状態ベクトルｃ^u（ｔ）を構成するｋ番目の要素であるデータｃ^u _k（ｔ）が入力される。 The input node 160-i receives data x ^u _i (t) that is the i-th element constituting the state vector x ^u (t) at time t. The parameter node 161-r receives data tsdata ^u _r (t) which is the r-th element constituting the parameter tsdata ^u (t) at time t. Furthermore, data c ^u _k (t), which is the k-th element constituting the internal state vector c ^u (t) of CTRNN 141 at time t, is input to the context input node 162-k.

入力ノード１６０−ｉ、パラメータノード１６１−ｒ、およびコンテキスト入力ノード１６２−ｋのそれぞれにデータｘ^u _i（ｔ）、tsdata^u _r（ｔ）、およびｃ^u _k（ｔ）が入力された場合に、入力ノード１６０−ｉ、パラメータノード１６１−ｒ、およびコンテキスト入力ノード１６２−ｋが出力するデータｘ_i（ｔ）、tsdata_r（ｔ）、およびｃ_k（ｔ）は、それぞれ、次の式（３）、式（４）、および式（５）によって表される。 When data x ^u _i (t), tsdata ^u _r (t), and c ^u _k (t) are input to the input node 160-i, the parameter node 161-r, and the context input node 162-k, respectively. , Input node 160-i, parameter node 161-r, and context input node 162-k output data x _i (t), tsdata _r (t), and c _k (t), respectively, 3), represented by formula (4), and formula (5).

式（３）乃至式（５）における関数ｆは、シグモイド関数などの微分可能な連続関数であり、式（３）乃至式（５）は、入力ノード１６０−ｉ、パラメータノード１６１−ｒ、およびコンテキスト入力ノード１６２−ｋのそれぞれに入力されたデータｘ^u _i（ｔ）、tsdata^u _r（ｔ）、およびデータｃ^u _k（ｔ）が、関数ｆにより活性化され、データｘ_i（ｔ）、tsdata_r（ｔ）、およびデータｃ_k（ｔ）として入力ノード１６０−ｉ、パラメータノード１６１−ｒ、およびコンテキスト入力ノード１６２−ｋから出力されることを表している。なお、データｘ^u _i（ｔ）、tsdata^u _r（ｔ）、およびｃ^u _k（ｔ）の上付きのｕは、活性化される前のノードの内部状態を表す（他のノードについても同様）。 The function f in the equations (3) to (5) is a differentiable continuous function such as a sigmoid function, and the equations (3) to (5) include the input node 160-i, the parameter node 161-r, and Data x ^u _i (t), tsdata ^u _r (t), and data c ^u _k (t) input to each of the context input nodes 162-k are activated by the function f, and the data x _i (t) , Tsdata _r (t), and data c _k (t) are output from the input node 160-i, the parameter node 161-r, and the context input node 162-k. The superscript ^{u of} the data x ^u _i (t), tsdata ^u _r (t), and c ^u _k (t) represents the internal state of the node before being activated (the same applies to other nodes). ).

隠れノード１６３−ｊに入力されるデータｈ^u _j（ｔ）は、入力ノード１６０−ｉと隠れノード１６３−ｊの結合の重みを表す重み係数ｗ^h _ij、パラメータノード１６１−ｒと隠れノード１６３−ｊの結合の重みを表す重み係数ｗ^h _jr、およびコンテキスト入力ノード１６２−ｋと隠れノード１６３−ｊの結合の重みを表す重み係数ｗ^h _jkとを用いて、式（６）で表すことができ、隠れノード１６３−ｊが出力するデータｈ_j（ｔ）は、式（７）で表すことができる。 The data h ^u _j (t) input to the hidden node 163-j is a weight coefficient w ^h _ij representing the weight of the connection between the input node 160-i and the hidden node 163-j, the parameter node 161-r and the hidden node 163. Using the weight coefficient w ^h _jr representing the weight of the connection of −j and the weight coefficient w ^h _jk representing the weight of the connection between the context input node 162 -k and the hidden node 163 -j, expressed by Expression (6) The data h _j (t) output from the hidden node 163-j can be expressed by Expression (7).

なお、式（６）の右辺の第１項のΣは、ｉ＝１乃至Ｉの全てについて加算することを表し、第２項のΣは、ｒ＝１乃至Ｒの全てについて加算することを表し、第３項のΣは、ｋ＝１乃至Ｋの全てについて加算することを表す。 Note that Σ in the first term on the right side of Equation (6) indicates that addition is performed for all of i = 1 to I, and Σ in the second term indicates that addition is performed for all of r = 1 to R. Σ in the third term represents addition for all of k = 1 to K.

同様にして、出力ノード１６４−ｉに入力されるデータｙ^u _i（ｔ）と、出力ノード１６４−ｉが出力するデータｙ_i（ｔ）、および、コンテキスト出力ノード１６５−ｋに入力されるデータｏ^u _k（ｔ）と、コンテキスト出力ノード１６５−ｋが出力するデータｏ_k（ｔ）は、次式で表すことができる。 Similarly, the data y ^u _i which is input to the output node 164-i (t), the output node 164-i outputs data y _i (t), and the data to be input to the context output nodes 165-k o ^u _k (t) and data o _k (t) output from the context output node 165-k can be expressed by the following equations.

式（８）のｗ^y _ijは、隠れノード１６３−ｊと出力ノード１６４−ｉの結合の重みを表す重み係数であり、Σは、ｊ＝１乃至Jの全てについて加算することを表す。また、式（１０）のｗ^o _jkは、隠れノード１６３−ｊとコンテキスト出力ノード１６５−ｋの結合の重みを表す重み係数であり、Σは、ｊ＝１乃至Jの全てについて加算することを表す。 In formula (8), w ^y _ij is a weighting coefficient that represents the weight of the connection between the hidden node 163-j and the output node 164-i, and Σ represents that all j = 1 to J are added. Also, w ^o _jk in equation (10) is a weighting coefficient that represents the weight of the connection between the hidden node 163-j and the context output node 165-k, and Σ is added for all of j = 1 to J. To express.

演算部１５４は、出力ノード１６４−ｉが出力するデータｙ_i（ｔ）から、時刻ｔのデータｘ^u _i（ｔ）と時刻ｔ＋１のデータｘ^u _i（ｔ＋１）との差分△ｘ^u _i（ｔ＋１）を式（１２）により求め、さらに、式（１３）により、時刻ｔ＋１のデータｘ^u _i（ｔ＋１）を計算して、出力する。 Calculation unit 154, the data y _i (t) to the output node 164-i outputs the difference △ x ^u _i between the time data t x ^u _i (t) at time t + 1 of the data ^{_{x u i (t + 1)}} ( t + 1) is obtained from equation (12), and data x ^u _i (t + 1) at time t + 1 is calculated and output from equation (13).

ここで、αおよびτは、任意の係数を表す。 Here, α and τ represent arbitrary coefficients.

したがって、CTRNN１４１に時刻ｔのデータｘ^u _i（ｔ）が入力されると、時刻ｔ＋１のデータｘ^u _i（ｔ＋１）がCTRNN１４１の演算部１５４から出力される。また、演算部１５４から出力された時刻ｔ＋１のデータｘ^u _i（ｔ＋１）は、入力ノード１６０−ｉにも供給される（フィードバックされる）。 Therefore, when data x ^u _i (t) at time t is input to CTRNN 141, data x ^u _i (t + 1) at time t + 1 is output from the calculation unit 154 of CTRNN 141. Further, the data x ^u _i (t + 1) at time t + 1 output from the calculation unit 154 is also supplied (feedback) to the input node 160-i.

演算部１５５は、コンテキスト出力ノード１６５−ｋが出力するデータｏ_k（ｔ）から、時刻ｔのデータｃ^u _k（ｔ）と、時刻ｔ＋１のデータｃ^u _k（ｔ＋１）との差分△ｃ^u _k（ｔ＋１）を式（１４）により求め、さらに、式（１５）により、時刻ｔ＋１のデータｃ^u _k（ｔ＋１）を計算して、出力する。 The computing unit 155 determines the difference Δc ^u between the data c ^u _k (t) at time t and the data c ^u _k (t + 1) at time t + 1 from the data o _k (t) output from the context output node 165 -k. _k (t + 1) is obtained by Expression (14), and data c ^u _k (t + 1) at time t + 1 is calculated and output by Expression (15).

演算部１５５から出力された時刻ｔ＋１のデータｃ^u _k（ｔ＋１）は、コンテキスト入力ノード１６２−ｋにフィードバックされる。 The data c ^u _k (t + 1) at time t + 1 output from the calculation unit 155 is fed back to the context input node 162-k.

式（１５）は、ネットワークの現在の内部状態を表す内部状態ベクトルｃ^u（ｔ）に、コンテキスト出力ノード１６５−ｋの出力であるデータｏ_k（ｔ）を係数αで重み付けて加算する（所定の割合で足しこむ）ことによって次の時刻ｔ＋１のネットワークの内部状態ベクトルｃ^u（ｔ＋１）とすることを意味しており、その意味で、図５のCTRNN１４１は、連続時間型のRNNであると言うことができる。 Expression (15) adds the data o _k (t), which is the output of the context output node 165-k, to the internal state vector c ^u (t) representing the current internal state of the network, weighted by the coefficient α (predetermined). This means that the internal state vector c ^u (t + 1) of the network at the next time t + 1 is obtained. In this sense, CTRNN 141 in FIG. 5 is a continuous-time RNN. I can say that.

以上のように、CTRNN１４１では、時刻ｔのデータｘ^u（ｔ）およびデータｃ^u（ｔ）が入力されると、時刻ｔ＋１のデータｘ^u（ｔ＋１）およびデータｃ^u（ｔ＋１）を生成して出力する処理を逐次的に行うので、重み係数ｗ^h _ij，ｗ^h _ir，ｗ^h _jk，ｗ^y _ij、およびｗ^o _jkが学習により求められているとすると、入力ノード１６０に入力する入力データｘ^u（ｔ）の初期値ｘ^u（ｔ₀）＝Ｘ０、パラメータノード１６１に入力するパラメータtsdata^u、コンテキスト入力ノード１６２に入力するコンテキスト入力データｃ^u（ｔ）の初期値ｃ^u（ｔ₀）＝Ｃ０を与えることにより、時系列データを生成することができる。 As described above, when data x ^u (t) and data c ^u (t) at time t are input, CTRNN 141 generates data x ^u (t + 1) and data c ^u (t + 1) at time t + 1. since the process of outputting sequentially the weighting factor ^{_{^{_{w h ij, w h ir,}}}} w h jk, When w ^y _ij, and w ^o _jk is obtained by learning, the input data to be input to the input node 160 x ^u (t) initial value x ^u (t ₀ ) = X 0, parameter tsdata ^u input to parameter node 161, initial value c ^u (t ₀ ) of context input data c ^u (t) input to context input node 162 ) = C0, time series data can be generated.

図５に示したCRTNN１４１を、図３のCRTNN８１として採用し、CRTNN１４１の入力ノード１６０に対してerrorL[Ｎ]が与えられ、パラメータノード１６１に対してタスクIDが与えられる。従って、図５の入力ノード１６０の個数Iは、下位時系列予測生成器６１のRNN７１の個数Ｎと一致する。なお、コンテキスト入力ノード１６２に入力するコンテキスト入力データｃ^u（ｔ）の初期値ｃ^u（ｔ₀）＝Ｃ０には、例えば、ランダムな所定の値が与えられる。 The CRTNN 141 shown in FIG. 5 is adopted as the CRTNN 81 in FIG. 3, errorL [N] is given to the input node 160 of the CRTNN 141, and the task ID is given to the parameter node 161. Accordingly, the number I of input nodes 160 in FIG. 5 matches the number N of RNNs 71 in the lower time series prediction generator 61. For example, a random predetermined value is given as the initial value c ^u (t ₀ ) = C _{0 of} the context input data c ^u (t) input to the context input node 162.

次に、図６のフローチャートを参照して、下位時系列予測生成器６１における、行動シーケンスに対応するセンサモータ信号の時系列データの学習処理について説明する。 Next, with reference to the flowchart of FIG. 6, the learning process of the time series data of the sensor motor signal corresponding to the action sequence in the lower time series prediction generator 61 will be described.

初めに、ステップＳ１において、下位時系列予測生成器６１の制御回路７６は、教師データとして供給された所定の時刻の入力データを読み込む。ここでの入力データは、上述したようにセンサモータ信号であり、例えば、時刻ｔのセンサモータ信号ｓｍ（ｔ）が読み込まれたものとする。読み込まれた時刻ｔのセンサモータ信号ｓｍ（ｔ）は、制御回路７６により、下位時系列予測生成器６１を構成するＮ個のRNN７１−１乃至７１−Ｎそれぞれに供給される。 First, in step S1, the control circuit 76 of the lower time series prediction generator 61 reads input data at a predetermined time supplied as teacher data. The input data here is a sensor motor signal as described above. For example, it is assumed that the sensor motor signal sm (t) at time t is read. The read sensor motor signal sm (t) at time t is supplied by the control circuit 76 to each of the N RNNs 71-1 to 71-N constituting the lower time series prediction generator 61.

ステップＳ２において、下位時系列予測生成器６１のRNN７１−ｎ（ｎ＝１，２，・・・，Ｎ）は、時刻ｔのセンサモータ信号ｓｍ（ｔ）に対して、時刻ｔ＋１のセンサモータ信号ｓｍ_n（ｔ＋１）を算出する。 In step S2, the RNN 71-n (n = 1, 2,..., N) of the lower time series prediction generator 61 detects the sensor motor signal at time t + 1 with respect to the sensor motor signal sm (t) at time t. sm _n (t + 1) is calculated.

また、ステップＳ２において、演算回路７４は、RNN７１−ｎの予測誤差errorL^t+1 _nを算出する。具体的には、演算回路７４は、予測誤差errorL^t+1 _nとして、式（１６）によって表される、時刻ｔ＋１から過去Ｌ時間ステップ分のセンサモータ信号に対する予測誤差を算出する。 In step S2, the arithmetic circuit 74 calculates a prediction error errorL ^{t + 1} _n of the RNN 71-n. Specifically, the arithmetic circuit 74 calculates a prediction error with respect to the sensor motor signal for the past L time steps from time t + 1, expressed by the equation (16), as a prediction error errorL ^{t + 1} _n .

式（１６）において、ｓｍ_n,i'（Ｔ）は、時刻Ｔのセンサモータ信号ｓｍ（Ｔ）を出力するRNN７１−ｎの出力層１０３のノード１１３（図４）がＩ’個あるうちのｉ’番目のノード１１３が出力するセンサモータ信号を表し、ｓｍ^* _n,i'（Ｔ）は、それに対する教師データとしてのセンサモータ信号を表す。 In equation (16), sm _{n, i ′} (T) is the number of nodes 113 (FIG. 4) in the output layer 103 of the RNN 71-n that outputs the sensor motor signal sm (T) at time T. The sensor motor signal output from the i'th node 113 is represented, and sm ^* _{n, i '} (T) represents the sensor motor signal as teacher data for the sensor motor signal.

式（１６）によれば、時刻Ｔ＝ｔ＋１−Ｌからｔ＋１までの、RNN７１−ｎの出力層１０３のｉ’番目のノード１１３のセンサモータ信号ｓｍ_n,i'（Ｔ）と教師データｓｍ^* _n,i'（Ｔ）との誤差の総和が、時刻ｔ＋１におけるRNN７１−ｎの予測誤差errorL^t+1 _nとされる。なお、過去のセンサモータ信号がＬ時間ステップ分ない場合には、存在する時間ステップ分のデータのみで予測誤差errorL^t+1 _nが求められる。 According to equation (16), the sensor motor signal sm _{n, i ′} (T) and the teacher data sm ^* of the i′-th node 113 of the output layer 103 of the RNN 71-n from time T = t + 1−L to t + 1 ^. The sum of errors from _{n, i ′} (T) is the prediction error errorL ^{t + 1} _n of RNN 71-n at time t + 1. When the past sensor motor signal does not include L time steps, the prediction error errorL ^{t + 1} _n is obtained using only data for existing time steps.

ステップＳ３において、演算回路７４は、時刻ｔ＋１におけるRNN７１−ｎの予測誤差errorL^t+1 _nをメモリ７５に供給する。これにより、メモリ７５には、RNN７１−１乃至７１−Ｎのｎ個の予測誤差errorL^t+1 ₁乃至errorL^t+1 _Nが供給され、メモリ７５は、予測誤差errorL^t+1［Ｎ］＝｛errorL^t+1 ₁，errorL^t+1 ₂，・・・，errorL^t+1 _N｝を記憶する。また、後述するステップＳ７の処理ＮＯと判定された場合、ステップＳ３の処理が所定時間ステップだけ繰り返されるので、メモリ７５には、教師データに対する予測誤差の時系列データerrorL［Ｎ］が記憶される。 In step S 3} , the arithmetic circuit 74 supplies the prediction error errorL ^{t + 1} _n of the RNN 71 -n at time t + 1 to the memory 75. Accordingly, n prediction errors errorL ^{t + 1} _{1 to} errorL ^{t + 1} _{N of} RNNs 71-1 to 71- _N are supplied to the memory 75, and the memory 75 stores the prediction errors errorL ^{t + 1} [N] = Store {errorL ^{t + 1} ₁ , errorL ^{t + 1} ₂ ,..., ErrorL ^{t + 1} _N }. If it is determined that the process in step S7, which will be described later, is NO, the process in step S3 is repeated for a predetermined time step, so that the memory 75 stores time series data errorL [N] of the prediction error for the teacher data. .

ステップＳ４において、制御回路７６は、予測誤差errorL^t+1 _nに応じたRNN７１−ｎの学習重みγ_nを算出する。具体的には、制御回路７６は、ソフトマックス関数（softmax関数）を用いた式（１７）により、学習重みγ_nを算出する。 In step S4, the control circuit 76 calculates the learning weight γ _n of the RNN 71-n corresponding to the prediction error errorL ^{t + 1} _n . Specifically, the control circuit 76 calculates the learning weight γ _{n according} to Expression (17) using a softmax function (softmax function).

ステップＳ５において、制御回路７６は、BPTT（Back Propagation Through Time）法によりRNN７１−ｎの重み係数ｗ_ab,nを更新する。ここで、重み係数ｗ_ab,nは、RNN７１−ｎの入力層１０１のノード１１１と中間層１０２のノード１１２との重み係数、または、RNN７１−ｎの中間層１０２のノード１１２と出力層１０２のノード１１３との重み係数を表す。 In step S5, the control circuit 76 updates the weight coefficient w _{ab, n} of the RNN 71-n by a BPTT (Back Propagation Through Time) method. Here, the weighting factor w _{ab, n} is the weighting factor between the node 111 of the input layer 101 of the RNN 71-n and the node 112 of the intermediate layer 102, or between the node 112 of the intermediate layer 102 of the RNN 71-n and the output layer 102. The weight coefficient with the node 113 is represented.

RNN７１−ｎの重み係数ｗ_ab,nの更新では、ステップＳ４で算出された学習重みγ_nに応じてRNN７１−ｎの重み係数ｗ_ab,nが算出される。具体的には、次式（１８）および（１９）により、BPTT法の繰り返し計算におけるｓ回目の重み係数ｗ_ab,n（ｓ）から、ｓ＋１回目の重み係数ｗ_ab,n（ｓ＋１）を求めることができる。 RNN71-n weighting coefficient w _{ab, and} in updating _n, the weighting coefficient w _ab of RNN71-n in accordance with the learning weights gamma _n calculated in step _{S4, n} is calculated. Specifically, the s + 1-th weight coefficient w _{ab, n} (s + 1) is obtained from the s- _th weight coefficient w _{ab, n} (s) in the repetitive calculation of the BPTT method by the following equations (18) and (19). be able to.

式（１８）において、η₁は学習係数を表し、α₁は慣性係数を表す。なお、式（１８）において、ｓ＝１の場合の△ｗ_ab,n（ｓ）は、０とする。 In Expression (18), η ₁ represents a learning coefficient, and α ₁ represents an inertia coefficient. In Equation (18), _{Δwab, n} (s) when s = 1 is set to 0.

ステップＳ６において、制御回路７６は、RNN７１−１乃至７１−Ｎの利用頻度FREQ₁乃至FREQ_Nをメモリ７５に供給する。メモリ７５は、供給されたRNN７１−１乃至７１−Ｎの利用頻度FREQ₁乃至FREQ_Nを記憶する。上述したステップＳ５において学習重みγ_nが大きいほど、そのRNN７１−ｎの重み係数ｗ_ab,nが更新され、RNN７１−ｎが利用されたことになる。従って、制御回路７６は、例えば、学習重みγ_nが所定の値以上であるRNN７１−ｎの利用頻度FREQ_nをカウントアップさせる。この利用頻度FREQ₁乃至FREQ_Nは、図１０を参照して後述する追加学習で使用される。 In step S 6, the control circuit 76 supplies the usage frequencies FREQ _{1 to} FREQ _N of the RNNs 71-1 to 71- _N to the memory 75. The memory 75 stores the usage frequencies FREQ _{1 to} FREQ _N of the supplied RNNs 71-1 to 71-N. As the learning weight γ _n is larger in step S5 described above, the weight coefficient w _{ab, n} of the RNN 71-n is updated, and the RNN 71-n is used. Accordingly, the control circuit 76, for example, learning weights gamma _n is to count up the use frequency FREQ _n of RNN71-n is greater than or equal to a predetermined value. The usage frequencies FREQ _{1 to} FREQ _N are used in additional learning described later with reference to FIG.

ステップＳ７において、下位時系列予測生成器６１の制御回路７６は、入力データの供給が終了したかを判定する。 In step S7, the control circuit 76 of the lower time series prediction generator 61 determines whether or not the supply of input data has been completed.

ステップＳ７で、入力データの供給が終了していないと判定された場合、即ち、ステップＳ１で供給された入力データの次の時刻の入力データが供給された場合、ステップＳ１に戻り、それ以降の処理が繰り返される。 If it is determined in step S7 that the supply of input data has not ended, that is, if input data at the next time of the input data supplied in step S1 is supplied, the process returns to step S1, and thereafter The process is repeated.

一方、ステップＳ７で、入力データの供給が終了したと判定された場合、学習処理は終了する。 On the other hand, if it is determined in step S7 that the supply of input data has ended, the learning process ends.

次に、上位時系列予測生成器６２のCRTNN８１における予測誤差の時系列データの学習について説明する。 Next, learning of time series data of prediction errors in the CRTNN 81 of the upper time series prediction generator 62 will be described.

情報処理装置５１を搭載したヒューマノイドロボットに、複数の行動シーケンスを学習させる場合、学習の結果得られた入力層１５１と中間層１５２の各ノード間の重み係数ｗ^h _ij，ｗ^h _jr、およびｗ^h _jkと、中間層１５２と出力層１５３の各ノード間の重み係数ｗ^y _ijおよびｗ^o _jkが、すべての行動シーケンスに対応可能な値である必要がある。 When a humanoid robot equipped with the information processing device 51 learns a plurality of action sequences, weight coefficients w ^h _ij , w ^h _jr , and w between nodes of the input layer 151 and the intermediate layer 152 obtained as a result of learning ^h _jk and the weight coefficients w ^y _ij and w ^o _jk between the nodes of the intermediate layer 152 and the output layer 153 need to be values that can correspond to all action sequences.

そこで、学習処理では、複数の行動シーケンスに対応する時系列データの学習が同時に実行される。即ち、学習処理では、学習させる行動シーケンスの数と同数のCTRNN１４１（図５）が用意され、行動シーケンスごとに重み係数ｗ^h _ij，ｗ^h _jr，ｗ^h _jk，ｗ^y _ij、およびｗ^o _jkをそれぞれ求め、それらの平均値を１つの重み係数ｗ^h _ij，ｗ^h _jr，ｗ^h _jk，ｗ^y _ij、およびｗ^o _jkとする処理を繰り返し実行することによって、生成処理で利用されるCTRNN８１の重み係数ｗ^h _ij，ｗ^h _jr，ｗ^h _jk，ｗ^y _ij、およびｗ^o _jkが求められる。 Therefore, in the learning process, learning of time series data corresponding to a plurality of action sequences is performed simultaneously. That is, in the learning process, the same number of CTRNNs 141 (FIG. 5) as the number of action sequences to be learned are prepared, and the weight coefficients w ^h _ij , w ^h _jr , w ^h _jk , w ^y _ij , and w ^o _jk for each action sequence. CTRNN81 used in the generation process is obtained by repeatedly executing the process of obtaining each of the average values and setting the average value thereof as one weighting coefficient w ^h _ij , w ^h _jr , w ^h _jk , w ^y _ij , and w ^o _jk. weight coefficients w ^h _ij ^{_{^{_{of, w h jr, w h jk}}}} , w y ij, and w ^o _jk is required.

図７は、Ｑ個の行動シーケンスに対応するＱ個の予測誤差の時系列データを学習する、上位時系列予測生成器６２の学習処理のフローチャートである。なお、本実施の形態では、学習する行動シーケンスは、行動シーケンスA，B、およびCの３つであるので、Ｑ＝３となる。 FIG. 7 is a flowchart of the learning process of the high-order time-series prediction generator 62 that learns time-series data of Q prediction errors corresponding to Q action sequences. In the present embodiment, since there are three action sequences to be learned, action sequences A, B, and C, Q = 3.

初めに、ステップＳ３１において、上位時系列予測生成器６２は、教師データとしての、Ｑ個の予測誤差の時系列データerrorL[Ｎ]を下位時系列予測生成器６１のメモリ７５から読み込む。そして、上位時系列予測生成器６２は、読み込んだＱ個の時系列データerrorL[Ｎ]を、Ｑ個のCRTNN１４１にそれぞれ供給する。 First, in step S 31, the upper time series prediction generator 62 reads time series data errorL [N] of Q prediction errors as teacher data from the memory 75 of the lower time series prediction generator 61. Then, the upper time series prediction generator 62 supplies the read Q pieces of time series data errorL [N] to the Q pieces of CRTNN 141, respectively.

ステップＳ３２において、上位時系列予測生成器６２は、Ｑ個の行動シーケンスそれぞれを識別するタスクIDを読み込む。本実施の形態では、３つの行動シーケンスA，B、およびCそれぞれを識別するタスクIDを読み込む。そして、上位時系列予測生成器６２は、行動シーケンスAの教師データを供給したCRT１４１には、行動シーケンスAを識別するタスクIDを供給し、行動シーケンスBの教師データを供給したCRT１４１には、行動シーケンスBを識別するタスクIDを供給し、行動シーケンスCの教師データを供給したCRT１４１には、行動シーケンスCを識別するタスクIDを供給する。 In step S32, the upper time-series prediction generator 62 reads a task ID for identifying each of the Q action sequences. In this embodiment, a task ID for identifying each of the three action sequences A, B, and C is read. Then, the upper time series prediction generator 62 supplies the task ID for identifying the action sequence A to the CRT 141 that has supplied the teacher data for the action sequence A, and the action RT to the CRT 141 that has supplied the teacher data for the action sequence B. A task ID for identifying the sequence B is supplied, and a task ID for identifying the behavior sequence C is supplied to the CRT 141 that has supplied the teacher data of the behavior sequence C.

ステップＳ３３において、上位時系列予測生成器６２は、学習回数を表す変数ｓに１を代入する。 In step S33, the higher-order time-series prediction generator 62 substitutes 1 for a variable s representing the number of learning times.

ステップＳ３４において、上位時系列予測生成器６２は、Ｑ個の時系列データにそれぞれ対応するCTRNN１４１において、BPTT法を用いて、入力層１５１と中間層１５２の各ノード間の重み係数ｗ^h _ij（ｓ）、ｗ^h _jr（ｓ）、およびｗ^h _jk（ｓ）の誤差量δｗ^h _ij、δｗ^h _jr、およびδｗ^h _jkと、中間層１５２と出力層１５３の各ノード間の重み係数ｗ^y _ij（ｓ）およびｗ^o _jk（ｓ）の誤差量δｗ^y _ijおよびδｗ^o _jkを計算する。ここで、ｑ（＝１，・・・，Ｑ）番目の時系列データが入力されたCTRNN１４１において、BPTT法を用いて得られた誤差量δｗ^h _ij，δｗ^h _jr，δｗ^h _jk，δｗ^y _ij、およびδｗ^o _jkを、それぞれ、誤差量δｗ^h _ij,q，δｗ^h _jr,q，δｗ^h _jk,q，δｗ^y _ij,q、およびδｗ^o _jk,qと表す。 In step S34, the higher-order time-series prediction generator 62 uses the BPTT method in the CTRNN 141 corresponding to each of the Q pieces of time-series data to use the weighting coefficient w ^h _ij (between the nodes of the input layer 151 and the intermediate layer 152. s), w ^h _jr (s), and w ^h _jk (s) error amounts δw ^h _ij , δw ^h _jr , and δw ^h _jk, and a weighting factor w ^y between each node of the intermediate layer 152 and the output layer 153. The error amounts δw ^y _ij and δw ^o _jk of _ij (s) and w ^o _jk (s) are calculated. Here, the error amounts δw ^h _ij , δw ^h _jr , δw ^h _jk , δw ^y obtained by using the BPTT method in the CTRNN 141 to which the q (= 1,..., Q) -th time series data is input. _ij and δw ^o _jk are expressed as error amounts δw ^h _{ij, q} , δw ^h _{jr, q} , δw ^h _{jk, q} , δw ^y _{ij, q} , and δw ^o _{jk, q} , respectively.

なお、上位時系列予測生成器６２は、ステップＳ３４のBPTT法を用いた計算において、時刻ｔ＋１のコンテキスト入力ノード１６２−ｋのデータｃ^u _k（ｔ＋１）の誤差量δｃ^u _k（ｔ＋１）を、時刻ｔのコンテキスト出力ノード１６５−ｋのデータｏ_k（ｔ）の誤差量δｏ_k（ｔ）に逆伝播する際、任意の正の係数ｍで割ることにより、コンテキストデータの時定数の調整を行う。 In the calculation using the BPTT method in step S34, the upper time series prediction generator 62 calculates the error amount δc ^u _k (t + 1) of the data c ^u _k (t + 1) of the context input node 162-k at time t + 1, when backpropagated error amount .delta.o _k (t) of time t context output nodes 165-k of the data o _k (t), by dividing any positive coefficients m, to adjust the time constant of the context data .

即ち、上位時系列予測生成器６２は、時刻ｔのコンテキスト出力ノード１６５−ｋのデータｏ_k（ｔ）の誤差量δｏ_k（ｔ）を、時刻ｔ＋１のコンテキスト入力ノード１６２−ｋのデータｃ^u _k（ｔ＋１）の誤差量δｃ^u _k（ｔ＋１）を用いた式（２０）によって求める。 That is, the upper time series prediction generator 62, the error amount .delta.o _k (t) of the data o _k context output nodes 165-k at time t (t), the time t + 1 of the context input node 162-k of the data c ^u _{This is obtained} by the equation (20) using the error amount δc ^u _k (t + 1) of _k (t + 1).

BPTT法において式（２０）を採用することにより、CTRNN１４１の内部状態を表すコンテキストデータの１時間ステップ先の影響度を調整することができる。 By adopting the equation (20) in the BPTT method, it is possible to adjust the influence level of the context data representing the internal state of the CTRNN 141 one step ahead.

ステップＳ３５において、上位時系列予測生成器６２は、入力層１５１と中間層１５２の各ノード間の重み係数ｗ^h _ij、ｗ^h _jr、およびｗ^h _jkと、中間層１５２と出力層１５３の各ノード間の重み係数ｗ^y _ijおよびｗ^o _jkのそれぞれを、Ｑ個の時系列データで平均化して、更新する。 In step S35, the upper time series prediction generator 62, each of the input layer 151 and the weighting coefficients w ^h _ij between nodes of the intermediate layer 152, w ^h _jr, and w ^h _jk and an intermediate layer 152 and the output layer 153 Each of the weight coefficients w ^y _ij and w ^o _jk between the nodes is averaged with Q time-series data and updated.

即ち、上位時系列予測生成器６２は、式（２１）乃至式（３０）により、入力層１５１と中間層１５２の各ノード間の重み係数ｗ^h _ij（ｓ＋１）、ｗ^h _jr（ｓ＋１）、およびｗ^h _jk（ｓ＋１）と、中間層１５２と出力層１５３の各ノード間の重み係数ｗ^y _ij（ｓ＋１）およびｗ^o _jk（ｓ＋１）を求める。 That is, the higher-order time series prediction generator 62 uses the weighting coefficients w ^h _ij (s + 1), w ^h _jr (s + 1), between the nodes of the input layer 151 and the intermediate layer 152 by the equations (21) to (30). And w ^h _jk (s + 1), and weighting factors w ^y _ij (s + 1) and w ^o _jk (s + 1) between the nodes of the intermediate layer 152 and the output layer 153 are obtained.

ここで、η₂は学習係数を表し、α₂は慣性係数を表す。なお、式（２１）、式（２３）、式（２５）、式（２７）、および式（２９）において、ｓ＝１の場合の△ｗ^h _ij（ｓ），△ｗ^h _jr（ｓ），△ｗ^h _jk（ｓ），△ｗ^y _ij（ｓ）、および△ｗ^o _jk（ｓ）は、０とする。 Here, η ₂ represents a learning coefficient, and α ₂ represents an inertia coefficient. In Expression (21), Expression (23), Expression (25), Expression (27), and Expression (29), _Δw ^h _ij (s), Δw ^h _jr (s) when s = 1. , Δw ^h _jk (s), Δw ^y _ij (s), and Δw ^o _jk (s) are set to zero.

ステップＳ３６において、上位時系列予測生成器６２は、変数ｓが所定の学習回数以下であるか否かを判定する。ここで設定される所定の学習回数は、学習誤差が十分に小さくなると認められる学習の回数である。 In step S36, the upper time series prediction generator 62 determines whether or not the variable s is less than or equal to a predetermined number of learning times. The predetermined number of learning times set here is the number of learning times that the learning error is recognized to be sufficiently small.

ステップＳ３６で、変数ｓが所定の学習回数以下であると判定された場合、即ち、学習誤差が十分に小さくなると認められるだけの回数の学習をまだ行っていない場合、ステップＳ３７において、上位時系列予測生成器６２は、変数ｓを１だけインクリメントして、ステップＳ３４に処理を戻す。これにより、ステップＳ３４乃至Ｓ３６の処理が繰り返される。一方、ステップＳ３６で、変数ｓが所定の学習回数より大きいと判定された場合、学習処理は終了する。 If it is determined in step S36 that the variable s is less than or equal to the predetermined number of learning times, that is, if learning has not yet been performed for the number of times that the learning error is recognized to be sufficiently small, the upper time series is determined in step S37. The prediction generator 62 increments the variable s by 1, and returns the process to step S34. Thereby, the process of step S34 thru | or S36 is repeated. On the other hand, if it is determined in step S36 that the variable s is greater than the predetermined number of learning times, the learning process ends.

なお、ステップＳ３６では、学習回数によって処理の終了を判定する以外に、学習誤差が所定の基準値以内となったか否かにより、処理の終了を判定してもよい。 In step S36, in addition to determining the end of the process based on the number of learnings, the end of the process may be determined based on whether or not the learning error is within a predetermined reference value.

以上のように、上位時系列予測生成器６２の学習処理では、行動シーケンスごとに重み係数ｗ^h _ij，ｗ^h _jr，ｗ^h _jk，ｗ^y _ij、およびｗ^o _jkをそれぞれ求め、それらの平均値を求める処理を繰り返し実行することによって、生成処理で利用されるCTRNN８１の重み係数ｗ^h _ij，ｗ^h _jr，ｗ^h _jk，ｗ^y _ij、およびｗ^o _jkが求められる。 As described above, in the learning process of the high-order time series prediction generator 62, the weight coefficients w ^h _ij , w ^h _jr , w ^h _jk , w ^y _ij , and w ^o _jk are obtained for each action sequence, and their averages are obtained. By repeatedly executing the process for obtaining the value, the weight coefficients w ^h _ij , w ^h _jr , w ^h _jk , w ^y _ij , and w ^o _{jk of the} CTRNN 81 used in the generation process are obtained.

なお、上述した学習処理では、各行動シーケンスの重み係数ｗ^h _ij，ｗ^h _jr，ｗ^h _jk，ｗ^y _ij、およびｗ^o _jkの平均値を求める処理を毎回実行するようにしたが、その処理は、所定回数ごとに実行するようにしてもよい。例えば、学習処理を終了する所定の学習回数が１００００回である場合に、１０回の学習回数ごとに各行動シーケンスの重み係数ｗ^h _ij，ｗ^h _jr，ｗ^h _jk，ｗ^y _ij、およびｗ^o _jkの平均値を求める処理を実行するようにしてもよい。 Incidentally, in the above-described learning process, the weight coefficient w ^h _ij of each behavior sequence, w ^h _jr, w ^h _jk, was w ^y _ij, and w ^o mean the seek processing _jk to be executed each time the The process may be executed every predetermined number of times. For example, when the predetermined number of learnings to end the learning process is 10,000, the weighting factors w ^h _ij , w ^h _jr , w ^h _jk , w ^y _ij , and w for each action sequence every 10 learning times. ^o Processing for _obtaining the average value of _jk may be executed.

次に、図８のフローチャートを参照して、図６および図７を参照して説明した学習処理によって求められた重み係数が設定されたRNN７１−１乃至７１−ＮおよびCTRNN８１を含む図３の情報処理装置５１による、時系列データを生成する生成処理について説明する。 Next, referring to the flowchart of FIG. 8, the information of FIG. 3 including RNNs 71-1 to 71-N and CTRNN 81 in which the weighting factors obtained by the learning process described with reference to FIGS. 6 and 7 are set. A generation process for generating time-series data by the processing device 51 will be described.

初めに、ステップＳ５１において、上位時系列予測生成器６２のCTRNN８１は、入力データの初期値を読み込む。ここでの入力データの初期値とは、入力ノード１６０とコンテキスト入力ノード１６２に供給する初期値であり、そこには、例えば、ランダムな所定の値が供給される。 First, in step S51, the CTRNN 81 of the upper time series prediction generator 62 reads the initial value of the input data. The initial value of the input data here is an initial value supplied to the input node 160 and the context input node 162, for example, a random predetermined value is supplied thereto.

ステップＳ５２において、上位時系列予測生成器６２のCTRNN８１は、行動シーケンスを識別するタスクIDを読み込む。読み込まれたタスクIDは、パラメータノード１６１に供給される。 In step S52, the CTRNN 81 of the high-order time-series prediction generator 62 reads a task ID that identifies an action sequence. The read task ID is supplied to the parameter node 161.

ステップＳ５３において、上位時系列予測生成器６２のCTRNN８１は、所定の時刻におけるRNN７１−１乃至７１−Ｎの推定予測誤差errorPredH［Ｎ］の生成処理を実行する。この生成処理の詳細は、図９を参照して後述するが、CTRNN８１は、例えば、時刻ｔ＋１における推定予測誤差errorPredH^t+1［Ｎ］を生成し、ゲート信号変換部６３に出力する。 In step S53, the CTRNN 81 of the higher-order time-series prediction generator 62 performs a process of generating the estimated prediction errors errorPredH [N] of the RNNs 71-1 to 71-N at a predetermined time. Details of this generation processing will be described later with reference to FIG. 9, but the CTRNN 81 generates, for example, an estimated prediction error errorPredH ^{t + 1} [N] at time t + 1, and outputs it to the gate signal conversion unit 63.

ステップＳ５４において、ゲート信号変換部６３は、上述した式（２）により、供給された推定予測誤差errorPredH^t+1［Ｎ］をゲート信号gate^t+1［Ｎ］に変換し、変換した結果をゲート７２−１乃至７２−Ｎに出力する。 In step S54, the gate signal conversion unit 63 converts the supplied estimated prediction error errorPredH ^{t + 1} [N] into the gate signal gate ^{t + 1} [N] according to the above-described equation (2), and the converted result is obtained. Output to the gates 72-1 to 72-N.

ステップＳ５５において、時刻ｔのセンサモータ信号ｓｍ（ｔ）が下位時系列予測生成器６１のRNN７１−ｎに入力され、RNN７１−ｎは、入力された時刻ｔのセンサモータ信号ｓｍ（ｔ）に対して、時刻ｔ＋１のセンサモータ信号ｓｍ_n（ｔ＋１）を生成し、ゲート７２−ｎに出力する。 In step S55, the sensor motor signal sm (t) at time t is input to the RNN 71-n of the lower time series prediction generator 61, and the RNN 71-n corresponds to the input sensor motor signal sm (t) at time t. Thus, the sensor motor signal sm _n (t + 1) at time t + 1 is generated and output to the gate 72-n.

ステップＳ５６において、ゲート７２−ｎは、ゲート信号変換部６３から供給されたゲート信号gate^t+1［Ｎ］のうちのゲート信号ｇ^t+1 _nに応じたセンサモータ信号ｓｍ_n（ｔ＋１）の出力を行う。即ち、ゲート７２−ｎにおいては、ゲート信号ｇ^t+1 _nが大きいときにはゲートが大きく開かれ、ゲート信号ｇ^t+1 _nが小さいときにはゲートを小さく開かれる。合成回路７３には、ゲート７２−ｎのゲートの開き具合に応じたセンサモータｓｍ_n（ｔ＋１）が供給される。 In step S56, the gate 72-n outputs the sensor motor signal sm _n (t + 1) corresponding to the gate signal g ^{t + 1} _n in the gate signal gate ^{t + 1} [N] supplied from the gate signal conversion unit 63. Output. That is, in the gate 72-n, when the gate signal g ^{t + 1} _n is large, the gate is opened widely, and when the gate signal g ^{t + 1} _n is small, the gate is opened small. A sensor motor sm _n (t + 1) corresponding to the degree of opening of the gate 72-n is supplied to the synthesis circuit 73.

ステップＳ５７において、合成回路７３は、式（１）によりゲート７２−１乃至７２−Ｎそれぞれからの出力を合成し、その合成の結果を、時刻ｔ＋１のセンサモータ信号ｓｍ（ｔ＋１）として出力する。 In step S57, the synthesizing circuit 73 synthesizes the outputs from the gates 72-1 to 72-N according to equation (1), and outputs the result of the synthesis as a sensor motor signal sm (t + 1) at time t + 1.

ステップＳ５８において、情報処理装置５１は、時系列データの生成を終了するかを判定する。ステップＳ５８で、時系列データの生成を終了しないと判定された場合、処理はステップＳ５３に戻り、それ以降の処理が繰り返される。その結果、上位時系列予測生成器６２では、前回のステップＳ５３で処理した時刻ｔ＋１の次の時刻ｔ＋２おける推定予測誤差errorPredH^t+2［Ｎ］が生成され、下位時系列予測生成器６１では、時刻ｔ＋１のセンサモータ信号ｓｍ（ｔ＋１）に対するセンサモータｓｍ（ｔ＋２）が生成される。 In step S58, the information processing apparatus 51 determines whether to end the generation of time series data. If it is determined in step S58 that generation of time-series data is not terminated, the process returns to step S53, and the subsequent processes are repeated. As a result, the upper time series prediction generator 62 generates an estimated prediction error errorPredH ^{t + 2} [N] at the time t + 2 next to the time t + 1 processed in the previous step S53, and the lower time series prediction generator 61 A sensor motor sm (t + 2) corresponding to the sensor motor signal sm (t + 1) at time t + 1 is generated.

一方、ステップＳ５８で、例えば、所定の時間ステップ数に到達するなどして、時系列データの生成を終了すると判定された場合、生成処理は終了する。 On the other hand, if it is determined in step S58 that generation of time-series data is to be terminated, for example, when a predetermined number of time steps has been reached, the generation process ends.

次に、図９のフローチャートを参照して、図８のステップＳ５３における、推定予測誤差errorPredH［Ｎ］の生成処理について説明する。図９では、時刻ｔ＋１における推定予測誤差errorPredH^t+1［Ｎ］を生成する例について説明する。 Next, the generation process of the estimated prediction error errorPredH [N] in step S53 of FIG. 8 will be described with reference to the flowchart of FIG. FIG. 9 illustrates an example of generating the estimated prediction error errorPredH ^{t + 1} [N] at time t + 1.

初めに、ステップＳ７１において、入力ノード１６１−ｉは、データｘ_i（ｔ）を式（３）により計算し、パラメータノード１６１−ｒは、データtsdata_r（ｔ）を式（４）により計算し、コンテキスト入力ノード１６２−ｋは、データｃ_k（ｔ）を式（５）により計算して、それぞれ出力する。 First, in step S71, the input node 161-i calculates data x _i (t) according to the equation (3), and the parameter node 161-r calculates data tsdata _r (t) according to the equation (4). , The context input node 162-k calculates the data c _k (t) according to the equation (5) and outputs them.

ステップＳ７２において、隠れノード１６３−ｊは、式（６）を計算することによりデータｈ^u _j（ｔ）を得て、データｈ_j（ｔ）を式（７）により計算して出力する。 In step S72, the hidden node 163-j obtains the data h ^u _j (t) by calculating the expression (6), and calculates and outputs the data h _j (t) by the expression (7).

ステップＳ７３において、出力ノード１６４−ｉは、式（８）を計算することによりデータｙ^u _i（ｔ）を得て、データｙ_i（ｔ）を式（９）により計算して出力する。 In step S73, the output node 164-i, with the data y ^u _i (t) by calculating equation (8), and outputs data y _i (t) is calculated by Equation (9).

ステップＳ７４において、コンテキスト出力ノード１６５−ｋは、式（１０）を計算することによりデータｏ^u _k（ｔ）を得て、データｏ_k（ｔ）を式（１１）により計算して出力する。 In step S74, the context output nodes 165-k, with the data o ^u _k (t) by calculating equation (10), and outputs the data o _k a (t) calculated by Equation (11).

ステップＳ７５において、演算部１５４は、差分△ｘ^u _i（ｔ＋１）を式（１２）により求め、時刻ｔ＋１のデータｘ^u _i（ｔ＋１）を式（１３）により計算し、ゲート信号変換部６３に出力する。 In step S75, the calculation unit 154 calculates the difference △ x ^u _i a (t + 1) Equation (12), the time t + 1 of the data x ^u _i a (t + 1) calculated by the equation (13), the gate signal converter unit 63 Output.

ステップＳ７６において、演算部１５５は、差分△ｃ^u _k（ｔ＋１）を式（１４）により求め、時刻ｔ＋１のデータｃ^u _k（ｔ＋１）を式（１５）により計算する。また、演算部１５５は、式（１５）による計算の結果得られた時刻ｔ＋１のデータｃ^u _k（ｔ＋１）を、コンテキスト入力ノード１６２−ｋにフィードバックする。 In step S76, the operation unit 155 obtains the difference Δc ^u _k (t + 1) by the equation (14) and calculates the data c ^u _k (t + 1) at the time t + 1 by the equation (15). Further, the arithmetic unit 155 feeds back the data c ^u _k (t + 1) at time t + 1 obtained as a result of the calculation according to Expression (15) to the context input node 162-k.

ステップＳ７７において、演算部１５４は、式（１３）による計算の結果得られた時刻ｔ＋１のデータｘ^u _i（ｔ＋１）を、入力ノード１６１−ｉにフィードバックする。そして、処理は図８のステップＳ５３に戻り、ステップＳ５４に進む。 In step S77, the arithmetic unit 154 feeds back the data x ^u _i (t + 1) at time t + 1 obtained as a result of the calculation according to Expression (13) to the input node 161-i. And a process returns to step S53 of FIG. 8, and progresses to step S54.

以上のように、図８の生成処理によれば、上位時系列予測生成器６２が、下位時系列生成器６１のRNN７１−１乃至７１−Ｎが生成時に発生させる予測誤差の推定値である推定予測誤差errorPredH［Ｎ］を出力し、この推定予測誤差errorPredH［Ｎ］が、ゲート７２−１乃至７２−Ｎの開閉状態を制御するゲート信号gate［Ｎ］に変換される。そして、上述の（１）式で表される、開閉状態が制御されたゲート７２−１乃至７２−Ｎから出力されたRNN７１−１乃至７１−Ｎの出力信号ｓｍ₁（ｔ＋１）乃至ｓｍ_N（ｔ＋１）の総和が、時刻ｔ＋１のセンサモータ信号ｓｍ（ｔ＋１）として、ヒューマノイドロボットのセンサおよびモータに供給され、タスクIDで指定された行動シーケンスが実行される。 As described above, according to the generation process of FIG. 8, the upper time series prediction generator 62 is an estimation that is an estimation value of a prediction error generated by the RNNs 71-1 to 71-N of the lower time series generator 61 at the time of generation. Prediction error errorPredH [N] is output, and this estimated prediction error errorPredH [N] is converted into a gate signal gate [N] that controls the open / closed states of the gates 72-1 to 72-N. Then, the output signals sm ₁ (t + 1) to sm _N (output from the RNNs 71-1 to 71-N output from the gates 72-1 to 72-N whose open / close states are controlled, represented by the above-described equation (1). The sum of t + 1) is supplied to the sensor and motor of the humanoid robot as a sensor motor signal sm (t + 1) at time t + 1, and the action sequence specified by the task ID is executed.

次に、情報処理装置５１に、これまで学習させた行動シーケンスA，B、およびC以外の行動シーケンスを追加して学習させる追加学習について説明する。以下では、ホームポジションにいるロボットが、物体を両手で掴んで所定の高さだけ持ち上げ、物体が元々置かれていたテーブルより一段高い前方のテーブルに置いて、ホームポジションに戻る動作となる行動シーケンスDを追加学習させる。 Next, additional learning will be described in which the information processing apparatus 51 learns by adding action sequences other than the action sequences A, B, and C learned so far. In the following, the action sequence is such that the robot in the home position grabs the object with both hands, lifts it up to a predetermined height, puts it on the table one level higher than the table on which the object was originally placed, and returns to the home position Learn additional D.

下位時系列予測生成器６１のRNN７１−１乃至７１−Ｎには、上述したように、それぞれ異なる行動部品が学習されている。また、一般的には、RNN７１の個数であるＮ個は行動部品の数よりも十分大きく用意されるため、RNN７１−１乃至７１−Ｎの中には、行動部品が学習されていないRNN７１（以下、適宜、未使用のRNN７１とも称する）も存在する。 As described above, different behavioral components are learned in the RNNs 71-1 to 71-N of the lower time series prediction generator 61, respectively. In general, N, which is the number of RNNs 71, is prepared to be sufficiently larger than the number of action parts, and therefore, among the RNNs 71-1 to 71-N, RNNs 71 for which no action parts have been learned (hereinafter referred to as RNN 71). (Also referred to as unused RNN 71 as appropriate).

これまで学習させた行動シーケンスA，B、およびCに追加して、新たな行動シーケンスDを学習させる場合、既に行動部品が学習されているRNN７１は、そのままにして、未使用のRNN７１に、追加の行動シーケンスDに含まれる新たな行動部品を学習させるのが効率が良い。この場合、追加の行動シーケンスDの学習によってこれまで学習させたRNN７１を壊す（RNN７１の重み係数を変更する）ことがなく、新たな行動シーケンスDに、これまで学習させた行動部品が含まれていた場合、その行動部品を共通に利用することもできる。 In addition to the previously learned behavior sequences A, B, and C, when learning a new behavior sequence D, the RNN 71 that has already learned the behavior component is left as it is and added to the unused RNN 71 It is efficient to learn a new behavior part included in the behavior sequence D. In this case, the RNN 71 learned so far by the learning of the additional behavior sequence D is not broken (the weighting coefficient of the RNN 71 is changed), and the behavior component learned so far is included in the new behavior sequence D. In such a case, the action parts can be used in common.

そこで、下位時系列予測生成器６１は、行動シーケンスDを追加学習する際、既に行動部品が学習されているRNN７１には、その重み係数を変更しにくくするような抵抗を与える。 Therefore, when the lower time-series prediction generator 61 additionally learns the behavior sequence D, the RNN 71 in which the behavior component has already been learned is given resistance that makes it difficult to change the weighting factor.

既に行動部品が学習されているRNN７１とは、即ち、図６のステップＳ６の処理により、メモリ７５に記憶されている利用頻度FREQ_nが大きいRNN７１−ｎである。 The RNN 71 for which the behavioral part has already been learned is the RNN 71-n having a high usage frequency FREQ _n stored in the memory 75 by the process of step S6 in FIG.

従って、下位時系列予測生成器６１の制御回路７６は、図１０に示すような、利用頻度FREQ_nが少ないRNN７１−ｎほど重み係数を更新し易く、利用頻度FREQ_nが大きいRNN７１−ｎは、重み係数を更新しにくい、換言すれば、利用頻度FREQ_nに負の相関を有する関数ｈ₁によって学習重みμ_nを決定する。図１０に示す関数ｈ₁が表す曲線は、利用頻度FREQ_nが小さいほど傾きが大きく、利用頻度FREQ_nが大きいほど傾きが小さくなる曲線である。なお、図１０では、関数ｈ₁が非線形な曲線として示されているが、負の相関を有する関数であれば、線形な直線であっても勿論よい。 Accordingly, the control circuit 76 of the lower time series prediction generator 61, as shown in FIG. 10, it is easy to update the use frequency FREQ _n is less RNN71-n as the weighting factor, RNN71-n use frequency FREQ _n is large, It is difficult to update the weight coefficient, in other words, the learning weight μ _n is determined by the function h ₁ having a negative correlation with the use frequency FREQ _n . The curve represented by the function h ₁ shown in FIG. 10 is a curve having a larger slope as the usage frequency FREQ _n is smaller, and a smaller slope as the usage frequency FREQ _n is larger. In FIG. 10, the function h ₁ is shown as a non-linear curve, but may be a linear straight line as long as it is a function having a negative correlation.

図１１のフローチャートを参照して、情報処理装置５１の追加学習処理について説明する。 The additional learning process of the information processing apparatus 51 will be described with reference to the flowchart of FIG.

初めに、ステップＳ１０１において、下位時系列予測生成器６１の制御回路７６は、メモリ７５に記憶されているRNN７１−１乃至７１−Ｎの利用頻度FREQ₁乃至FREQ_Nを読み出す。 _First , in step S 101, the control circuit 76 of the lower time series prediction generator 61 reads the usage frequencies FREQ _{1 to} FREQ _N of the RNNs 71-1 to 71-N stored in the memory 75.

ステップＳ１０２において、下位時系列予測生成器６１の制御回路７６は、図１０に示した関数ｈ₁を用いて、RNN７１−ｎの利用頻度FREQ_nに応じた学習重みμ_nを決定する。決定された学習重みμ_nは、RNN７１−ｎに供給される。 In step S102, the control circuit 76 of the low-order time-series prediction generator 61 determines a learning weight μ _n according to the usage frequency FREQ _n of the RNN 71-n using the function h ₁ shown in FIG. The determined learning weight μ _n is supplied to the RNN 71-n.

ステップＳ１０３において、情報処理装置５１は、行動シーケンスDに対応するセンサモータ信号の時系列データを学習する、図６の下位時系列予測生成器６１の学習処理、即ち、ステップＳ１乃至Ｓ７の処理を実行する。但し、ステップＳ１０３の処理での図６のステップＳ５においては、式（１８）に代えて、学習重みμ_nが含まれる次式（３１）を採用する。 In step S103, the information processing apparatus 51 learns the time series data of the sensor motor signal corresponding to the action sequence D, the learning process of the lower time series prediction generator 61 in FIG. 6, that is, the processes of steps S1 to S7. Execute. However, in step S5 of FIG. 6 in the process of step S103, the following equation (31) including the learning weight μ _n is employed instead of equation (18).

ステップＳ１０３の処理後、行動シーケンスDの予測誤差の時系列データerrorL[Ｎ]がメモリ７５に記憶される。 After the process of step S 103, the time series data errorL [N] of the prediction error of the action sequence D is stored in the memory 75.

ステップＳ１０４において、情報処理装置５１は、行動シーケンスA，B、およびCに、追加された行動シーケンスDの予測誤差の時系列データerrorL[Ｎ]をメモリ７５から読み出し、その４個の予測誤差の時系列データについて、図７の上位時系列予測生成器６２の学習処理、即ち、ステップＳ３１乃至Ｓ３７の処理を実行する。そして、追加学習処理は終了する。 In step S104, the information processing apparatus 51 reads the time series data errorL [N] of the prediction error of the action sequence D added to the action sequences A, B, and C from the memory 75, and calculates the four prediction errors. For the time series data, the learning process of the upper time series prediction generator 62 of FIG. 7, that is, the processes of steps S31 to S37 are executed. Then, the additional learning process ends.

以上のように、情報処理装置５１の追加学習処理では、これまでの学習で利用頻度FREQ_nが大きいRNN７１−ｎについて、その重み係数を変更しにくくするような学習重みμ_nを与えて、RNN７１−ｎの重み係数を学習する。これにより、追加の行動シーケンスDの学習によってこれまで学習させたRNN７１の重み係数をできるだけ変更せずに、追加される行動シーケンスを効率的に学習することができる。 As described above, in the additional learning process of the information processing device 51, the RNN 71- _n is given the learning weight μ _n that makes it difficult to change the weighting coefficient for the RNN 71-n having a large use frequency FREQ _{n in} the learning so far. Learn -n weighting factors. Thereby, it is possible to efficiently learn the added action sequence without changing the weighting coefficient of the RNN 71 learned so far by learning the additional action sequence D as much as possible.

次に、本発明を適用した情報処理装置のその他の構成例について説明する。 Next, another configuration example of the information processing apparatus to which the present invention is applied will be described.

図１２は、情報処理装置５１のその他の構成例を示している。図１２において、図３の情報処理装置５１と対応する部分については同一の符号を付してあり、その説明は省略する。 FIG. 12 shows another configuration example of the information processing apparatus 51. 12, parts corresponding to those of the information processing apparatus 51 in FIG. 3 are denoted by the same reference numerals, and description thereof is omitted.

図１２の情報処理装置５１は、時間フィルタ部２０１と非線形フィルタ部２０２が新たに設けられている点を除いては、図３の情報処理装置５１と同様に構成されている。 The information processing apparatus 51 in FIG. 12 is configured in the same manner as the information processing apparatus 51 in FIG. 3 except that a time filter unit 201 and a nonlinear filter unit 202 are newly provided.

時間フィルタ部２０１には、下位時系列予測生成器６１が出力する予測誤差の時系列データerrorL［Ｎ］が入力される。時間フィルタ部２０１と非線形フィルタ部２０２は、そこに入力される時系列データに所定のフィルタ処理を施し、処理後の時系列データを後段に出力する。非線形フィルタ部２０２は、処理後の時系列データを、予測誤差の時系列データerrorL’［Ｎ］として、上位時系列予測生成器６２に供給する。 The time filter unit 201 receives time series data errorL [N] of prediction errors output from the lower time series prediction generator 61. The time filter unit 201 and the non-linear filter unit 202 perform predetermined filter processing on the time series data input thereto, and output the processed time series data to the subsequent stage. The nonlinear filter unit 202 supplies the processed time series data to the upper time series prediction generator 62 as the time series data errorL ′ [N] of the prediction error.

上位時系列予測生成器６２は、予測誤差の時系列データを学習するが、ある程度長い時間ステップでのRNN７１−１乃至７１−Ｎの予測誤差の大まかな変動が分かればよく、短時間での微小な変動はあまり関係しない。 The high-order time-series prediction generator 62 learns time-series data of prediction errors, but only needs to know rough fluctuations in the prediction errors of the RNNs 71-1 to 71-N in a somewhat long time step. Such fluctuations are not so relevant.

時間フィルタ部２０１は、下位時系列予測生成器６１が出力する予測誤差の時系列データerrorL［Ｎ］に対して、時間フィルタ処理を施す。即ち、時間フィルタ部２０１は、下位時系列予測生成器６１が出力する予測誤差の時系列データerrorL［Ｎ］に、いわゆるローパスフィルタ処理を施し、処理後の時系列データを非線形フィルタ部２０２に供給する。例えば、ローパスフィルタ処理としては、所定の時間ステップ数の移動平均などを用いることができる。これにより、短時間での微小な変動が抑制された、RNN７１−１乃至７１−Ｎの予測誤差の時系列データを上位時系列予測生成器６２に供給することができる。 The time filter unit 201 performs time filter processing on the prediction error time series data errorL [N] output from the lower time series prediction generator 61. That is, the time filter unit 201 performs a so-called low-pass filter process on the time series data errorL [N] of the prediction error output from the lower time series prediction generator 61 and supplies the processed time series data to the nonlinear filter unit 202. To do. For example, as the low-pass filter process, a moving average of a predetermined number of time steps can be used. Thereby, the time series data of the prediction errors of the RNNs 71-1 to 71-N in which minute fluctuations in a short time are suppressed can be supplied to the upper time series prediction generator 62.

なお、ある程度長い時間ステップでのRNN７１−１乃至７１−Ｎの予測誤差の大まかな変動を上位時系列予測生成器６２が学習するためには、上位時系列予測生成器６２のCTRNN８１が時系列データをサンプリングするときのサンプリングレートを、下位時系列予測生成器６１のRNN７１のサンプリングレートよりも大きくすることによっても実現可能である。例えば、上位時系列予測生成器６２は、下位時系列予測生成器６１のRNN７１の時系列データを所定の時間間隔で間引いた時系列データを学習することで、RNN７１−１乃至７１−Ｎの予測誤差の大まかな変動を学習することができる。また、式（１３）および式（１５）の係数τを調整することにより、時間サンプリングを調整することができる。この場合、係数τが大きいほど、RNN７１−１乃至７１−Ｎの予測誤差の大まかな変動を学習することができる。 In order for the upper time-series prediction generator 62 to learn rough fluctuations in the prediction errors of the RNNs 71-1 to 71-N in a somewhat long time step, the CTRNN 81 of the upper time-series prediction generator 62 is time-series data. This can also be realized by making the sampling rate when sampling the higher than the sampling rate of the RNN 71 of the lower time series prediction generator 61. For example, the upper time series prediction generator 62 learns time series data obtained by thinning out the time series data of the RNN 71 of the lower time series prediction generator 61 at a predetermined time interval, thereby predicting the RNNs 71-1 to 71-N. A rough variation of the error can be learned. Further, the time sampling can be adjusted by adjusting the coefficient τ in the equations (13) and (15). In this case, the larger the coefficient τ is, the more the variation in the prediction errors of the RNNs 71-1 to 71-N can be learned.

非線形フィルタ部２０２は、図１３に示すような、入力される予測誤差errorL_nが小さい範囲では傾きが大きく、入力される予測誤差errorL_nが大きくなるほど傾きが小さくなる非線形の曲線で表される関数ｈ₂によって、入力される予測誤差errorL_nを変換する。非線形フィルタ部２０２は、変換処理後の予測誤差errorL’[Ｎ]を上位時系列予測生成器６２に供給する。 The non-linear filter unit 202 has a function represented by a non-linear curve as shown in FIG. 13 in which the slope is large in the range where the input prediction error errorL _n is small, and the slope is small as the input prediction error errorL _n is large. The input prediction error errorL _n is converted by h ₂ . The nonlinear filter unit 202 supplies the prediction error errorL ′ [N] after the conversion process to the upper time series prediction generator 62.

情報処理装置５１の生成処理では、図８を参照して説明したように、予測誤差errorL［Ｎ］の学習によって得られる推定予測誤差errorPredH_nがより小さいRNN７２−ｎほどゲートが大きく開くように制御される。反対に、推定予測誤差errorPredH_nが大きいRNN７２−ｎが出力するセンサモータ信号ｓｍ_n（ｔ＋１）は、ほとんど利用されない。 In the generation processing of the information processing apparatus 51, as described with reference to FIG. 8, the control is performed so that the gate is opened wider as the RNN 72-n has a smaller estimated prediction error errorPredH _n obtained by learning the prediction error errorL [N]. Is done. Conversely, the sensor motor signal sm _n (t + 1) output by the RNN 72- _n having a large estimated prediction error errorPredH _n is hardly used.

従って、推定予測誤差errorPredH_nがより小さいRNN７２−ｎほど、下位時系列予測生成器６１が出力するセンサモータ信号ｓｍ（ｔ＋１）への寄与率は高く、重要であると言うことができる。 Therefore, it can be said that the RNN 72- _n having a smaller estimated prediction error errorPredH _n has a higher contribution rate to the sensor motor signal sm (t + 1) output from the lower time series prediction generator 61 and is more important.

例えば、RNN７２−１の予測誤差errorL₁とRNN７２−ｎの予測誤差errorL_nが、０乃至１の間の小さい値（例えば、０．３など）で拮抗していた場合と、０乃至１の間の大きい値（例えば、０．９など）で拮抗していた場合とを考えると、RNN７２−１の予測誤差errorL₁とRNN７２−ｎの予測誤差errorL_nが０乃至１の間の小さい値で拮抗していた場合、生成時に、RNN７２−１またはRNN７２−ｎが出力するセンサモータ信号ｓｍ₁（ｔ＋１）またはｓｍ_n（ｔ＋１）の、下位時系列予測生成器６１が出力するセンサモータ信号ｓｍ（ｔ＋１）への寄与率は高いので、RNN７２−１とRNN７２−ｎのセンサモータ信号のどちらが優位であるかは重要になってくる。 For example, when the prediction error errorL ₁ of the RNN 72-1 and the prediction error errorL _n of the RNN 72- _n are antagonized by a small value between 0 and 1 (for example, 0.3), and between 0 and 1 great value (e.g., 0.9, etc.) considering a case where not conflict with, antagonistic small value between the prediction error ErrorL _n of the prediction error ErrorL ₁ and RNN72-n of RNN72-1 is 0 to 1 In this case, the sensor motor signal sm (t + 1) output from the lower time series prediction generator 61 of the sensor motor signal sm ₁ (t + 1) or sm _n (t + 1) output from the RNN 72-1 or RNN 72-n at the time of generation. ) Is high, it becomes important which of the sensor motor signals of RNN 72-1 and RNN 72-n is superior.

一方、RNN７２−１の予測誤差errorL₁とRNN７２−ｎの予測誤差errorL_nが０乃至１の間の大きい値で拮抗していた場合、RNN７２−１とRNN７２−ｎ以外に、より小さい予測誤差を有するRNN７２がいると考えられ、生成時に、RNN７２−１またはRNN７２−ｎが出力するセンサモータ信号ｓｍ₁（ｔ＋１）またはｓｍ_n（ｔ＋１）が、下位時系列予測生成器６１が出力するセンサモータ信号ｓｍ（ｔ＋１）に含まれる率は少ないので、RNN７２−１とRNN７２−ｎのセンサモータ信号のどちらが優位であるかは、さほど重要ではない。 On the other hand, when the prediction error errorL ₁ of the RNN 72-1 and the prediction error errorL _n of the RNN 72-n compete with each other with a large value between 0 and 1, a smaller prediction error is obtained in addition to the RNN 72-1 and the RNN 72-n. The sensor motor signal sm ₁ (t + 1) or sm _n (t + 1) output from the RNN 72-1 or RNN 72 -n at the time of generation is the sensor motor signal output from the lower time series prediction generator 61. Since the rate included in sm (t + 1) is small, it is not so important which of the sensor motor signals of RNN 72-1 and RNN 72-n is dominant.

非線形フィルタ部２０２は、関数ｈ₂によって、センサモータ信号ｓｍ（ｔ＋１）の生成に重要な予測誤差errorLの小さいRNN７２どうしの優位差を大きくし、センサモータ信号ｓｍ（ｔ＋１）の生成に重要ではない予測誤差errorLの大きいRNN７２どうしの優位差を小さくする処理を行う。これにより、上位時系列予測生成器６２において、学習に重要なRNN７１が出力した予測誤差errorLを効率的に学習することができる。 The non-linear filter unit 202 increases the dominant difference between the RNNs 72 having a small prediction error errorL, which is important for generating the sensor motor signal sm (t + 1), by the function h ₂ and is not important for generating the sensor motor signal sm (t + 1). A process of reducing the dominant difference between the RNNs 72 having a large prediction error errorL is performed. As a result, the upper time series prediction generator 62 can efficiently learn the prediction error errorL output by the RNN 71 important for learning.

時間フィルタ部２０１と非線形フィルタ部２０２の動作は、図７を参照して説明したフローチャートのステップＳ３１の、上位時系列予測生成器６２が、教師データとしての、Ｑ個の予測誤差の時系列データerrorL[Ｎ]を下位時系列予測生成器６１のメモリ７５から読み込む場合において、時間フィルタ部２０１と非線形フィルタ部２０２によって処理された後のＱ個の予測誤差の時系列データerrorL’[Ｎ]を読み込む動作となる。 The operations of the time filter unit 201 and the non-linear filter unit 202 are performed as follows. Time series data of Q prediction errors is used as the superordinate time series prediction generator 62 in step S31 of the flowchart described with reference to FIG. When errorL [N] is read from the memory 75 of the lower time series prediction generator 61, time series data errorL ′ [N] of Q prediction errors after being processed by the time filter unit 201 and the nonlinear filter unit 202 are obtained. Read operation.

なお、時間フィルタ部２０１および非線形フィルタブ２０２は、必ずしも両方が同時に設けられる必要はなく、いずれか一方のみでもよい。 Note that both the time filter unit 201 and the nonlinear filter 202 are not necessarily provided at the same time, and only one of them may be provided.

ところで、図３および図１２に示した情報処理装置５１では、複数のRNN７１−１乃至７１−ｎを有する下位時系列生成器６１の構成として、複数のRNNの出力をゲート機構により統合して最終的な出力を決定するMixture of RNN Expertというモデルを採用したが、Mixture of RNN Expert以外の構成を採用することもできる。 By the way, in the information processing apparatus 51 shown in FIG. 3 and FIG. 12, as a configuration of the low-order time series generator 61 having a plurality of RNNs 71-1 to 71-n, the outputs of a plurality of RNNs are integrated by a gate mechanism and finally A model called Mixture of RNN Expert that determines the typical output is adopted, but a configuration other than Mixture of RNN Expert can also be adopted.

Mixture of RNN Expert以外の構成としては、例えば、ベクトルパターンのカテゴリ学習に用いられる自己組織化マップ（self-organization map）（以下、ＳＯＭという）を導入し、SOMの各ノードにRNNを採用し、自己組織的に外部入力に対し適切なRNNを選択し、RNNのパラメータ学習を行うRNN-SOMなどを採用することができる。なお、SOMについては、例えば、「T.コホネン、「自己組織化マップ」、シュプリンガー・フェアラーク東京」などにその詳細が記載されている。 As a configuration other than Mixture of RNN Expert, for example, a self-organization map (hereinafter referred to as SOM) used for vector pattern category learning is introduced, and RNN is adopted for each node of SOM. An RNN-SOM that performs RNN parameter learning by selecting an appropriate RNN for external input in a self-organizing manner can be employed. Details of SOM are described in, for example, “T. Kohonen,“ Self-Organizing Map ”, Springer Fairlark Tokyo”, and the like.

図３および図１２に示したMixture of RNN Expertのモデルでは、ある新しい学習サンプル（即ち、時系列データ）に対して、全てのRNNが学習エラー（予測誤差）を算出し、その学習エラーの度合いに応じて各RNNが学習サンプルを学習する。 In the Mixture of RNN Expert model shown in FIGS. 3 and 12, all RNNs calculate a learning error (prediction error) for a new learning sample (that is, time series data), and the degree of the learning error. Each RNN learns a learning sample according to

これに対して、RNN-SOMでは、ある新しい学習サンプル（即ち、時系列データ）に対して、全てのRNNが学習エラー（予測誤差）を算出し、その中で、最も学習エラーの小さいRNNが勝者に決定される。勝者のRNNが決定された後は、各RNNの学習エラーは関係なく、勝者のRNNと距離が近いRNNが、勝者との近傍度合いに応じて学習サンプルを学習するという、各RNNに対して自分以外のRNNとの距離空間の概念が導入されたものである。 On the other hand, in RNN-SOM, all RNNs calculate learning errors (prediction errors) for a new learning sample (that is, time-series data). Among them, the RNN with the smallest learning error is calculated. The winner will be determined. After the winner's RNN is determined, the learning error of each RNN does not matter, and the RNN that is close to the winner's RNN learns the learning sample according to the degree of proximity to the winner. The concept of metric space with other RNNs was introduced.

図１４は、下位時系列生成器６１の構成としてRNN-SOMを採用した場合の、行動シーケンスに対応するセンサモータ信号の時系列データの学習処理のフローチャートである。 FIG. 14 is a flowchart of the learning process of the time series data of the sensor motor signal corresponding to the action sequence when the RNN-SOM is adopted as the configuration of the lower time series generator 61.

図１４に示される処理は、ステップＳ１２４の処理が、図６のステップＳ４の処理と異なる以外は、図６に示した学習処理と同様である。 The process shown in FIG. 14 is the same as the learning process shown in FIG. 6 except that the process in step S124 is different from the process in step S4 in FIG.

即ち、図１４のステップＳ１２１乃至Ｓ１２３およびＳ１２５乃至Ｓ１２７は、図６のステップＳ１乃至Ｓ３およびＳ５乃至Ｓ７と、それぞれ同様である。 That is, steps S121 to S123 and S125 to S127 in FIG. 14 are the same as steps S1 to S3 and S5 to S7 in FIG.

ステップＳ１２４では、下位時系列予測生成器６１は、予測誤差errorL^t+1が最小のRNN７１を勝者とし、図１５に示す近傍関数ｈ₃に基づいて、勝者からの距離（DISTANCE_n）に応じた学習重みγ_nを算出する。 In step S124, the low-order time-series prediction generator 61 uses the RNN 71 with the smallest prediction error errorL ^{t + 1} as the winner, and responds to the distance (DISTANCE _n ) from the winner based on the neighborhood function h ₃ shown in FIG. A learning weight γ _n is calculated.

近傍関数ｈ₃は、図１５に示されるように、勝者からの距離（DISTANCE_n）が近いRNN７１−ｎほど大きい学習重みγ_nが割り当てられる。 As shown in FIG. 15, the neighborhood function h ₃ is assigned a learning weight γ _{n that} is larger as the RNN 71-n has a shorter distance (DISTANCE _n ) from the winner.

次に、図１６乃至図１９を参照して、上述した情報処理装置５１に、ヒューマノイドロボットが行う行動シーケンスを学習および生成させた実験結果について説明する。 Next, with reference to FIG. 16 to FIG. 19, an experimental result in which the information processing apparatus 51 described above learns and generates an action sequence performed by the humanoid robot will be described.

なお、この実験では、下位時系列予測生成器６１が出力する予測誤差の時系列データerrorL［Ｎ］に対して時間フィルタと非線形フィルタを施した、図１２の情報処理装置５１による例を示している。また、下位時系列生成器６１のRNN７１の個数Ｎは、１６（Ｎ＝１６）となっている。 This experiment shows an example of the information processing apparatus 51 of FIG. 12 in which a time filter and a non-linear filter are applied to the time series data errorL [N] of the prediction error output from the lower time series prediction generator 61. Yes. The number N of RNNs 71 in the lower time series generator 61 is 16 (N = 16).

図１６は、行動シーケンスA，B、およびCを学習後、情報処理装置５１が行動シーケンスAを生成した結果を示している。 FIG. 16 shows a result of the information processing apparatus 51 generating the action sequence A after learning the action sequences A, B, and C.

図１６Ａは、生成処理時の、上位時系列予測生成器６２のCTRNN８１としてのCTRNN１４１のコンテキスト出力ノード１６５の出力データを示している。 FIG. 16A shows the output data of the context output node 165 of the CTRNN 141 as the CTRNN 81 of the higher time series prediction generator 62 during the generation process.

図１６Ｂは、上位時系列予測生成器６２のCTRNN８１が出力する推定予測誤差errorPredH［Ｎ］を示している。 FIG. 16B shows the estimated prediction error errorPredH [N] output from the CTRNN 81 of the higher-order time-series prediction generator 62.

図１６Ｃは、図１６Ｂに示される推定予測誤差errorPredH［Ｎ］がゲート信号変換部６３によって変換されたゲート信号gate［Ｎ］を示している。 FIG. 16C shows the gate signal gate [N] obtained by converting the estimated prediction error errorPredH [N] shown in FIG. 16B by the gate signal conversion unit 63.

図１６Ｄは、下位時系列予測生成器６１の合成回路７３から出力されたセンサモータ信号ｓｍ（ｔ）のうちのモータ信号を、図１６Ｅは、下位時系列予測生成器６１の合成回路７３から出力されたセンサモータ信号ｓｍ（ｔ）のうちのセンサ信号を、それぞれ示している。なお、図１６Ｄおよび図１６Ｅでは、４つのモータ信号と２つのセンサ信号のデータが図示されているが、図を見やすくするため、実際のモータ信号およびセンサ信号よりも少ない数のデータを図示している。 16D shows the motor signal of the sensor motor signal sm (t) output from the synthesis circuit 73 of the lower time series prediction generator 61, and FIG. 16E shows the output from the synthesis circuit 73 of the lower time series prediction generator 61. Sensor signals among the sensor motor signals sm (t) thus obtained are shown. In FIG. 16D and FIG. 16E, the data of four motor signals and two sensor signals are shown. However, in order to make the drawing easier to see, a smaller number of data than the actual motor signals and sensor signals are shown. Yes.

図１６Ａ乃至図１６Ｅの横軸は、時間ステップ（step）を表す。また、図１６Ａ，図１６Ｄ，および図１６Ｅの縦軸は、コンテキスト出力ノード１６５、モータ信号、およびセンサ信号それぞれの出力値を表し、０乃至１の範囲の値である。図１６Ｂおよび図１６Ｃは、下位時系列予測生成器６１のRNN７１の番号（１乃至１６）を表している。 The horizontal axis in FIGS. 16A to 16E represents a time step. 16A, 16D, and 16E represent output values of the context output node 165, the motor signal, and the sensor signal, and are values in the range of 0 to 1. 16B and 16C show the numbers (1 to 16) of the RNN 71 of the lower time series prediction generator 61. FIG.

図１６ＢおよびＣにおいては、RNN７１−ｎに対応するerrorPredH_nまたはゲート信号ｇ^t _nの値とグレイレベルとが対応しており、図１６Ｂでは、errorPredH_nの値が小さい（即ち、０に近い）ほど黒く（濃く）表されており、図１６Ｃでは、ゲート信号ｇ^t _nの値が大きい（即ち、１に近い）ほど黒く（濃く）表されている。 In FIGS. 16B and C, the value of errorPredH _n or gate signal g ^t _n corresponding to RNN 71-n corresponds to the gray level, and in FIG. 16B, the value of errorPredH _n is small (ie, close to 0). In FIG. 16C, the larger the value of the gate signal g ^t _n (that is, closer to 1), the darker (darker) it is.

図１７は、行動シーケンスA，B、およびCを学習後、情報処理装置５１が行動シーケンスBを生成した結果を、図１８は、行動シーケンスCを生成した結果を、それぞれ示している。 FIG. 17 shows the result of generating the action sequence B by the information processing apparatus 51 after learning the action sequences A, B, and C, and FIG. 18 shows the result of generating the action sequence C.

また、図１９は、行動シーケンスA，B、およびCを学習後に行動シーケンスDを追加学習させた後、情報処理装置５１が行動シーケンスDを生成した結果を示している。 FIG. 19 shows the result of the information processing apparatus 51 generating the behavior sequence D after learning the behavior sequence D after learning the behavior sequences A, B, and C.

図１７乃至図１９において、図示されたデータが、行動シーケンスB乃至Dに関するものである以外は、同様である。 17 to 19 is the same except that the illustrated data is related to the action sequences B to D.

行動シーケンスAに対応する時系列データの生成では、図１６Ｃを見て分かるように、シーケンスの前半では、ゲート７２−１４が開かれることによりRNN７１−１４が有効となり、その後、シーケンスの後半部分では、ゲート７２−４が開かれることによりRNN７１−４が有効となっている。 In the generation of time series data corresponding to the action sequence A, as can be seen from FIG. 16C, in the first half of the sequence, the RNN 71-14 is enabled by opening the gate 72-14, and thereafter, in the second half of the sequence. When the gate 72-4 is opened, the RNN 71-4 is activated.

但し、図１６Ｂに示すデータから図１６Ｃに示すデータへの変換、即ち、推定予測誤差errorPredH［Ｎ］からゲート信号gate［Ｎ］への変換は、errorPredH₁乃至errorPredH₁₆のうちの最も値の小さいものが唯一の勝者となるウィナーテイクオール（Winner-take-all）の原理ではなく、上述した式（２）のソフトマックス関数を用いて行われるため、所定の時刻（時間ステップ）から、離散的にRNN７１−１４からRNN７１−４に有効なRNN７１が切替わるのではなく、RNN７１−１４からRNN７１−４への切替が時間の経過とともに緩やかに行われている。 However, conversion to the data shown in Figure 16C from the data shown in FIG. 16B, that is, conversion of the estimated prediction error errorPredH [N] to the gate signal Gate [N] is less the least value among the ErrorPredH ₁ to ErrorPredH ₁₆ Since it is performed using the softmax function of the above formula (2), not the principle of winner-take-all, where a thing is the only winner, it is discrete from a predetermined time (time step) However, the effective RNN 71 is not switched from the RNN 71-14 to the RNN 71-4, but the switching from the RNN 71-14 to the RNN 71-4 is gradually performed over time.

従って、errorPredH₁乃至errorPredH₁₆のうちの複数の値が拮抗しているような場合であっても、勝者が頻繁に交替することはなく、拮抗している状態では、そのまま拮抗している状態として出力を行うことができ、これにより、学習された時系列データを正しく生成することができる。 Therefore, even if a plurality of values of errorPredH _{1 to} errorPredH ₁₆ are antagonizing, the winner does not frequently change, and in the antagonizing state, Output can be performed, and thus the learned time-series data can be correctly generated.

行動シーケンスBの生成では、図１７Ｃを見て分かるように、RNN７１−１４、RNN７１−２、RNN７１−１３、RNN７１−１、RNN７１−１１が、その順で有効となっている。 In the generation of the action sequence B, as can be seen from FIG. 17C, the RNN 71-14, the RNN 71-2, the RNN 71-13, the RNN 71-1, and the RNN 71-11 are effective in that order.

行動シーケンスCの生成では、図１８Ｃを見て分かるように、RNN７１−２、RNN７１−１２、RNN７１−３が、その順で有効となっている。 In the generation of the action sequence C, as can be seen from FIG. 18C, the RNN 71-2, the RNN 71-12, and the RNN 71-3 are effective in that order.

行動シーケンスDの生成では、図１９Ｃを見て分かるように、RNN７１−５、RNN７１−１５、RNN７１−３、RNN７１−１６が、その順で有効となっている。 In the generation of the action sequence D, as can be seen from FIG. 19C, the RNN 71-5, the RNN 71-15, the RNN 71-3, and the RNN 71-16 are valid in that order.

行動シーケンスB乃至Dのゲート７２の切替においても、図１６の行動シーケンスAにおける場合と同様のことが言える。 The same applies to the switching of the gate 72 of the action sequences B to D as in the case of the action sequence A of FIG.

即ち、所定の時刻に推定予測誤差errorPredH_nが最も大きいRNN７１−ｎから、所定時間後に次に推定予測誤差errorPredH_n'が最も大きいRNN７１−ｎ’ （ｎ≠ｎ’）へゲート信号gate［Ｎ］が切替わる場合、ゲート信号ｇ_nは徐々に小さくなると同時に、ゲート信号ｇ_n'は徐々に大きくなる。即ち、ゲート７２−ｎでは、センサモータ信号ｓｍ_n（ｔ＋１）の出力が徐々に抑えられ、ゲート７２−ｎ’では、センサモータ信号ｓｍ_n'（ｔ＋１）の出力が徐々に開放される。 That is, the gate signal gate [N] from the RNN 71- _n having the largest estimated prediction error errorPredH _{n at} a predetermined time to the next RNN 71-n ′ (n ≠ n ′) having the largest estimated prediction error errorPredH _{n ′} after a predetermined time. Are switched, the gate signal g _n gradually decreases and the gate signal g _{n ′} gradually increases. That is, the output of the sensor motor signal sm _n (t + 1) is gradually suppressed at the gate 72-n, and the output of the sensor motor signal sm _{n ′} (t + 1) is gradually opened at the gate 72-n ′.

また、図１９に示される追加学習によって学習された行動シーケンスDの生成結果では、行動シーケンスA乃至Cでは有効となっていないRNN７１−５、RNN７１−１５、RNN７１−１６が有効となっており、これまでに学習した行動シーケンスA乃至Cにない行動部品については新しいRNN７１が学習していることが分かる。 In addition, in the generation result of the action sequence D learned by the additional learning shown in FIG. 19, RNN71-5, RNN71-15, and RNN71-16 that are not valid in the action sequences A to C are valid, It can be seen that the new RNN 71 has learned behavior parts that are not in the behavior sequences A to C learned so far.

上述した一連の処理は、ハードウエアにより実行させることもできるし、ソフトウエアにより実行させることもできる。一連の処理をソフトウエアにより実行させる場合には、そのソフトウエアを構成するプログラムが、専用のハードウエアに組み込まれているコンピュータ、または、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータなどに、プログラム記録媒体からインストールされる。 The series of processes described above can be executed by hardware or can be executed by software. When a series of processing is executed by software, a program constituting the software executes various functions by installing a computer incorporated in dedicated hardware or various programs. For example, it is installed from a program recording medium in a general-purpose personal computer or the like.

図２０は、上述した一連の処理をプログラムにより実行するパーソナルコンピュータの構成の例を示すブロック図である。CPU（Central Processing Unit）３０１は、ROM（Read Only Memory）３０２、または記憶部３０８に記憶されているプログラムに従って各種の処理を実行する。RAM（Random Access Memory）３０３には、CPU３０１が実行するプログラムやデータなどが適宜記憶される。これらのCPU３０１、ROM３０２、およびRAM３０３は、バス３０４により相互に接続されている。 FIG. 20 is a block diagram showing an example of the configuration of a personal computer that executes the above-described series of processing by a program. A CPU (Central Processing Unit) 301 executes various processes according to a program stored in a ROM (Read Only Memory) 302 or a storage unit 308. A RAM (Random Access Memory) 303 appropriately stores programs executed by the CPU 301 and data. The CPU 301, ROM 302, and RAM 303 are connected to each other by a bus 304.

CPU３０１にはまた、バス３０４を介して入出力インタフェース３０５が接続されている。入出力インタフェース３０５には、キーボード、マウス、マイクロホンなどよりなる入力部３０６、CRT(Cathode Ray Tube)、LCD(Liquid Crystal display)などよりなるディスプレイ、スピーカなどよりなる出力部３０７が接続されている。CPU３０１は、入力部３０６から入力される指令に対応して各種の処理を実行する。そして、CPU３０１は、処理の結果を出力部３０７に出力する。 An input / output interface 305 is also connected to the CPU 301 via the bus 304. The input / output interface 305 is connected to an input unit 306 including a keyboard, a mouse, and a microphone, a display including a CRT (Cathode Ray Tube) and an LCD (Liquid Crystal display), and an output unit 307 including a speaker. The CPU 301 executes various processes in response to commands input from the input unit 306. Then, the CPU 301 outputs the processing result to the output unit 307.

入出力インタフェース３０５に接続されている記憶部３０８は、例えばハードディスクからなり、CPU３０１が実行するプログラムや各種のデータを記憶する。通信部３０９は、インターネットやローカルエリアネットワークなどのネットワークを介して、または直接に接続された外部の装置と通信する。 The storage unit 308 connected to the input / output interface 305 includes, for example, a hard disk, and stores programs executed by the CPU 301 and various data. The communication unit 309 communicates with an external device connected directly or via a network such as the Internet or a local area network.

入出力インタフェース３０５に接続されているドライブ３１０は、磁気ディスク、光ディスク、光磁気ディスク、或いは半導体メモリなどのリムーバブルメディア３２１が装着されたとき、それらを駆動し、そこに記録されているプログラムやデータなどを取得する。取得されたプログラムやデータは、必要に応じて記憶部３０８に転送され、記憶される。また、プログラムやデータは、通信部３０９を介して取得され、記憶部３０８に記憶されてもよい。 The drive 310 connected to the input / output interface 305 drives a removable medium 321 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, and drives the program or data recorded therein. Get etc. The acquired program and data are transferred to and stored in the storage unit 308 as necessary. Further, the program and data may be acquired via the communication unit 309 and stored in the storage unit 308.

コンピュータにインストールされ、コンピュータによって実行可能な状態とされるプログラムを格納するプログラム記録媒体は、図２０に示すように、磁気ディスク（フレキシブルディスクを含む）、光ディスク（CD-ROM(Compact Disc-Read Only Memory),DVD(Digital Versatile Disc)を含む）、光磁気ディスクを含む）、もしくは半導体メモリなどよりなるパッケージメディアであるリムーバブルメディア３２１、または、プログラムが一時的もしくは永続的に格納されるROM３０２や、記憶部３０８を構成するハードディスクなどにより構成される。プログラム記録媒体へのプログラムの格納は、必要に応じてルータ、モデムなどのインタフェースである通信部３０９を介して、ローカルエリアネットワーク、インターネット、デジタル衛星放送といった、有線または無線の通信媒体を利用して行われる。 As shown in FIG. 20, a program recording medium for storing a program that is installed in a computer and is ready to be executed by the computer includes a magnetic disk (including a flexible disk), an optical disk (CD-ROM (Compact Disc-Read Only). Memory), DVD (including Digital Versatile Disc), magneto-optical disk), or removable media 321 which is a package medium made of semiconductor memory, or ROM 302 where the program is temporarily or permanently stored, The storage unit 308 is configured by a hard disk or the like. The program is stored in the program recording medium using a wired or wireless communication medium such as a local area network, the Internet, or digital satellite broadcasting via a communication unit 309 that is an interface such as a router or a modem as necessary. Done.

上述した例では、生成時の行動シーケンスA乃至Cの切替を、CTRNN８１のタスクIDを変更することによって行うようにしたが、CTRNN８１には、タスクIDの入力を持たせずに、コンテキスト入力ノード１６２に与える初期値を変更することによって、生成時の行動シーケンスA乃至Cの切替を行うようにしてもよい。 In the example described above, the behavior sequences A to C at the time of generation are switched by changing the task ID of CTRNN 81. However, the context input node 162 does not have the task ID input in CTRNN 81. The action sequence A to C at the time of generation may be switched by changing the initial value given to the.

本明細書において、フローチャートに記述されたステップは、記載された順序に沿って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的あるいは個別に実行される処理をも含むものである。 In this specification, the steps described in the flowcharts include processes that are executed in parallel or individually even if they are not necessarily processed in time series, as well as processes that are executed in time series in the described order. Is also included.

なお、本発明の実施の形態は、上述した実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能である。 The embodiment of the present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the gist of the present invention.

従来の情報処理装置の一例を示す図である。It is a figure which shows an example of the conventional information processing apparatus. 図１の情報処理装置で生成される時系列データの例を示す図である。It is a figure which shows the example of the time series data produced | generated with the information processing apparatus of FIG. 本発明を適用した情報処理装置の一実施の形態の構成例を示す図である。It is a figure which shows the structural example of one Embodiment of the information processing apparatus to which this invention is applied. 下位時系列予測生成器に使用されるRNNの詳細な構成例を示す図である。It is a figure which shows the detailed structural example of RNN used for a low-order time series prediction generator. 上位時系列予測生成器に使用されるRNNの詳細な構成例を示す図である。It is a figure which shows the detailed structural example of RNN used for a high-order time series prediction generator. 下位時系列予測生成器の学習処理について説明するフローチャートである。It is a flowchart explaining the learning process of a low-order time series prediction generator. 上位時系列予測生成器の学習処理について説明するフローチャートである。It is a flowchart explaining the learning process of a high-order time series prediction generator. 図３の情報処理装置の生成処理について説明するフローチャートである。4 is a flowchart illustrating a generation process of the information processing apparatus in FIG. 3. 図８のステップＳ５３における生成処理について説明するフローチャートである。It is a flowchart explaining the production | generation process in step S53 of FIG. 利用頻度FREQ_nに応じて学習重みμ_nを決定する関数ｈ₁を説明する図である。It is a diagram illustrating a function h ₁ that determines the learning weights mu _n in accordance with the use frequency FREQ _n. 図３の情報処理装置の追加学習処理について説明するフローチャートである。It is a flowchart explaining the additional learning process of the information processing apparatus of FIG. 本発明を適用した情報処理装置のその他の構成例を示す図である。It is a figure which shows the other structural example of the information processing apparatus to which this invention is applied. 予測誤差errorL_nの大きさに応じて非線形の変換を行う関数ｈ₂を説明する図である。Is a diagram illustrating the function h ₂ for converting non-linear depending on the magnitude of the prediction error errorL _n. 下位時系列予測生成器のその他の学習処理について説明するフローチャートである。It is a flowchart explaining the other learning process of a low-order time series prediction generator. 図１４の学習処理で使用される近傍関数ｈ₃を説明する図である。It is a diagram illustrating a neighborhood function h ₃ used in the learning process of Figure 14. 情報処理装置５１の実験結果を示す図である。It is a figure which shows the experimental result of the information processing apparatus 51. FIG. 情報処理装置５１の実験結果を示す図である。It is a figure which shows the experimental result of the information processing apparatus 51. FIG. 情報処理装置５１の実験結果を示す図である。It is a figure which shows the experimental result of the information processing apparatus 51. FIG. 情報処理装置５１の実験結果を示す図である。It is a figure which shows the experimental result of the information processing apparatus 51. FIG. 本発明を適用したコンピュータの一実施の形態の構成例を示すブロック図である。It is a block diagram which shows the structural example of one Embodiment of the computer to which this invention is applied.

Explanation of symbols

５１情報処理装置，６１下位時系列予測生成器，６２上位時系列予測生成器，６３ゲート信号変換部，７１−１乃至７１−Ｎ RNN，７２−１乃至７２−Ｎゲート，７３合成回路，７４演算回路，７５メモリ，７６制御回路，８１ CTRNN，２０１時間フィルタ部，２０２非線形フィルタ部，３０１ CPU，３０２ ROM，３０３ RAM，３０８記憶部 51 Information Processing Device, 61 Lower Time Series Prediction Generator, 62 Upper Time Series Prediction Generator, 63 Gate Signal Conversion Unit, 71-1 to 71-N RNN, 72-1 to 72-N Gate, 73 Synthesis Circuit, 74 Arithmetic circuit, 75 memory, 76 control circuit, 81 CTRNN, 201 time filter unit, 202 nonlinear filter unit, 301 CPU, 302 ROM, 303 RAM, 308 storage unit

Claims

After learning predetermined time-series data for a plurality of recurrent type neural networks, when adding new time-series data to the plurality of recurrent type neural networks to learn, the predetermined time series data Learning weight determination means for determining a learning weight having a negative correlation with the frequency of use of each of the plurality of recurrent neural networks when the data was previously learned;
Each of the plurality of recurrent neural networks learns the new time-series data according to the learning weight determined by the learning weight determining means.

After learning predetermined time-series data for a plurality of recurrent type neural networks, when adding new time-series data to the plurality of recurrent type neural networks to learn, the predetermined time series data Determining a learning weight having a negative correlation with the frequency of use of each of the plurality of recurrent neural networks when the data was previously learned;
An information processing method including a step of learning the new time-series data according to the determined learning weight.

After learning predetermined time-series data for a plurality of recurrent type neural networks, when adding new time-series data to the plurality of recurrent type neural networks to learn, the predetermined time series data Determining a learning weight having a negative correlation with the frequency of use of each of the plurality of recurrent neural networks when the data was previously learned;
A program for causing a computer to execute a process including a step of learning the new time-series data according to the determined learning weight.