JP2007276052A

JP2007276052A - Control system, record system, information processor and method, program, and recording medium

Info

Publication number: JP2007276052A
Application number: JP2006105543A
Authority: JP
Inventors: Kuniaki Noda; 邦昭野田; Kenichi Hidai; 健一日台; Kenta Kawamoto; 献太河本
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2006-04-06
Filing date: 2006-04-06
Publication date: 2007-10-25

Abstract

PROBLEM TO BE SOLVED: To provide a system capable of improving resistance to delay of communication or processing in control and changes in the environment, and suppressing the load and failure of the control. SOLUTION: A user operates the master 12 of a robot having the same geometrical shape with a slave 13, and controls the slave 13 via a control device 11. The control device 11 records a control command to the slave 13 generated by the operation of the master 12 while containing control delay caused in the system along with sensor information supplied by the slave 13 at the generation timing of the control command. It predicts the future control command in time series in consideration of the control delay based on the control command and the sensor information, and controls the slave 13 by the control command. This invention can be applied to an information processing system. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、制御システム、記録システム、情報処理装置および方法、プログラム、並びに記録媒体に関し、特に、制御における通信や処理の遅延や環境の変化に対する耐性を向上させ、制御の負荷や破綻を抑制することができるようにした制御システム、記録システム、情報処理装置および方法、プログラム、並びに記録媒体に関する。 The present invention relates to a control system, a recording system, an information processing apparatus and method, a program, and a recording medium, and in particular, improves resistance to communication and processing delays in control and environmental changes, and suppresses control loads and failures. The present invention relates to a control system, a recording system, an information processing apparatus and method, a program, and a recording medium.

従来、可動な関節を複数有するロボットの姿勢や動作の制御において、一般的に、制御対象であるロボットは、その関節数が多く、自由度が高いので、操作者による指示入力操作を容易にするために、入力可能な指示の数が制御対象のロボットの自由度（関節）の数よりも少ないコントローラを用いる方法があった（例えば、特許文献１乃至特許文献３参照）（以下、第１の方法と称する）。このような場合、そのコントローラは、コントローラにおける１つの操作（１つの指示）に対して予めプログラミングされた複数の動作が割り当てられていたり、１つの操作が制御する自由度（関節）を切り替える切り替え機能が設けられていたり、ロボットが有する全自由度（全関節）のうち、予め定められた一部の自由度（関節）のみが制御可能で、その他の自由度（関節）については自律制御されるようになされていたりする。 Conventionally, in controlling the posture and operation of a robot having a plurality of movable joints, the robot to be controlled generally has a large number of joints and a high degree of freedom, so that it is easy for an operator to input instructions. Therefore, there has been a method using a controller in which the number of instructions that can be input is less than the number of degrees of freedom (joints) of the robot to be controlled (for example, see Patent Document 1 to Patent Document 3) (hereinafter referred to as the first) Called method). In such a case, the controller is assigned a plurality of operations programmed in advance for one operation (one instruction) in the controller, or a switching function for switching the degree of freedom (joint) controlled by one operation. Can be controlled, or, among all degrees of freedom (all joints) of the robot, only some of the predetermined degrees of freedom (joints) can be controlled, and other degrees of freedom (joints) are controlled autonomously It is made like that.

例えば、特許文献１においては、特定の操作方法に、制御対象である二足歩行ロボットの特定の動作が割り当てられた遠隔操作装置が示されている。また、特許文献２においては、足操作入力装置を操作することにより、制御対象のロボットの自律制御機構に簡単な指令を与えたり、クラッチペダルの踏み込み動作により足操作の対象を切り替えたりすることが可能な超多自由度ロボット操縦装置が示されている。さらに、特許文献３においては、操作者が目的とする作業に対して直接的に関係して意識を集中する部位の操作を行えば、その他の部位については従属的に自律操作するようにして、操作者の作業を簡単化する遠隔制御方法が示されている。 For example, Patent Document 1 discloses a remote control device in which a specific operation of a biped walking robot that is a control target is assigned to a specific operation method. In Patent Document 2, by operating the foot operation input device, a simple command can be given to the autonomous control mechanism of the robot to be controlled, or the foot operation target can be switched by a stepping operation of the clutch pedal. A possible multi-degree-of-freedom robot maneuver is shown. Furthermore, in Patent Document 3, if an operation is performed on a part that concentrates consciousness directly in relation to the work intended by the operator, the other parts are autonomously operated in a dependent manner, A remote control method is shown that simplifies the operator's work.

また、ロボット制御の他の方法として、人間の動作を画像処理したり、若しくはセンサスーツなどによってセンシングしたりすることにより得られた情報を、ロボットの自由度構成に合わせて幾何変換し、その幾何変換された制御情報によりロボットを動作させる方法もあった（例えば、特許文献４参照）（以下、第２の方法と称する）。 As another method of robot control, information obtained by image processing of human motion or sensing with a sensor suit or the like is geometrically transformed according to the robot's degree of freedom configuration, There is also a method of operating the robot by the converted control information (see, for example, Patent Document 4) (hereinafter referred to as the second method).

さらに他の方法としては、制御対象の関節角軌道をモーションクリエータなどのソフトウェアを利用して作成したり、理想軌道上の通過点を指定したりして、制御対象の動作を予め教示しておき、その教示データを再生させて自律制御させるものもあった。その場合、例えば、一対のアームを制御する際に、各アームに互いの相対位置を計測させることによって位置補正を行わせる等、さらに動作中にロボットに環境とのインタラクションを行わせるものもあった（例えば、特許文献５および特許文献６参照）（以下、第３の方法と称する）。 As another method, the joint angle trajectory of the control target is created using software such as a motion creator, or the passing point on the ideal trajectory is specified, and the operation of the control target is taught in advance. Some of the teaching data are reproduced and autonomously controlled. In this case, for example, when controlling a pair of arms, there are some which cause the robot to interact with the environment during operation, such as causing each arm to measure the relative position of each other and performing position correction. (For example, refer to Patent Document 5 and Patent Document 6) (hereinafter referred to as the third method).

特開２００５−１１１６６１号公報JP-A-2005-111661 特開２００５−０６６７５２号公報Japanese Patent Laying-Open No. 2005-066752 特開２００４−２７６１２３号公報JP 2004-276123 A 特開２００５−０４６９３１号公報JP 2005-046931 A 特開２００４−１１４１６１号公報JP 2004-114161 A 特開昭６３−２１６１０５号公報JP 63-216105 A

しかしながら、例えば、特許文献１乃至特許文献３に記載のような第１の方法の場合、操作の動きと、その操作に対する制御対象の動きが一致または近似せず、ユーザが、制御対象の目的の動作に対する操作方法を直感的に把握することができない恐れがあった。また、自律制御とユーザの制御において、処理や通信の遅延等によって互いに時間的なずれが生じてしまい、制御が発散してしまう恐れがあった。 However, for example, in the case of the first method as described in Patent Literature 1 to Patent Literature 3, the movement of the operation and the movement of the control target for the operation do not match or approximate, and the user does not have the object of the control target. There is a fear that the operation method for the operation cannot be intuitively grasped. In addition, in autonomous control and user control, there is a possibility that a time lag will occur due to processing or communication delays, and control may diverge.

また、特許文献４に記載のような第２の方法の場合、人間の動きをロボットの幾何モデルに変換する必要があるため、制御対象であるロボットのデザインが人間と大幅にスケールや幾何形状において異なる場合、幾何変換の解が存在しなかったり、解が得られたとしても直感的に分かりにくいものになったりする恐れがあった。そのため、操作を行うユーザがロボットのモデル体系を考慮しながら操作する必要があり、ユーザの負担が大きくなってしまうという恐れもあった。 Further, in the case of the second method as described in Patent Document 4, since it is necessary to convert human movement into a robot geometric model, the design of the robot to be controlled is greatly different from that of humans in scale and geometric shape. If they are different, there is a risk that a solution for geometric transformation does not exist or that even if a solution is obtained, it may become difficult to understand intuitively. For this reason, it is necessary for the user who performs the operation to perform the operation while considering the robot model system, which may increase the burden on the user.

さらに特許文献５や特許文献６に記載のような第３の方法の場合、教示後は、ロボットを自律制御させるので、第１の方法や第２の方法よりも制御操作が容易であるが、多様に変化して予測が困難な実環境下においては、予め理想軌道を設定することができず、環境の変化に応じて適応的に教示行動を変化させることもができないため、ロボットと環境との多様な相互作用の教示を行うことができなかった。例えば、特許文献５や特許文献６に記載のように、アーム間の相対位置を計測して補正を行う場合、計測から補正までの間に、通信や情報処理等による遅延が生じる。従って、その遅延時間に対してアームの動きが激しい場合、制御が発散してしまう恐れがある。また、例えば、そのアームで、直接の制御対象ではないボールを動かすような場合、アームの動作を、そのボールの動き（環境の変化）に合わせて補正することができず、適切な動作制御を行うことができない恐れがある。 Furthermore, in the case of the third method as described in Patent Document 5 and Patent Document 6, since the robot is autonomously controlled after teaching, the control operation is easier than the first method and the second method. In an actual environment that is difficult to predict due to various changes, the ideal trajectory cannot be set in advance, and the teaching behavior cannot be changed adaptively according to changes in the environment. It was not possible to teach various interactions. For example, as described in Patent Document 5 and Patent Document 6, when correction is performed by measuring the relative position between arms, a delay due to communication, information processing, or the like occurs between measurement and correction. Therefore, if the arm moves vigorously with respect to the delay time, the control may be diverged. Also, for example, when a ball that is not directly controlled is moved by the arm, the movement of the arm cannot be corrected in accordance with the movement of the ball (change in environment), and appropriate motion control is performed. There is a fear that it cannot be done.

図１は、制御の遅延を説明するグラフであり、直接の制御対象であるロボットに、直接の制御対象でないボールの位置に応じた動作をさせる場合の、各変数の状態量の時間的変化を示すグラフである。 FIG. 1 is a graph for explaining a delay in control. When a robot that is a direct control target is caused to perform an action according to the position of a ball that is not a direct control target, the temporal change of the state quantity of each variable is shown. It is a graph to show.

図１の上のグラフにおいて、点線で示される関節角コマンドθｍは、ロボットに供給される、所定の関節の角度を指定する制御コマンドであり、実線で示される関節角センサデータθ_sは、その所定の関節の角度（関節角）の状態を計測した計測結果を示す情報である。つまり、関節角コマンドθｍが目標値を示す制御情報であり、関節角センサデータθｓが関節角コマンドθｍによる制御結果を示す結果情報である。図１の下のグラフにおいては、そのロボットによって操作されるボールの位置座標が示されており、一点鎖線は、ボールのｘ座標（ボール座標ｘ）を示し、点線は、ボールのｙ座標（ボール座標ｙ）を示し、実線は、ボールのｚ座標（ボール座標ｚ）を示している。 In the upper graph of FIG. 1, a joint angle command θm indicated by a dotted line is a control command for designating a predetermined joint angle supplied to the robot, and the joint angle sensor data θ _s indicated by a solid line is It is information which shows the measurement result which measured the state of the angle (joint angle) of the predetermined joint. That is, the joint angle command θm is control information indicating a target value, and the joint angle sensor data θs is result information indicating a control result by the joint angle command θm. In the lower graph of FIG. 1, the position coordinates of the ball operated by the robot are shown. The alternate long and short dash line indicates the x coordinate of the ball (ball coordinate x), and the dotted line indicates the y coordinate of the ball (ball The solid line indicates the z coordinate (ball coordinate z) of the ball.

図１の上のグラフに示されるように、関節角センサデータθｓには、関節角コマンドθｍに対して、両矢印１に示されるように、制御遅延が生じる。つまり、各時刻において、関節角センサデータθｓと関節角コマンドθｍとの間に、両矢印２で示されるように、偏差が生じる。 As shown in the upper graph of FIG. 1, in the joint angle sensor data θs, a control delay occurs as shown by the double arrow 1 with respect to the joint angle command θm. That is, at each time, a deviation occurs between the joint angle sensor data θs and the joint angle command θm, as indicated by the double arrow 2.

この制御遅延は、例えば、ロボットで観測された関節角センサデータをロボットの外部に置かれた遠隔のコンピュータ上の時系列予測器に転送する際の通信遅延、ロボットの外部に置かれた遠隔のコンピュータ上で計算された関節角コマンドをロボットに転送するのにかかる通信遅延、および、ロボットに転送された関節角コマンドに基づき実際にその関節角度にアクチュエータが回転されるまでの制御動作遅延などの要因によるものである。 This control delay is, for example, a communication delay when transferring joint angle sensor data observed by a robot to a time series predictor on a remote computer placed outside the robot, and a remote delay placed outside the robot. Communication delay required to transfer the joint angle command calculated on the computer to the robot and control operation delay until the actuator is actually rotated based on the joint angle command transferred to the robot. This is due to factors.

このような制御遅延が存在する場合、例えば、時刻ｔにおける各データをまとめてベクトル化し、時刻ｔのベクトルデータ（θｍ_t，θｓ_t，ｘ_t，ｙ_t，ｚ_t）を用いて次の時刻の関節角コマンドを演算する自律制御においては、ベクトルの要素間に偏差が生じているため、適切な制御を行うことができないだけでなく、解が発散し、制御不可能となる恐れがある。 When such a control delay exists, for example, the data at time t are collectively vectorized, and the next time using the vector data (θm _t , θs _t , x _t , y _t , z _t ) at time _t. In the autonomous control for calculating the joint angle command, there is a deviation between the vector elements, so that not only the proper control cannot be performed, but also the solution diverges and the control may be impossible.

以上のように、従来の制御方法においては、制御遅延や環境の変化に対する耐性の高い自律制御を行うことができず、難易度の高い制御操作を行う必要があり、制御装置や操作者に対して多大な負荷をかけてしまったり制御が破綻してしまったりする恐れがあった。 As described above, in the conventional control method, it is not possible to perform autonomous control that is highly resistant to control delays and environmental changes, and it is necessary to perform highly difficult control operations. As a result, there is a risk that a great load is applied or control is broken.

本発明は、このような状況に鑑みてなされたものであり、制御における通信や処理の遅延や環境の変化に対する耐性を向上させ、制御の負荷や破綻を抑制することができるようにするものである。 The present invention has been made in view of such circumstances, and is intended to improve resistance to communication and processing delays in control and environmental changes, and to suppress control load and failure. is there.

本発明の一側面は、ユーザが操作する入力装置と、制御対象である被制御装置と、入力装置より入力された情報に基づいて、被制御装置を制御する制御装置とを備える制御システムであって、入力装置および被制御装置は、互いに同一の幾何形状を有する装置であり、制御装置は、ユーザが操作した入力装置より供給される入力装置に設けられた、周囲の環境を計測するセンサより出力されるセンサ情報を取得する取得手段と、取得手段により取得されたセンサ情報を、被制御装置を制御する制御コマンドに変換する変換手段と、変換手段により変換されて得られた制御コマンドを、被制御装置に供給する供給手段とを備える制御システムである。 One aspect of the present invention is a control system including an input device operated by a user, a controlled device to be controlled, and a control device that controls the controlled device based on information input from the input device. The input device and the controlled device are devices having the same geometric shape, and the control device is provided by a sensor for measuring the surrounding environment provided in the input device supplied from the input device operated by the user. An acquisition means for acquiring output sensor information, a conversion means for converting the sensor information acquired by the acquisition means into a control command for controlling the controlled device, and a control command obtained by conversion by the conversion means, And a supply unit that supplies the controlled device.

本発明の一側面は、制御対象である被制御装置と、被制御装置を制御する制御装置とを備える制御システムであって、制御装置は、被制御装置に設けられた、周囲の環境を計測するセンサより出力されるセンサ情報を取得する取得手段と、取得手段により取得されたセンサ情報と、被制御装置を制御する制御コマンドとに基づいて、センサ情報に対して所定の時間先の時刻に被制御装置を制御する制御コマンドを予測して生成する予測手段と、予測手段により予測されて生成された新たな制御コマンドを、被制御装置に供給する供給手段とを備える制御システムである。 One aspect of the present invention is a control system including a controlled device that is a control target and a control device that controls the controlled device, and the control device measures an ambient environment provided in the controlled device. Based on the acquisition means for acquiring the sensor information output from the sensor, the sensor information acquired by the acquisition means, and the control command for controlling the controlled device, the sensor information at a predetermined time ahead A control system includes a prediction unit that predicts and generates a control command for controlling a controlled device, and a supply unit that supplies a new control command predicted and generated by the prediction unit to the controlled device.

本発明の一側面は、ユーザが操作する入力装置と、入力装置より入力された情報を記録する記録装置とを備える記録システムであって、制御対象である被制御装置をさらに有し、制御装置は、ユーザが操作した入力装置より供給される、入力装置に設けられた周囲の環境を計測するセンサより出力されるセンサ情報を取得する取得手段と、取得手段により取得されたセンサ情報を、被制御装置を制御する制御コマンドに変換する変換手段と、変換手段により変換されて得られた制御コマンドを、所定の時間毎に時系列データとして記録する記録手段と、変換手段により変換されて得られた制御コマンドを、被制御装置に供給する供給手段とを備える記録システムである。 One aspect of the present invention is a recording system including an input device operated by a user and a recording device that records information input from the input device, and further includes a controlled device to be controlled, the control device The acquisition means for acquiring the sensor information output from the sensor for measuring the surrounding environment provided in the input device, which is supplied from the input device operated by the user, and the sensor information acquired by the acquisition means. Conversion means for converting to a control command for controlling the control device, recording means for recording the control command obtained by conversion by the conversion means as time-series data every predetermined time, and obtained by conversion by the conversion means And a supply means for supplying the control command to the controlled device.

本発明の一側面は、制御対象である被制御装置を制御する情報処理装置であって、ユーザが操作する、被制御装置と互いに同一の幾何形状を有する入力装置に設けられた、周囲の環境を計測するセンサより出力されるセンサ情報を取得する第１の取得手段と、第１の取得手段により取得されたセンサ情報を、被制御装置を制御する制御コマンドに変換する変換手段と、変換手段により変換されて得られた制御コマンドを、被制御装置に供給する第１の供給手段とを備える情報処理装置である。 One aspect of the present invention is an information processing apparatus that controls a controlled apparatus that is a control target, and is provided in an input device that is operated by a user and that has the same geometric shape as the controlled apparatus. First acquisition means for acquiring sensor information output from the sensor for measuring the sensor, conversion means for converting the sensor information acquired by the first acquisition means into a control command for controlling the controlled device, and conversion means And a first supply means for supplying the control command obtained by the conversion to the controlled device.

前記入力装置に設けられた各入力部の入力ゲインを、それぞれ、互いに独立に調整するゲイン調整手段をさらに備えることができる。 The apparatus may further include gain adjusting means for adjusting the input gain of each input unit provided in the input device independently of each other.

前記被制御装置に設けられた、周囲の環境を計測するセンサより出力されるセンサ情報を取得する第２の取得手段と、第２の取得手段により取得されたセンサ情報を入力装置のユーザに提示する提示手段とをさらに備えることができる。 A second acquisition unit that acquires sensor information output from a sensor that measures the surrounding environment provided in the controlled device, and presents the sensor information acquired by the second acquisition unit to the user of the input device And presenting means for further comprising.

前記変換手段により変換されて生成された制御コマンドを時系列データとして記録する記録手段をさらに備えることができる。 The apparatus may further comprise recording means for recording the control command generated by conversion by the conversion means as time series data.

前記記録手段に記録された制御コマンドを時系列に沿って再生して被制御装置に出力させる再生手段をさらに備えることができる。 Reproducing means for reproducing the control commands recorded in the recording means in time series and outputting them to the controlled device can be further provided.

前記被制御装置に設けられた、周囲の環境を計測するセンサより出力されるセンサ情報を取得する第２の取得手段をさらに備え、記録手段は、第２の取得手段により取得されたセンサ情報を、制御コマンドとともに記録することができる。 The apparatus further comprises second acquisition means for acquiring sensor information output from a sensor for measuring the surrounding environment provided in the controlled device, and the recording means stores the sensor information acquired by the second acquisition means. Can be recorded along with the control command.

所定の予測モデルを用いて、第２の取得手段により取得されたセンサ情報および過去の制御コマンドに基づいて、センサ情報に対して所定の時間先の時刻に被制御装置を制御する制御コマンドを予測して生成する予測手段と、予測手段により生成された新たな制御コマンドを被制御装置に供給する第２の供給手段とをさらに備えることができる。 Based on the sensor information acquired by the second acquisition means and the past control command, a control command for controlling the controlled device at a predetermined time ahead is predicted with respect to the sensor information using a predetermined prediction model. And a second supply means for supplying a new control command generated by the prediction means to the controlled device.

前記記録手段により記録されたセンサ情報および制御コマンドを用いて、予測モデルの学習を行う学習手段をさらに備えることができる。 Learning means for learning a prediction model can be further provided using sensor information and control commands recorded by the recording means.

前記予測モデルはリカレントニューラルネットワークであることができる。 The prediction model may be a recurrent neural network.

前記学習手段は、ベクトル・パターンのカテゴリー学習に用いられる自己組織化マップの手法を用いて、リカレントニューラルネットワークの学習を行うことができる。 The learning means can learn a recurrent neural network by using a self-organizing map technique used for vector pattern category learning.

前記入力装置および被制御装置は、複数の関節を有するロボット装置であることができる。 The input device and the controlled device may be a robot device having a plurality of joints.

前記入力装置と通信を行う第１の無線通信手段と、被制御装置と通信を行う第２の無線通信手段とをさらに備えることができる。 The wireless communication apparatus may further include a first wireless communication unit that communicates with the input device and a second wireless communication unit that communicates with the controlled device.

本発明の一側面は、制御対象である被制御装置を制御する情報処理装置の情報処理方法であって、ユーザが操作する、被制御装置と互いに同一の幾何形状を有する入力装置に設けられた、周囲の環境を計測するセンサより出力されるセンサ情報を取得し、取得されたセンサ情報を、被制御装置を制御する制御コマンドに変換し、変換されて得られた制御コマンドを、被制御装置に供給するステップを実行する情報処理方法である。 One aspect of the present invention is an information processing method for an information processing device that controls a controlled device that is a control target, and is provided in an input device that is operated by a user and has the same geometric shape as the controlled device. The sensor information output from the sensor that measures the surrounding environment is acquired, the acquired sensor information is converted into a control command for controlling the controlled device, and the control command obtained by the conversion is converted to the controlled device. It is the information processing method which performs the step which supplies to.

制御対象である被制御装置を制御する処理を行うプログラムにおいて、ユーザが操作する、被制御装置と互いに同一の幾何形状を有する入力装置に設けられた、周囲の環境を計測するセンサより出力されるセンサ情報を取得し、取得されたセンサ情報を、被制御装置を制御する制御コマンドに変換し、変換されて得られた制御コマンドを、被制御装置に供給するステップをコンピュータに実行させることができる。 In a program for performing processing to control a controlled device that is a control target, output from a sensor that measures a surrounding environment provided in an input device that is operated by a user and that has the same geometric shape as the controlled device Acquire sensor information, convert the acquired sensor information into a control command for controlling the controlled device, and cause the computer to execute a step of supplying the converted control command to the controlled device. .

請求項１７に記載のプログラムが記録されている記録媒体とすることができる。 It can be set as the recording medium with which the program of Claim 17 is recorded.

本発明の一側面においては、入力装置および被制御装置が互いに同一の幾何形状を有する装置として構成されており、制御装置において、ユーザが操作した入力装置より供給される入力装置に設けられた、周囲の環境を計測するセンサより出力されるセンサ情報が取得され、取得されたセンサ情報が、被制御装置を制御する制御コマンドに変換され、変換されて得られた制御コマンドが、被制御装置に供給される。 In one aspect of the present invention, the input device and the controlled device are configured as devices having the same geometric shape, and the control device is provided in the input device supplied from the input device operated by the user. Sensor information output from a sensor that measures the surrounding environment is acquired, the acquired sensor information is converted into a control command for controlling the controlled device, and the control command obtained by the conversion is sent to the controlled device. Supplied.

本発明の一側面においては、制御装置において、被制御装置に設けられた、周囲の環境を計測するセンサより出力されるセンサ情報が取得され、その取得されたセンサ情報と、被制御装置を制御する制御コマンドとに基づいて、センサ情報に対して所定の時間先の時刻に被制御装置を制御する制御コマンドが予測されて生成され、その予測されて生成された新たな制御コマンドが、被制御装置に供給される。 In one aspect of the present invention, in a control device, sensor information output from a sensor that measures a surrounding environment provided in the controlled device is acquired, and the acquired sensor information and the controlled device are controlled. Based on the control command to be generated, a control command for controlling the controlled apparatus is predicted and generated at a predetermined time ahead of the sensor information, and the predicted and generated new control command is Supplied to the device.

本発明の一側面においては、制御装置においては、ユーザが操作した入力装置より供給される、入力装置に設けられた周囲の環境を計測するセンサより出力されるセンサ情報が取得され、その取得されたセンサ情報が、被制御装置を制御する制御コマンドに変換され、その変換されて得られた制御コマンドが、所定の時間毎に時系列データとして記録されるとともに、被制御装置に供給される。 In one aspect of the present invention, in the control device, sensor information output from a sensor that measures the surrounding environment provided in the input device, which is supplied from the input device operated by the user, is acquired and acquired. The obtained sensor information is converted into a control command for controlling the controlled device, and the control command obtained by the conversion is recorded as time-series data at every predetermined time and supplied to the controlled device.

本発明の一側面においては、ユーザが操作する、被制御装置と互いに同一の幾何形状を有する入力装置に設けられた、周囲の環境を計測するセンサより出力されるセンサ情報が取得され、その取得されたセンサ情報が、被制御装置を制御する制御コマンドに変換され、その変換されて得られた制御コマンドが、被制御装置に供給される。 In one aspect of the present invention, sensor information output from a sensor that measures a surrounding environment provided in an input device that is operated by a user and that has the same geometric shape as a controlled device is acquired and acquired. The sensor information thus converted is converted into a control command for controlling the controlled device, and the control command obtained by the conversion is supplied to the controlled device.

本発明の側面によれば、制御における通信や処理の遅延や環境の変化に対する耐性を向上させ、制御の負荷や破綻を抑制することができる。 ADVANTAGE OF THE INVENTION According to the aspect of this invention, the tolerance with respect to the delay of communication and a process in control, or an environmental change can be improved, and the load and failure of control can be suppressed.

特に、自律制御用の情報を収集する際に、制御対象（ロボットの動作制御の場合、ロボット）に対して送信される制御コマンドと、その制御対象（ロボット）から受信された関節の角度センサ、視覚センサ、および音声センサ等の各種センサの出力（センサ情報）を統合化して保存することにより、制御コマンドとセンサ情報を、同期を取りながら時系列情報として収集することができる。 In particular, when collecting information for autonomous control, a control command transmitted to a control target (in the case of robot motion control, a robot) and a joint angle sensor received from the control target (robot), By integrating and storing outputs (sensor information) of various sensors such as a visual sensor and an audio sensor, it is possible to collect control commands and sensor information as time-series information while maintaining synchronization.

また、その収集された制御コマンドを制御対象（ロボット）において再生することによって環境とのインタラクションを行わせることができ、システムが有する制御遅延を反映した自律制御用の時系列教示データを作成することができる。 In addition, by playing back the collected control commands on the control target (robot), it is possible to interact with the environment, and create time-series teaching data for autonomous control reflecting the control delay of the system Can do.

さらに、制御コマンドの入力において、制御対象（ロボット）と実質的に同一の幾何形状を有する装置（すなわち、幾何モデルが共通のロボット）を用いて、そのセンサ情報から制御コマンドを生成するようにすることにより、幾何モデルへの変換等の処理が不要になるので、制御コマンドの生成が容易になる。また、その入力装置（制御対象と幾何モデルが同じロボット）の各制御パラメータ（ロボットの場合、各関節のサーボのゲイン）をそれぞれ個別に値を設定することにより、各関節のゲインを適切に設定することができ、不必要な関節を動かさずに教示を行うことが可能になる。つまり、教示時に余分な自由度を制御する（ロボットの場合、関節を手で支える）必要がなくなるので、教示者に及ぼす負担を軽減させることができる。 Further, when a control command is input, a control command is generated from the sensor information using a device having the substantially same geometric shape as the control target (robot) (that is, a robot having a common geometric model). This eliminates the need for processing such as conversion to a geometric model, so that control commands can be easily generated. In addition, each joint gain is set appropriately by setting each control parameter (servo gain of each joint in the case of a robot) individually for the input device (the robot whose control target and geometric model are the same). Thus, teaching can be performed without moving unnecessary joints. That is, it is not necessary to control an extra degree of freedom at the time of teaching (in the case of a robot, the joint is supported by hand), so the burden on the teacher can be reduced.

制御対象（ロボット）と実質的に同一の幾何形状を有する装置（すなわち、同じ幾何モデルを有するロボット）の操作によって生成された制御コマンドを制御対象においてリアルタイムに再生することにより、制御を行う操作者は、制御対象の制御結果（動き）を確認しながら情報を入力し、制御コマンドを作成させることができる。つまり、操作者は、制御対象の制御結果（動き）を確認しながらリアルタイムに教示データを生成することができる。このようにすることにより、多様に変化して予測が困難な環境においても、ロボットと環境との複雑な相互作用の教示を行うことができる。 An operator who performs control by reproducing in real time a control command generated by an operation of a device having substantially the same geometric shape as the control target (robot) (that is, a robot having the same geometric model). Can input information while confirming the control result (movement) of the control target, and can create a control command. That is, the operator can generate the teaching data in real time while confirming the control result (movement) of the control target. By doing so, it is possible to teach a complex interaction between the robot and the environment even in an environment that is varied and difficult to predict.

以下に本発明の実施の形態を説明するが、本発明の構成要件と、発明の詳細な説明に記載の実施の形態との対応関係を例示すると、次のようになる。この記載は、本発明をサポートする実施の形態が、発明の詳細な説明に記載されていることを確認するためのものである。従って、発明の詳細な説明中には記載されているが、本発明の構成要件に対応する実施の形態として、ここには記載されていない実施の形態があったとしても、そのことは、その実施の形態が、その構成要件に対応するものではないことを意味するものではない。逆に、実施の形態が構成要件に対応するものとしてここに記載されていたとしても、そのことは、その実施の形態が、その構成要件以外の構成要件には対応しないものであることを意味するものでもない。 Embodiments of the present invention will be described below. Correspondences between the configuration requirements of the present invention and the embodiments described in the detailed description of the present invention are exemplified as follows. This description is to confirm that the embodiments supporting the present invention are described in the detailed description of the invention. Accordingly, although there are embodiments that are described in the detailed description of the invention but are not described here as embodiments corresponding to the constituent elements of the present invention, It does not mean that the embodiment does not correspond to the configuration requirements. Conversely, even if an embodiment is described here as corresponding to a configuration requirement, that means that the embodiment does not correspond to a configuration requirement other than the configuration requirement. It's not something to do.

さらに、この記載は、本明細書に記載されている発明の全てを意味するものではない。換言すれば、この記載は、本明細書に記載されている発明であって、この出願では請求されていない発明の存在、すなわち、将来、分割出願されたり、補正により追加されたりする発明の存在を否定するものではない。 Further, this description does not mean all the inventions described in this specification. In other words, this description is an invention described in the present specification and is not claimed in this application, that is, an invention that will be filed in the future or added by amendment. Is not to deny.

本発明の一側面は、ユーザが操作する入力装置（例えば、図３のマスタ１２）と、制御対象である被制御装置（例えば、図３のスレーブ１３）と、入力装置より入力された情報に基づいて、被制御装置を制御する制御装置（例えば、図３の制御装置１１）とを備える制御システム（例えば、図３の制御システム１０Ａ）であって、入力装置および被制御装置は、互いに同一の幾何形状を有する装置であり、制御装置は、ユーザが操作した入力装置より供給される入力装置に設けられた、周囲の環境を計測するセンサより出力されるセンサ情報を取得する取得手段（例えば、図７の受信部７２）と、取得手段により取得されたセンサ情報を、被制御装置を制御する制御コマンドに変換する変換手段（例えば、図７のセンサデータ・コマンド変換部６３）と、変換手段により変換されて得られた制御コマンドを、被制御装置に供給する供給手段（例えば、図７の送信部７３）とを備える制御システムである。 One aspect of the present invention includes an input device operated by a user (for example, the master 12 in FIG. 3), a controlled device to be controlled (for example, the slave 13 in FIG. 3), and information input from the input device. And a control system (for example, the control system 10A of FIG. 3) that controls the controlled device (for example, the control system 10A of FIG. 3), and the input device and the controlled device are identical to each other. The control device is an acquisition unit that acquires sensor information output from a sensor that measures the surrounding environment provided in the input device supplied from the input device operated by the user (for example, 7) and conversion means for converting the sensor information acquired by the acquisition means into a control command for controlling the controlled device (for example, sensor data / command conversion in FIG. 7). 63), the control command obtained by the conversion by the conversion means, a control system comprising a supply means for supplying to the controlled device (e.g., transmitter 73 of FIG. 7).

本発明の一側面は、制御対象である被制御装置（例えば、図５のスレーブ１３）と、被制御装置を制御する制御装置（例えば、図５の制御装置１１）とを備える制御システム（例えば、図５の学習システム１０Ｃ）であって、制御装置は、被制御装置に設けられた、周囲の環境を計測するセンサより出力されるセンサ情報を取得する取得手段（例えば、図８の受信部１０２）と、取得手段により取得されたセンサ情報と、被制御装置を制御する制御コマンドとに基づいて、センサ情報に対して所定の時間先の時刻に被制御装置を制御する制御コマンドを予測して生成する予測手段（例えば、図８の時系列予測器９３）と、予測手段により予測された生成された、新たな制御コマンドを、被制御装置に供給する供給手段（例えば、図８の送信部１０１）とを備える制御システムである。 One aspect of the present invention is a control system (for example, a control device (for example, the control device 11 in FIG. 5)) that controls a controlled device (for example, the slave 13 in FIG. 5) that is a control target. In the learning system 10C in FIG. 5, the control device obtains sensor information output from a sensor that measures the surrounding environment provided in the controlled device (for example, the receiving unit in FIG. 8). 102) and the sensor information acquired by the acquisition means and the control command for controlling the controlled device, a control command for controlling the controlled device at a predetermined time ahead of the sensor information is predicted. And a supply means for supplying a new control command generated by the prediction means to the controlled device (for example, transmission of FIG. 8). Part 01) and a control system comprising a.

本発明の一側面は、ユーザが操作する入力装置（例えば、図４のマスタ１２）と、入力装置より入力された情報を記録する記録装置（例えば、図４の制御装置１１）とを備える記録システム（例えば、図４の記録システム１０Ｂ）であって、制御対象である被制御装置（例えば、図４のスレーブ１３）をさらに有し、制御装置は、ユーザが操作した入力装置より供給される入力装置に設けられた、周囲の環境を計測するセンサより出力されるセンサ情報を取得する取得手段（例えば、図７の受信部７２）と、取得手段により取得されたセンサ情報を、被制御装置を制御する制御コマンドに変換する変換手段（例えば、図７のセンサデータ・コマンド変換部６３）と、変換手段により変換されて得られた制御コマンドを、所定の時間毎に時系列データとして記録する記録手段（例えば、図７のデータ記録部２２）と、変換手段により変換されて得られた制御コマンドを、被制御装置に供給する供給手段（例えば、図７の送信部７３）とを備える記録システムである。 One aspect of the present invention is a recording that includes an input device (for example, master 12 in FIG. 4) operated by a user and a recording device (for example, control device 11 in FIG. 4) that records information input from the input device. The system (for example, the recording system 10B in FIG. 4) further includes a controlled device (for example, the slave 13 in FIG. 4) to be controlled, and the control device is supplied from an input device operated by the user. An acquisition unit (for example, the receiving unit 72 in FIG. 7) that acquires sensor information output from a sensor that measures the surrounding environment provided in the input device, and the sensor information acquired by the acquisition unit A conversion means (for example, the sensor data / command conversion unit 63 in FIG. 7) for converting the control command to control the control command obtained by conversion by the conversion means is time-sequentially at predetermined time intervals. Recording means for recording as data (for example, the data recording unit 22 in FIG. 7) and supply means for supplying the control command obtained by conversion by the converting means to the controlled device (for example, the transmission unit 73 in FIG. 7). ).

本発明の一側面は、制御対象である被制御装置（例えば、図２のスレーブ１３）を制御する情報処理装置（例えば、図２の制御装置１１）であって、ユーザが操作する、被制御装置と互いに同一の幾何形状を有する入力装置（例えば、図２のマスタ１２）に設けられた、周囲の環境を計測するセンサより出力されるセンサ情報を取得する第１の取得手段（例えば、図７の受信部７２）と、第１の取得手段により取得されたセンサ情報を、被制御装置を制御する制御コマンドに変換する変換手段（例えば、図７のセンサデータ・コマンド変換部６３）と、変換手段により変換されて得られた制御コマンドを、被制御装置に供給する第１の供給手段（例えば、図７の送信部７３）とを備える情報処理装置である。 One aspect of the present invention is an information processing apparatus (for example, the control apparatus 11 in FIG. 2) that controls a controlled apparatus to be controlled (for example, the slave 13 in FIG. 2), and is controlled by a user. First acquisition means (for example, FIG. 2) that acquires sensor information output from a sensor that measures the surrounding environment provided in an input device (for example, master 12 in FIG. 2) having the same geometric shape as the device. 7 receiving unit 72), converting means for converting the sensor information acquired by the first acquiring means into a control command for controlling the controlled device (for example, sensor data / command converting unit 63 in FIG. 7), The information processing apparatus includes first supply means (for example, the transmission unit 73 in FIG. 7) that supplies the control command obtained by conversion by the conversion means to the controlled apparatus.

前記入力装置に設けられた各入力部の入力ゲインを、それぞれ、互いに独立に調整するゲイン調整手段（例えば、図７のサーボゲイン設定部６１）をさらに備えることができる。 The apparatus may further include gain adjusting means (for example, servo gain setting unit 61 in FIG. 7) that adjusts the input gain of each input unit provided in the input device independently of each other.

前記被制御装置に設けられた、周囲の環境を計測するセンサより出力されるセンサ情報を取得する第２の取得手段（例えば、図７の受信部７４）と、第２の取得手段により取得されたセンサ情報を入力装置のユーザに提示する提示手段（例えば、図７の表示部５３）とをさらに備えることができる。 Obtained by a second obtaining unit (for example, the receiving unit 74 in FIG. 7) that obtains sensor information output from a sensor that measures the surrounding environment provided in the controlled device, and the second obtaining unit. Presenting means for presenting the sensor information to the user of the input device (for example, the display unit 53 in FIG. 7) can be further provided.

前記変換手段により変換されて生成された制御コマンドを時系列データとして記録する記録手段（例えば、図７のデータ記録部２２）をさらに備えることができる。 The recording apparatus (for example, the data recording part 22 of FIG. 7) which records the control command converted and produced | generated by the said conversion means as time series data can be further provided.

前記記録手段に記録された制御コマンドを時系列に沿って再生して被制御装置に出力させる再生手段（例えば、図７の再生部６６）をさらに備えることができる。 Reproducing means (for example, reproducing unit 66 in FIG. 7) for reproducing the control commands recorded in the recording means in time series and outputting them to the controlled device can be further provided.

前記被制御装置に設けられた、周囲の環境を計測するセンサより出力されるセンサ情報を取得する第２の取得手段（例えば、図７の受信部７４）をさらに備え、記録手段は、第２の取得手段により取得されたセンサ情報を、制御コマンドとともに記録する（例えば、図７のデータ統合部６５）ことができる。 The apparatus further includes second acquisition means (for example, the receiving unit 74 in FIG. 7) that is provided in the controlled device and acquires sensor information output from a sensor that measures the surrounding environment. The sensor information acquired by the acquisition means can be recorded together with the control command (for example, the data integration unit 65 in FIG. 7).

所定の予測モデルを用いて、第２の取得手段により取得されたセンサ情報および過去の制御コマンドに基づいて、センサ情報に対して所定の時間先の時刻に被制御装置を制御する制御コマンドを予測して生成する予測手段（例えば、図８の時系列予測器９３）と、予測手段により生成された新たな制御コマンドを被制御装置に供給する第２の供給手段（例えば、図８の送信部１０１）とをさらに備えることができる。 Based on the sensor information acquired by the second acquisition means and the past control command, a control command for controlling the controlled device at a predetermined time ahead is predicted with respect to the sensor information using a predetermined prediction model. And a second supply unit (for example, the transmission unit in FIG. 8) that supplies the controlled device with a new control command generated by the prediction unit (for example, the time series predictor 93 in FIG. 8). 101).

前記記録手段により記録されたセンサ情報および制御コマンドを用いて、予測モデルの学習を行う学習手段（例えば、図８のデータ学習部２３）をさらに備えることができる。 A learning unit (for example, the data learning unit 23 in FIG. 8) that learns a prediction model using the sensor information and the control command recorded by the recording unit may be further provided.

前記予測モデルはリカレントニューラルネットワーク（例えば、図９）であることができる。 The prediction model can be a recurrent neural network (eg, FIG. 9).

前記学習手段は、ベクトル・パターンのカテゴリー学習に用いられる自己組織化マップの手法を用いて、リカレントニューラルネットワークの学習を行う（例えば、図１０のダイナミクス記憶ネットワーク１３１）ことができる。 The learning means can learn a recurrent neural network (for example, the dynamics storage network 131 in FIG. 10) using a self-organizing map technique used for vector pattern category learning.

前記入力装置および被制御装置は、複数の関節を有するロボット装置（例えば、図１５のロボット）であることができる。 The input device and the controlled device may be a robot device having a plurality of joints (for example, the robot shown in FIG. 15).

前記入力装置と通信を行う第１の無線通信手段（例えば、図７の無線通信部６２）と、被制御装置と通信を行う第２の無線通信手段（例えば、図７の無線通信部６４）とをさらに備えることができる。 First wireless communication means (for example, wireless communication unit 62 in FIG. 7) that communicates with the input device and second wireless communication means (for example, wireless communication unit 64 in FIG. 7) that communicates with the controlled device. And can be further provided.

本発明の一側面は、制御対象である被制御装置を制御する情報処理装置の情報処理方法であって、ユーザが操作する、被制御装置と互いに同一の幾何形状を有する入力装置に設けられた、周囲の環境を計測するセンサより出力されるセンサ情報を取得し（例えば、図２２のステップＳ２２）、取得されたセンサ情報を、被制御装置を制御する制御コマンドに変換し（例えば、図２２のステップＳ２４）、変換されて得られた制御コマンドを、被制御装置に供給する（例えば、図２２のステップＳ２５）ステップを実行する情報処理方法である。 One aspect of the present invention is an information processing method for an information processing device that controls a controlled device that is a control target, and is provided in an input device that is operated by a user and has the same geometric shape as the controlled device. The sensor information output from the sensor that measures the surrounding environment is acquired (for example, step S22 in FIG. 22), and the acquired sensor information is converted into a control command for controlling the controlled device (for example, FIG. 22). Step S24) is an information processing method for executing the step of supplying the control command obtained by the conversion to the controlled device (for example, Step S25 in FIG. 22).

次に、本発明を適用した実施の形態について、図面を参照して説明する。 Next, an embodiment to which the present invention is applied will be described with reference to the drawings.

図２は、本発明を適用した情報処理システムの構成例を示すブロック図である。 FIG. 2 is a block diagram showing a configuration example of an information processing system to which the present invention is applied.

図２の情報処理システム１０は、制御装置１１が、互いに同一の幾何形状を有する、複数の関節（自由度）を有する人型の２足歩行ロボットであるマスタ１２およびスレーブ１３を制御するシステムである。制御装置１１、マスタ１２、およびスレーブ１３は、互いに、無線通信により制御コマンドやセンサ情報等の各種情報の授受を行う。 The information processing system 10 in FIG. 2 is a system in which a control device 11 controls a master 12 and a slave 13 that are humanoid biped robots having a plurality of joints (degrees of freedom) having the same geometric shape. is there. The control device 11, the master 12, and the slave 13 exchange various information such as control commands and sensor information with each other by wireless communication.

制御装置１１は、例えば、パーソナルコンピュータ等の情報処理装置により構成され、情報処理システム１０全体の制御に関する処理を行う。マスタ１２は、ユーザに操作されて、動きに関する情報の入力を受け付ける入力インタフェースとして構成され、スレーブ１３は制御対象として構成される。マスタ１２およびスレーブ１３の各関節には、サーボモータを用いたアクチュエータが設けられており、このアクチュエータの動作により関節の角度が決定される。また、マスタ１２およびスレーブ１３には、各関節の角度（関節角）を計測する関節角センサ、CCD（Charge Coupled Device）やCMOS（Complementary Metal Oxide Semiconductor）を利用したカメラ等の周囲の画像情報を取得する視覚センサ、マイク等の周囲の音声情報を取得する音声センサ等の各種センサが設けられている。 The control device 11 is configured by an information processing device such as a personal computer, for example, and performs processing related to control of the entire information processing system 10. The master 12 is configured as an input interface that is operated by a user and receives input of information related to movement, and the slave 13 is configured as a control target. Each joint of the master 12 and the slave 13 is provided with an actuator using a servo motor, and the angle of the joint is determined by the operation of this actuator. The master 12 and the slave 13 also receive surrounding image information such as a joint angle sensor for measuring the angle (joint angle) of each joint, a camera using a CCD (Charge Coupled Device), a CMOS (Complementary Metal Oxide Semiconductor), and the like. Various sensors, such as a visual sensor to acquire and a voice sensor to acquire surrounding voice information such as a microphone, are provided.

マスタ１２とスレーブ１３は、互いに同一の位置に関節（アクチュエータ）と関節角センサを有する、自由度配置が共通のロボットにより構成される。つまり、マスタ１２とスレーブ１３は、制御装置１１によるロボットの動作や姿勢の制御（図１の情報処理システム１０）において、同一の幾何形状を有し、制御装置１１からみて互いに同一の幾何的特性を有するように構成されている（つまり、制御に関する幾何モデルが共通である）。例えば、制御装置１１は、共通の制御コマンドで、マスタ１２の動作を制御することも、スレーブ１３の動作を制御することができる。さらに、マスタ１２から制御装置１１に供給されるセンサ情報と、スレーブ１３から制御装置１１に供給されるセンサ情報とが互いに共通（同種のセンサの出力）である。なお、さらにマスタ１２とスレーブ１３の筐体の形状、色、大きさ等も互いに同一であるようにしてももちろんよい。 The master 12 and the slave 13 are configured by robots having joints (actuators) and joint angle sensors at the same position and having a common degree of freedom arrangement. That is, the master 12 and the slave 13 have the same geometric shape in the control of the robot movement and posture by the control device 11 (the information processing system 10 in FIG. 1), and the same geometric characteristics as viewed from the control device 11. (That is, the geometric model related to control is common). For example, the control device 11 can control the operation of the master 12 and the operation of the slave 13 with a common control command. Furthermore, the sensor information supplied from the master 12 to the control device 11 and the sensor information supplied from the slave 13 to the control device 11 are common to each other (the output of the same type of sensor). Of course, the housings of the master 12 and the slave 13 may have the same shape, color, size, and the like.

情報処理システム１０は、ユーザがマスタ１２をコントローラとして操作し、そのマスタ１２の動きに応じて制御装置１１が制御コマンドを生成し、スレーブ１３に同様の動きを行わせることによって環境とのインタラクション（相互作用）を行わせるシステムである。例えば、情報処理システム１０において、制御装置１１は、スレーブ１３に対して、時系列予測器を用いることにより、スレーブ１３のセンサ情報に基づいた制御（自律制御）を行う。また、制御装置１１は、センサ情報や制御コマンド等の時系列データを学習データ（教示データ）として、時系列予測器の予測モデルの学習を行う。さらに、制御装置１１は、教示者であるユーザがマスタ１２を操作して入力する情報（つまりマスタ１２の動き）を、スレーブ１３に再現させるとともに、その動きを再現するスレーブ１３より供給されるセンサ情報を制御コマンドとを統合して、時系列データとして記録する。この時系列データは、例えば、学習データ（教示データ）として、後で上述した学習に利用される。 In the information processing system 10, the user operates the master 12 as a controller, the control device 11 generates a control command according to the movement of the master 12, and causes the slave 13 to perform the same movement (interaction with the environment ( (System of interaction). For example, in the information processing system 10, the control device 11 performs control (autonomous control) on the slave 13 based on the sensor information of the slave 13 by using a time series predictor. Moreover, the control apparatus 11 learns the prediction model of a time series predictor by using time series data, such as sensor information and a control command, as learning data (teaching data). Further, the control device 11 causes the slave 13 to reproduce the information (that is, the movement of the master 12) input by the user who is a teacher by operating the master 12, and the sensor supplied from the slave 13 that reproduces the movement. Information is integrated with control commands and recorded as time series data. This time-series data is used, for example, as learning data (teaching data) for later learning.

従って、情報処理システム１０は、例えば、以下のような複数の側面を有する。つまり、情報処理システム１０は、ユーザが、マスタ１２をコントローラとして操作することにより、スレーブ１３を動作させる制御システムとしての側面、ユーザがマスタ１２に対して行った操作履歴を記録する記録システムとしての側面、および、スレーブ１３のセンサ情報に基づいた自律制御を行うとともに、制御コマンドの生成に用いられる時系列予測器の予測モデルを学習する学習システムとしての側面をそれぞれ有する。以下に、各側面について説明する。 Therefore, the information processing system 10 has, for example, the following multiple aspects. That is, the information processing system 10 functions as a control system that operates the slave 13 when the user operates the master 12 as a controller, and as a recording system that records an operation history performed on the master 12 by the user. In addition to performing side-by-side and autonomous control based on the sensor information of the slave 13, each side has a side as a learning system that learns a prediction model of a time-series predictor used to generate a control command. Below, each side surface is demonstrated.

図３は、図２の情報処理システム１０の、制御システムとしての側面を説明する図である。 FIG. 3 is a diagram illustrating a side surface of the information processing system 10 of FIG. 2 as a control system.

図３に示される制御システム１０Ａは、情報処理システム１０の一側面であり、ユーザ１２Ａが、マスタ１２をコントローラとして操作することにより、制御装置１１を介して、スレーブ１３を動作させるシステムである。 A control system 10A shown in FIG. 3 is an aspect of the information processing system 10, and is a system in which the user 12A operates the slave 12 via the control device 11 by operating the master 12 as a controller.

この制御システム１０Ａにおける制御装置１１は、マスタ１２およびスレーブ１３を制御するマスタスレーブ制御部２１を有する。そのマスタスレーブ制御部２１は、マスタ１２に対して、マスタ１２の各関節に設けられたサーボモータのゲインを設定するサーボゲイン設定コマンド３１を無線通信により供給することにより、マスタ１２の各関節の硬さ（動きやすさ、すなわち、関節の角度の変えやすさ）の程度をそれぞれ制御する。つまり、マスタスレーブ制御部２１は、ユーザ１２Ａの操作に対して、動きやすい関節や硬い関節（動かしてもよい関節や固定すべき関節）を設定することができる。つまり、ユーザは、効率よく教示を行うことができる。 The control device 11 in the control system 10 A includes a master / slave control unit 21 that controls the master 12 and the slave 13. The master slave control unit 21 supplies a servo gain setting command 31 for setting a gain of a servo motor provided at each joint of the master 12 to the master 12 by wireless communication. The degree of hardness (ease of movement, that is, ease of changing the joint angle) is controlled. That is, the master-slave control unit 21 can set a movable joint or a hard joint (a joint that may be moved or a joint that should be fixed) in response to the operation of the user 12A. That is, the user can teach efficiently.

マスタ１２は、ユーザが、直接関節（例えば双腕）を動かすことによってスレーブ１３に実行して欲しい動作を、各関節の角度によって教示するためのインタフェースとして構成される。ユーザによって動作させられたマスタ１２の姿勢の変化は、関節角度センサのセンサ情報（時系列情報）として逐次的に（定期的または不定期に）無線を通じて制御装置１１に送信される。 The master 12 is configured as an interface for teaching the operation that the user wants the slave 13 to perform by directly moving a joint (for example, a double arm) according to the angle of each joint. The change in the posture of the master 12 operated by the user is transmitted to the control device 11 sequentially (regularly or irregularly) as sensor information (time series information) of the joint angle sensor.

マスタスレーブ制御部２１は、そのマスタ１２から供給された関節角センサデータ３２に基づいて、スレーブ１３の各関節（各関節に設けられたアクチュエータ）の動作を制御する関節角コマンド３３を生成し、その関節角コマンド３３を無線通信によりスレーブ１３に供給する。 Based on the joint angle sensor data 32 supplied from the master 12, the master slave control unit 21 generates a joint angle command 33 that controls the operation of each joint (actuator provided at each joint) of the slave 13, The joint angle command 33 is supplied to the slave 13 by wireless communication.

スレーブ１３は、マスタ１２で生成された関節角コマンドに対応する動きを再現し、実環境とインタラクションすることで、自らの関節角センサやコマンド情報と共に、環境を示すセンサ情報を併せて取得するために機能する。供給された関節角コマンド３３に基づいて、各関節のアクチュエータを動作させ、関節角コマンド３３に対応する動作を行う。つまり、スレーブ１３は、ユーザ１２Ａにより操作されたマスタ１２の動きをトレースする。ユーザ１２Ａは、そのスレーブ１３の動作を視認し、そのスレーブ１３の動きに基づいて、次の動作をマスタ１２に対して入力する（マスタ１２を操作する）。つまり、ユーザ１２Ａがスレーブ１３を視認することにより、制御結果（スレーブ１３の動き）が制御指示の入力（マスタ１２の操作）にフィードバックされる。 The slave 13 reproduces the motion corresponding to the joint angle command generated by the master 12 and interacts with the real environment, thereby acquiring sensor information indicating the environment together with its joint angle sensor and command information. To work. Based on the supplied joint angle command 33, the actuator of each joint is operated, and an operation corresponding to the joint angle command 33 is performed. That is, the slave 13 traces the movement of the master 12 operated by the user 12A. The user 12A visually recognizes the operation of the slave 13 and inputs the next operation to the master 12 based on the movement of the slave 13 (operates the master 12). That is, when the user 12A visually recognizes the slave 13, the control result (movement of the slave 13) is fed back to the input of the control instruction (operation of the master 12).

以上のように、ユーザ１２Ａは、制御結果を確認しながら制御指示を入力することができるので、例えば、直接の制御対象でないボール１４に対する動作をスレーブ１３に行わせるような、多様に変化して予測が困難な環境においても、ボール１４とスレーブ１３の動きを確認しながら容易に適切な制御指示を入力することができる。 As described above, the user 12A can input a control instruction while confirming the control result. For example, the user 12A can change variously to cause the slave 13 to perform an operation on the ball 14 that is not directly controlled. Even in an environment where prediction is difficult, it is possible to easily input an appropriate control instruction while checking the movement of the ball 14 and the slave 13.

また、入力インタフェースであるマスタ１２と、制御対象であるスレーブ１３が同一の幾何形状を有するロボットであり、制御するパラメータ（関節）が共通であるので、ユーザ１２Ａは、直感的にマスタ１２Ａの操作方法を理解することができるので、容易にスレーブ１３を制御することができる。また、ユーザ１２Ａの制御指示を制御対象の幾何モデルに変換する必要が無いので、マスタスレーブ制御部２１は、その負荷が低減される。 Further, since the master 12 as the input interface and the slave 13 as the control target are robots having the same geometric shape and the parameters (joints) to be controlled are common, the user 12A intuitively operates the master 12A. Since the method can be understood, the slave 13 can be easily controlled. Further, since it is not necessary to convert the control instruction of the user 12A into the geometric model to be controlled, the load on the master slave control unit 21 is reduced.

さらに、マスタスレーブ制御部２１がサーボゲイン設定コマンド３１を無線通信により供給することにより、ユーザ１２Ａは、例えば、マスタ１２の全身の関節のうち、胴体や足等の動かす必要のない関節を動かさずに、目的の双腕の関節のみを動かす等、適切な操作を容易に実現することができる。このとき、ユーザ１２Ａは、スレーブ１３やボール１４の現在の動きから、その将来の動きを予測し、その動きをマスタ１２に入力することができる。つまり、ユーザ１２Ａのこのような操作により、制御装置１１は、制御における通信や処理の遅延や環境の変化に対する耐性を向上させ、制御の負荷や破綻を抑制することができる。 Further, the master slave control unit 21 supplies the servo gain setting command 31 by wireless communication, so that the user 12A does not move, for example, the joints of the whole body of the master 12 that do not need to be moved, such as the trunk and legs. In addition, it is possible to easily realize an appropriate operation such as moving only the target double-arm joint. At this time, the user 12 A can predict the future movement from the current movement of the slave 13 and the ball 14 and input the movement to the master 12. That is, by such an operation by the user 12A, the control device 11 can improve resistance to communication and processing delays in control and environmental changes, and can suppress a control load and failure.

図４は、図２の情報処理システム１０の、記録システムとしての側面を説明する図である。 FIG. 4 is a diagram illustrating a side surface of the information processing system 10 of FIG. 2 as a recording system.

図４に示される記録システム１０Ｂは、情報処理システム１０の一側面であり、ユーザ１２Ａのマスタ１２の操作履歴を記録するシステムである。 A recording system 10B shown in FIG. 4 is an aspect of the information processing system 10, and is a system that records an operation history of the master 12 of the user 12A.

図３の場合と同様に、制御装置１１のマスタスレーブ制御部２１は、サーボゲイン設定コマンド３１をマスタ１２に供給することによって、マスタ１２の必要な関節のみを可動とし、マスタ１２は、ユーザ１２Ａの操作に対して、各関節の角度（関節角）を示す関節角センサデータ３２を制御装置１１に供給する。 As in the case of FIG. 3, the master slave control unit 21 of the control device 11 supplies the servo gain setting command 31 to the master 12 so that only the necessary joints of the master 12 are movable. In response to the operation, joint angle sensor data 32 indicating the angle (joint angle) of each joint is supplied to the control device 11.

また、制御装置１１のマスタスレーブ制御部２１は、その関節角センサデータ３２から関節角コマンド３３を生成し、スレーブ１３に供給する。スレーブ１３は、その関節角コマンド３３に応じて各関節を動作させ、マスタ１２の動きをトレースする。ユーザ１２Ａは、そのスレーブ１３の動きとボール１４の位置等を視認しながらマスタ１２を操作することにより、制御結果を制御指示入力にフィードバックさせる。 The master slave control unit 21 of the control device 11 generates a joint angle command 33 from the joint angle sensor data 32 and supplies the joint angle command 33 to the slave 13. The slave 13 operates each joint according to the joint angle command 33 and traces the movement of the master 12. The user 12A feeds back the control result to the control instruction input by operating the master 12 while visually recognizing the movement of the slave 13, the position of the ball 14, and the like.

以上は、図３の場合と同様であるが、図４の場合、さらに、スレーブ１３は、関節角センサデータ３４、視覚センサデータ３５、および音声センサデータ３６を制御装置１１に供給する。スレーブ１３は、指定された関節角コマンド３３を実行する一方、自己の姿勢情報およびボール１４等の外部環境情報を、各種センサによって逐次的に観測している。例えば、エンコーダによる関節角度情報よりなる関節角センサデータ３４、カメラによる画像情報（ボール１４の座標）よりなる視覚センサデータ３５、およびマイクロホンによる音声情報（音圧変化）よりなる音声センサデータ３６などである。観測されたこれらのセンサ情報は逐次的に無線を通じて制御装置１１に送信される。 The above is the same as in the case of FIG. 3, but in the case of FIG. 4, the slave 13 further supplies joint angle sensor data 34, visual sensor data 35, and audio sensor data 36 to the control device 11. While the slave 13 executes the designated joint angle command 33, the slave 13 sequentially observes its own posture information and external environment information such as the ball 14 by various sensors. For example, joint angle sensor data 34 composed of joint angle information by an encoder, visual sensor data 35 composed of image information (coordinates of the ball 14) by a camera, and sound sensor data 36 composed of sound information (sound pressure change) from a microphone. is there. The observed sensor information is sequentially transmitted to the control device 11 through wireless communication.

制御装置１１のマスタスレーブ制御部２１は、スレーブ１３に供給する関節角コマンド３３と、スレーブ１３より供給された関節角センサデータ３４、視覚センサデータ３５、および音声センサデータ３６とを統合したベクトルデータを、所定の時間毎に生成し、その各ベクトルデータを時系列データとしてデータ記録部２２に供給して記録させる。データ記録部２２は、例えば、ハードディスク、半導体メモリ、光磁気ディスク等の所定の記録媒体を有しており、マスタスレーブ制御部２１より供給される時系列データをその記録媒体に記録する。 The master-slave control unit 21 of the control device 11 integrates the joint angle command 33 supplied to the slave 13 and the joint angle sensor data 34, visual sensor data 35, and audio sensor data 36 supplied from the slave 13. Are generated every predetermined time, and each vector data is supplied to the data recording unit 22 as time series data and recorded. The data recording unit 22 includes a predetermined recording medium such as a hard disk, a semiconductor memory, or a magneto-optical disk, and records time series data supplied from the master / slave control unit 21 on the recording medium.

このデータ記録部２２に記録された時系列データは、様々な処理に利用することができる。例えば、マスタスレーブ制御部２１は、記録時より時間的に後の、任意の時刻に、データ記録部２２より関節角コマンド３３を時系列に沿って読み出し、スレーブ１３に供給することにより、スレーブ１３を利用してマスタ１２の動きを任意の時刻に再現することができる。すなわち、その再現を複数回行うことも可能である。また、記録された時系列データの編集も可能になる。さらに、他の装置や他のシステムでの利用も可能になる。 The time series data recorded in the data recording unit 22 can be used for various processes. For example, the master-slave control unit 21 reads the joint angle command 33 from the data recording unit 22 along the time series at an arbitrary time later than the time of recording, and supplies the joint angle command 33 to the slave 13. Can be used to reproduce the movement of the master 12 at an arbitrary time. That is, the reproduction can be performed a plurality of times. In addition, the recorded time series data can be edited. Further, it can be used in other devices and other systems.

もちろん、制御装置１１自身が、後述する学習機能の教示データとして、この時系列データを利用することもできる。データ記録部２２は、関節角コマンド３３とともに、関節角コマンドが生成されたのと同時刻にスレーブ１３より得られた関節角センサデータ３４、視覚センサデータ３５、および音声センサデータ３６をベクトルデータとして記録する。つまり、データ記録部２２は、システム内で発生する通信遅延や処理遅延を含む状態で時系列データを記録する。これにより、制御装置１１は、後述するように、学習処理により、制御遅延を考慮した予測モデルの学習を行うことができる。つまり、制御装置１１は、制御における通信や処理の遅延に対する耐性を向上させ、制御の負荷や破綻を抑制することができる。 Of course, the control device 11 itself can use this time-series data as teaching data for a learning function to be described later. The data recording unit 22 uses joint angle sensor data 34, visual sensor data 35, and audio sensor data 36 obtained from the slave 13 at the same time as the joint angle command is generated, as well as the joint angle command 33, as vector data. Record. That is, the data recording unit 22 records time-series data in a state including communication delay and processing delay that occur in the system. Thereby, the control apparatus 11 can learn the prediction model in consideration of the control delay by the learning process, as will be described later. That is, the control device 11 can improve resistance to communication and processing delays in control, and can suppress control load and failure.

また、その関節角コマンドの記録の際に、スレーブ１３がその関節角コマンドによって動作するので、ユーザ１２Ａは、制御結果を視認しながらマスタ１２を操作することができ、データ記録部２２は、適切な制御が行われたときの関節角コマンド３３を記録することができる。 Further, since the slave 13 operates according to the joint angle command when the joint angle command is recorded, the user 12A can operate the master 12 while visually checking the control result, and the data recording unit 22 It is possible to record the joint angle command 33 when a proper control is performed.

なお、図３の制御システムの場合と同様に、制御装置１１のマスタスレーブ制御部２１は、例えば、マスタ１２の双腕の関節のみを、外部から自由に動かせるように、関節角度制御のサーボゲインを切った状態にし、教示者となるユーザに、マスタ１２の双腕を把持し、スレーブに行って欲しい行動を遠隔で指令するように双腕の各関節角のみを操作させることができる。このサーボゲインは、各関節に個別に値を設定することができるようになされており、マスタスレーブ制御部２１は、タスクを実行するために必要となる関節角のみゲインを落とし、その他の関節角は動かないようにゲインを入れておくことや、より可動範囲を広げたい関節角のゲインのみ低いゲインを設定することにより、ユーザが不必要な自由度を動かさずに教示を行うことができるようにすることができる。 As in the case of the control system of FIG. 3, the master slave control unit 21 of the control device 11 can, for example, perform servo gain for joint angle control so that only the double-armed joint of the master 12 can be freely moved from the outside. The user who is a teacher can hold only the two arms of the master 12 and operate only the joint angles of the two arms so as to remotely command the desired action to be performed by the slave. This servo gain can be set individually for each joint, and the master-slave control unit 21 reduces the gain only for the joint angles necessary for executing the task, and other joint angles. The user can teach without moving unnecessary degrees of freedom by setting a gain so that it does not move, or by setting a low gain only for the gain of the joint angle that wants to expand the range of motion. Can be.

図５は、図２の情報処理システム１０の、学習システムとしての側面を説明する図である。 FIG. 5 is a diagram for explaining an aspect of the information processing system 10 of FIG. 2 as a learning system.

図５に示される学習システム１０Ｃは、情報処理システム１０の一側面であり、スレーブ１３をそのセンサ情報による自律制御を行うとともに、その制御コマンドの生成の際に用いられる時系列予測器の予測モデルを学習する学習システムである。 The learning system 10C shown in FIG. 5 is one aspect of the information processing system 10, and performs the autonomous control of the slave 13 based on the sensor information and uses the prediction model of the time series predictor used when generating the control command. It is a learning system that learns.

制御装置１１のデータ学習部２３は、データ記録部２２に記録されている、図４の記録システム１０Ｂにおいて記録されたような時系列データ、すなわち、関節角コマンド３３、関節角センサデータ３４、視覚センサデータ３５、および音声センサデータ３６を含むベクトルデータに基づいて、時系列予測器の予測モデルの学習を行う。 The data learning unit 23 of the control device 11 records time series data recorded in the data recording unit 22 as recorded in the recording system 10B of FIG. 4, that is, the joint angle command 33, the joint angle sensor data 34, the visual Based on the vector data including the sensor data 35 and the voice sensor data 36, the prediction model of the time series predictor is learned.

この予測モデルは、スレーブ１３より供給されるセンサ情報や過去の制御コマンドに基づいて、スレーブ１３を制御する次のステップの制御コマンドを生成する。ところで現実の学習システム１０Ｃにおいては、スレーブ１３において生成された関節角センサデータ３４、視覚センサデータ３５、および音声センサデータ３６が、制御装置１１に供給され、制御装置１１がそれらから関節角コマンド３３を生成し、その関節角コマンド３３がスレーブ１３に供給され、スレーブ１３がその関節角コマンドを実行し、実際の動作となるまでに、タイムラグ、すなわち制御遅延が発生する。 This prediction model generates a control command for the next step for controlling the slave 13 based on sensor information supplied from the slave 13 and past control commands. By the way, in the actual learning system 10C, the joint angle sensor data 34, the visual sensor data 35, and the audio sensor data 36 generated in the slave 13 are supplied to the control device 11, and the control device 11 uses the joint angle command 33 from them. The joint angle command 33 is supplied to the slave 13, and the slave 13 executes the joint angle command, and a time lag, that is, a control delay occurs until an actual operation is performed.

この制御遅延は、例えば、スレーブ１３で観測された関節角センサデータ３４を制御装置１１の時系列予測器に転送する際の通信遅延、制御装置１１において算出された関節角コマンド３３をスレーブ１３に転送する際の通信遅延、および、スレーブ１３において、関節角コマンド３３に基づいて実際にその関節角度にアクチュエータが回転されるまでの制御動作遅延などの要因によるものである。 This control delay is, for example, a communication delay when transferring the joint angle sensor data 34 observed by the slave 13 to the time series predictor of the control device 11, and the joint angle command 33 calculated by the control device 11 to the slave 13. This is due to a communication delay at the time of transfer and a control operation delay until the actuator is actually rotated to the joint angle based on the joint angle command 33 in the slave 13.

そこで、データ学習部２３は、この制御遅延による制御の発散を抑制し、制御遅延に対する耐性を向上させるために、上述したような予め制御遅延を含む時系列データを教示データとし、その制御遅延を考慮して適切な最新の関節角コマンド３３、すなわち、各種のセンサ情報に対して制御遅延時間分将来の動作を予測し、その動作をスレーブ１３に行わせるための関節角コマンド３３を予測して生成する時系列予測器の予測モデルの学習を行う。つまり、データ学習部２３は、このような時系列予測器の予測モデルの予測精度を向上させるために学習を行う。この学習についての詳細は後述する。 Therefore, in order to suppress the divergence of control due to the control delay and improve the resistance against the control delay, the data learning unit 23 uses the time series data including the control delay as described above as the teaching data, and uses the control delay as the control data. Considering the latest joint angle command 33 that is appropriate in consideration, that is, predicting a future operation for various types of sensor information by the control delay time, and predicting a joint angle command 33 for causing the slave 13 to perform the operation. The prediction model of the time series predictor to be generated is learned. That is, the data learning unit 23 performs learning in order to improve the prediction accuracy of the prediction model of such a time series predictor. Details of this learning will be described later.

スレーブ制御部２４は、このデータ学習部２３により学習された予測モデルを用いて、スレーブ１３より供給される、関節角センサデータ３４、視覚センサデータ３５、および音声センサデータ３６等のセンサ情報より、制御遅延時間分先の時刻に対応する関節角コマンド３３を予測し、それをスレーブ１３に供給する。 Using the prediction model learned by the data learning unit 23, the slave control unit 24 uses sensor information such as the joint angle sensor data 34, the visual sensor data 35, and the audio sensor data 36 supplied from the slave 13. A joint angle command 33 corresponding to the time ahead of the control delay time is predicted and supplied to the slave 13.

このように制御装置１１は、時系列予測器により予測モデルを用いて、制御遅延時間分先の時刻に対応する関節角コマンド３３を予測して生成することにより、制御における通信や処理の遅延に対する耐性を向上させ、制御の負荷や破綻を抑制することができる。また、制御装置１１は、予め制御遅延を含む時系列データを教示データとして予測モデルの学習を行うので、予測精度をさらに向上させることができる。 As described above, the control device 11 predicts and generates the joint angle command 33 corresponding to the time ahead of the control delay time by using the prediction model by the time series predictor, thereby preventing communication and processing delay in control. Tolerance can be improved and control load and failure can be suppressed. In addition, since the control device 11 learns the prediction model using time-series data including a control delay in advance as teaching data, the prediction accuracy can be further improved.

以上のように情報処理システム１０は、制御システム、記録システム、および学習システムとしての各側面を有している。従って、情報処理システム１０は、スレーブ１３の自律制御において、センサ情報から制御遅延時間分先の時刻に対応する制御コマンドを予測する予測モデルの学習用の教示データを生成して記録する教示処理と、生成された教示データを用いてその予測モデルの学習を行い、スレーブ１３のセンサ情報に基づいたスレーブ１３の自律制御を行う学習制御処理を行うことができる。 As described above, the information processing system 10 has various aspects as a control system, a recording system, and a learning system. Therefore, in the autonomous control of the slave 13, the information processing system 10 generates teaching data for learning a prediction model for predicting a control command corresponding to the time ahead of the control delay time from the sensor information and records the teaching data. Then, learning control processing for performing autonomous control of the slave 13 based on the sensor information of the slave 13 can be performed by learning the prediction model using the generated teaching data.

次に、情報処理システム１０を構成する各装置について説明する。図６は、図２の制御装置１１の構成例を示すブロック図である。 Next, each device constituting the information processing system 10 will be described. FIG. 6 is a block diagram illustrating a configuration example of the control device 11 of FIG.

図６において、制御装置１１は、制御装置１１の各部を制御する主制御部５１、キーボードやマウスに代表される入力デバイスを有し、制御装置１１のユーザからの入力情報を受け付け、それを主制御部５１に供給する入力部５２、CRT（Cathode Ray Tube）モニタ、LCD（Liquid Crystal Display）、または有機ELディスプレイ（OELD（Organic ElectroLuminescence Display））等のモニタを有し、主制御部５１より供給される画像情報を表示する表示部５３、マスタ１２およびスレーブ１３を制御するマスタスレーブ制御装置２１、ハードディスクや半導体メモリ等に代表される記録媒体を有し、時系列データを記録するデータ記録部２２、時系列データを用いた学習処理により、予測モデルを学習するデータ学習部２３、予測モデルを用いて、スレーブ１３のセンサ情報から制御コマンドを予測し、スレーブ１３を制御するスレーブ制御装置２４、並びに、表示部５３の表示を制御するデータ表示制御部２５を有している。 In FIG. 6, the control device 11 includes a main control unit 51 that controls each unit of the control device 11, an input device represented by a keyboard and a mouse, receives input information from a user of the control device 11, and receives the input information. It has a monitor such as an input unit 52 to be supplied to the control unit 51, a CRT (Cathode Ray Tube) monitor, an LCD (Liquid Crystal Display), or an organic EL display (OELD (Organic ElectroLuminescence Display)), and is supplied from the main control unit 51 Display unit 53 for displaying image information to be displayed, master-slave control device 21 for controlling master 12 and slave 13, and a data recording unit 22 for recording time-series data, including a recording medium represented by a hard disk, a semiconductor memory, and the like. The learning process using the time series data, the data learning unit 23 for learning the prediction model, and the sensor information of the slave 13 using the prediction model. A slave control device 24 that predicts a control command from the information and controls the slave 13 and a data display control unit 25 that controls display of the display unit 53 are provided.

入力部５２、表示部５３、並びに、マスタスレーブ制御部２１乃至データ表示制御部２５は、主制御部５１の制御の下、各処理を行う。 The input unit 52, the display unit 53, and the master / slave control unit 21 to the data display control unit 25 perform each process under the control of the main control unit 51.

図７は、図６の制御装置１１の教示処理に関する主な構成を示している。教示処理の説明に不要な部分は省略している。 FIG. 7 shows a main configuration relating to the teaching process of the control device 11 of FIG. Portions unnecessary for the description of the teaching process are omitted.

この教示時の制御装置１１Ａは、主制御部５１、入力部５２、表示部５３、マスタスレーブ制御部２１、データ記録部２２、およびデータ表示制御部２５を有している。このとき、マスタスレーブ制御部２１は、サーボゲイン設定部６１、無線通信部６２、センサデータ・コマンド変換部６３、無線通信部６４、データ統合部６５、および再生部６６を有している。 The control device 11A at the time of teaching includes a main control unit 51, an input unit 52, a display unit 53, a master / slave control unit 21, a data recording unit 22, and a data display control unit 25. At this time, the master-slave control unit 21 includes a servo gain setting unit 61, a wireless communication unit 62, a sensor data / command conversion unit 63, a wireless communication unit 64, a data integration unit 65, and a reproduction unit 66.

サーボゲイン設定部６１は、マスタ１２の各関節のサーボモータのゲイン（サーボゲイン）の設定に関する処理を行う処理部である。具体的には、主制御部５１を介して表示部５３に、サーボゲインをユーザに指示させるGUI（Graphical User Interface）画面を表示させ、入力部５２より入力されたユーザ指示を受け付け、その指示に基づいてサーボゲインを設定し、サーボゲイン設定コマンド３１を無線通信部６２の送信部７１を介してマスタ１２に送信する。 The servo gain setting unit 61 is a processing unit that performs processing related to setting of the servo motor gain (servo gain) of each joint of the master 12. Specifically, a GUI (Graphical User Interface) screen for instructing the user to specify the servo gain is displayed on the display unit 53 via the main control unit 51, and a user instruction input from the input unit 52 is received. Based on this, the servo gain is set, and the servo gain setting command 31 is transmitted to the master 12 via the transmission unit 71 of the wireless communication unit 62.

無線通信部６２は、例えば、無線LAN等の所定の無線通信規格に準拠する方法でマスタ１２と無線通信を行う処理部であり、マスタ１２にデータを送信する送信部７１、および、マスタ１２より送信されたデータを受信する受信部７２を有している。送信部７１は、例えば、上述したように、サーボゲイン設定部６１より供給されるサーボゲイン設定コマンド３１をマスタ１２に送信する。受信部７２は、例えば、マスタ１２より供給された関節角センサデータ３２を受信し、センサデータ・コマンド変換部６３に供給する。 The wireless communication unit 62 is a processing unit that performs wireless communication with the master 12 by a method that conforms to a predetermined wireless communication standard such as a wireless LAN, for example, from the transmission unit 71 that transmits data to the master 12 and the master 12. It has the receiving part 72 which receives the transmitted data. For example, as described above, the transmission unit 71 transmits the servo gain setting command 31 supplied from the servo gain setting unit 61 to the master 12. For example, the receiving unit 72 receives the joint angle sensor data 32 supplied from the master 12 and supplies the joint angle sensor data 32 to the sensor data / command conversion unit 63.

センサデータ・コマンド変換部６３は、受信部７２を介して供給された、マスタ１２の関節角センサデータ３２を、その関節角を実現する制御コマンドである関節角コマンド３３に変換する処理部である。センサデータ・コマンド変換部６３は、その関節角センサデータ３２を変換して得られた関節角コマンド３３を、無線通信部６４の送信部７３を介してスレーブ１３に送信したり、データ統合部６５に供給したりする。 The sensor data / command conversion unit 63 is a processing unit that converts the joint angle sensor data 32 of the master 12 supplied via the reception unit 72 into a joint angle command 33 that is a control command for realizing the joint angle. . The sensor data / command conversion unit 63 transmits the joint angle command 33 obtained by converting the joint angle sensor data 32 to the slave 13 via the transmission unit 73 of the wireless communication unit 64, or the data integration unit 65. Or to supply.

無線通信部６４は、無線通信部６２と同様の通信部であり、例えば、無線LAN等の所定の無線通信規格に準拠する方法でスレーブ１３と無線通信を行う。無線通信部６４は、スレーブ１３にデータを送信する送信部７３、および、スレーブ１３より送信されたデータを受信する受信部７４を有している。送信部７３は、例えば、上述したように、センサデータ・コマンド変換部６３より供給される関節角コマンド３３をスレーブ１３に送信する。受信部７４は、例えば、スレーブ１３より供給された、関節角センサデータ３４や、ボール１４のスレーブ１３に対する相対位置を示す座標情報であるボール座標データ３７を受信し、それらをデータ統合部６５に供給する。ボール座標データ３７は、スレーブ１３に設けられた各種センサにより得られる情報であり、例えば、視覚センサデータ３５や音声センサデータ３６より算出されるデータである。すなわち、ボール座標データ３７は、制御装置１１において、上述した視覚センサデータ３５や音声センサデータ３６と同様のセンサ情報として処理されるデータである。 The wireless communication unit 64 is a communication unit similar to the wireless communication unit 62, and performs wireless communication with the slave 13 by a method that conforms to a predetermined wireless communication standard such as a wireless LAN. The wireless communication unit 64 includes a transmission unit 73 that transmits data to the slave 13 and a reception unit 74 that receives data transmitted from the slave 13. For example, as described above, the transmission unit 73 transmits the joint angle command 33 supplied from the sensor data / command conversion unit 63 to the slave 13. The receiving unit 74 receives, for example, the joint angle sensor data 34 supplied from the slave 13 and the ball coordinate data 37 which is coordinate information indicating the relative position of the ball 14 to the slave 13, and sends them to the data integration unit 65. Supply. The ball coordinate data 37 is information obtained by various sensors provided in the slave 13, for example, data calculated from the visual sensor data 35 and the audio sensor data 36. That is, the ball coordinate data 37 is data processed as sensor information similar to the above-described visual sensor data 35 and audio sensor data 36 in the control device 11.

データ統合部６５は、所定の時間毎に、センサデータ・コマンド変換部６３より供給される関節角コマンド３３、並びに、受信部７４より供給される関節角センサデータ３４およびボール座標データ３７の各データを統合し、ベクトルデータとして、データ記録部２２に供給する。データ記録部２２は、所定の時間毎に供給されるそのベクトルデータ（関節角コマンド３３、関節角センサデータ３４、およびボール座標データ３７の集合）を、時系列データとして順次記録し、管理する。 The data integration unit 65 receives the joint angle command 33 supplied from the sensor data / command conversion unit 63 and the joint angle sensor data 34 and the ball coordinate data 37 supplied from the reception unit 74 at predetermined time intervals. Are integrated and supplied to the data recording unit 22 as vector data. The data recording unit 22 sequentially records and manages the vector data (a set of the joint angle command 33, the joint angle sensor data 34, and the ball coordinate data 37) supplied every predetermined time as time series data.

再生部６６は、以上のようにしてデータ記録部２２に時系列データとして記録された関節角コマンド３３を、その時系列に沿って順次読み出し、その読み出した関節角コマンド３３を、送信部７３を介してスレーブ１３に供給する。すなわち、再生部６６は、データ記録部２２に記録されている関節角コマンド３３群により示される一連の動作を、スレーブ１３に再現させる。この再生部６６の再現処理により、ユーザは、スレーブ１３の動きによって、データ記録部２２に記録されたデータの内容を確認することができる。例えば、ユーザは、この再現結果を参考にして、新たな教示を行ったり、記録したデータの一部または全部を破棄し、再度教示をやり直したりすることができる。 The reproduction unit 66 sequentially reads the joint angle commands 33 recorded as time series data in the data recording unit 22 as described above along the time series, and the read joint angle commands 33 are transmitted via the transmission unit 73. To the slave 13. That is, the reproducing unit 66 causes the slave 13 to reproduce a series of operations indicated by the joint angle commands 33 recorded in the data recording unit 22. Through the reproduction process of the reproduction unit 66, the user can confirm the content of the data recorded in the data recording unit 22 by the movement of the slave 13. For example, the user can perform new teaching with reference to the reproduction result, or discard part or all of the recorded data and perform teaching again.

また、上述したデータ統合部６５の出力は、データ表示制御部２５にも供給される。データ表示制御部２５は、供給されたベクトルデータ（関節角コマンド３３、関節角センサデータ３４、およびボール座標データ３７の集合）を、数値で表したり、グラフ化したりして視覚化し、その画像情報を、主制御部５１を介して表示部５３に供給する。表示部５３は、供給された画像情報を、所定のGUI画面上等に表示する。このように、記録するデータを視覚化して表示部５３に表示させることにより、ユーザは、記録されるデータの内容を確認しながら、マスタ１２を操作し、教示を行うことができる。 The output of the data integration unit 65 described above is also supplied to the data display control unit 25. The data display control unit 25 visualizes the supplied vector data (a set of the joint angle command 33, the joint angle sensor data 34, and the ball coordinate data 37) with a numerical value or a graph, and displays the image information. Is supplied to the display unit 53 via the main control unit 51. The display unit 53 displays the supplied image information on a predetermined GUI screen. Thus, by visualizing the data to be recorded and displaying it on the display unit 53, the user can operate the master 12 and perform teaching while confirming the content of the data to be recorded.

図８は、図６制御装置１１の学習制御処理に関する主な構成を示している。学習制御処理の説明に不要な部分は省略している。 FIG. 8 shows a main configuration related to the learning control process of the control device 11 of FIG. Portions unnecessary for the description of the learning control process are omitted.

この学習制御時の制御装置１１Ｂは、主制御部５１、入力部５２、表示部５３、データ記録部２２、データ学習部２３、スレーブ制御部２４、およびデータ表示制御部２５を有している。 The control device 11B at the time of learning control has a main control unit 51, an input unit 52, a display unit 53, a data recording unit 22, a data learning unit 23, a slave control unit 24, and a data display control unit 25.

スレーブ制御部２４は、無線通信部９１、データ統合部９２、および時系列予測器９３を有している。 The slave control unit 24 includes a wireless communication unit 91, a data integration unit 92, and a time series predictor 93.

無線通信部９１は、図７の無線通信部６４と同様の通信部であり、例えば、無線LAN等の所定の無線通信規格に準拠する方法でスレーブ１３と無線通信を行う。無線通信部９１は、スレーブ１３にデータを送信する送信部１０１、および、スレーブ１３より送信されたデータを受信する受信部１０２を有している。送信部１０１は、例えば、データ統合部９２より供給される関節角コマンド３３をスレーブ１３に送信する。受信部１０２は、例えば、スレーブ１３より供給された、関節角センサデータ３４やボール座標データ３７を受信し、それらをデータ統合部９２に供給する。 The wireless communication unit 91 is a communication unit similar to the wireless communication unit 64 of FIG. 7, and performs wireless communication with the slave 13 by a method that conforms to a predetermined wireless communication standard such as a wireless LAN, for example. The wireless communication unit 91 includes a transmission unit 101 that transmits data to the slave 13 and a reception unit 102 that receives data transmitted from the slave 13. For example, the transmission unit 101 transmits the joint angle command 33 supplied from the data integration unit 92 to the slave 13. For example, the receiving unit 102 receives the joint angle sensor data 34 and the ball coordinate data 37 supplied from the slave 13 and supplies them to the data integration unit 92.

データ統合部９２は、所定の時間毎に、受信部１０２より供給された関節角センサデータ３４およびボール座標データ３７、並びに、１つ前の時刻において、時系列予測器９３において予測された関節角コマンド３３を統合し、ベクトルデータとして、時系列予測器９３に供給する。また、データ統合部９２は、時系列予測器９３において予測された関節角コマンド３３である関節角コマンド予測データ１１１を含むベクトルデータ取得すると、その関節角コマンド予測データ１１１を新たな関節角コマンド３３として、送信部１０１を介してスレーブ１３に供給する。 The data integration unit 92 receives the joint angle sensor data 34 and the ball coordinate data 37 supplied from the reception unit 102 at every predetermined time, and the joint angle predicted by the time series predictor 93 at the previous time. The commands 33 are integrated and supplied to the time series predictor 93 as vector data. Further, when the data integration unit 92 acquires vector data including the joint angle command prediction data 111 that is the joint angle command 33 predicted by the time series predictor 93, the data integration unit 92 uses the joint angle command prediction data 111 as a new joint angle command 33. Is supplied to the slave 13 via the transmission unit 101.

時系列予測器９３は、データ学習部２３より供給される予測モデルにより、データ統合部９２より供給されたベクトルデータ（１つ前の時刻において、時系列予測器９３において予測された関節角コマンド３３、並びに、受信部１０２より供給された関節角センサデータ３４およびボール座標データ３７）に基づいて、１ステップ分の予測計算を行い、関節角コマンドの予測データである関節角コマンド予測データ１１１、関節角センサデータの予測データである関節角予測データ１１２、および、ボール座標データの予測データであるボール座標予測データ１１３を生成し、それらをベクトルデータとしてデータ統合部９２に供給する。 The time series predictor 93 uses the prediction model supplied from the data learning unit 23 to obtain the vector data supplied from the data integration unit 92 (the joint angle command 33 predicted by the time series predictor 93 at the previous time). In addition, based on the joint angle sensor data 34 and the ball coordinate data 37) supplied from the receiving unit 102, prediction calculation for one step is performed, and joint angle command prediction data 111, which is prediction data of the joint angle command, The joint angle prediction data 112 that is the prediction data of the angle sensor data and the ball coordinate prediction data 113 that is the prediction data of the ball coordinate data are generated and supplied to the data integration unit 92 as vector data.

つまり、スレーブ１３より供給される関節角センサデータ３４やボール座標データ３７が１つ前のステップの関節角コマンド３３とともに時系列予測器９３に供給されると、時系列予測器９３は、制御遅延を無くすように次のステップの関節角コマンド３３を予測する。その新たな関節角コマンド３３は、スレーブ１３に供給され、実行される。そして、その実行の際に得られた新たな関節角センサデータ３４やボール座標データ３７が、制御装置１１Ｂに供給される。このような処理を繰返し、制御装置１１Ｂは、スレーブ１３に対して、スレーブ１３自身のセンサ情報による自律制御を行う。 That is, when the joint angle sensor data 34 and the ball coordinate data 37 supplied from the slave 13 are supplied to the time series predictor 93 together with the joint angle command 33 of the previous step, the time series predictor 93 The joint angle command 33 in the next step is predicted so as to eliminate. The new joint angle command 33 is supplied to the slave 13 and executed. Then, new joint angle sensor data 34 and ball coordinate data 37 obtained at the time of execution are supplied to the control device 11B. By repeating such processing, the control device 11B performs autonomous control on the slave 13 based on the sensor information of the slave 13 itself.

この時系列予測器９３の予測モデルは、現在時間における状態を入力とし、その情報を元に次の時間の状態を予測し出力する計算アルゴリズムのことである。例えば、微分方程式をルンゲ・クッタ法などの手法を用いたアルゴリズムでは、精度の高い時系列予測計算を行うことができる。この予測モデルは、どのような手法で実現するようにしてもよく、あるベクトルデータの入力とそれに対応するベクトルデータの出力との対応関係が予め教示データとして与えられ、その教示データを用いた学習後、教示時に与えたものと同一の入力情報から、教示時と同一の出力情報が得られるモデルであれば何でもよい。例えば、図９に示されるような、リカレントニューラルネットワークを予測モデルとして用いてもよい。 The prediction model of the time series predictor 93 is a calculation algorithm that receives the state at the current time as input and predicts and outputs the state at the next time based on the information. For example, an algorithm using a differential equation such as the Runge-Kutta method can perform highly accurate time series prediction calculation. This prediction model may be realized by any method, and a correspondence relationship between an input of certain vector data and an output of vector data corresponding thereto is given as teaching data in advance, and learning using the teaching data is performed. Thereafter, any model can be used as long as it can obtain the same output information as in teaching from the same input information given in teaching. For example, a recurrent neural network as shown in FIG. 9 may be used as the prediction model.

リカレントニューラルネットワークとは、通常のバックプロパゲーションを行うニューラルネットワークについて、出力層から出力されるベクトルの一部が再帰的に入力層にフィードバックされる構造（コンテキストノード）を持ったニューラルネットワークである。つまり、リカレントニューラルネットワークは、ある時間における状態に至るまでの過去の時系列状態を反映した、ある切り出された一時点での観測量には反映されない、その時点でのネットワーク内の内部状態量（コンテキスト情報）を有している。このコンテキスト情報には、時間方向に奥行きを持った過去の履歴が反映されるので、空間的に決定される状態量だけでは一意に状態を決定できない、時系列予測計算などには必要な状態量である。リカレントニューラルネットワークは、入力された時系列状態ベクトルだけでなく、このコンテキスト情報にも基づいて予測状態ベクトルを生成して出力する。その際、リカレントニューラルネットワークは、コンテキスト情報を更新し、新たなコンテキスト情報として再帰させる（コンテキストループ）。この再帰ループ（コンテキストループ）があることによって、リカレントニューラルネットワークは、入力ベクトルから一意に出力が決定できないような時系列情報についても、コンテキストノードに自己組織的にコンテキスト情報を生成し、判別を行うことが可能となる。 A recurrent neural network is a neural network having a structure (context node) in which a part of a vector output from an output layer is recursively fed back to an input layer with respect to a neural network that performs normal backpropagation. In other words, the recurrent neural network reflects the past time-series state up to the state at a certain time, and is not reflected in the observed amount at a certain cut-out temporary point. Context information). This context information reflects the past history with depth in the time direction, so it is not possible to uniquely determine the state with only the spatially determined state amount. State amount necessary for time-series prediction calculations, etc. It is. The recurrent neural network generates and outputs a predicted state vector based on not only the input time-series state vector but also this context information. At that time, the recurrent neural network updates the context information and recurses it as new context information (context loop). Due to the presence of this recursive loop (context loop), the recurrent neural network generates context information in the context node in a self-organized manner and makes a determination even for time-series information whose output cannot be uniquely determined from the input vector. It becomes possible.

例えば、リカレントニューラルネットワークは、偶々、値が互いに同一の時系列状態ベクトルが入力された２つの場合においても、コンテキスト情報を利用することにより、それぞれの場合の、それまでに入力された時系列ベクトルの値の違いを判別することができるので、互いに異なる予測状態ベクトルを導き出すことができる。 For example, the recurrent neural network accidentally uses the context information in two cases in which time-series state vectors having the same value are input, so that the time-series vectors input so far in each case are used. Can be discriminated, so that different prediction state vectors can be derived.

図８のデータ学習部２３は、データ記録部２２に時系列データとして記録されているベクトルデータ（関節角コマンド３３、関節角センサデータ３４、およびボール座標データ３７の集合）を教示データとして、このようなリカレントニューラルネットワークの学習を行うことにより、関節角コマンド３３の予測モデルを生成し、それを、主制御部５１を介してスレーブ制御部２４の時系列予測器９３に供給する。 The data learning unit 23 in FIG. 8 uses, as teaching data, vector data (a set of joint angle commands 33, joint angle sensor data 34, and ball coordinate data 37) recorded as time series data in the data recording unit 22. By learning such a recurrent neural network, a prediction model of the joint angle command 33 is generated and supplied to the time series predictor 93 of the slave control unit 24 via the main control unit 51.

図１０は、データ学習部２３の詳細な構成例を示すブロック図である。 FIG. 10 is a block diagram illustrating a detailed configuration example of the data learning unit 23.

図１０において、データ学習部２３は、ダイナミクス記憶ネットワーク取得部１２１、ダイナミクス記憶ネットワーク保持部１２２、時系列データ取得部１２３、学習部１２４、およびダイナミクス記憶ネットワーク供給部１２５を有している。 10, the data learning unit 23 includes a dynamics storage network acquisition unit 121, a dynamics storage network holding unit 122, a time series data acquisition unit 123, a learning unit 124, and a dynamics storage network supply unit 125.

ダイナミクス記憶ネットワーク取得部１２１は、主制御部５１を介して、スレーブ制御部２４の時系列予測器９３よりダイナミクス記憶ネットワーク１３１を取得し、それをダイナミクス記憶ネットワーク保持部１２２に供給する。ダイナミクス記憶ネットワーク１３１は、詳細については後述するが、内部状態量を持つ力学系近似モデルをノードとするネットワーク（内部状態量を持つ力学系近似モデルを保持（記憶）するノードによって構成されるネットワーク）であり、時系列予測器９３において使用される、リカレントニューラルネットワークを用いた予測モデルである。 The dynamics storage network acquisition unit 121 acquires the dynamics storage network 131 from the time series predictor 93 of the slave control unit 24 via the main control unit 51, and supplies it to the dynamics storage network holding unit 122. As will be described in detail later, the dynamics storage network 131 is a network having a dynamic system approximation model having an internal state quantity as a node (a network configured by nodes that hold (store) a dynamic system approximation model having an internal state quantity). This is a prediction model using a recurrent neural network used in the time series predictor 93.

ダイナミクス記憶ネットワーク保持部１２２は、ダイナミクス記憶ネットワーク取得部１２１より供給されたダイナミクス記憶ネットワーク１３１を一時的に保持する。このダイナミクス記憶ネットワーク保持部１２２のダイナミクス記憶ネットワーク１３１は、学習部１２４に利用される。 The dynamics storage network holding unit 122 temporarily holds the dynamics storage network 131 supplied from the dynamics storage network acquisition unit 121. The dynamics storage network 131 of the dynamics storage network holding unit 122 is used for the learning unit 124.

時系列データ取得部１２３は、主制御部５１を介して、データ記録部２２より、関節角コマンド３３、関節角センサデータ３４、およびボール座標データ３７等の時系列データを取得し、それを学習部１２４に供給する。 The time series data acquisition unit 123 acquires time series data such as the joint angle command 33, the joint angle sensor data 34, and the ball coordinate data 37 from the data recording unit 22 via the main control unit 51, and learns it. To the unit 124.

学習部１２４は、ダイナミクス記憶ネットワーク保持部１２２よりダイナミクス記憶ネットワーク１３１を取得すると、供給される時系列データを教示データとして、ダイナミクス記憶ネットワーク１３１に対して学習処理を行い、ダイナミクス記憶ネットワーク１３１を更新する。 When the learning unit 124 acquires the dynamics storage network 131 from the dynamics storage network holding unit 122, the learning unit 124 performs learning processing on the dynamics storage network 131 using the supplied time series data as teaching data, and updates the dynamics storage network 131. .

この時系列データに含まれる各データには、ラベル（データが、どのようなカテゴリのデータであるかを表す情報）が付与されていない。従って、学習部１２４は、時系列データに、どのようなカテゴリのデータが含まれているのかは未知であり、そのカテゴリの数も未知であるものとして学習を行う。 Each data included in the time series data is not given a label (information indicating what kind of category the data is). Therefore, the learning unit 124 learns what kind of category data is included in the time-series data, and learns that the number of categories is unknown.

図１０の例において、ダイナミクス記憶ネットワーク１３１は、それぞれがリカレントニューラルネットワークを有するノード１３１−４乃至１３１−９により構成されている。 In the example of FIG. 10, the dynamics storage network 131 includes nodes 131-4 to 131-9 each having a recurrent neural network.

学習部１２４は、ダイナミクス記憶ネットワーク１３１全体によって、時系列データの特徴を適切に表現できるように学習を行う。ダイナミクス記憶ネットワーク１３１を構成するそれぞれのノード１３１−４乃至１３１−９は自己組織的に学習が行われる。 The learning unit 124 performs learning so that the characteristics of the time series data can be appropriately expressed by the entire dynamics storage network 131. Each node 131-4 to 131-9 constituting the dynamics storage network 131 is learned in a self-organizing manner.

ここで、１つのノードが必ずしも１つのカテゴリに対応するとは限らないということには注意しておく必要がある。むしろ、複数のノードによって、あるカテゴリが構成されると見ることができる。例えば、時系列データにカテゴリ「Ａ」、「Ｂ」、「Ｃ」の３つのカテゴリのデータが含まれる場合には、カテゴリ「Ａ」、「Ｂ」、「Ｃ」それぞれが複数のノードによって学習されることがある。また、時系列データに含まれるデータが明確にカテゴリ分けできない（人間がカテゴリ分けできない）場合であっても、学習は可能である。 Here, it should be noted that one node does not necessarily correspond to one category. Rather, it can be seen that a certain category is constituted by a plurality of nodes. For example, when time-series data includes data of three categories “A”, “B”, and “C”, each of the categories “A”, “B”, and “C” is learned by a plurality of nodes. May be. Further, even when the data included in the time series data cannot be clearly categorized (humans cannot categorize), learning is possible.

ダイナミクス記憶ネットワーク１３１は、複数のノードで構成されるネットワークである。ノードの一つ一つは時系列パターンを保持するために利用される。そして、ノードとノードは結合関係を持つことができる。この結合関係をリンクと呼ぶ。図１０のダイナミクス記憶ネットワーク１３１では、例えば、ノード１３１−４は、ノード１３１−５およびノード１３１−６と結合関係を有するが、この結合関係がリンクにあたる。 The dynamics storage network 131 is a network composed of a plurality of nodes. Each node is used to hold a time series pattern. The nodes can have a connection relationship. This connection relationship is called a link. In the dynamics storage network 131 of FIG. 10, for example, the node 131-4 has a connection relationship with the node 131-5 and the node 131-6, and this connection relationship corresponds to a link.

図１１と図１２に、ダイナミクス記憶ネットワークの代表的な例を示す。 11 and 12 show typical examples of dynamics storage networks.

図１１のダイナミクス記憶ネットワーク１４０は、すべてのノード１４１乃至１４７がリンクを有していない。 In the dynamics storage network 140 of FIG. 11, all the nodes 141 to 147 do not have links.

これに対して、図１２のダイナミクス記憶ネットワーク１５０は、すべてのノード１５１乃至１５９が２次元的に配置され、縦方向および横方向に隣接するノード間にリンクが与えられている。ここで、リンクは空間上にノードが配置される構造を与えるために利用される。つまり、図１２のダイナミクス記憶ネットワーク１５０は、２次元的なノードの配置構造を与えたダイナミクス記憶ネットワークの例であり、図１１のダイナミクス記憶ネットワーク１４０は、ノードの配置に空間的な制約がない構造を与えたダイナミクス記憶ネットワークの例である。 On the other hand, in the dynamics storage network 150 of FIG. 12, all the nodes 151 to 159 are two-dimensionally arranged, and links are provided between nodes that are adjacent in the vertical and horizontal directions. Here, the link is used to give a structure in which nodes are arranged in space. That is, the dynamics storage network 150 in FIG. 12 is an example of a dynamics storage network given a two-dimensional node arrangement structure, and the dynamics storage network 140 in FIG. 11 has a structure in which no node is spatially restricted. It is an example of a dynamics storage network given.

リンクによって与えられる空間的なノードの配置構造に基づき、その空間上での距離関係が決定される。例えば、図１２のダイナミクス記憶ネットワーク１５０の場合、あるノードに着目したとき、その着目ノードとリンクで直接結合された、着目ノードに隣接するノードは（着目ノードとの距離が）最も近く、その隣接ノードから先のリンクを順にたどっていくことで到達するノードは（着目ノードとの距離が）少しずつ遠くなっていく。 Based on the spatial node arrangement structure given by the link, the distance relation in the space is determined. For example, in the case of the dynamics storage network 150 shown in FIG. 12, when attention is paid to a certain node, the node adjacent to the target node that is directly connected to the target node via the link is the closest (distance to the target node) The node reached by following the links from the node in order (distance from the node of interest) gradually increases.

これに対して、図１１のダイナミクス記憶ネットワーク１４０の場合、空間上における距離関係は与えられない。 On the other hand, in the case of the dynamics storage network 140 of FIG. 11, no spatial distance relationship is given.

図１１や図１２の例以外にも、リンクの構成の仕方によって、ノードの空間上における配置構造を変えることができ、その配置構造は、リンクを使うことで任意に設定することができる。 In addition to the examples of FIGS. 11 and 12, the arrangement structure of the nodes in the space can be changed depending on the way the link is configured, and the arrangement structure can be arbitrarily set by using the link.

図１３は、ダイナミクス記憶ネットワーク１３１の１つのノードの詳細例を示す図である。 FIG. 13 is a diagram illustrating a detailed example of one node of the dynamics storage network 131.

ダイナミクス記憶ネットワークの１つのノードは、内部状態量を持つ力学系近似モデル１６１と、その力学系近似モデル１６１のパラメータを学習するための学習データ（教示データ）を記憶しておく学習データ記憶部１６２を有している。内部状態量を持つ力学系近似モデル１６１としては、例えばリカレントニューラルネットワークなどが利用される。 One node of the dynamics storage network stores a dynamic system approximation model 161 having an internal state quantity and a learning data storage unit 162 that stores learning data (teaching data) for learning parameters of the dynamic system approximation model 161. have. As the dynamic system approximation model 161 having the internal state quantity, for example, a recurrent neural network is used.

図１３では、力学系近似モデル１６１として、三層型ニューラルネットワーク（NN）の出力層から入力層への回帰ループを持つリカレントニューラルネットワーク（RNN）が用いられている。このリカレントニューラルネットワークを用いて、時系列データにおける時刻ｔの状態ベクトルXtを入力し、時刻ｔ＋１の状態ベクトルXt+1を予測して出力することを学習（予測学習、prediction learning）することにより、対象となる時系列データの時間発展法則を学習することができる。 In FIG. 13, a recurrent neural network (RNN) having a regression loop from the output layer to the input layer of a three-layer neural network (NN) is used as the dynamical system approximation model 161. By using this recurrent neural network, learning (prediction learning) that inputs the state vector Xt at time t in the time series data and predicts and outputs the state vector Xt + 1 at time t + 1. It is possible to learn the time evolution law of the target time-series data.

リカレントニューラルネットワークのような内部状態量を持つ力学系近似モデルのパラメータの推定方法には、Back-Propagation Through Time 法が利用される。（参考文献： D. E. Rumelhart, G. E. Hinton & R. E. Williams, 1986 “Learning internal representations by error propagation”, In D. E. Rumelhart & J. McClelland, "Parallel distributed processing", pp. 318-364, Cambridge, MA: MIT Press） The Back-Propagation Through Time method is used as a parameter estimation method for dynamical system approximation models with internal state quantities such as a recurrent neural network. (Reference: D. E. Rumelhart, G. E. Hinton & R. E. Williams, 1986 “Learning internal representations by error propagation”, In D. E. Rumelhart & J. McClelland, “Parallel distributed processing”, pp. 318-364, Cambridge, MA: MIT Press)

内部状態量を持つ力学系近似モデル１６１では、学習データ記憶部１６２に記憶された学習データの力学的な特性が学習され、これにより、内部状態量を持つ力学系近似モデル１６１と学習データ記憶部１６２のデータは対応関係を持つことになる。 In the dynamic system approximation model 161 having the internal state quantity, the dynamic characteristics of the learning data stored in the learning data storage unit 162 are learned, and thereby, the dynamic system approximation model 161 having the internal state quantity and the learning data storage unit. The data 162 has a correspondence relationship.

ここで、学習に使われるデータは時系列データであることから、内部状態量を持つ力学系近似モデル１６１はダイナミクスを学習することになる。 Here, since the data used for learning is time series data, the dynamical approximate model 161 having the internal state quantity learns the dynamics.

図１０に戻り、以上のようなダイナミクス記憶ネットワーク１３１の更新が完了すると、学習部１２４は、更新されたダイナミクス記憶ネットワーク１３１をダイナミクス記憶ネットワーク保持部１２２に戻す。 Returning to FIG. 10, when the update of the dynamics storage network 131 as described above is completed, the learning unit 124 returns the updated dynamics storage network 131 to the dynamics storage network holding unit 122.

ダイナミクス記憶ネットワーク供給部１２５は、その更新されたダイナミクス記憶ネットワーク１３１を、ダイナミクス記憶ネットワーク保持部１２２より取得し、それを、主制御部４１を介して、スレーブ制御部２４の時系列予測器９３に供給する。 The dynamics storage network supply unit 125 acquires the updated dynamics storage network 131 from the dynamics storage network holding unit 122, and sends it to the time series predictor 93 of the slave control unit 24 via the main control unit 41. Supply.

図１４は、時系列予測器９３の詳細な構成例を示すブロック図である。 FIG. 14 is a block diagram illustrating a detailed configuration example of the time series predictor 93.

図１４に示されるように、時系列予測器９３は、入力部１７１、特徴抽出部１７２、認識部１７３、ネットワーク記憶部１７４、内部状態記憶部１７５、および生成部１７６を有している。 As illustrated in FIG. 14, the time series predictor 93 includes an input unit 171, a feature extraction unit 172, a recognition unit 173, a network storage unit 174, an internal state storage unit 175, and a generation unit 176.

入力部１７１は、データ統合部９２より供給された時系列データ（１つ前の関節角コマンド３３、並びに、スレーブ１３より供給された関節角センサデータ３４およびボール座標データよりなるベクトルデータ）を受け付け、それを特徴抽出部１７２に供給する。 The input unit 171 receives time-series data (vector data composed of the previous joint angle command 33 and the joint angle sensor data 34 and ball coordinate data supplied from the slave 13) supplied from the data integration unit 92. , And supplies it to the feature extraction unit 172.

特徴抽出部１７２では、入力部１７１より供給された時系列データから特徴量の抽出が行われる。例えば、センサ信号の１つである音声信号に対して一定時間間隔で周波数分析などの処理が施され、メルケプストラムなどの特徴量が時系列に抽出される。 The feature extraction unit 172 extracts feature amounts from the time series data supplied from the input unit 171. For example, an audio signal, which is one of the sensor signals, is subjected to processing such as frequency analysis at regular time intervals, and feature quantities such as mel cepstrum are extracted in time series.

ここで、メルケプストラムとは音声認識などで広く利用されている特徴量である。特徴抽出部１７２が時系列データより特徴量を時系列に抽出することにより得られる、その特徴量の時系列データは認識部１７３の内部状態量更新部１８１および生成部１７６の時系列データ生成部１９３に供給される。 Here, the mel cepstrum is a feature amount widely used for voice recognition and the like. The feature extraction unit 172 extracts the feature quantity from the time series data in time series, and the time series data of the feature quantity is the internal state quantity update unit 181 of the recognition unit 173 and the time series data generation unit of the generation unit 176. 193.

認識部１７３は、特徴抽出部１７２より供給された時系列データに対して、それまでの学習の結果である、ネットワーク記憶部１７４に記憶されているダイナミクス記憶ネットワークに保持されたダイナミクスと照らし合わせ、最も類似したダイナミクスを決定し、その結果を認識結果として生成部１７６に供給する。 The recognizing unit 173 compares the time series data supplied from the feature extracting unit 172 with the dynamics held in the dynamics storage network stored in the network storage unit 174, which is the result of learning so far. The most similar dynamics are determined, and the result is supplied to the generation unit 176 as a recognition result.

認識部１７３は、内部状態量更新部１８１、スコア計算部１８２、勝者ノード決定部１８３、および認識結果出力部１８４を有している。 The recognition unit 173 includes an internal state quantity update unit 181, a score calculation unit 182, a winner node determination unit 183, and a recognition result output unit 184.

内部状態量更新部１８１は、内部状態記憶部１７５から前回更新されて記憶されている内部状態量をネットワーク記憶部１７４に記憶されているダイナミクス記憶ネットワークの各ノードの力学系近似モデルへ読み込み、入力となる時系列データに基づいて、各内部状態量を更新する。 The internal state quantity update unit 181 reads the internal state quantity updated and stored last time from the internal state storage unit 175 into the dynamic system approximation model of each node of the dynamics storage network stored in the network storage unit 174, and inputs it. Each internal state quantity is updated based on the time-series data.

スコア計算部１８２は、学習時に勝者ノードを決定するために行う処理と同じスコア計算を行う。スコア計算部１８２のスコア計算の結果、各ノードにはスコアが付与される。上述したように、内部状態量を持つ力学系近似モデルがリカレントニューラルネットワークで与えられる場合には、予測出力の平均二乗誤差がスコアとして利用される。 The score calculation unit 182 performs the same score calculation as the process performed to determine the winner node during learning. As a result of the score calculation by the score calculation unit 182, a score is assigned to each node. As described above, when a dynamic system approximation model having an internal state quantity is given by a recurrent neural network, the mean square error of the predicted output is used as a score.

つまり、認識部１７３においては、内部状態量を更新しながら、スコアの計算が行われる。勝者ノード決定部１８３は、スコア計算部１８２において得られるスコアに基づき、最もスコアの良いノード、すなわち勝者ノードを決定する。さらに、勝者ノード決定部１８３は、この最もスコアの良いノード（勝者ノード）に対応するダイナミクスを、入力された時系列データに最も適合するダイナミクスとして選択する。 That is, the recognition unit 173 calculates the score while updating the internal state quantity. The winner node determination unit 183 determines the node with the best score, that is, the winner node, based on the score obtained by the score calculation unit 182. Further, the winner node determination unit 183 selects the dynamics corresponding to the node with the highest score (winner node) as the dynamics most suitable for the input time-series data.

上述した内部状態量更新部１８１は、勝者ノードが決定されたときの内部状態量の更新値（更新された内部状態量）と、その勝者ノードが決定されたときの内部状態量の初期値とを、内部状態記憶部１７５に記憶させる。 The internal state quantity updating unit 181 described above includes an update value of the internal state quantity when the winner node is determined (updated internal state quantity), and an initial value of the internal state quantity when the winner node is determined. Is stored in the internal state storage unit 175.

ここで、内部状態記憶部１７５に記憶された内部状態量の更新値は、認識部１７３での次回のスコア計算に利用される。また、内部状態記憶部１７５に記憶された内部状態量の初期値は、生成部１７６において利用される。 Here, the updated value of the internal state quantity stored in the internal state storage unit 175 is used for the next score calculation in the recognition unit 173. The initial value of the internal state quantity stored in the internal state storage unit 175 is used in the generation unit 176.

認識部１７３の認識結果出力部１８４は、勝者ノード決定部１８３においてどのノードが選択されたかという情報を認識結果として出力し、生成部１７６の生成ノード決定部１９１に供給する。 The recognition result output unit 184 of the recognition unit 173 outputs information indicating which node has been selected by the winner node determination unit 183 as a recognition result, and supplies the information to the generation node determination unit 191 of the generation unit 176.

ネットワーク記憶部１７４は、時系列予測モデルとしてのダイナミクス記憶ネットワークを記憶する記憶部である。なお、上述した、ダイナミクス記憶ネットワークのノードの一部である学習データ記憶部１６２は、ここでは、ネットワーク記憶部１７４の記憶領域の一部として構成される。 The network storage unit 174 is a storage unit that stores a dynamics storage network as a time series prediction model. The learning data storage unit 162 that is a part of the nodes of the dynamics storage network described above is configured as a part of the storage area of the network storage unit 174 here.

内部状態記憶部１７５は、上述の認識部１７３の処理において更新された力学系近似モデル、すなわち、ネットワーク記憶部１７４のダイナミクス記憶ネットワークの内部状態量（内部状態）を保持する。この内部状態量は、認識部１７３によって更新され、生成部１７６によって生成処理に利用される。 The internal state storage unit 175 holds the dynamic system approximate model updated in the processing of the recognition unit 173, that is, the internal state amount (internal state) of the dynamics storage network of the network storage unit 174. The internal state quantity is updated by the recognition unit 173 and used by the generation unit 176 for generation processing.

生成部１７６は、ダイナミクス記憶ネットワークに保持されたダイナミクスから、必要に応じて次の時系列データ（予測情報）を生成する。生成部１７６は、生成ノード決定部１９１、内部状態読み込み部１９２、時系列データ生成部１９３、および生成結果出力部１９４を有している。 The generation unit 176 generates the next time series data (prediction information) as needed from the dynamics stored in the dynamics storage network. The generation unit 176 includes a generation node determination unit 191, an internal state reading unit 192, a time-series data generation unit 193, and a generation result output unit 194.

生成ノード決定部１９１は、認識部１７３より供給される認識結果に基づき、時系列データを生成すべきノード（生成ノード）を決定する。つまり、認識部１７３の認識の処理において決定された勝者ノードが生成ノードに決定される。生成部１７６は、認識部１７３の認識の処理において決定された勝者ノードから時系列データ（予測情報）を生成する。 The generation node determination unit 191 determines a node (generation node) where time series data should be generated based on the recognition result supplied from the recognition unit 173. That is, the winner node determined in the recognition process of the recognition unit 173 is determined as the generation node. The generation unit 176 generates time-series data (prediction information) from the winner node determined in the recognition process of the recognition unit 173.

内部状態読み込み部１９２は、ネットワーク記憶部１７４に記憶されたダイナミクス記憶ネットワークの生成ノードの力学系近似モデルに内部状態記憶部１７５の記憶値を、内部状態量の初期値として読み込む。即ち、内部状態読み込み部１９２は、内部状態記憶部１７５の記憶値のうちの、生成ノードが認識部１７３において勝者ノードに決定されたときの内部状態量の初期値を読み出し、生成ノードの力学系近似モデルの内部状態量の初期値にセットする。 The internal state reading unit 192 reads the storage value of the internal state storage unit 175 as the initial value of the internal state quantity into the dynamical system approximate model of the generation node of the dynamics storage network stored in the network storage unit 174. That is, the internal state reading unit 192 reads the initial value of the internal state quantity when the generation node is determined as the winner node in the recognition unit 173 among the stored values of the internal state storage unit 175, and generates the dynamic system of the generation node Set to the initial value of the internal state quantity of the approximate model.

時系列データ生成部１９３は、特徴抽出部１７２において抽出された特徴量の時系列データを読み込み、その時系列データと、内部状態読み込み部１９２によって内部状態量の初期値がセットされた力学系近似モデルに基づき、その内部状態量を更新しながら、時系列データを生成する。 The time series data generation unit 193 reads the time series data of the feature amount extracted by the feature extraction unit 172, and the dynamic system approximate model in which the time series data and the initial value of the internal state amount are set by the internal state reading unit 192. Based on the above, time series data is generated while updating the internal state quantity.

生成結果出力部１９４は、その時系列データ生成部１９３により生成された時系列データ（予測情報）を、生成結果としてデータ統合部９２（図８）に出力する。 The generation result output unit 194 outputs the time series data (prediction information) generated by the time series data generation unit 193 to the data integration unit 92 (FIG. 8) as a generation result.

ここで、ダイナミクス記憶ネットワークで学習されるダイナミクスの数は、ダイナミクス記憶ネットワークのノード数と一致するので、そのノード数に応じた時系列データを認識し、その認識結果に応じて、時系列データを生成することが可能となる。 Here, since the number of dynamics learned in the dynamics storage network matches the number of nodes in the dynamics storage network, the time series data corresponding to the number of nodes is recognized, and the time series data is converted according to the recognition result. Can be generated.

図１５は、マスタ１２やスレーブ１３として利用されるロボットの斜視図である。 FIG. 15 is a perspective view of a robot used as the master 12 or the slave 13.

図１５に示されるように、本実施の形態においては、２足歩行の人型のロボット装置をマスタ１２およびスレーブ１３として用いる場合ついて説明するが、実際には、２足歩行のロボット装置に限らず、４足又は車輪等により移動可能なロボット装置等、どのようなロボット装置であってもよい。また、駆動装置、通信装置、情報処理装置、またはAV機器等、ロボット以外のどのような装置であってももちろんよい。 As shown in FIG. 15, in the present embodiment, a case where a bipedal humanoid robot apparatus is used as the master 12 and the slave 13 will be described. However, the robot apparatus is actually limited to the biped robot apparatus. Any robot device such as a robot device that can be moved by four legs or wheels may be used. Of course, any device other than a robot, such as a drive device, a communication device, an information processing device, or an AV device, may be used.

図１５に示される人型のロボット装置は、住環境その他の日常生活上の様々な場面における人的活動を支援する実用ロボットであり、環境中の情報を視覚や聴覚を使って取得し、環境の状況に応じた行動を、教示された経験やあらかじめプログラムされた行動計画に従って、再現することが可能である。 The humanoid robot device shown in FIG. 15 is a practical robot that supports human activities in various situations in the living environment and other daily lives. It acquires information in the environment using visual and auditory senses. It is possible to reproduce the behavior according to the situation according to the taught experience and the pre-programmed action plan.

図１５において、マスタ１２とされるロボット装置は、体幹部ユニット２０１の所定の位置に頭部ユニット２０２が連結されると共に、左右２つの腕部ユニット２０３Ｒおよび２０３Ｌと、左右２つの脚部ユニット２０４Ｒおよび２０４Ｌが連結されて構成されている。これらのＲ及びＬは、それぞれ、右または左を示す接尾辞であり、以下において、左右を互いに区別する必要のない場合は省略される。 In FIG. 15, the robot device as the master 12 has a head unit 202 coupled to a predetermined position of the trunk unit 201, two left and right arm units 203R and 203L, and two left and right leg units 204R. And 204L are connected to each other. These R and L are suffixes indicating right or left, respectively, and will be omitted below when it is not necessary to distinguish left and right.

図１５のロボット装置が具備する関節自由度構成を図１６に模式的に示す。 FIG. 16 schematically shows a joint degree-of-freedom configuration included in the robot apparatus of FIG.

頭部ユニット２０２を支持する首関節は、首関節ヨー軸２１１、首関節ピッチ軸２１２、および首関節ロール軸２１３という３自由度を有する。 The neck joint that supports the head unit 202 has three degrees of freedom: a neck joint yaw axis 211, a neck joint pitch axis 212, and a neck joint roll axis 213.

上肢を構成する各々の腕部ユニット２０３Ｒおよび２０３Ｌは、肩関節ピッチ軸２１７、肩関節ロール軸２１８、上腕ヨー軸２１９、肘関節ピッチ軸２２０、前腕ヨー軸２２１、手首関節ピッチ軸２２２、手首関節ロール軸２２３、および手部２２４とで構成される。手部２２４は、実際には、さらに複数本の指を含む多関節・多自由度構造体である。ただし、手部２２４の動作は、ロボット装置全体の姿勢制御や歩行制御に対する寄与や影響が少ないので、以下においては、説明の簡略化のためにゼロ自由度とする。従って、左右の各腕部は、それぞれ７自由度を有する。 Each arm unit 203R and 203L constituting the upper limb includes a shoulder joint pitch axis 217, a shoulder joint roll axis 218, an upper arm yaw axis 219, an elbow joint pitch axis 220, a forearm yaw axis 221, a wrist joint pitch axis 222, and a wrist joint. A roll shaft 223 and a hand portion 224 are included. The hand portion 224 is actually a multi-joint / multi-degree-of-freedom structure including a plurality of fingers. However, since the operation of the hand portion 224 has little contribution or influence on the posture control or walking control of the entire robot apparatus, the following description assumes zero degrees of freedom for the sake of simplicity of explanation. Therefore, the left and right arms have 7 degrees of freedom.

体幹部ユニット２０１は、体幹ピッチ軸２１４、体幹ロール軸２１５、および体幹ヨー軸２１６という３自由度を有する。 The trunk unit 201 has three degrees of freedom: a trunk pitch axis 214, a trunk roll axis 215, and a trunk yaw axis 216.

下肢を構成する各々の脚部ユニット２０４Ｒおよび２０４Ｌは、股関節ヨー軸２２５、股関節ピッチ軸２２６、股関節ロール軸２２７、膝関節ピッチ軸２２８、足首関節ピッチ軸２２９、足首関節ロール軸２３０、および足部２３１とで構成される。なお、股関節ピッチ軸２２６と股関節ロール軸２２７の交点は、ロボット装置全体の股関節位置を定義する。また、足部２３１は、実際には多関節・多自由度の足底を含んだ構造体であるが、以下においては説明の簡略化のため、ロボット装置の足底は、ゼロ自由度であるとする。従って、左右の各脚部は、それぞれ６自由度を有する。 Each leg unit 204R and 204L constituting the lower limb includes a hip joint yaw axis 225, a hip joint pitch axis 226, a hip joint roll axis 227, a knee joint pitch axis 228, an ankle joint pitch axis 229, an ankle joint roll axis 230, and a foot part. 231. Note that the intersection of the hip joint pitch axis 226 and the hip joint roll axis 227 defines the hip joint position of the entire robot apparatus. In addition, the foot 231 is actually a structure including a multi-joint / multi-degree-of-freedom sole, but in the following, for the sake of simplicity of explanation, the bottom of the robot apparatus has zero degrees of freedom. And Accordingly, the left and right leg portions each have six degrees of freedom.

以上のように、図１６のロボット装置全体としては、合計で３＋７×２＋３＋６×２＝３２自由度を有することになる。ただし、これはあくまでも一例であり、マスタ１２およびスレーブ１３の自由度は３２自由度に限定されるわけではない。つまり、マスタ１２およびスレーブ１３の制御パラメータの数（ロボットの場合、自由度すなわち関節数）は、いくつであってもよく、例えば、設計や制作上の制約条件や要求仕様等に応じて適宜増減可能である。 As described above, the entire robot apparatus of FIG. 16 has a total of 3 + 7 × 2 + 3 + 6 × 2 = 32 degrees of freedom. However, this is merely an example, and the degrees of freedom of the master 12 and the slave 13 are not limited to 32 degrees of freedom. That is, the number of control parameters of the master 12 and the slave 13 (in the case of a robot, the number of degrees of freedom, that is, the number of joints) may be any number. Is possible.

上述したようなロボット装置が有する各自由度は、実際にはアクチュエータを用いて実装される。外観上で余分な膨らみを排してヒトの自然体形状に近似させること、２足歩行という不安定構造体に対して姿勢制御を行うこと等の要請から、関節に設けられるアクチュエータは、小型且つ軽量であることが好ましい。 Each degree of freedom of the robot apparatus as described above is actually implemented using an actuator. The actuators provided in the joints are small and light because of the need to eliminate the extra bulge on the appearance and approximate the shape of a human body, or to control the posture of an unstable structure called biped walking. It is preferable that

このようなロボット装置は、ロボット装置全体の動作を制御する制御システムを例えば体幹部ユニット２０１等に備える。図１７は、マスタ１２およびスレーブ１３として用いられるロボット装置の制御システム構成を示す模式図である。図１７に示されるように、ロボット装置内の制御システムは、ユーザ入力等に動的に反応して情緒判断や感情表現を司る思考制御モジュール３００、および、アクチュエータ４５０の駆動等ロボット装置の全身協調運動を制御する運動制御モジュール４００により構成される。 Such a robot apparatus includes a control system that controls the operation of the entire robot apparatus, for example, in the trunk unit 201. FIG. 17 is a schematic diagram showing a control system configuration of the robot apparatus used as the master 12 and the slave 13. As shown in FIG. 17, the control system in the robot apparatus is a whole body cooperation of the robot apparatus such as driving of the thinking control module 300 that controls emotion judgment and emotion expression in response to user input and the like, and the actuator 450. The motion control module 400 controls motion.

思考制御モジュール３００は、バスI/F（InterFace）３０１により互いに接続された、情緒判断や感情表現に関する演算処理を実行するCPU（Central Processing Unit）３１１、RAM（Random Access Memory）３１２、ROM（Read Only Memory）３１３、および外部記憶装置（ハード・ディスク・ドライブ等）３１４等で構成され、モジュール内で自己完結した処理を行うことができる、独立駆動型の情報処理装置である。 The thought control module 300 includes a CPU (Central Processing Unit) 311, a RAM (Random Access Memory) 312, and a ROM (Read) that are connected to each other via a bus I / F (InterFace) 301 and execute arithmetic processing related to emotion judgment and emotion expression. Only memory) 313, an external storage device (hard disk drive, etc.) 314, etc., and is an independent drive type information processing apparatus capable of performing self-contained processing in a module.

この思考制御モジュール３００は、バスI/F３０１を介して画像入力装置３５１から入力される画像データや、バスI/F３０１を介して音声入力装置３５２から入力される音声データ等、外界からの刺激等に従って、ロボット装置の現在の感情や意思を決定する。すなわち、上述したように、入力される画像データからユーザの表情を認識し、その情報をロボット装置の感情や意思に反映させることで、ユーザの表情に応じた行動を発現することができる。ここで、画像入力装置３５１は、例えばCCD（Charge Coupled Device）カメラを複数有し、音声入力装置３５２は、例えばマイクロホンを複数有している。 This thought control module 300 is used for stimuli from the outside, such as image data input from the image input device 351 via the bus I / F 301, audio data input from the audio input device 352 via the bus I / F 301, and the like. To determine the current emotion and intention of the robotic device. That is, as described above, by recognizing the user's facial expression from the input image data and reflecting the information on the emotion and intention of the robot apparatus, it is possible to express an action according to the user's facial expression. Here, the image input device 351 has a plurality of CCD (Charge Coupled Device) cameras, for example, and the audio input device 352 has a plurality of microphones, for example.

また、思考制御モジュール３００は、意思決定に基づいた動作又は行動シーケンス、すなわち四肢の運動を実行するように、運動制御モジュール４００に対して指令を発行する。 In addition, the thought control module 300 issues a command to the motion control module 400 to execute an action or action sequence based on decision making, that is, exercise of the limbs.

運動制御モジュール４００は、バスI/F４０１により互いに接続された、ロボット装置の全身協調運動を制御するCPU４１１、RAM４１２、ROM４１３、および外部記憶装置（ハード・ディスク・ドライブ等）４１４等で構成され、モジュール内で自己完結した処理を行うことができる独立駆動型の情報処理装置である。また、外部記憶装置４１４には、例えば、オフラインで算出された歩行パターンや目標とするZMP（Zero Moment Point）軌道や、その他の行動計画を蓄積することができる。 The motion control module 400 includes a CPU 411, a RAM 412, a ROM 413, an external storage device (hard disk drive, etc.) 414, etc., which are connected to each other by a bus I / F 401 and control the whole body cooperative motion of the robot device. It is an independent drive type information processing apparatus capable of performing self-contained processing. The external storage device 414 can store, for example, a walking pattern calculated offline, a target ZMP (Zero Moment Point) trajectory, and other action plans.

この運動制御モジュール４００には、図１６に示したロボット装置の全身に分散するそれぞれの関節自由度を実現するアクチュエータ４５０、対象物との距離を測定する距離計測センサ（図示せず）、体幹部ユニット２０１の姿勢や傾斜を計測する姿勢センサ４５１、左右の足底の離床又は着床を検出する接地確認センサ４５２および４５３、足部２３１の足底に設けられる荷重センサ、並びに、バッテリ等の電源を管理する電源制御装置４５４等の各種の装置が、バスI/F４０１経由で接続されている。ここで、姿勢センサ４５１は、例えば、加速度センサとジャイロ・センサの組み合わせによって構成され、接地確認センサ４５２および４５３は、近接センサ又はマイクロ・スイッチ等で構成される。もちろん、それら以外のセンサにより構成されるようにしてもよい。 The motion control module 400 includes an actuator 450 that realizes joint degrees of freedom distributed throughout the body of the robot apparatus shown in FIG. 16, a distance measurement sensor (not shown) that measures the distance to the object, and a trunk A posture sensor 451 for measuring the posture and inclination of the unit 201, grounding confirmation sensors 452 and 453 for detecting left or right foot sole leaving or landing, a load sensor provided on the foot sole of the foot portion 231, and a power source such as a battery Various devices such as a power supply control device 454 that manages the above are connected via the bus I / F 401. Here, the posture sensor 451 is configured by, for example, a combination of an acceleration sensor and a gyro sensor, and the grounding confirmation sensors 452 and 453 are configured by proximity sensors, micro switches, or the like. Of course, other sensors may be used.

思考制御モジュール３００と運動制御モジュール４００は、共通のプラットフォーム上で構築され、両者間はバスI/F３０１およびバスI/F４０１を介して相互接続されている。 The thought control module 300 and the motion control module 400 are constructed on a common platform, and are interconnected via a bus I / F 301 and a bus I / F 401.

運動制御モジュール４００は、思考制御モジュール３００から指示された行動を体現すべく、各アクチュエータ４５０による全身協調運動を制御する。すなわち、CPU４１１は、思考制御モジュール３００から指示された行動に応じた動作パターンを外部記憶装置４１４から取り出し、又は、内部的に動作パターンを生成する。そして、CPU４１１は、指定された動作パターンに従って、足部運動、ZMP軌道、体幹運動、上肢運動、または、腰部水平位置及び高さ等を設定するとともに、これらの設定内容に従った動作を指示する指令値を各アクチュエータ４５０に転送する。 The motion control module 400 controls the whole body coordinated motion by each actuator 450 in order to embody the action instructed from the thought control module 300. That is, the CPU 411 extracts an operation pattern corresponding to the action instructed from the thought control module 300 from the external storage device 414, or generates an operation pattern internally. Then, the CPU 411 sets a foot movement, a ZMP trajectory, a trunk movement, an upper limb movement, a waist horizontal position and a height, etc. according to the designated movement pattern, and instructs the movement according to these setting contents. The command value to be transferred is transferred to each actuator 450.

また、CPU４１１は、姿勢センサ４５１の出力信号によりロボット装置の体幹部ユニット２０１の姿勢や傾きを検出するとともに、各接地確認センサ４５２および４５３の出力信号により各脚部ユニット１０５Ｒおよび１０５Ｌが遊脚又は立脚のいずれの状態であるかを検出することによって、ロボット装置の全身協調運動を適応的に制御することができる。更に、CPU４１１は、ZMP位置が常にZMP安定領域の中心に向かうように、ロボット装置の姿勢や動作を制御する。 In addition, the CPU 411 detects the posture and inclination of the trunk unit 201 of the robot apparatus from the output signal of the posture sensor 451, and the leg units 105R and 105L are set to the free leg or from the output signals of the grounding confirmation sensors 452 and 453, respectively. By detecting which state is the stance, it is possible to adaptively control the whole body cooperative movement of the robot apparatus. Further, the CPU 411 controls the posture and operation of the robot apparatus so that the ZMP position always moves toward the center of the ZMP stable region.

また、運動制御モジュール４００は、思考制御モジュール３００において決定された意思通りの行動がどの程度発現されたか、すなわち処理の状況を、思考制御モジュール３００に返すようになされている。このようにしてロボット装置は、制御プログラムに基づいて自己及び周囲の状況を判断し、自律的に行動することができる。 In addition, the motion control module 400 is configured to return to the thought control module 300 the degree to which the intended behavior determined by the thought control module 300 has been expressed, that is, the processing status. In this way, the robot apparatus can autonomously act by judging itself and the surrounding situation based on the control program.

次に、以上のような情報処理システム１０のユーザ、すなわち、制御装置１１やマスタ１２を操作するユーザ（何名でもよい）に対して、表示部５３に表示されるGUI画面について説明する。 Next, a GUI screen displayed on the display unit 53 for the user of the information processing system 10 as described above, that is, a user who operates the control device 11 or the master 12 (any number of users) will be described.

図１８は、モードコントロールコマンダ（Mode Control Commander）画面の例を説明する図である。 FIG. 18 is a diagram illustrating an example of a mode control commander (Mode Control Commander) screen.

モードコントロールコマンダ画面５１０は、ユーザに、マスタ１２の各関節のサーボゲインの設定値を入力させるためのGUI画面である。図１８に示されるように、モードコントロールコマンダ画面５１０は、３列の構成となっており、左の列に各関節の名前の一覧が示され、中央の列に各関節の動作モードを、制御コマンドに従って動作するモード、ユーザ等の外部操作に従って動作するモード等、複数のモードの中から選択するモード選択欄が表示され、右の列に各関節のサーボゲインの値を設定する設定欄が表示されている。 The mode control commander screen 510 is a GUI screen for allowing the user to input a setting value of the servo gain of each joint of the master 12. As shown in FIG. 18, the mode control commander screen 510 is configured in three columns, a list of names of each joint is shown in the left column, and the operation mode of each joint is controlled in the center column. A mode selection column for selecting from multiple modes, such as a mode that operates according to commands and a mode that operates according to external operations by the user, etc. is displayed, and a setting column for setting the servo gain value of each joint is displayed in the right column Has been.

ユーザが、制御装置１１の表示部５３に表示された、このモードコントロールコマンダ画面５１０に対する入力操作を入力部５２に対して行うことにより、その設定指示が主制御部５１を介してサーボゲイン設定部６１に供給される。つまり、ユーザは、このモードコントロールコマンダ画面５１０に対して入力操作を行うことにより、容易に、マスタ１２の各関節のゲイン値を設定することができる。 When the user performs an input operation on the mode control commander screen 510 displayed on the display unit 53 of the control device 11 to the input unit 52, the setting instruction is given via the main control unit 51 to the servo gain setting unit. 61. That is, the user can easily set the gain value of each joint of the master 12 by performing an input operation on the mode control commander screen 510.

図１９は、センサモータビューア（Sensor Moter Viewer）画面の例を説明する図である。 FIG. 19 is a diagram illustrating an example of a sensor motor viewer screen.

センサモータビューア画面５２０は、制御対象であるスレーブ１３より供給される各種のセンサ情報を画像情報として表示するGUI画面である。つまり、センサモータビューア画面５２０は、図７のデータ統合部６５や図８のデータ統合部９２がデータ表示制御部２５に供給した各種情報を、ユーザが視覚的に確認するためのビューアである。つまり、ユーザは、このセンサモータビューア画面５２０に表示される情報を参照することにより、容易に、スレーブ１３の状態をより正確に把握することができる。 The sensor motor viewer screen 520 is a GUI screen that displays various types of sensor information supplied from the slave 13 to be controlled as image information. That is, the sensor motor viewer screen 520 is a viewer for the user to visually confirm various information supplied to the data display control unit 25 by the data integration unit 65 of FIG. 7 or the data integration unit 92 of FIG. That is, the user can easily grasp the state of the slave 13 more accurately by referring to the information displayed on the sensor motor viewer screen 520.

図２０は、センサモータレコーダ（Sensor Motor Recorder）画面の例を説明する図である。 FIG. 20 is a diagram for explaining an example of a sensor motor recorder screen.

センサモータレコーダ画面５３０は、データ記録部２２による時系列データの記録や再生に関する指示を、ユーザに入力させるためのGUI画面である。センサモータレコーダ画面５３０の画面左部分には、記録された時系列データのファイル一覧が表示され、センサモータレコーダ画面５３０の画面右部分には、GUIボタンやチェックボックス等、各種操作指示を受け付ける機能が設けられている。ユーザは、表示部５３に表示されたこのセンサモータレコーダ画面５３０に対する操作を、入力部５２より入力することにより、容易に、データ記録部２２に時系列データを記録させたり、再生部６６にデータ記録部２２に記録されている時系列データを再生させたりすることができる。 The sensor motor recorder screen 530 is a GUI screen for allowing the user to input instructions regarding recording and reproduction of time-series data by the data recording unit 22. A list of recorded time-series data files is displayed on the left side of the sensor motor recorder screen 530, and a function for receiving various operation instructions such as GUI buttons and check boxes on the right side of the sensor motor recorder screen 530. Is provided. The user can easily record time-series data in the data recording unit 22 or input data to the reproduction unit 66 by inputting an operation to the sensor motor recorder screen 530 displayed on the display unit 53 from the input unit 52. The time series data recorded in the recording unit 22 can be reproduced.

以上のように、表示部５３が、各種GUI画面をユーザに対して表示し、入力部５２が、そのGUI画面に対するユーザ指示を受け付けるので、ユーザは、容易に指示を入力したり状況を把握したりすることができる。 As described above, the display unit 53 displays various GUI screens to the user, and the input unit 52 accepts user instructions for the GUI screens, so that the user can easily input instructions and grasp the situation. Can be.

次に、以上のような情報処理システム１０において実行される各種処理の流れについて説明する。 Next, the flow of various processes executed in the information processing system 10 as described above will be described.

最初に、図２１のフローチャートを参照して、図７の制御装置１１（制御装置１１Ａ）による、サーボゲイン設定処理の流れの例を説明する。 First, an example of the flow of servo gain setting processing by the control device 11 (control device 11A) of FIG. 7 will be described with reference to the flowchart of FIG.

サーボゲイン設定処理が開始されると、ステップＳ１において、表示部５２は、主制御部５１の制御に基づいて、モードコントロールコマンダ画面５１０（図１８）を表示する。ステップＳ２において、主制御部５１は、入力部５２より供給される、モードコントロールコマンダ画面５１０に対するユーザ指示に基づいて、サーボゲイン設定処理を終了するか否かを判定し、ユーザより終了指示を受けておらず、終了しないと判定した場合、処理をステップＳ３に進める。 When the servo gain setting process is started, the display unit 52 displays a mode control commander screen 510 (FIG. 18) based on the control of the main control unit 51 in step S1. In step S2, the main control unit 51 determines whether to end the servo gain setting process based on a user instruction supplied from the input unit 52 to the mode control commander screen 510, and receives an end instruction from the user. If not, the process proceeds to step S3.

ステップＳ３において、主制御部５１は、入力部５２より供給される、モードコントロールコマンダ画面５１０に対するユーザ指示に基づいて、サーボゲインの設定が指示されたか否かを判定し、ユーザより設定を指示されたと判定した場合、処理をステップＳ４に進める。ステップＳ４において、サーボゲイン設定部６１は、入力されたサーボゲイン設定コマンド３１を、送信部７１を介してマスタ１２に送信する。 In step S3, the main control unit 51 determines whether or not the setting of the servo gain has been instructed based on the user instruction to the mode control commander screen 510 supplied from the input unit 52, and the setting is instructed by the user. When it determines with having carried out, it advances a process to step S4. In step S 4, the servo gain setting unit 61 transmits the input servo gain setting command 31 to the master 12 via the transmission unit 71.

ステップＳ４の処理を終了すると、サーボゲイン設定部６１は、処理をステップＳ２に戻し、それ以降の処理を繰り返す。また、ステップＳ３において、サーボゲインの設定が指示されていないと判定した場合、主制御部５１は、ステップＳ４の処理を省略し、処理をステップＳ２に戻し、それ以降の処理を繰り返す。 When the process of step S4 is completed, the servo gain setting unit 61 returns the process to step S2, and repeats the subsequent processes. If it is determined in step S3 that the setting of the servo gain is not instructed, the main control unit 51 omits the process in step S4, returns the process to step S2, and repeats the subsequent processes.

また、ステップＳ２において、ユーザより終了指示を受け、終了すると判定した場合、主制御部５１は、サーボゲイン設定処理を終了する。 In step S2, when receiving an end instruction from the user and determining to end, the main control unit 51 ends the servo gain setting process.

次に、図２２のフローチャートを参照して、図７の制御装置１１（制御装置１１Ａ）による、時系列データの、記録処理の流れの例を説明する。 Next, an example of the flow of recording processing of time series data by the control device 11 (control device 11A) of FIG. 7 will be described with reference to the flowchart of FIG.

表示部５３に表示されたセンサモータレコーダ画面５３０に対する操作入力が、入力部５２に入力され、記録処理が開始されると、主制御部５１は、ステップＳ２１において、記録処理を終了するか否かを判定する。ユーザが、センサモータレコーダ画面５３０において記録処理の終了を指示しておらず、記録処理を終了しないと判定した場合、主制御部５１は、処理をステップＳ２２に進める。 When an operation input for the sensor motor recorder screen 530 displayed on the display unit 53 is input to the input unit 52 and the recording process is started, the main control unit 51 determines whether or not to end the recording process in step S21. Determine. If the user has not instructed to end the recording process on the sensor motor recorder screen 530 and determines not to end the recording process, the main control unit 51 advances the process to step S22.

ステップＳ２２において、無線通信部６２の受信部７２は、マスタ１２の関節角センサデータ３２を受け付け、ステップＳ２３において、その関節角センサデータ３２を取得したか否かを判定し、取得していないと判定した場合、処理をステップＳ２１に戻し、それ以降の処理を繰り返す。 In step S22, the receiving unit 72 of the wireless communication unit 62 receives the joint angle sensor data 32 of the master 12, and determines whether or not the joint angle sensor data 32 has been acquired in step S23. When it determines, a process is returned to step S21 and the process after it is repeated.

また、ステップＳ２３において、関節角センサデータ３２を取得したと判定した場合、受信部７２は、処理をステップＳ２４に進める。ステップＳ２４において、センサデータ・コマンド変換部６３は、受信部７２より関節角センサデータ３２を取得すると、その関節角センサデータ３２を関節角コマンド３３に変換する。ステップＳ２５において、無線通信部６４の送信部７３は、センサデータ・コマンド変換部６３より供給された関節角コマンド３３をスレーブ１３に供給する。 If it is determined in step S23 that the joint angle sensor data 32 has been acquired, the receiving unit 72 advances the process to step S24. In step S 24, when the sensor data / command converter 63 acquires the joint angle sensor data 32 from the receiver 72, the sensor data / command converter 63 converts the joint angle sensor data 32 into a joint angle command 33. In step S 25, the transmission unit 73 of the wireless communication unit 64 supplies the joint angle command 33 supplied from the sensor data / command conversion unit 63 to the slave 13.

ステップＳ２６において、無線通信部６４の受信部７４は、スレーブ１３の関節角センサデータ３４およびボール座標データ３７を受け付け、ステップＳ２７において、その関節角センサデータ３４およびボール座標データ３７を取得したか否かを判定し、取得していないと判定した場合、処理をステップＳ２６に戻し、取得したと判定されるまで繰り返す。ステップＳ２７において、関節角センサデータ３４およびボール座標データ３７を取得したと判定した場合、受信部７４は、処理をステップＳ２８に進める。 In step S26, the receiving unit 74 of the wireless communication unit 64 receives the joint angle sensor data 34 and the ball coordinate data 37 of the slave 13, and whether or not the joint angle sensor data 34 and the ball coordinate data 37 have been acquired in step S27. If it is determined that it has not been acquired, the process returns to step S26 and is repeated until it is determined that it has been acquired. If it is determined in step S27 that the joint angle sensor data 34 and the ball coordinate data 37 have been acquired, the receiving unit 74 advances the process to step S28.

ステップＳ２８において、データ統合部６５は、受信部７４より供給された、スレーブ１３の関節角センサデータ３４およびボール座標データ３７と、センサデータ・コマンド変換部６３より供給された関節角コマンド３３とを統合する。 In step S28, the data integration unit 65 receives the joint angle sensor data 34 and ball coordinate data 37 of the slave 13 supplied from the receiving unit 74, and the joint angle command 33 supplied from the sensor data / command conversion unit 63. Integrate.

データ記録部２２は、ステップＳ２９において、その統合された統合情報（ベクトルデータ）を時系列データの１つとして記録し、処理をステップＳ２１に戻し、それ以降の処理を繰り返させる。 In step S29, the data recording unit 22 records the integrated integrated information (vector data) as one of the time series data, returns the processing to step S21, and repeats the subsequent processing.

ステップＳ２１において、ユーザがセンサモータレコーダ画面５３０において記録処理の終了を指示しており、記録処理を終了すると判定した場合、主制御部５１は、記録処理を終了する。 In step S21, when the user has instructed to end the recording process on the sensor motor recorder screen 530 and determines that the recording process is to be ended, the main control unit 51 ends the recording process.

次に、図２３のフローチャートを参照して、図７の制御装置１１（制御装置１１Ａ）の再生部６６による、再生処理の流れの例を説明する。 Next, an example of the flow of reproduction processing by the reproduction unit 66 of the control device 11 (control device 11A) of FIG. 7 will be described with reference to the flowchart of FIG.

表示部５３に表示されたセンサモータレコーダ画面５３０に対する操作入力が、入力部５２に入力され、再生処理が開始されると、再生部６６は、ステップＳ４１において、指定されたファイルに含まれる関節角コマンド３３をデータ記録部２２より読み出す。 When the operation input for the sensor motor recorder screen 530 displayed on the display unit 53 is input to the input unit 52 and the reproduction process is started, the reproduction unit 66 determines that the joint angle included in the specified file in step S41. The command 33 is read from the data recording unit 22.

データ記録部２２には、ユーザがマスタ１２を操作して入力した情報より生成された関節角コマンド３３を含む時系列データが、その記録単位毎にファイル化されて管理されている。つまり、上述した記録処理１回分の時系列データが１つのファイルとして管理されている。再生処理においては、このようなファイル内の時系列データを時系列に沿って順次再生する。もちろん複数のファイルの時系列データを連続して再生するようにしてもよいし、ファイルに含まれる時系列データの一部のみを再生するようにしてもよい。 In the data recording unit 22, time series data including the joint angle command 33 generated from information input by operating the master 12 by the user is filed and managed for each recording unit. That is, the time series data for one recording process described above is managed as one file. In the reproduction process, the time series data in such a file is sequentially reproduced along the time series. Of course, the time series data of a plurality of files may be reproduced continuously, or only part of the time series data included in the file may be reproduced.

ステップＳ４２において、再生部６６は、その読み出した関節角コマンド３３を、送信部７３を介してスレーブ１３に供給する。関節角コマンド３３を供給されたスレーブ１３は、その関節角コマンド３３に基づいた動作を行う。再生部６６は、ステップＳ４３において、指定されたファイルの全ての関節角コマンドを供給したか否かを判定し、未処理の関節角コマンドが存在すると判定した場合、処理をステップＳ４１に戻し、それ以降の処理を繰り返す。また、ステップＳ４３において、ファイル内の全ての関節角コマンドを供給したと判定した場合、再生部６６は、再生処理を終了する。 In step S 42, the reproduction unit 66 supplies the read joint angle command 33 to the slave 13 via the transmission unit 73. The slave 13 supplied with the joint angle command 33 performs an operation based on the joint angle command 33. In step S43, the playback unit 66 determines whether or not all joint angle commands of the specified file have been supplied. If it is determined that there are unprocessed joint angle commands, the playback unit 66 returns the process to step S41, The subsequent processing is repeated. If it is determined in step S43 that all joint angle commands in the file have been supplied, the playback unit 66 ends the playback process.

次に、図２４のフローチャートを参照して、学習処理の流れの例を説明する。 Next, an example of the flow of learning processing will be described with reference to the flowchart of FIG.

学習処理が開始されると、データ学習部２３のダイナミクス記憶ネットワーク取得部１２１は、ステップＳ６１において、ダイナミクス記憶ネットワークを時系列予測器９３より取得し、ダイナミクス記憶ネットワーク保持部１２２に保持させる。 When the learning process is started, the dynamics storage network acquisition unit 121 of the data learning unit 23 acquires the dynamics storage network from the time series predictor 93 and stores the dynamics storage network in the dynamics storage network holding unit 122 in step S61.

ステップＳ６２において、学習部１２４は、ダイナミクス記憶ネットワーク保持部１２２に保持されているダイナミクス記憶ネットワーク１３１を取得し、その全てのパラメータの初期化を行う。具体的には、ダイナミクス記憶ネットワーク１３１の各ノードの内部状態量を持つ力学系近似モデルのパラメータに、適当な値が初期値として付与される。 In step S62, the learning unit 124 acquires the dynamics storage network 131 held in the dynamics storage network holding unit 122, and initializes all the parameters thereof. Specifically, an appropriate value is assigned as an initial value to the parameter of the dynamic system approximation model having the internal state quantity of each node of the dynamics storage network 131.

ステップＳ６３において、学習部１２４は、学習処理を終了するか否かを判定し、時系列データもまだ供給され続けており、終了しないと判定した場合、処理をステップＳ６４に進める。 In step S63, the learning unit 124 determines whether or not to end the learning process, and if the time-series data is still supplied and is determined not to end, the process proceeds to step S64.

ステップＳ６４において、時系列データ取得部１２３は、関節角コマンド３３、関節角線─データ３４、およびボール座標データ３７等をまとめたベクトルデータよりなる時系列データの、新たな時刻のデータを取得する。学習部１２４は、ステップＳ６５において、その時系列データに対して、ダイナミクス記憶ネットワークに含まれる各ノードに対応する内部状態量を持つ力学系近似モデルとのスコア計算を、内部状態量を更新しながら行う。 In step S64, the time-series data acquisition unit 123 acquires new time data of time-series data including vector data obtained by combining the joint angle command 33, the joint angle line-data 34, the ball coordinate data 37, and the like. . In step S65, the learning unit 124 performs score calculation for the time series data with a dynamical approximate model having an internal state quantity corresponding to each node included in the dynamics storage network while updating the internal state quantity. .

内部状態量を持つ力学系近似モデルがリカレントニューラルネットワークで与えられる場合には、出力誤差がスコアとして利用される。出力誤差の計算方法には、一般的に平均二乗誤差が用いられる。スコア計算の結果、入力データに対して、全てのノードにスコアが付与されることになる。 When a dynamic system approximation model having an internal state quantity is given by a recurrent neural network, an output error is used as a score. As a method for calculating the output error, a mean square error is generally used. As a result of the score calculation, scores are assigned to all nodes for the input data.

学習部１２４は、ステップＳ６６において、ダイナミクス記憶部ネットワークを構成するノードそれぞれのスコアを比較することによって、最もスコアの良いノード、すなわち勝者ノードを決定する。さらに、学習部１２４は、ステップＳ６７において、勝者ノードを中心として各ノードの学習の重みを決定し、ステップＳ６８において、各ノードの内部状態量を持つ力学系近似モデルのパラメータの更新を、学習の重みに応じて行う。 In step S66, the learning unit 124 determines the node having the best score, that is, the winner node, by comparing the scores of the nodes constituting the dynamics storage unit network. Further, in step S67, the learning unit 124 determines the learning weight of each node centering on the winner node, and in step S68, updates the parameters of the dynamical approximate model having the internal state quantity of each node. Depending on the weight.

ここで、勝者ノードのパラメータだけを更新する方法はWTA(winner-take-all)に対応し、勝者ノードの近傍のノードに対してもパラメータの更新を行う方法がSMA(soft-max adaptation)に対応する。学習部１２４は、SMAで、パラメータの更新を行う。 Here, the method of updating only the parameters of the winner node corresponds to WTA (winner-take-all), and the method of updating parameters to the nodes in the vicinity of the winner node is SMA (soft-max adaptation). Correspond. The learning unit 124 updates parameters using SMA.

図２５は、ノードのパラメータをSMAで更新するときに用いられる学習の重みを示している。 FIG. 25 shows learning weights used when updating the parameter of a node with SMA.

図２５の左において、ノード５４１乃至ノード５４６は、ダイナミクス記憶ネットワークを構成するノードである。ノード５４１乃至ノード５４６のうちのノード５４１は、勝者ノードであり、ノード５４２乃至ノード５４６は、勝者ノード５４１からの距離が近い順に並べられている。 On the left side of FIG. 25, the nodes 541 to 546 are nodes constituting the dynamics storage network. Of the nodes 541 to 546, the node 541 is a winner node, and the nodes 542 to 546 are arranged in order of increasing distance from the winner node 541.

図２５の右のグラフは、学習の重みと勝者ノードからの距離の関係を示しており、横軸は学習の重みを、縦軸は勝者ノードからの距離を、それぞれ示している。 The right graph of FIG. 25 shows the relationship between the learning weight and the distance from the winner node. The horizontal axis shows the learning weight, and the vertical axis shows the distance from the winner node.

図２５の右のグラフによれば、勝者ノード５４１に対しては、学習の重みを最も大きくし、他のノード５４２乃至ノード５４６それぞれに対しては、勝者ノード５４１からの距離が離れるにしたがって、学習の重みが小さくなるように学習の重みが決定される。 According to the graph on the right side of FIG. 25, for the winner node 541, the learning weight is maximized, and for each of the other nodes 542 to 546, as the distance from the winner node 541 increases, The learning weight is determined so that the learning weight becomes small.

勝者ノードからの距離は、ダイナミクス記憶ネットワークのリンクによって与えられる空間上のノードの配置構造に基づいて決定される。例えば、図１２の２次元上にノード１４１乃至ノード１５９が配置されたダイナミクス記憶ネットワーク１５０において、勝者ノードが、例えばノード１５６であれば、その勝者ノード１５６に隣接するノード１５３、ノード１５５、およびノード１５９が最も近く、ノード１５２、ノード１５４、およびノード１５８がその次に近く、ノード１５１とノード１５７が最も遠いものとなる。この場合、ノードとノードをつなぐ最小のリンク数を距離として利用すると、近い順に距離は１、２、３として与えられることになる。 The distance from the winner node is determined based on the arrangement of nodes in the space given by the link of the dynamics storage network. For example, in the dynamics storage network 150 in which the nodes 141 to 159 are arranged on the two dimensions in FIG. 12, if the winner node is, for example, the node 156, the node 153, the node 155, and the node adjacent to the winner node 156 159 is the closest, node 152, node 154, and node 158 are the next closest, and nodes 151 and 157 are the farthest. In this case, when the minimum number of links connecting the nodes is used as the distance, the distances are given as 1, 2, and 3 in order of increasing distance.

図１１のようにリンクを与えない場合には、入力データ（ノードのスコアの計算に用いられる時系列データ）に基づき各ノードにおいて計算されたスコアの良い順にノードを並べ、その順位が勝者ノードからの距離として利用される。つまり、勝者ノードから順に、０、１、２、３、・・・が距離として与えられる。このような勝者ノードからの距離の与え方は、ベクトル・パターンのカテゴリー学習に用いられる自己組織化マップ（SOM（self-organization map），例えば、『T.コホネン、「自己組織化マップ」、シュプリンガー・フェアラーク東京』参照）やNeural-Gas algorithmで利用されている方法と同じである。この勝者ノードからの距離と学習の重みの関係を示したのが次式である。 When no link is given as shown in FIG. 11, the nodes are arranged in the order of good score calculated at each node based on the input data (time series data used for calculating the score of the node). Used as a distance. That is, 0, 1, 2, 3,... Are given as distances in order from the winner node. How to give such distance from the winner node is the self-organization map (SOM (self-organization map) used for vector pattern category learning, eg, “T. Kohonen,“ Self-Organization Map ”, Springer・ This is the same method used in Fairlark Tokyo) and Neural-Gas algorithm. The following equation shows the relationship between the distance from the winner node and the learning weight.

α＝Ｇ×γd/△ .....（１） α = G × γd / △ (1)

式（１）において、αは学習の重み、Ｇは（学習の重みαのうちの）勝者ノードに与える学習の重み、γは減衰係数で０＜γ＜１の範囲の定数、ｄは勝者ノードからの距離、ΔはSMAにおける近傍に対する学習の重みを調整するための変数を、それぞれ示している。 In Expression (1), α is a learning weight, G is a learning weight given to the winner node (of the learning weight α), γ is an attenuation coefficient, and a constant in the range of 0 <γ <1, d is a winner node , And Δ are variables for adjusting the learning weight for the neighborhood in SMA.

今、距離ｄに関しては、勝者ノードからの距離が近い順に１、２、３で与えられるとし、勝者ノードに対してはｄ＝０が与えられたとする。この時、例えば、Ｇ＝８、γ＝０．５、Δ＝１とすれば、学習の重みαは、勝者ノードからの距離ｄが離れるにしたがって、８、４、２、１と求まることになる。ここで、変数Δを少しずつ０に近づけていくと、学習の重みαは勝者ノードから離れるにしたがってより小さい値となる。そして、変数Δが０に近くなると、勝者ノード以外のノードの学習重みはほとんど０となり、これはWTAと同様となる。このように、変数Δを調整することで、SMAにおける勝者ノードの近傍に対する学習の重みαを調整することが可能となる。基本的には、変数Δは学習の開始時は大きくし、時間の経過と伴に小さくなるように調整が行われる。 Now, regarding the distance d, it is assumed that the distance from the winner node is given by 1, 2, and 3 in order of decreasing distance, and d = 0 is given to the winner node. At this time, for example, if G = 8, γ = 0.5, and Δ = 1, the learning weight α is obtained as 8, 4, 2, 1 as the distance d from the winner node increases. Become. Here, when the variable Δ is gradually approached to 0, the learning weight α becomes smaller as the distance from the winner node increases. When the variable Δ is close to 0, the learning weight of nodes other than the winner node is almost 0, which is the same as in WTA. In this way, by adjusting the variable Δ, it is possible to adjust the learning weight α for the neighborhood of the winner node in the SMA. Basically, the variable Δ is adjusted so as to increase at the start of learning and decrease with time.

このような学習の重みαに基づき、勝者ノードのパラメータは入力データの影響を最も強く受け、勝者ノードから離れるにしたがって、その影響が小さくなるように、他のノード（勝者ノード以外のノード）のパラメータの更新が行われる。 Based on such learning weight α, the parameter of the winner node is most strongly influenced by the input data, and the influence of other nodes (nodes other than the winner node) is reduced so that the influence decreases as the distance from the winner node increases. The parameter is updated.

図２６は、ノードのパラメータの更新の方法を説明する図である。 FIG. 26 is a diagram for explaining a method for updating a parameter of a node.

いま、あるノードのパラメータ更新前の内部状態量を持つ力学系近似モデル１６１のパラメータの学習に使われた学習データ（教示データ）である時系列データが学習データ記憶部１６２に格納されているとする。 Now, when the learning data storage unit 162 stores time-series data that is learning data (teaching data) used for learning parameters of the dynamical approximate model 161 having an internal state quantity before updating parameters of a certain node. To do.

この更新前の学習データを旧学習データと呼ぶものとする。 The learning data before update is referred to as old learning data.

ノードのパラメータの更新は、例えば、そのノードに対して決定された学習の重みαに応じて、入力データ５５１を、旧学習データ５５２に追加し、その結果得られる新学習データを用いて行われる。即ち、学習の重みαに応じて、入力データ５５１と旧学習データ５５２を足し合わせる（混合する）ことで、新学習データが構成され、この新学習データが学習データ記憶部１６２に記憶される。そして、その新学習データによって、内部状態量を持つ力学系近似モデル１６１のパラメータが更新される。 The node parameter is updated using, for example, new learning data obtained by adding the input data 551 to the old learning data 552 in accordance with the learning weight α determined for the node. . That is, by adding (mixing) the input data 551 and the old learning data 552 according to the learning weight α, new learning data is configured, and this new learning data is stored in the learning data storage unit 162. Then, the parameters of the dynamical approximate model 161 having the internal state quantity are updated with the new learning data.

パラメータの更新には、例えば、Back-Propagation Through Time 法が適用される。その場合、具体的には、更新前の内部状態量を持つ力学系近似モデル１６１のパラメータを初期値とし、新学習データに基づくパラメータの推定がBack-Propagation Through Time 法によって行われる。 For example, the Back-Propagation Through Time method is applied to update the parameters. In that case, specifically, the parameter of the dynamical approximate model 161 having the internal state quantity before update is set as an initial value, and the parameter estimation based on the new learning data is performed by the Back-Propagation Through Time method.

ここで、新学習データを構成する際の、入力データ５５１と旧学習データ５５２とを足し合わせる比率に関して説明する。 Here, the ratio of adding the input data 551 and the old learning data 552 when configuring the new learning data will be described.

仮に、入力データ５５１と旧学習データ５５２との比率を１：０にすると、新学習データは完全に入力データ５５１だけで構成されることになる。 If the ratio between the input data 551 and the old learning data 552 is 1: 0, the new learning data is completely composed of only the input data 551.

一方、入力データ５５１と旧学習データ５５２との比率を０：１にすると、新学習データには入力データ５５１は追加されず、旧学習データ５５２だけで構成されることになる。つまり、入力データ５５１と旧学習データ５５２との比率を変えることで、パラメータに与える入力データ５５１の影響の強さを変えることができる。 On the other hand, when the ratio of the input data 551 and the old learning data 552 is set to 0: 1, the input data 551 is not added to the new learning data, and only the old learning data 552 is configured. That is, by changing the ratio between the input data 551 and the old learning data 552, the strength of the influence of the input data 551 on the parameters can be changed.

入力データ５５１と旧学習データ５５２との比率を、前に述べた学習の重みαに基づいて適切に調整することによって、入力データの影響を適切にパラメータに与える学習を行うことができる。その調整方法の１つのやり方について説明する。 By appropriately adjusting the ratio of the input data 551 and the old learning data 552 based on the previously described learning weight α, it is possible to perform learning that appropriately affects the influence of the input data on the parameters. One method of the adjustment method will be described.

まず、ノードが学習データ記憶部１６２に保持できる時系列データの個数を一定とし、その値をＨとする。つまり、Ｈ個の時系列データで内部状態量を持つ力学系近似モデル１６１のパラメータが学習されるものとする。そして、入力データ５５１と旧学習データ５５２との比率を、ノードの学習の重みαに応じて、α：Ｈ−αとなるように調整する。例えば、Ｈ＝１００とすれば、α＝８の場合、入力データ５５１と旧学習データ５５２との比率は、８：９２となるように調整が行われることになる。そして、このような比率で、入力データ５５１と旧学習データ５５２とを足し合わせることで、Ｈ個の新学習データが構成される。 First, let the number of time-series data that a node can hold in the learning data storage unit 162 be constant, and let that value be H. That is, it is assumed that the parameters of the dynamical approximate model 161 having an internal state quantity are learned from H time-series data. Then, the ratio between the input data 551 and the old learning data 552 is adjusted to be α: H−α according to the learning weight α of the node. For example, if H = 100, when α = 8, the ratio between the input data 551 and the old learning data 552 is adjusted to be 8:92. Then, by adding the input data 551 and the old learning data 552 at such a ratio, H new learning data are configured.

α：Ｈ−αの比率で、入力データ５５１と旧学習データ５５２とを足し合わせる方法としては、例えば、以下のような方法を採用することができる。 As a method of adding the input data 551 and the old learning data 552 at a ratio of α: H−α, for example, the following method can be employed.

即ち、まず、入力データ５５１については、時系列データが１つ与えられるだけなので、これをα倍したデータを追加する。例えば、α＝８の場合、入力データ５５１としての同一の時系列データを８個追加する。 That is, first, as the input data 551, only one time-series data is given, so data obtained by multiplying this by α is added. For example, when α = 8, 8 pieces of the same time series data as the input data 551 are added.

一方、旧学習データ５５２については、その個数はＨであり、これをＨ−αに調整する必要がある。例えば、上述したように、α＝８の場合、旧学習データ５５２を、１００から９２に減らす必要がある。そこで、学習データ記憶部１６２に記憶された旧学習データ５５２としての１００の時系列データの順番に応じて、最も古いものからα個だけ除去することで、旧学習データ５５２の個数をＨ−α個に調整する。 On the other hand, the number of old learning data 552 is H, and it is necessary to adjust this to H-α. For example, as described above, when α = 8, it is necessary to reduce the old learning data 552 from 100 to 92. Therefore, according to the order of 100 time-series data as the old learning data 552 stored in the learning data storage unit 162, by removing only α from the oldest data, the number of old learning data 552 is reduced to H-α. Adjust to pieces.

以上のようにして個数を調整した入力データ５５１と旧学習データ５５２とを足し合わせて新学習データとすることにより、学習データ記憶部１６２には、常に最新のＨ個の時系列データだけが学習データとして保持される。このように、学習データ（新学習データ）に占める入力データ５５１の割合を学習の重みαによって調整することができる。 By adding the input data 551 whose number has been adjusted as described above and the old learning data 552 to obtain new learning data, only the latest H time-series data are always learned in the learning data storage unit 162. Retained as data. Thus, the ratio of the input data 551 to the learning data (new learning data) can be adjusted by the learning weight α.

なお、ここで説明した方法以外にも、学習の重みαに応じて入力データ５５１をパラメータに反映させる方法であればどのような方法を用いても良い。重要なのは、新しいデータ（入力データ５５１）が与えられるたびにパラメータを少しずつ修正することと、その際に、学習の重みαに応じて入力データ５５１の学習に与える影響の強さを調整することである。 In addition to the method described here, any method may be used as long as the input data 551 is reflected in the parameters according to the learning weight α. What is important is that each time new data (input data 551) is given, the parameters are modified little by little, and at that time, the strength of the influence on the learning of the input data 551 is adjusted according to the learning weight α. It is.

また、学習を適切に行うには、学習の重みαを時間の経過とともに適切に調整することが非常に重要であり、本実施の形態では、変数Δによって、学習の重みαを調整する方法を述べたが、基本的には、入力データ５５１の影響を受けるノードが、勝者ノードを中心とする広い範囲のノードから徐々に狭い範囲のノードへになるように、学習の重みαを調整していくことが重要であり、それを実現する方法であれば、どのような方法を用いても良い。 Also, in order to perform learning appropriately, it is very important to adjust the learning weight α appropriately with the passage of time. In this embodiment, a method for adjusting the learning weight α by the variable Δ is used. As described above, basically, the learning weight α is adjusted so that the node affected by the input data 551 gradually changes from a wide range of nodes centering on the winner node to a narrow range of nodes. It is important to use any method as long as it is a method for realizing it.

以上のような学習手法により、ダイナミクス記憶ネットワークの各ノードのパラメータは、学習部１２４に時系列データ（入力データ）が入力されるたびに、自己組織的に学習されることになる。 With the learning method described above, the parameters of each node of the dynamics storage network are learned in a self-organized manner every time time-series data (input data) is input to the learning unit 124.

図２４に戻り、ステップＳ６８の処理を終了すると、学習部１２４は、処理をステップＳ６３に戻し、それ以降の処理を繰り返す。すなわち、データ学習部２３は、時系列データの取得を終了するまで、ステップＳ６３乃至ステップＳ６８の処理を繰り返す。 Returning to FIG. 24, when the process of step S68 is completed, the learning unit 124 returns the process to step S63 and repeats the subsequent processes. That is, the data learning unit 23 repeats the processing from step S63 to step S68 until the acquisition of time series data is completed.

ステップＳ６３において、学習処理を終了すると判定した場合、学習部１２４は、更新したダイナミクス記憶ネットワーク１３１をダイナミクス記憶ネットワーク保持部１２２に戻し、処理をステップＳ６９に進める。ステップＳ６９において、ダイナミクス記憶ネットワーク供給部１２５は、ダイナミクス記憶ネットワーク保持部１２２に保持されているダイナミクス記憶ネットワーク１３１を、主制御部５１を介して、時系列予測器９３に供給し、学習処理を終了する。 If it is determined in step S63 that the learning process is to be terminated, the learning unit 124 returns the updated dynamics storage network 131 to the dynamics storage network holding unit 122, and the process proceeds to step S69. In step S69, the dynamics storage network supply unit 125 supplies the dynamics storage network 131 held in the dynamics storage network holding unit 122 to the time series predictor 93 via the main control unit 51, and ends the learning process. To do.

次に、図２７のフローチャートを参照して、図８の制御装置１１（制御装置１１Ｂ）による、スレーブ１３の制御処理の流れの例を説明する。 Next, an example of the flow of control processing of the slave 13 by the control device 11 (control device 11B) of FIG. 8 will be described with reference to the flowchart of FIG.

制御処理が開始されると、主制御部５１は、ステップＳ８１において、制御処理を終了するか否かを判定し、ユーザより終了指示が入力されたりしておらず、制御処理を終了しないと判定した場合、処理をステップＳ８２に進める。 When the control process is started, the main control unit 51 determines whether or not to end the control process in step S81, and determines that the end instruction is not input from the user and the control process is not ended. If so, the process proceeds to step S82.

ステップＳ８２において、受信部１０２は、スレーブ１３の、関節角センサデータ３４およびボール座標データ３７を受け付け、ステップＳ８３において、そのスレーブ１３の関節角センサデータ３４およびボール座標データ３７を取得したか否かを判定する。取得していないと判定した場合、処理をステップＳ８１に戻し、それ以降の処理を繰り返す。 In step S82, the receiving unit 102 receives the joint angle sensor data 34 and the ball coordinate data 37 of the slave 13, and whether or not the joint angle sensor data 34 and the ball coordinate data 37 of the slave 13 are acquired in step S83. Determine. When it determines with not acquiring, a process is returned to step S81 and the process after it is repeated.

ステップＳ８２において、スレーブ１３の関節角センサデータ３４およびボール座標データ３７を取得したと判定した場合、受信部１０２は、処理をステップＳ８４に進める。ステップＳ８４において、データ統合部９２は、関節角センサデータ３４、ボール座標データ３７、および、１つ前のステップの関節角コマンド３３を統合し、ベクトルデータを生成し、そのベクトルデータを時系列データとして時系列予測器９３に供給する。ステップＳ８５において、時系列予測器９３は、現在の時系列データに基づいて、次の時刻の時系列データ（予測情報）を予測し、生成する。この予測処理の詳細については、後述する。 If it is determined in step S82 that the joint angle sensor data 34 and the ball coordinate data 37 of the slave 13 have been acquired, the receiving unit 102 advances the process to step S84. In step S84, the data integration unit 92 integrates the joint angle sensor data 34, the ball coordinate data 37, and the joint angle command 33 of the previous step, generates vector data, and uses the vector data as time series data. To the time series predictor 93. In step S85, the time series predictor 93 predicts and generates time series data (prediction information) at the next time based on the current time series data. Details of this prediction process will be described later.

ステップＳ８６において、データ統合部９２は、ステップＳ８５において生成された次の時刻の予測情報に含まれる、次の時刻の関節角コマンド３３を、送信部１０１を介してスレーブ１３に供給し、処理をステップＳ８１に戻す。 In step S86, the data integration unit 92 supplies the joint angle command 33 of the next time included in the prediction information of the next time generated in step S85 to the slave 13 via the transmission unit 101, and performs processing. Return to step S81.

すなわち、スレーブ制御部２４の各部は、主制御部５１がステップＳ８１において終了すると判定するまで、ステップＳ８２乃至ステップＳ８６の処理を繰り返す。ステップＳ８１において、主制御部５１がユーザ指示等に基づいて、制御処理を終了すると判定した場合、制御処理を終了する。 That is, each unit of the slave control unit 24 repeats the processing from step S82 to step S86 until the main control unit 51 determines that the process ends in step S81. In step S81, when the main control unit 51 determines to end the control process based on a user instruction or the like, the control process ends.

次に、図２７のステップＳ８５において実行される予測処理の詳細の流れについて、図２８のフローチャートを参照して説明する。 Next, the detailed flow of the prediction process executed in step S85 of FIG. 27 will be described with reference to the flowchart of FIG.

ステップＳ１０１において、時系列予測部９３は、予測処理を終了するか否かを判定し、終了しないと判定した場合、ステップＳ１０２に処理を進める。ステップＳ１０２において、入力部１７１が時系列データを取得し、ステップＳ１０３において、特徴抽出部１７２が時系列データより特徴を抽出する。内部状態量更新部１８１が内部状態記憶部１７５に記憶されている内部状態量を更新した後、ステップＳ１０４において、スコア計算部１８２がスコア計算を行い、ステップＳ１０５において、勝者ノード決定部１８３が勝者ノードを決定し、ステップＳ１０６において、認識結果出力部１８４が認識結果を出力する。 In step S101, the time-series prediction unit 93 determines whether or not to end the prediction process. If it is determined not to end the process, the process proceeds to step S102. In step S102, the input unit 171 acquires time series data, and in step S103, the feature extraction unit 172 extracts features from the time series data. After the internal state quantity update unit 181 updates the internal state quantity stored in the internal state storage unit 175, in step S104, the score calculation unit 182 performs score calculation, and in step S105, the winner node determination unit 183 receives the winner. The node is determined, and in step S106, the recognition result output unit 184 outputs the recognition result.

生成部１７６は、ステップＳ１０７において、その認識結果を制御信号として取得する。ステップＳ１０８において、生成ノード決定部１９１が予測情報の生成に用いるノードを決定し、ステップＳ１０９において、内部状態読み込み部１９２がダイナミクス記憶ネットワークの内部状態を読み込み、ステップＳ１１０において、時系列データ生成部１９３が予測情報を生成し、ステップＳ１１１において、生成結果出力部１９４が予測情報をデータ統合部９２に出力する。 In step S107, the generation unit 176 acquires the recognition result as a control signal. In step S108, the generation node determination unit 191 determines a node to be used for generation of prediction information. In step S109, the internal state reading unit 192 reads the internal state of the dynamics storage network. In step S110, the time series data generation unit 193. Generates prediction information, and the generation result output unit 194 outputs the prediction information to the data integration unit 92 in step S111.

ステップＳ１１１の処理を終了すると、生成結果出力部１９４は、処理をステップＳ１０１に戻し、それ以降の処理を繰り返し実行させる。すなわち、時系列予測器９３の各部は、ステップＳ１０１において、予測処理を終了すると判定されるまで、ステップＳ１０１乃至ステップＳ１１１の処理を繰り返し実行する。ステップＳ１０１において、予測処理を終了すると判定した場合、時系列予測器９３は、予測処理を終了し、処理を図２７のステップＳ８５に戻し、ステップＳ８６以降の処理を実行させる。 When the process of step S111 ends, the generation result output unit 194 returns the process to step S101, and repeatedly executes the subsequent processes. That is, each unit of the time-series predictor 93 repeatedly executes the processing from step S101 to step S111 until it is determined in step S101 that the prediction processing is to be ended. If it is determined in step S101 that the prediction process is to be ended, the time-series predictor 93 ends the prediction process, returns the process to step S85 in FIG. 27, and executes the processes after step S86.

以上のようにして、各処理が実行される。これにより、情報処理システム１０は、制御における通信や処理の遅延や環境の変化に対する耐性を向上させ、制御の負荷や破綻を抑制することができる。 Each process is performed as described above. As a result, the information processing system 10 can improve resistance to communication and processing delays in control and environmental changes, and can suppress control load and failure.

特に、スレーブ１３に対して送信される関節角コマンド３３と、そのスレーブ１３から受信された関節度センサデータ３４およびボール座標データ３７を統合化して記録することにより、これらのデータを、同期を取りながら時系列データとして収集することができる。 In particular, the joint angle command 33 transmitted to the slave 13 and the joint degree sensor data 34 and ball coordinate data 37 received from the slave 13 are recorded in an integrated manner so that these data are synchronized. However, it can be collected as time series data.

また、その収集された関節角コマンド３３をスレーブ１３において再生することによって環境とのインタラクションを行わせることができ、情報処理システムが有する制御遅延を反映した自律制御用の時系列教示データを作成することができる。つまり、制御装置１１は、制御遅延を含む関係の制御コマンドおよびセンサ情報の時系列データを教示データ（学習データ）として、時系列予測器の予測モデルの学習を行うことにより、その時系列予測器を用いてスレーブ１３のセンサ情報に対して制御遅延時間分先の時刻にスレーブ１３を制御する制御コマンドを予測して生成することができ、制御における通信や処理の遅延に対する耐性を向上させ、制御の負荷や破綻を抑制することができる。 Further, by reproducing the collected joint angle command 33 in the slave 13, interaction with the environment can be performed, and time series teaching data for autonomous control reflecting the control delay of the information processing system is created. be able to. That is, the control device 11 learns the prediction model of the time-series predictor by using the control command including the control delay and the time-series data of the sensor information as teaching data (learning data), so that the time-series predictor is changed. It is possible to predict and generate a control command for controlling the slave 13 at a time ahead of the control delay time with respect to the sensor information of the slave 13, improving resistance to communication and processing delay in the control, Loads and failures can be suppressed.

さらに、マスタ１２に、スレーブ１３と実質的に同一の幾何形状を有する装置（幾何モデルが共通のロボット）を適用することにより、幾何モデルの変換等の処理が不要になるので、関節角コマンド３３の生成が容易になる。また、制御対象であるスレーブ１３の動きを確認しながら、コントローラであるマスタ１２を操作するユーザも、直感的にマスタ１２の操作方法を理解することができ、容易に、スレーブ１３をユーザの思い通りに制御するように、マスタ１２を操作することができる。さらに、そのマスタ１２の各関節のサーボのゲインをそれぞれ個別に値を設定することにより、各関節のゲインを適切に設定することができ、不必要な関節を動かさずに教示を行うことが可能になる。つまり、教示時に、不要な関節を手で支える等の、余分な自由度を制御する必要がなくなるので、教示者に及ぼす負担を軽減させることができる。 Furthermore, by applying a device having substantially the same geometric shape as the slave 13 (a robot having a common geometric model) to the master 12, processing such as conversion of the geometric model is unnecessary, so the joint angle command 33 Is easily generated. In addition, the user who operates the master 12 that is the controller while confirming the movement of the slave 13 that is the control target can intuitively understand the operation method of the master 12, and the slave 13 can be easily operated as the user desires. The master 12 can be operated so as to be controlled. Furthermore, by setting the servo gain of each joint of the master 12 individually, the gain of each joint can be set appropriately, and teaching can be performed without moving unnecessary joints. become. That is, since it is not necessary to control an extra degree of freedom such as supporting an unnecessary joint with a hand during teaching, the burden on the teacher can be reduced.

また、マスタ１２の操作によって生成された関節角コマンド３３をスレーブ１３において再生することにより、ユーザは、スレーブ１３の制御結果（動き）を確認しながらマスタ１２を操作し、制御装置１１に、関節角コマンド３３を生成させることができる。つまり、ユーザは、スレーブ１３の動きを確認しながらマスタ１２の操作（つまり、スレーブ１３の制御や、予測モデルの学習のための教示データの生成）を行うこともできる。このようにすることにより、多様に変化して予測が困難な環境においても、ユーザは、容易に、スレーブ１３と環境との複雑な相互作用の教示を行うことができる。すなわち、環境の変化への耐性を向上させることができる。 In addition, by reproducing the joint angle command 33 generated by the operation of the master 12 on the slave 13, the user operates the master 12 while confirming the control result (movement) of the slave 13, and the control device 11 A corner command 33 can be generated. That is, the user can also operate the master 12 while confirming the movement of the slave 13 (that is, control of the slave 13 and generation of teaching data for learning the prediction model). By doing in this way, even in an environment in which various changes are difficult to predict, the user can easily teach a complex interaction between the slave 13 and the environment. That is, resistance to environmental changes can be improved.

なお、このような教示者の負担を軽減や操作性の向上は、スレーブ１３の制御操作や教示データの生成を容易にし、適切な予測モデルの生成、すなわち、制御における通信や処理の遅延や環境の変化に対する耐性を向上させ、制御の負荷や破綻を抑制することにも寄与する。 It should be noted that such a burden on the instructor and an improvement in operability facilitate the control operation of the slave 13 and the generation of teaching data, and generate an appropriate prediction model, that is, delays in communication and processing in the control and the environment. It also contributes to improving resistance to changes and suppressing control load and failure.

なお、以上においては、時系列予測器９３は、入力された時系列データより予測情報を生成するのみであるように説明したが、これに限らず、この予測情報を生成するとともに、入力された時系列データを教示データとして学習を行うようにしてもよい。 In the above description, the time series predictor 93 has been described so as to only generate the prediction information from the input time series data. However, the present invention is not limited to this, and the prediction information is generated and input. Learning may be performed using time-series data as teaching data.

図２９は、時系列予測器の他の構成例を示すブロック図である。 FIG. 29 is a block diagram illustrating another configuration example of the time-series predictor.

図２９に示される時系列予測器６９３は、基本的に図１４に示される時系列予測器９３と同様の構成を有しているが、時系列予測器９３の構成に加えて、データ学習部２３を有している点で異なる。 The time series predictor 693 shown in FIG. 29 basically has the same configuration as the time series predictor 93 shown in FIG. 14, but in addition to the configuration of the time series predictor 93, a data learning unit 23 in that it has 23.

このデータ学習部２３は、図８に示されるデータ学習部２３と同様のものであるのでその詳細についての説明は省略する。 Since the data learning unit 23 is the same as the data learning unit 23 shown in FIG. 8, a detailed description thereof is omitted.

図２９において、データ学習部２３は、時系列データ生成部１９３に供給されるのと同一の、特徴抽出部１７２において特徴を抽出された時系列データに基づいてネットワーク記憶部１７４に記憶されているダイナミクス記憶ネットワークの学習を行い、ダイナミクス記憶ネットワークを更新する。 In FIG. 29, the data learning unit 23 is stored in the network storage unit 174 based on the same time series data extracted by the feature extraction unit 172 as that supplied to the time series data generation unit 193. Learning the dynamics storage network and updating the dynamics storage network.

このようにすることにより、制御装置１１は、スレーブ１３のセンサ情報を用いた自律制御中に、時系列予測器の予測モデルの学習を行い、ダイナミクス記憶ネットワークを更新することができる。 By doing in this way, the control apparatus 11 can learn the prediction model of a time series predictor during the autonomous control using the sensor information of the slave 13, and can update a dynamics storage network.

上述した一連の処理は、ハードウェアにより実行させることもできるし、ソフトウエアにより実行させることもできる。この場合、例えば、図３０に示されるようなパーソナルコンピュータとして構成されるようにしてもよい。 The series of processes described above can be executed by hardware or can be executed by software. In this case, for example, a personal computer as shown in FIG. 30 may be configured.

図３０において、パーソナルコンピュータ７００のCPU（Central Processing Unit）７０１は、ROM（Read Only Memory）７０２に記憶されているプログラム、または記憶部７１３からRAM（Random Access Memory）７０３にロードされたプログラムに従って各種の処理を実行する。RAM７０３にはまた、CPU７０１が各種の処理を実行する上において必要なデータなども適宜記憶される。 In FIG. 30, a CPU (Central Processing Unit) 701 of the personal computer 700 performs various processes according to a program stored in a ROM (Read Only Memory) 702 or a program loaded from a storage unit 713 to a RAM (Random Access Memory) 703. Execute the process. The RAM 703 also appropriately stores data necessary for the CPU 701 to execute various processes.

CPU７０１、ROM７０２、およびRAM７０３は、バス７０４を介して相互に接続されている。このバス７０４にはまた、入出力インタフェース７１０も接続されている。 The CPU 701, ROM 702, and RAM 703 are connected to each other via a bus 704. An input / output interface 710 is also connected to the bus 704.

入出力インタフェース７１０には、キーボード、マウスなどよりなる入力部７１１、CRTやLCDなどよりなるディスプレイ、並びにスピーカなどよりなる出力部７１２、ハードディスクなどより構成される記憶部７１３、モデムなどより構成される通信部７１４が接続されている。通信部７１４は、インターネットを含むネットワークを介しての通信処理を行う。 The input / output interface 710 includes an input unit 711 including a keyboard and a mouse, a display including a CRT and an LCD, an output unit 712 including a speaker, a storage unit 713 including a hard disk, a modem, and the like. A communication unit 714 is connected. The communication unit 714 performs communication processing via a network including the Internet.

入出力インタフェース７１０にはまた、必要に応じてドライブ７１５が接続され、磁気ディスク、光ディスク、光磁気ディスク、或いは半導体メモリなどのリムーバブルメディア７２１が適宜装着され、それらから読み出されたコンピュータプログラムが、必要に応じて記憶部７１３にインストールされる。 A drive 715 is connected to the input / output interface 710 as necessary, and a removable medium 721 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is appropriately mounted, and a computer program read from them is It is installed in the storage unit 713 as necessary.

上述した一連の処理をソフトウエアにより実行させる場合には、そのソフトウエアを構成するプログラムが、ネットワークや記録媒体からインストールされる。 When the above-described series of processing is executed by software, a program constituting the software is installed from a network or a recording medium.

この記録媒体は、例えば、図３０に示されるように、装置本体とは別に、ユーザにプログラムを配信するために配布される、プログラムが記録されている磁気ディスク（フレキシブルディスクを含む）、光ディスク（CD-ROM,DVDを含む）、光磁気ディスク（MDを含む）、もしくは半導体メモリなどよりなるリムーバブルメディア７２１により構成されるだけでなく、装置本体に予め組み込まれた状態でユーザに配信される、プログラムが記録されているROM７０２や、記憶部７１３に含まれるハードディスクなどで構成される。 For example, as shown in FIG. 30, this recording medium is distributed to distribute a program to a user separately from the apparatus main body, and includes a magnetic disk (including a flexible disk) on which a program is recorded, an optical disk ( CD-ROM, DVD (including), magneto-optical disk (including MD), or removable media 721 made of semiconductor memory, etc., as well as being distributed to users in a pre-installed state in the device body. A ROM 702 in which a program is recorded, a hard disk included in the storage unit 713, and the like are included.

なお、本明細書において、記録媒体に記録されるプログラムを記述するステップは、記載された順序に沿って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的あるいは個別に実行される処理をも含むものである。 In the present specification, the step of describing the program recorded on the recording medium is not limited to the processing performed in chronological order according to the described order, but is not necessarily performed in chronological order. It also includes processes that are executed individually.

また、本明細書において、システムとは、複数のデバイス（装置）により構成される装置全体を表すものである。 Further, in this specification, the system represents the entire apparatus composed of a plurality of devices (apparatuses).

なお、以上において、一つの装置として説明した構成を分割し、複数の装置として構成するようにしてもよい。逆に、以上において複数の装置として説明した構成をまとめて一つの装置として構成されるようにしてもよい。また、各装置の構成に上述した以外の構成を付加するようにしてももちろんよい。さらに、システム全体としての構成や動作が実質的に同じであれば、ある装置の構成の一部を他の装置の構成に含めるようにしてもよい。つまり、本発明の実施の形態は、上述した実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能である。 In the above description, the configuration described as one device may be divided and configured as a plurality of devices. Conversely, the configurations described above as a plurality of devices may be combined into a single device. Of course, configurations other than those described above may be added to the configuration of each device. Furthermore, if the configuration and operation of the entire system are substantially the same, a part of the configuration of a certain device may be included in the configuration of another device. That is, the embodiment of the present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the gist of the present invention.

本発明は、ロボット制御情報教示システムに適用することが可能である。 The present invention can be applied to a robot control information teaching system.

制御の遅延を説明するグラフである。It is a graph explaining the delay of control. 本発明を適用した情報処理システムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the information processing system to which this invention is applied. 図２の情報処理システムの、制御システムとしての構成例を示すブロック図である。It is a block diagram which shows the structural example as a control system of the information processing system of FIG. 図２の情報処理システムの、記録システムとしての構成例を示すブロック図である。It is a block diagram which shows the structural example as a recording system of the information processing system of FIG. 図２の情報処理システムの、学習システムとしての構成例を示すブロック図である。It is a block diagram which shows the structural example as a learning system of the information processing system of FIG. 図２の制御装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the control apparatus of FIG. 図６の制御装置の、記録装置としての構成例を示すブロック図である。It is a block diagram which shows the structural example as a recording device of the control apparatus of FIG. 図６の制御装置の、学習装置としての構成例を示すブロック図である。It is a block diagram which shows the structural example as a learning apparatus of the control apparatus of FIG. リカレントニューラルネットワークのモデルの例を示す図である。It is a figure which shows the example of the model of a recurrent neural network. データ学習部の構成例を示す図である。It is a figure which shows the structural example of a data learning part. ダイナミクス記憶ネットワークの例を示す図である。It is a figure which shows the example of a dynamics storage network. ダイナミクス記憶ネットワークの他の例を示す図である。It is a figure which shows the other example of a dynamics storage network. ダイナミクス記憶ネットワークのノードを説明する図である。It is a figure explaining the node of a dynamics storage network. 時系列予測器の詳細な構成例を示すブロック図である。It is a block diagram which shows the detailed structural example of a time series predictor. マスタの斜視図である。It is a perspective view of a master. マスタの可動な関節を説明する図である。It is a figure explaining the movable joint of a master. マスタの内部の構成例を示す図である。It is a figure which shows the example of an internal structure of a master. モードコントロールコマンダ画面の例を説明する図である。It is a figure explaining the example of a mode control commander screen. センサモータビューア画面の例を説明する図である。It is a figure explaining the example of a sensor motor viewer screen. センサモータレコーダ画面の例を説明する図である。It is a figure explaining the example of a sensor motor recorder screen. サーボゲイン設定処理の流れの例を説明するフローチャートである。It is a flowchart explaining the example of the flow of a servo gain setting process. 記録処理の流れの例を説明するフローチャートである。It is a flowchart explaining the example of the flow of a recording process. 再生処理の流れの例を説明するフローチャートである。It is a flowchart explaining the example of the flow of a reproduction | regeneration process. 学習処理の流れの例を説明するフローチャートである。It is a flowchart explaining the example of the flow of a learning process. 勝者ノードからの距離と、学習の重みとの関係を示す図である。It is a figure which shows the relationship between the distance from a winner node, and the weight of learning. 学習データの更新の方法を説明する図である。It is a figure explaining the method of updating learning data. 制御処理の流れの例を説明するフローチャートである。It is a flowchart explaining the example of the flow of control processing. 予測処理の流れの例を説明するフローチャートである。It is a flowchart explaining the example of the flow of a prediction process. 時系列予測器の他の構成例を示すブロック図である。It is a block diagram which shows the other structural example of a time series predictor. 本発明を適用したパーソナルコンピュータの構成例を示すブロック図である。It is a block diagram which shows the structural example of the personal computer to which this invention is applied.

Explanation of symbols

１０情報処理システム，１１制御装置，１２マスタ１２スレーブ，１４ボール，２１マスタスレーブ制御部，２２データ記録部，２３データ学習部，２４スレーブ制御部，２５データ表示制御部，５１主制御部，５２入力部，５３出力部，６１サーボゲイン設定部，６２無線通信部，６３センサデータ・コマンド変換部，６４無線通信部，６５データ統合部，６６再生部，９１無線通信部，９２データ統合部，９３時系列予測器，１２１ダイナミクス記憶ネットワーク取得部，１２２ダイナミクス記憶ネットワーク保持部，１２３時系列データ取得部，１２４学習部，１２５ダイナミクス記憶ネットワーク供給部，１６２学習データ記憶部，１７３認識部，１７４ネットワーク記憶部，１７５内部状態記憶部，１７６生成部，１８１内部状態量更新部，１８２スコア計算部，１８３勝者ノード決定部，１８４認識結果出力部，１９１生成ノード決定部，１９２内部状態読み込み部，１９３時系列データ生成部，１９４生成結果出力部，２０１ロボット装置 DESCRIPTION OF SYMBOLS 10 Information processing system, 11 Control apparatus, 12 Master 12 Slave, 14 Ball, 21 Master slave control part, 22 Data recording part, 23 Data learning part, 24 Slave control part, 25 Data display control part, 51 Main control part, 52 Input unit 53 Output unit 61 Servo gain setting unit 62 Wireless communication unit 63 Sensor data / command conversion unit 64 Wireless communication unit 65 Data integration unit 66 Playback unit 91 Wireless communication unit 92 Data integration unit 93 time series predictor, 121 dynamics storage network acquisition unit, 122 dynamics storage network holding unit, 123 time series data acquisition unit, 124 learning unit, 125 dynamics storage network supply unit, 162 learning data storage unit, 173 recognition unit, 74 network storage unit, 175 internal state storage unit, 176 generation unit, 181 internal state quantity update unit, 182 score calculation unit, 183 winner node determination unit, 184 recognition result output unit, 191 generation node determination unit, 192 internal state reading unit , 193 Time series data generation unit, 194 generation result output unit, 201 robot apparatus

Claims

An input device operated by a user;
A controlled device to be controlled; and
A control system comprising: a control device that controls the controlled device based on information input from the input device;
The input device and the controlled device are devices having the same geometric shape,
The controller is
Obtaining means for obtaining sensor information output from a sensor for measuring a surrounding environment provided in the input device supplied from the input device operated by a user;
Conversion means for converting the sensor information acquired by the acquisition means into a control command for controlling the controlled device;
A control system comprising: supply means for supplying the control command obtained by conversion by the conversion means to the controlled device.

A controlled device to be controlled; and
A control system comprising: a control device that controls the controlled device;
The controller is
Obtaining means for obtaining sensor information output from a sensor for measuring a surrounding environment provided in the controlled device;
Based on the sensor information acquired by the acquisition unit and a control command for controlling the controlled device, a control command for controlling the controlled device at a predetermined time ahead of the sensor information is predicted. Prediction means to be generated,
A control system comprising: supply means for supplying a new control command predicted and generated by the prediction means to the controlled device.

An input device operated by a user;
A recording system comprising: a recording device that records information input from the input device,
It further has a controlled device to be controlled,
The controller is
Obtaining means for obtaining sensor information output from a sensor for measuring a surrounding environment provided in the input device, which is supplied from the input device operated by a user;
Conversion means for converting the sensor information acquired by the acquisition means into a control command for controlling the controlled device;
Recording means for recording the control command obtained by conversion by the conversion means as time-series data every predetermined time;
A recording system comprising: supply means for supplying the control command obtained by conversion by the conversion means to the controlled device.

An information processing apparatus that controls a controlled apparatus that is a control target,
A first acquisition unit configured to acquire sensor information output from a sensor for measuring a surrounding environment provided in an input device having the same geometric shape as the controlled device operated by a user;
Conversion means for converting the sensor information acquired by the first acquisition means into a control command for controlling the controlled device;
An information processing apparatus comprising: a first supply unit that supplies the control command obtained by conversion by the conversion unit to the controlled device.

The information processing apparatus according to claim 4, further comprising: a gain adjustment unit that adjusts input gains of the input units provided in the input apparatus independently of each other.

Second acquisition means provided in the controlled device for acquiring sensor information output from a sensor for measuring the surrounding environment;
The information processing apparatus according to claim 4, further comprising: a presentation unit that presents the sensor information acquired by the second acquisition unit to the user of the input device.

The information processing apparatus according to claim 4, further comprising a recording unit that records the control command generated by conversion by the conversion unit as time-series data.

The information processing apparatus according to claim 7, further comprising a reproducing unit that reproduces the control command recorded in the recording unit in time series and outputs the control command to the controlled device.

A second obtaining unit for obtaining sensor information output from a sensor for measuring a surrounding environment provided in the controlled device;
The information processing apparatus according to claim 7, wherein the recording unit records the sensor information acquired by the second acquisition unit together with the control command.

Based on the sensor information acquired by the second acquisition unit and the past control command, the controlled device is controlled at a time ahead of the sensor information by using a predetermined prediction model. Prediction means for predicting and generating a control command to be performed;
The information processing apparatus according to claim 9, further comprising: a second supply unit that supplies a new control command generated by the prediction unit to the controlled device.

The information processing apparatus according to claim 10, further comprising a learning unit that learns the prediction model using the sensor information and the control command recorded by the recording unit.

The information processing apparatus according to claim 11, wherein the prediction model is a recurrent neural network.

The information processing apparatus according to claim 12, wherein the learning unit learns the recurrent neural network using a self-organizing map technique used for category learning of vector patterns.

The information processing apparatus according to claim 4, wherein the input device and the controlled device are robot devices having a plurality of joints.

First wireless communication means for communicating with the input device;
The information processing apparatus according to claim 4, further comprising: a second wireless communication unit that communicates with the controlled apparatus.

An information processing method for an information processing device that controls a controlled device that is a control target,
Obtaining sensor information output from a sensor for measuring the surrounding environment provided in an input device having the same geometric shape as the controlled device operated by the user;
The acquired sensor information is converted into a control command for controlling the controlled device,
An information processing method including a step of supplying the control command obtained by the conversion to the controlled device.

In a program that performs processing to control a controlled device that is a control target,
Obtaining sensor information output from a sensor for measuring the surrounding environment provided in an input device having the same geometric shape as the controlled device operated by the user;
The acquired sensor information is converted into a control command for controlling the controlled device,
A program for causing a computer to execute the step of supplying the control command obtained by the conversion to the controlled device.

A recording medium on which the program according to claim 17 is recorded.