JP4441296B2

JP4441296B2 - Online I / O relation learning method

Info

Publication number: JP4441296B2
Application number: JP2004074204A
Authority: JP
Inventors: 洋川野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2004-03-16
Filing date: 2004-03-16
Publication date: 2010-03-31
Anticipated expiration: 2024-03-16
Also published as: JP2005265918A

Description

本発明は、オンライン入出力関係学習方法に関する。 The present invention relates to an on-line input / output relation learning method .

制御対象の力学特性が非線形性を持っている場合や、制御に対する追従時間差、追従誤差に対する要求が強い場合においては、フィードフォワード制御の手法がよく用いられる。
フィードフォワード制御においては、制御対象の入出力関係を精度よく記述したモデルが必要不可欠であるが、制御対象の力学特性を、数学的な方程式によって記述することが難しい場合が多々ある。例えば、超音波モータ、流体中の航行体、などはその代表例であるといえる。
そのような場合においては、機械学習の手法を利用して、制御対象の運動モデルを獲得し、フィードフォワード制御に利用する手法が有効であるとされている。例えば、制御を行いながら同時にオンラインで運動モデルの獲得を行うことが可能なフィードバック誤差学習の手法が有効である（非特許文献１参照）。 The feed forward control method is often used when the mechanical characteristics of the controlled object have non-linearity, or when there is a strong demand for the tracking time difference and tracking error for the control.
In feedforward control, a model that accurately describes the input / output relationship of the controlled object is indispensable, but it is often difficult to describe the dynamic characteristics of the controlled object by mathematical equations. For example, an ultrasonic motor, a navigation body in a fluid, and the like are representative examples.
In such a case, it is considered that a method of acquiring a motion model to be controlled using a machine learning method and using it for feedforward control is effective. For example, a feedback error learning technique capable of simultaneously acquiring a motion model while performing control is effective (see Non-Patent Document 1).

ところが、実際に学習によって獲得される運動モデルをどのような状態空間を利用して構成するかについては問題が多い。特に問題なのは、運動モデルによって記述可能な入力値に対する出力値の変化の仕方の複雑さ（ここでは、便宜的に解像度と呼ぶ）は運動モデルを記述するために選択した数学的な手法に左右されるということである。
例えば、放射状基底関数(Radial Basis Function：中心点から距離が離れるにつれて、値が単調に減少（増加)し、その等高線が超球（２次元の場合：円又は楕円）になる関数）を利用して運動モデルを記述した場合、基底とする特徴点の数によって、前述の解像度は直接左右される。ニューラルネットワークを利用した場合でも、パーセプトロンの数によって解像度は決定される。
すなわち、放射状基底関数の場合には特徴点、ニューラルネットッワークの場合にはパーセプトロン数を十分大きくすれば、解像度も上がり、精度の高い運動モデルの構築が可能となる。しかし、一般に、学習に要する時間は、解像度が上がるにつれて急激に長くなるという問題がある。これが機械学習の研究において一般に言われる「状態空間の爆発問題」である（非特許文献２参照）。 However, there are many problems about what state space is used to construct the motion model actually acquired by learning. Of particular concern is the complexity of how the output values change with respect to the input values that can be described by the motion model (here called resolution for convenience) depends on the mathematical method chosen to describe the motion model. That is.
For example, using a radial basis function (a function whose value decreases monotonously (increases) as the distance from the center increases, and whose contours are hyperspheres (in the case of two dimensions: a circle or an ellipse)) When the motion model is described, the above-mentioned resolution is directly influenced by the number of feature points as a basis. Even when a neural network is used, the resolution is determined by the number of perceptrons.
That is, in the case of radial basis functions, if the number of perceptrons is sufficiently large in the case of a neural network, the resolution can be increased and a highly accurate motion model can be constructed. However, in general, there is a problem that the time required for learning increases rapidly as the resolution increases. This is the “state space explosion problem” commonly referred to in machine learning research (see Non-Patent Document 2).

これまでの研究として、学習対象プラントの入出力特性を忠実に再現するための必要最小限の解像度を知るためのいくつかの手法が提案されているが（非特許文献３参照）、そうして求められた解像度の運動モデルが、オンラインで実用的な学習時間内に獲得される事が可能であるという補償はない。
また、制御を行いながらの運動モデルのオンラインでの学習においては、精度のよい運度モデルを学習する上で十分な状態空間内の探索を行える補償はない。例えば、急激に目標起動の形態がこれまでとは異なる形に変化した場合においては、事実上運動モデルの学習しなおしが必要となるため、制御性能の悪化が著しくなる。これに対する対策として、オフラインで運動モデルを学習する手法も考えられるが、オフラインの場合においても、運動モデルの学習必要な状態空間の探索が十分に行えるという補償はない。すなわち、限られた学習経験を元に、あらゆる制御目標の要求にこたえられる運度モデルが構築可能であることが必要である。
これらの問題に着目した研究事例としては、低速なプラントを扱った離散値を元にした強化学習に関するものがあるだけであり、連続値の扱いを必要とする高速制御の問題には向いていない（非特許文献２参照）。
Shibata T., Shaal S.“Biomimetic Smooth Pursuite Based on Fast Learning of the Target Dynamics”, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems,(pp.278-285), Hawaii 2001. Kawano, H. & Ura. T.,“Dynamics Control Algorithm of Autonomous Underwater Vehicle by Reinforcement Learning and Teaching Method Considering Thruster Failure under Severe Disturbance. ”Proseeding of 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems(pp.974-979). Hawaii, 2001. 上田修功，「最良モデル探索のための変分ベイズ学習」，人工知能学会論文誌，Vol. 16, No.2, pp. 299-308,2001. As a research so far, several methods for knowing the minimum resolution necessary to faithfully reproduce the input / output characteristics of the learning target plant have been proposed (see Non-Patent Document 3). There is no compensation that a motion model with the required resolution can be obtained online in a practical learning time.
In addition, in online learning of a motion model while performing control, there is no compensation that can perform a search in a state space sufficient for learning an accurate mobility model. For example, when the target activation form suddenly changes to a different form from the past, it is necessary to relearn the motion model in effect, so that the control performance is significantly deteriorated. As a countermeasure against this, a method of learning a motion model offline can be considered, but even in the offline case, there is no compensation that a search of a state space that needs to learn a motion model can be sufficiently performed. That is, based on limited learning experience, it is necessary to be able to build a mobility model that can meet the requirements of all control targets.
Examples of research focusing on these issues are related to reinforcement learning based on discrete values dealing with low-speed plants, not suitable for high-speed control problems that require the treatment of continuous values. (Refer nonpatent literature 2).
Shibata T., Shaal S. “Biomimetic Smooth Pursuite Based on Fast Learning of the Target Dynamics”, Proceedings of IEEE / RSJ International Conference on Intelligent Robots and Systems, (pp.278-285), Hawaii 2001. Kawano, H. & Ura. T., “Dynamics Control Algorithm of Autonomous Underwater Vehicle by Reinforcement Learning and Teaching Method Considering Thruster Failure under Severe Disturbance.” Proseeding of 2001 IEEE / RSJ International Conference on Intelligent Robots and Systems (pp.974- 979) .Hawaii, 2001. Nobuyoshi Ueda, “Variation Bayesian Learning for Best Model Search”, Journal of the Japanese Society for Artificial Intelligence, Vol. 16, No.2, pp. 299-308, 2001.

従来の技術においては、制御対象とするプラントの入出力関係を連続値によって記述する高精度なモデルを、プラントの制御と同時に、少ない時間で獲得する事が可能であるという補償はなかった。
本発明は、異なる解像度の状態空間を併用することで、これらの課題を解決し、精度の良い運動モデルを構築するに十分な解像度を保持しつつも、学習に要する時間を大幅に短縮することを可能とするものである。また、制御を同時に行いながらのオンライン学習において、限られた学習経験を元に、未経験な目標起動を指示された場合における迅速な学習機能を実現するものである。 In the prior art, there has been no compensation that a high-accuracy model describing the input / output relationship of a plant to be controlled by continuous values can be acquired in a short time simultaneously with plant control.
The present invention solves these problems by using state spaces of different resolutions together, and greatly reduces the time required for learning while maintaining sufficient resolution to build an accurate motion model. Is possible. Further, in online learning while performing control simultaneously, a rapid learning function is realized when an inexperienced target activation is instructed based on limited learning experience.

本発明の第一の特徴は、対象とするプラントの入出力関係を連続値で記述する入出力モデルを、機械学習によって獲得する入出力関係学習装置において、入出力モデルを構成するデータ領域を複数同時に保持し、それぞれのデータ領域で扱うことが可能な入出力パラメータ値の変化の複雑さ（便宜的に解像度と呼ぶ）が異なることである。 A first feature of the present invention is that in an input / output relationship learning apparatus that acquires an input / output model that describes the input / output relationship of a target plant with continuous values by machine learning, a plurality of data regions that constitute the input / output model are provided. The complexity of changes in input / output parameter values that can be simultaneously held and handled in each data area (referred to as resolution for convenience) is different.

本発明の第二の特徴は、前記本発明の第一の特徴における入出力関係学習装置において、入出力モデルを構成する複数のデータ領域を、放射状基底関数を利用して構成し、放射状基底関数の特徴点の数を変えることで、各データ領域の解像度を変えることである。 According to a second aspect of the present invention, in the input / output relation learning device according to the first aspect of the present invention, a plurality of data regions constituting the input / output model are configured using a radial basis function, and the radial basis function is configured. The resolution of each data area is changed by changing the number of feature points.

本発明の第三の特徴は、前記本発明の第一の特徴における入出力関係学習装置において、入出力モデルを構成する最も解像度の低いデータ領域を、ｎ次関数（ｎ：自然数）を利用して構成することである。 According to a third aspect of the present invention, in the input / output relation learning device according to the first aspect of the present invention, an n-order function (n: natural number) is used for a data area having the lowest resolution that constitutes the input / output model. Is to configure.

本発明の第四の特徴は、前記本発明の第一の特徴における入出力関係学習装置において、入出力モデルを構成する複数のデータ領域を、人工ニューラルネットワークを利用して構成し、人工ニューラルネットワークのパーセプトロンの数を変えることで、各データ領域の解像度を変えることである。 According to a fourth aspect of the present invention, in the input / output relation learning device according to the first aspect of the present invention, a plurality of data regions constituting the input / output model are configured using an artificial neural network, and the artificial neural network The resolution of each data area is changed by changing the number of perceptrons.

本発明の第五の特徴は、前記本発明の第一乃至四の特徴における入出力関係学習装置において、学習によって獲得された入出力モデルを利用する際には、入出力モデルに対する入力値を、入出力モデルを構成する各解像度データ領域のデータを利用した各モデルに入力し、各モデルの出力値の総和をとって、それを入出力モデルの出力値として利用することである。 In the fifth aspect of the present invention, in the input / output relationship learning device according to the first to fourth aspects of the present invention, when an input / output model obtained by learning is used, an input value for the input / output model is set as follows: It is to input to each model using data of each resolution data area constituting the input / output model, take the sum of the output values of each model, and use it as the output value of the input / output model.

本発明の第六の特徴は、前記本発明の第五の特徴における入出力関係学習装置において、学習が開始されていないデータ領域のデータを利用したモデルの出力値は、０であることである。 A sixth feature of the present invention is that, in the input / output relationship learning device according to the fifth feature of the present invention, an output value of a model using data in a data area where learning has not started is zero. .

本発明の第七の特徴は、前記本発明の第一乃至六の特徴における入出力関係学習装置において、入出力モデルを構成する複数のデータ領域のうち、解像度が低いデータ領域に保持されるデータの学習から始め、以後、解像度が増す順序で、各データ領域に保持されるデータの学習を行っていくことである。 According to a seventh aspect of the present invention, in the input / output relation learning device according to the first to sixth aspects of the present invention, data held in a data area having a low resolution among a plurality of data areas constituting an input / output model. The learning of data held in each data area is performed in the order of increasing resolution.

本発明の第八の特徴は、前記本発明の第七の特徴における入出力関係学習装置において、入出力モデルを構成する複数のデータ領域の学習が終了した後に学習によって獲得された入出力モデルを利用する際、入出力モデルを構成する最も解像度の高いデータ領域のデータの学習をオンラインで継続することである。 An eighth feature of the present invention is the input / output relation learning device according to the seventh feature of the present invention, wherein the input / output model acquired by learning after learning of the plurality of data regions constituting the input / output model is completed. When used, the learning of data in the data area having the highest resolution that constitutes the input / output model is continued online.

本発明の第九の特徴は、前記本発明の第七又は八の特徴における入出力関係学習装置において、入出力モデルを構成する各解像度データ領域の学習則として、入出力モデルが出力する値が適切な出力値に比べて高いか低いかを判定し、高い場合は、入出力モデルの出力値を下げ、低い場合は、入出力モデルの値を上げる機能を持つ教師あり学習を利用することである。 According to a ninth feature of the present invention, in the input / output relation learning device according to the seventh or eighth feature of the present invention, a value output from the input / output model is a learning rule for each resolution data area constituting the input / output model. Determine whether it is higher or lower than the appropriate output value.If it is higher, lower the output value of the input / output model, and if it is lower, use supervised learning with the function to increase the value of the input / output model. is there.

本発明の第十の特徴は、前記本発明の第七乃至九の特徴における入出力関係学習装置において、入出力モデルを構成する複数の各データ領域の学習時間もしくは、学習回数を、最も解像度の低いデータ領域の学習時間もしくは回数で等配分することである。 According to a tenth feature of the present invention, in the input / output relation learning device according to the seventh to ninth features of the present invention, the learning time or the number of learning times of each of the plurality of data regions constituting the input / output model is set to the highest It is to equally distribute the learning time or number of times in the low data area.

本発明の第十一の特徴は、前記本発明の第九の特徴における入出力関係学習装置において、入出力モデルの入力値を制御目標値とし、出力値を制御対象に入力したフィードフォワード制御の結果として観測される制御対象の動作の目標値からのずれ量を利用して、入出力モデルの出力が大きすぎるか小さ過ぎるか判定する教師あり学習手法により、入出力モデルの各データ領域の学習を行うことである。 An eleventh feature of the present invention is the input / output relationship learning apparatus according to the ninth feature of the present invention, wherein the input value of the input / output model is set as a control target value, and the output value is input to the control target. Use the supervised learning method to determine whether the output of the input / output model is too large or too small by using the amount of deviation from the target value of the control target behavior that is observed as a result. Is to do.

本発明の第十二の特徴は、前記本発明の第十一の特徴において、入出力関係モデルの入力値を制御目標値、出力値を制御対象への入力値として利用するフィードフォワード制御と、フィードバック制御を併用する手法によって、制御対象を制御する機能を備えることである。 A twelfth feature of the present invention is the feedforward control using the input value of the input / output relationship model as a control target value and the output value as an input value to the controlled object in the eleventh feature of the present invention, It is to have a function of controlling a controlled object by a method using feedback control together.

本発明の第十三の特徴は、前記本発明の第七乃至一二の特徴における入出力関係学習装置における、入出力モデルを構成するデータ領域の学習プロセスにおいて、同時に２つ以上のデータ領域が同時に学習されることはないことである。 According to a thirteenth feature of the present invention, in the learning process of the data region constituting the input / output model in the input / output relation learning device according to the seventh to the twelve features of the present invention, two or more data regions are simultaneously used. It is not learned at the same time.

（作用）
本発明は、高い精度を持つが学習に要する時間が長い高解像度の状態空間によって構成された入出力モデルと、低い精度を持つが学習に要する時間も大幅に短く、学習時における探索にて状態空間内を網羅することが容易である低解像度の状態空間によって構成された入出力モデルを併用し、かつ低解像度の運動モデルの学習を先に行うことで、両者の利点を生かすことが可能であることを、本発明の提案者が実験により見出したことを利用したものである。 (Function)
The present invention has an input / output model composed of a high-resolution state space with high accuracy but a long learning time, and a low accuracy but a significantly short learning time. It is possible to take advantage of both by using an input / output model composed of a low-resolution state space that is easy to cover the space and learning a low-resolution motion model first. This is based on the fact that the proposer of the present invention has found through experiments.

本発明のオンライン入出力関係学習装置によって、高い精度を持った制御対象の入出力モデルの学習所要時間が大幅に短縮し、入出力モデルを利用したフィードフォワード制御を行いながらのオンラインでの入出力モデルの獲得が可能となった。また、制御目標軌道の急激な変化に対する頑健性も大幅に向上する。 The on-line input / output relation learning device of the present invention significantly reduces the learning time required for an input / output model to be controlled with high accuracy, and performs on-line input / output while performing feedforward control using the input / output model. A model can be acquired. In addition, robustness against a sudden change in the control target trajectory is greatly improved.

以下、図面に基づいて本発明の実施例を説明する。
図１に本発明によって獲得される入出力モデルの構成を示す。図１において、入出力モデルは、例えば３層からなっている。それぞれの層は、入出力モデルデータ領域最上層１、入出力モデルデータ領域中間層２、入出力モデルデータ領域最下層３である。このモデルをフィードフォワード制御に利用する場合は、入力は制御目標値、出力は制御対象プラント４への入力値となる。例えば、モータ制御に適用した場合には、入力は、モータへの指示速度値で、出力はモータへの制御電圧となる。つまり、制御対象をある回転速度にしたい場合の適切な制御電圧値をこのモデルによって計算する形となる。
図１において、各データ領域１，２，３は、扱うことが可能な入出力パラメータ値の変化の複雑さ（離散系で言うところの解像度に当たる）が異なっている。上の層ほど、扱える複雑さは高く、下の層に行くにしたがって低くなる。入出力モデルへの入力は、各データ領域１，２，３に入力され、各データ領域１，２，３に保持されているデータを利用して計算された出力値の総和が、入出力モデルの出力値となる。
図１において、各層のデータ領域１，２，３は、例えば、放射状基底関数を利用して構築すればよい。その場合、基底関数の数（特徴点の数）によって、それぞれのデータ領域１，２，３の解像度が決まるが、その場合は、入出力モデルデータ領域最上層１が最も多くの特徴点を持ち、入出力モデルデータ領域最下層３は、特徴点の数が最も少ない。
各層の学習のための学習則としては、入出力モデルが出力する値が、適切な出力値に比べて高いか低いかを判定し、高い場合は、入出力モデルの出力値を下げ、低い場合は、入出力モデルの値を上げる機能を持つ教師あり学習（幾つかの学習例と各学習例に対する目標出力を与え、目標出力と実際の出力が一致するように重みを調整する方法）を利用するとよい。 Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 shows the configuration of an input / output model obtained by the present invention. In FIG. 1, the input / output model has, for example, three layers. Each layer is an input / output model data area uppermost layer 1, an input / output model data area intermediate layer 2, and an input / output model data area lowermost layer 3. When this model is used for feedforward control, the input is a control target value, and the output is an input value to the control target plant 4. For example, when applied to motor control, the input is an instruction speed value to the motor, and the output is a control voltage to the motor. That is, an appropriate control voltage value is calculated by this model when it is desired to set the control target to a certain rotational speed.
In FIG. 1, the data areas 1, 2, and 3 are different in the complexity of change of input / output parameter values that can be handled (corresponding to the resolution in a discrete system). The higher the layer, the more complex it can handle, and the lower it goes the lower layer. Input to the input / output model is input to each of the data areas 1, 2, 3, and the sum of output values calculated using the data held in the data areas 1, 2, 3 is the input / output model Output value.
In FIG. 1, the data regions 1, 2, and 3 of each layer may be constructed using, for example, radial basis functions. In this case, the resolution of the data areas 1, 2, and 3 is determined by the number of basis functions (number of feature points). In this case, the uppermost layer 1 of the input / output model data area has the most feature points. The input / output model data area lowest layer 3 has the smallest number of feature points.
The learning rule for learning in each layer is to determine whether the output value of the input / output model is higher or lower than the appropriate output value. Uses supervised learning with a function to increase the value of the input / output model (a method that gives a target output for several learning examples and each learning example and adjusts the weights so that the target output and the actual output match) Good.

本件では、単純にそれぞれの層に一定の学習時間を割り当てている。すなわち、最下層の学習にて、所定の時間が過ぎたら、最下層の学習を切り上げ、次の中間層の学習に移る。
図１に示す入出力モデルの学習のプロセスにおいては、最下層の学習が始めに行われる。学習プロセスにおいて、同時に学習が行われるデータ領域は、一つである。なお、学習の済んでいない層は０（すなわち、制御対象プラント４に対して何の影響も与えない入力値）を出力する。本入出力モデルを、フィードフォワード制御に利用する場合には、学習則としては、例えば、制御対象プラント４から観測される追従誤差量を利用して学習を行う教師あり学習手法の一つであるフィードバック誤差学習による更新方法を採用すればよい。その様子を図３に示す。
図１に示す入出力モデルの各データ領域１，２，３の学習時間は、本実施例のように、各データ領域を放射状基底関数で構築した場合には、各データ領域が含む特徴点の数に比例する。その総和は、最上層を単体で利用した場合の学習時間に比べて長くなるという見方が起こりうるがそうではない。以下それについて説明する。 In this case, a fixed learning time is simply assigned to each layer. That is, when a predetermined time has passed in the learning of the lowermost layer, the learning of the lowermost layer is rounded up and the learning of the next intermediate layer is started.
In the learning process of the input / output model shown in FIG. 1, learning of the lowest layer is performed first. In the learning process, there is one data area where learning is performed simultaneously. In addition, 0 (namely, input value which does not have any influence with respect to the control object plant 4) outputs the layer which has not been learned. When the present input / output model is used for feedforward control, as a learning rule, for example, it is one of supervised learning methods for performing learning using a tracking error amount observed from the control target plant 4. An update method based on feedback error learning may be employed. This is shown in FIG.
The learning time of each of the data areas 1, 2, and 3 of the input / output model shown in FIG. 1 is the characteristic point included in each data area when each data area is constructed with radial basis functions as in this embodiment. It is proportional to the number. There is a possibility that the sum is longer than the learning time when the top layer is used alone, but this is not the case. This will be described below.

図２において学習段階が進んでいくにつれて入出力モデルの出力値が、どのように変化していくかを示している。始めの最下層３の学習において、学習対象の入出力関係の大まかな形は、すでに学習されており、この段階にかかる時間は全層の中で最も短い。しかし、最下層３は解像度が低いため、出力値にはある程度の誤差が含まれたままである。次の中間層２の学習段階では、教師あり学習によって、この最下層３の出力値が含む誤差を修正するように学習が行われていく。本発明での中間層２の学習は、このようにすでに学習されている最下層３に上乗りして学習を行うため、単体で中間層２の学習を行う場合に比べて大幅に学習時間が短縮される。なお、この場合、最下層３の学習が終わった段階で、中間層２にとって未経験な領域があったとしても、その部分は、低解像度の最下層３の学習結果によって補完されているので、学習中における大きな制御精度の低下はない。これは、例えば目標軌道が急激に変化したような場合における本モデルの頑健性を保つことにも役立っている。学習の最終段階である最上層１の学習においても同様である。上乗り学習は、図１に示したとおり、各層の出力の和が全体のモデルの出力として扱われる。最下層の学習が済んだあとは、学習済みの最下層は与えられた入力値に対する出力を出すことになるが、この最下層の出力と、中間層の出力の和が制御対象に与えられ、その結果によって、中間層の学習が行われる。すなわち、最下層の出力値が持つ誤差の部分を中間層の学習によって補正しているような図式となる。 FIG. 2 shows how the output value of the input / output model changes as the learning stage progresses. In the learning of the lowest layer 3 at the beginning, the rough form of the input / output relationship of the learning target has already been learned, and the time taken for this stage is the shortest in all layers. However, since the lowest layer 3 has a low resolution, the output value still contains some error. In the next learning step of the intermediate layer 2, learning is performed so as to correct errors included in the output value of the lowest layer 3 by supervised learning. In the learning of the intermediate layer 2 according to the present invention, the learning is performed on the lowermost layer 3 that has already been learned in this way. Shortened. In this case, even if there is an area inexperienced for the intermediate layer 2 at the stage when the learning of the lowermost layer 3 is finished, the portion is complemented by the learning result of the lowermost layer 3 of low resolution, so There is no significant reduction in control accuracy. This also helps to maintain the robustness of the model when the target trajectory changes abruptly, for example. The same applies to the learning of the uppermost layer 1, which is the final stage of learning. In the upper learning, as shown in FIG. 1, the sum of the outputs of each layer is treated as the output of the entire model. After learning the lowermost layer, the learned lowermost layer will output the given input value, but the sum of the lowermost layer output and the intermediate layer output is given to the control target, Based on the result, learning of the intermediate layer is performed. That is, the diagram is such that the error portion of the output value of the lowest layer is corrected by learning of the intermediate layer.

従って、上層の学習段階において、全状態空間の探索が必須にはならないため、学習に要する時間は、単体で最上層１を利用した場合に比べて短い。逆に、単体で学習を行った場合には、未経験な状態の領域がなくなるまでに膨大な時間を要する。例えば、解像度を最上層を４、中間層を２、最下層を１として、それぞれのデータ領域の次元数を２とした場合、単体でそれぞれの層に要する学習時間は１６，４，１である。しかし、本発明によれば、せいぜい最下層学習所要時間の数倍しか学習時間がかからないので、所要時間は１＋１＋１＝３となり、大幅に短いことがわかる。すなわち、本発明によって、最上層のみを使って学習を行った場合の５倍以上（１６／３）の学習速度で、最上層と同様の解像度を持つ入出力モデルの学習が可能である。
図１において、最下層１のデータ領域をｎ次関数（ｎ：自然数）で構築すると、制御に本発明を適用したときに有効なことが多い。
また、本発明で獲得された入出力モデルを利用したフィードフォワード制御に加えて、フィードバック制御を行えば、制御性能は大幅に向上する。 Accordingly, since the search of the entire state space is not indispensable in the upper learning stage, the time required for learning is shorter than when the uppermost layer 1 is used alone. On the other hand, when learning is performed alone, it takes an enormous amount of time before the inexperienced region disappears. For example, if the resolution is 4 for the top layer, 2 for the middle layer, 1 for the bottom layer, and the number of dimensions of each data area is 2, the learning time required for each layer is 16, 4 and 1 . However, according to the present invention, since the learning time is only several times as long as the lowest layer learning required time, the required time is 1 + 1 + 1 = 3, which is significantly shorter. That is, according to the present invention, it is possible to learn an input / output model having the same resolution as that of the uppermost layer at a learning speed that is 5 times or more (16/3) that of learning using only the uppermost layer.
In FIG. 1, it is often effective when the present invention is applied to control to construct the data region of the lowest layer 1 with an n-order function (n: natural number).
Further, if feedback control is performed in addition to feedforward control using the input / output model obtained in the present invention, the control performance is greatly improved.

本発明の実施例によって獲得される入出力モデル例の階層図。FIG. 6 is a hierarchical diagram of an example input / output model obtained by an embodiment of the present invention. 本発明による入出力モデルの学習の進み方の模式図。The schematic diagram of how to advance learning of the input-output model by this invention. 本発明のオンライン入出力関係学習装置におけるフィードフォワード制御を利用した場合の学習のプロセス図。The process diagram of learning at the time of utilizing the feedforward control in the online input / output relationship learning apparatus of this invention.

Explanation of symbols

１・・・入出力モデルデータ領域最上層
２・・・入出力モデルデータ領域中間層
３・・・入出力モデルデータ領域最下層
４・・・プラント（制御対象）
DESCRIPTION OF SYMBOLS 1 ... Input / output model data area uppermost layer 2 ... Input / output model data area middle layer 3 ... Input / output model data area lowest layer 4 ... Plant (control object)

Claims

Control the input and output model is a function that describes the input-output relationship to your subject, an input-output relationship learning how to win by machine learning,
The input / output model includes a plurality of data regions having different complexity (referred to as resolution for convenience) of how the output value changes with respect to the input value;
And a procedure for feeding back the output value of the input-output model to the data area to be learning the amount of deviation from the target value of the control as an input value, the control object being observed operation to the controlled object,
Learning is started so as to correct the shift amount from a data region having the lowest resolution among a plurality of data regions constituting the input / output model, and thereafter, learning is performed so as to correct the shift amount in the order of increasing resolution. The learning in the data area of each resolution uses the input value to the data area of the resolution as the control target value of the input / output model, and the output value from the data area of the learned resolution and the resolution An on-line input / output relation learning method characterized in that learning is performed such that a sum of output values from the data area is set as an output value of the input / output model, and the amount of deviation is reduced .

The online input / output relation learning method according to claim 1 ,
An on-line input / output relation learning method, characterized in that the learning time or the number of times of learning for each of a plurality of data areas constituting the input / output model is equally distributed according to the learning time or the number of times of the data area having the lowest resolution.

The online input / output relation learning method according to claim 1 or 2 ,
An on-line input / output relation learning method, wherein two or more data areas are not simultaneously learned in a learning process of data areas constituting an input / output model.