JP2007164704A

JP2007164704A - Apparatus, method and program using self-organizing map

Info

Publication number: JP2007164704A
Application number: JP2005363602A
Authority: JP
Inventors: Tetsuo Furukawa; 徹生古川
Original assignee: Kyushu Institute of Technology NUC
Current assignee: Kyushu Institute of Technology NUC
Priority date: 2005-12-16
Filing date: 2005-12-16
Publication date: 2007-06-28
Anticipated expiration: 2025-12-16
Also published as: JP4734639B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an apparatus capable of actualizing high-versatility control from among few training cases. <P>SOLUTION: In this apparatus, a control object is controlled by using the control signal of a controller related to a unit, containing a prediction device which most accurately predicts the prediction state of the control object at the next time among a plurality of units consisting of neural network modules present within the apparatus. Consequently, control with high immediacy can be actualized, and a self-organization map can be formed. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は自己組織化マップを用いる装置であり、特に、即時性を要する制御に最適な装置に関する。 The present invention relates to an apparatus using a self-organizing map, and more particularly to an apparatus optimal for control requiring immediacy.

本発明の背景技術として、特開２０００−３５９５６号公報に開示されるエージェント学習装置がある。 As a background art of the present invention, there is an agent learning device disclosed in Japanese Patent Laid-Open No. 2000-35956.

この背景技術のエージェント学習装置は、環境に対して働きかけ、その結果得られる報酬を最大化するための行動出力を決定する強化学習システムと、環境の変化を予測する環境予測システムとの組によりなる学習モジュールが複数備えられ、各学習モジュールの環境予測システムの予測誤差が少ないものほど大きな値を取る責任信号が求められ、この責任信号に比例して強化学習システムによる行動出力が重み付けされて、環境に対する行動が与えられる構成である。 This background art agent learning device is composed of a combination of a reinforcement learning system that works on the environment and determines an action output for maximizing a reward obtained as a result, and an environment prediction system that predicts a change in the environment. Multiple learning modules are provided, and the responsibility signal that takes a larger value is required as the prediction error of the environment prediction system of each learning module is smaller, and the action output by the reinforcement learning system is weighted in proportion to this responsibility signal, and the environment It is the structure where the action for is given.

この背景技術のエージェント学習装置によれば、非線形性／非定常性を持つ制御対象やシステムなどの環境で、具体的な教師信号は与えられず、様々な環境の状態や動作モードに最適な行動の切り替えや組み合わせを行い、また先見知識を用いることなく柔軟に行動
学習を行なうことができる。
特開２０００−３５９５６号公報 Wolpert, D.M., Kawato, M.: Multiple paired forward and inverse models for motor control. Neural Networks 11, 1317-1329, 1998 According to the agent learning device of this background art, in a non-stationary / non-stationary controlled object or system environment, a specific teacher signal is not given, and an action that is optimal for various environmental states and operation modes It is possible to perform behavioral learning flexibly without switching or combining them and without using foresight knowledge.
JP 2000-35956 A Wolpert, DM, Kawato, M .: Multiple paired forward and inverse models for motor control.Neural Networks 11, 1317-1329, 1998

前記背景技術のエージェント学習装置は、教師付き学習の枠組みを適用することができない、何が正しい出力であるかは未知であるという実世界の問題を解決するものであった。
しかしながら、背景技術のエージェント学習装置であっても、トレーニングケースの少ない場合にあっては迅速に問題を適切に解決することができない場合があるという課題を有する。特に、環境１、環境２、環境３を仮定した場合に、環境２が環境１と環境３を折衷した環境である場合に、環境１を十分にトレーニングさせ、環境３を十分にトレーニングさせた後に、環境２をトレーニングさせた時には、環境１に対応した学習モジュールと、環境３に対応した学習モジュールとが環境２に歩み寄る形で対応するため、環境２に十分に対応するためには相当の時間がかかる。すなわち、既に学習した環境を利用して新規環境に対する即時性のある対応をできないという課題を有する。 The agent learning apparatus of the background art cannot solve the problem in the real world, in which the supervised learning framework cannot be applied and what is the correct output is unknown.
However, even the background art agent learning device has a problem that it may not be possible to solve the problem promptly and appropriately when there are few training cases. In particular, when environment 1, environment 2, and environment 3 are assumed and environment 2 is an environment where environment 1 and environment 3 are compromised, environment 1 is sufficiently trained and environment 3 is sufficiently trained. When the environment 2 is trained, the learning module corresponding to the environment 1 and the learning module corresponding to the environment 3 correspond to each other in the form of walking up to the environment 2, so that it takes a considerable amount of time to sufficiently cope with the environment 2. It takes. That is, there is a problem that it is impossible to take immediate action on a new environment by using an already learned environment.

本発明は前記課題を解決するためになされたものであり、数少ないトレーニングケースから汎化性の高い制御を実現することができる装置を提供することを目的とする。 The present invention has been made to solve the above-described problems, and an object of the present invention is to provide an apparatus that can realize highly generalized control from a few training cases.

すなわち、制御対象の特性の突然の変化に対応し、且つ、できるだけ少ない標本数から汎化的な制御能力を獲得することができる装置を実現することを目標としている。そこで自己組織化マップの考え方を導入した自己組織化適応制御器（Self-Organizing AdaptiveController : SOAC）を提案する。ＳＯＡＣの開発は制御工学における適応制御のテーマに関連するが、適応制御では基本的に、制御対象の特性が時間とともにゆっくり変化することを前提としており、その点相違する。 That is, an object of the present invention is to realize an apparatus that can deal with a sudden change in the characteristics of a controlled object and can acquire generalized control capability from as few samples as possible. Therefore, we propose a self-organizing adaptive controller (SOC) that introduces the concept of a self-organizing map. The development of SOAC is related to the theme of adaptive control in control engineering, but adaptive control basically assumes that the characteristics of a controlled object change slowly with time, and is different in that respect.

ＳＯＡＣは制御器を自己組織的に構成することを目的に考案されたものであり、大きく２つの特徴を持つ。第１の特徴は、自己組織化マップ（SOM）とモジュラーネットワークの双方の特徴を併せ持つｍｎＳＯＭ(modular network SOM)を元にしたアーキテクチャを用いていることである。すなわちＳＯＡＣはニューラルネットの機能モジュールが多数集まった構造を持ち、それらはＳＯＭのアルゴリズムに従って学習が行われる。第２の特徴は、ＳＯＡＣの各機能モジュールが制御器と予測器のペアから構成されることである。すなわちＳＯＡＣは、特性の異なる制御器と予測器が多数並んだ構造をしている（図１）。ここで、図１は本発明のＳＯＡＣの基本構成を示す図である。 The SOAC was devised for the purpose of self-organizing the controller, and has two main characteristics. The first feature is that an architecture based on mnSOM (modular network SOM), which has the features of both a self-organizing map (SOM) and a modular network, is used. That is, the SOAC has a structure in which a large number of functional modules of a neural network are gathered, and these are learned according to the SOM algorithm. The second feature is that each functional module of the SOAC is composed of a pair of a controller and a predictor. That is, the SOAC has a structure in which many controllers and predictors having different characteristics are arranged (FIG. 1). Here, FIG. 1 is a diagram showing a basic configuration of the SOAC of the present invention.

まず第１の特徴であるｍｎＳＯＭであるが、これは従来型ＳＯＭの各ベクトルユニットをニューラルネットワークの機能モジュールに置き換えたものである。たとえばｍｎＳＯＭでは、ＭＬＰ（Multi-Layer Perceptron）やＲＮＮ（RecurrentNeural Network）などのモジュールをＳＯＭのベクトルユニットの代わりとして使うことができる。こうすることで従来型ＳＯＭでは扱えなかったデータベクトルの集合や、時系列データをマッピングすることができる。なおｍｎＳＯＭの機能モジュールとしてヘブ学習ニューロンを選べばｍｎＳＯＭは通常のＳＯＭになることから、ｍｎＳＯＭはＳＯＭの一般化とみることができる。ｍｎＳＯＭの機能モジュールはユーザーが自由にデザインできるので、ＳＯＭの応用範囲を大きく拡げることができる。そこで発明者は鋭意努力の末、ニューラルネットワークを用いた制御器を機能モジュールとするｍｎＳＯＭを想到した。これがＳＯＡＣの第１の特徴である。すなわちＳＯＡＣは、ニューラルネットの制御器が多数集合したものであり、それらはＳＯＭのアルゴリズムによって機能の分業・協調が行われる。ある制御対象が与えられたとき、制御対象をもっとも良く制御するモジュールが最適合制御器（Best Matching Controller : BMC）として選ばれ、ＢＭＣモジュールを用いて対象が制御される。もし制御対象の特性が突然変化した場合は、ＢＭＣも直ちに別のモジュールへと変わるため適応的に制御を行うことができる（図２）。ここで、図２は本発明のモジュール切り替えの説明図である。 First, mnSOM, which is a first feature, is obtained by replacing each vector unit of a conventional SOM with a functional module of a neural network. For example, in mnSOM, modules such as MLP (Multi-Layer Perceptron) and RNN (Recurrent Neural Network) can be used instead of SOM vector units. By doing so, it is possible to map a set of data vectors and time series data that could not be handled by the conventional SOM. Note that if a Heb learning neuron is selected as the functional module of the mnSOM, the mnSOM becomes a normal SOM, so that the mnSOM can be regarded as a generalization of the SOM. Since the mnSOM function module can be freely designed by the user, the application range of the SOM can be greatly expanded. Accordingly, the inventor has come up with mnSOM with a controller using a neural network as a functional module after diligent efforts. This is the first feature of SOAC. That is, the SOAC is a collection of a large number of neural network controllers, and the functions are divided and coordinated by the SOM algorithm. When a certain control object is given, the module that best controls the control object is selected as the best matching controller (BMC), and the object is controlled using the BMC module. If the characteristics of the controlled object suddenly change, the BMC is immediately changed to another module, so that adaptive control can be performed (FIG. 2). Here, FIG. 2 is an explanatory diagram of module switching according to the present invention.

ＳＯＡＣの第２の特徴は、制御器と予測器がペアになったモジュール構造である。その必要性は制御タスクを実時間で行わなければなならないことから生じる。制御対象の特性が変化したとき、それに対応して最適な制御器、すなわちＢＭＣをただちに切り替えなければならない。そこでＳＯＡＣでは、すべての制御器とペアになる予測器を用意しておき、制御対象の次時刻の状態をもっとも良く推定した予測器とペアの制御器が、その時刻におけるＢＭＣであるとした。すなわち予測器はシステム同定器として働き、ペアとなる制御器は同定したシステムに対して最適な制御器になるよう事前に学習しておく。こうすることで、制御対象の突然の変化に対しても瞬時にＢＭＣを切り替えることができるようになる。
より体系的には本発明は次のように説示できる。 The second feature of SOAC is a modular structure in which a controller and a predictor are paired. The need arises from the fact that control tasks must be performed in real time. When the characteristics of the controlled object change, the optimum controller, that is, the BMC must be switched immediately correspondingly. Therefore, in the SOAC, a predictor that is paired with all the controllers is prepared, and the predictor that best estimates the state of the next time to be controlled and the paired controller are the BMC at that time. In other words, the predictor functions as a system identifier, and the paired controllers are learned in advance so as to be the optimal controller for the identified system. By doing so, the BMC can be switched instantaneously even for a sudden change in the controlled object.
More systematically, the present invention can be explained as follows.

（１）本発明に係る装置は、ニューラルネットワークのモジュールからなるユニット間の競合的学習から実現される自己組織化マップを構築する装置であって、当該ニューラルネットワークのモジュールは、制御対象を制御する制御器と制御対象の次時刻状態を予測する予測器を含み、制御器が制御対象の理想状態と制御対象の現在状態を入力されることで制御信号を出力し、予測器が制御対象の現時刻の制御信号と制御対象の現在状態を入力されることで次時刻の制御対象の予測状態を出力し、制御対象の現在状態を最も近い予測を行った予測器を具備するユニットを最適合ユニットとして特定し、最適合ユニットに係る制御器から出力された制御信号で制御対象を実際に制御し、制御信号を採用されたユニットを最適合ユニットとして自己組織化マップを更新するものである。 (1) An apparatus according to the present invention is an apparatus that constructs a self-organizing map realized by competitive learning between units composed of modules of a neural network, and the module of the neural network controls an object to be controlled. It includes a controller and a predictor that predicts the next time state of the controlled object.The controller outputs a control signal when the ideal state of the controlled object and the current state of the controlled object are input, and the predictor outputs the current state of the controlled object. By inputting the control signal of the time and the current state of the controlled object, the predicted state of the controlled object at the next time is output, and the unit equipped with the predictor that makes the closest prediction of the current state of the controlled object is the optimal unit The control target is actually controlled by the control signal output from the controller related to the optimal combination unit, and the unit that adopts the control signal as the optimal combination unit It is intended to update ourselves organizing map.

このように本発明によれば、装置内に複数存するニューラルネットワークのモジュールからなるユニットのうち、最も次時刻の制御対象の予測状態を正しく予測した予測器を含むユニットに係る制御器の制御信号を採用して制御対象を制御するので、即時性の高い制御を実現することができると共に、自己組織化マップを形成することができる。
なお、前記「ニューラルネットワークのモジュールからなるユニット間の競合的学習から実現される自己組織化マップを構築する装置」は、下位概念として「ニューラルネットワークのモジュールからなるユニット間の競合的学習と近傍関数による平滑化から実現される自己組織化マップを構築する装置」とすることもできる。 As described above, according to the present invention, the control signal of the controller related to the unit including the predictor that correctly predicts the predicted state of the control target at the next time among the units composed of a plurality of neural network modules existing in the apparatus. Since the control object is controlled by adopting it, it is possible to realize control with high immediacy and to form a self-organizing map.
The above-mentioned “apparatus for constructing a self-organizing map realized by competitive learning between units composed of modules of neural networks” is subordinate to “competitive learning between units composed of modules of neural network and neighborhood functions” An apparatus for constructing a self-organizing map realized by smoothing according to "."

（２）本発明に係る装置は必要に応じて、現在状態の最も近い予測を行った予測器を具備するユニットである候補ユニットと、前回の最適合ユニットとなったユニットとが異なる場合に、候補ユニットの現在状態の予測が前回の最適合ユニットとなったユニットの現在状態の予測よりも所定以上制御対象の現在状態に近くないときには、前回の最適合ユニットとなったユニットを最適合ユニットとして維持するものである。 (2) The apparatus according to the present invention, if necessary, when a candidate unit that is a unit including a predictor that has performed the prediction that is closest to the current state is different from a unit that has become the previous optimal unit, If the current state of the candidate unit is not closer to the current state of the control target than the prediction of the current state of the unit that was the previous optimal combined unit, the unit that was the previous optimal combined unit is determined as the optimal combined unit To maintain.

このように本発明によれば、最適合ユニットが他のユニットに移行する可能性がある場合に、前回の最適合ユニットの予測状態と制御対象の現在状態との差が、候補ユニットの予測状態と制御対象の現在状態との差よりも所定閾値よりも大きくない場合に、前回の最適合ユニットを継続して採用するので、最適合ユニットの交代が頻繁になされることがなく、安定した制御対象の制御を実現することができる。すなわち、候補ユニットがいくら正確に制御対象の予測状態を予測していた場合であっても現状の最適合ユニットでもさほど変わらず十分制御することができる場合にはユニットの切換を抑制している。ユニットの切換の乱発で不安定な制御系にならないように配慮している。 Thus, according to the present invention, when there is a possibility that the optimum combined unit may shift to another unit, the difference between the predicted state of the previous optimal combined unit and the current state of the control target is the predicted state of the candidate unit. If the difference between the control target and the current state of the control target is not greater than the predetermined threshold value, the previous optimal combination unit is continuously adopted, so that the optimal combination unit is not frequently changed and stable control is performed. Target control can be realized. In other words, even if the candidate unit accurately predicts the predicted state of the control target, switching of the unit is suppressed when the current optimum combined unit can be sufficiently controlled without much change. Consideration is given to avoid unstable control systems due to random switching of units.

（３）本発明に係る装置は必要に応じて、制御対象の現在状態を入力されることで制御信号を出力すると共に制御器にも出力する制御器毎に用意された線形フィードバック制御器を新たに含むものである。
このように本発明によれば、各制御器に線形フィードバック制御器を配しているので、制御器であるニューラルネットワークが十分に学習していない場合であっても線形フィードバックが制御信号を補間して適切な制御を実現することができると共に、線形フィードバック制御器が出力する信号が制御器にも入力されて学習も補間することができる。 (3) The device according to the present invention newly provides a linear feedback controller prepared for each controller that outputs a control signal and also outputs it to the controller by inputting the current state of the controlled object as necessary. Is included.
As described above, according to the present invention, since the linear feedback controller is arranged for each controller, the linear feedback interpolates the control signal even when the neural network as the controller is not sufficiently learned. Thus, appropriate control can be realized, and a signal output from the linear feedback controller can also be input to the controller to interpolate learning.

（４）本発明に係る装置は必要に応じて、予測器が出力した予測した制御対象の予測状態を少なくとも予測した時刻が到来するまで保持する予測毎に用意された遅延器を新たに含むものである。
このように本発明によれば、予測器が予測状態を出力するタイミングが予測した時刻でなかった場合であっても遅延器が調整し、適切に最適合ユニットを特定することができる。 (4) The apparatus according to the present invention newly includes a delay unit prepared for each prediction that is held until at least the predicted time arrives at the predicted state of the predicted control target output by the predictor as needed. .
As described above, according to the present invention, even when the predictor outputs the predicted state at a time that is not the predicted time, the delay unit can adjust and appropriately specify the optimum combined unit.

（５）本発明に係る装置は必要に応じて、最適合ユニットとして特定された対象のユニットが以前最適合ユニットとして特定されたとき、又は、以前最適合ユニットと特定されたユニットと自己組織化マップ上で近くにあったとき、そのときの制御対象の条件に基づき対象のユニットが対象としている制御対象の条件を推定するものである。
このように本発明によれば、同一制御対象の条件は自己組織化マップの同位置若しくは周辺位置に配置されるため、ユニットが最適合ユニットとなったときに以前最適合ユニットとなったときの制御対象の条件からおおよその対象の制御対象の条件を把握することができる。 (5) The apparatus according to the present invention is self-organized when necessary, when the target unit identified as the optimal combination unit is identified as the previous optimal combination unit, or with the unit previously identified as the optimal combination unit. When the object is close on the map, the condition of the control target targeted by the target unit is estimated based on the condition of the control target at that time.
As described above, according to the present invention, since the conditions of the same control target are arranged at the same position or the peripheral position of the self-organizing map, when the unit becomes the optimal combination unit, It is possible to grasp the approximate control target condition from the control target condition.

（６）本発明に係る装置は必要に応じて、現在の制御対象の条件が装置に入力された場合に、当該現在の制御対象の条件に対応する自己組織化マップ上の位置を特定する手段を新たに含み、自己組織化マップ上の位置に対応するユニットを用いて現在の制御対象を制御するものである。 (6) The apparatus according to the present invention specifies, when necessary, a position on the self-organizing map corresponding to the current control target condition when the current control target condition is input to the apparatus. And a current control target is controlled using a unit corresponding to the position on the self-organizing map.

このように本発明によれば、入力された制御対象の条件が対応する自己組織化マップ上の位置を特定し、この特定した位置に対応するユニットに係る制御器を用いて制御するので、入力された制御対象の条件に関して直接的に学習がなされていない場合であっても、比較的安定して制御することができる。例えば、制御対象の条件Ａと制御対象の条件Ｂがあり、これらの条件に関しては学習がなされている場合に、制御対象の条件Ａと制御対象の条件Ｂの中間の制御対象の条件Ｃが入力されたとき、自己組織化マップ上で制御対象の条件Ａと制御対象の条件Ｂとの間の位置に対応するユニットを用いて制御対象を制御することでかかる条件に関して学習がない場合であっても当初からおおよそ適切に制御を行うことができる。 As described above, according to the present invention, the position on the self-organizing map corresponding to the input control target condition is specified, and control is performed using the controller according to the unit corresponding to the specified position. Even when the learning is not performed directly on the controlled condition, the control can be performed relatively stably. For example, when there is a control target condition A and a control target condition B, and learning is performed regarding these conditions, a control target condition C intermediate between the control target condition A and the control target condition B is input. When the control object is controlled by using the unit corresponding to the position between the control object condition A and the control object condition B on the self-organizing map, there is no learning about the condition. Can be controlled appropriately from the beginning.

（７）本発明に係る方法は、ニューラルネットワークのモジュールからなるユニット間の競合的学習から実現される自己組織化マップを構築する装置を用いる方法であって、ニューラルネットワークのモジュールに含まれる制御対象を制御する制御器が制御対象の理想状態と制御対象の現在状態を入力されることで制御信号を出力するステップと、ニューラルネットワークのモジュールに含まれる制御対象の次時刻状態を予測する予測器が制御対象の現時刻の制御信号と制御対象の現在状態を入力されることで次時刻の制御対象の予測状態を出力するステップと、制御対象の現在状態の最も近い予測を行った予測器を具備するユニットを最適合ユニットとして特定するステップと、最適合ユニットに係る制御器から出力された制御信号で制御対象を実際に制御するステップと、制御信号を採用されたユニットを最適合ユニットとして自己組織化マップを更新するステップを含むものである。前記装置は方法としても把握することができる。 (7) A method according to the present invention is a method using an apparatus for constructing a self-organizing map realized by competitive learning between units composed of modules of a neural network, and is a control target included in the module of a neural network. A controller for controlling the output of the control signal by inputting the ideal state of the control target and the current state of the control target, and a predictor for predicting the next time state of the control target included in the module of the neural network A step of outputting a control signal at the current time of the controlled object and a current state of the controlled object to output a predicted state of the controlled object at the next time; and a predictor for performing a prediction that is closest to the current state of the controlled object The unit to be identified as the optimal unit and the control signal output from the controller associated with the optimal unit. A step that actually controls an object, is intended to include the step of updating the self-organizing map adopted unit a control signal as a best fit unit. The apparatus can also be grasped as a method.

（８）本発明に係るプログラムは、ニューラルネットワークのモジュールからなるユニット間の競合的学習から実現される自己組織化マップを構築するようにコンピュータを機能させるためのプログラムであって、制御対象の理想状態と制御対象の現在状態を入力されることで制御信号を出力するニューラルネットワークのモジュールに含まれる制御対象を制御する制御器と、制御対象の現時刻の制御信号と制御対象の現在状態を入力されることで次時刻の制御対象の予測状態を出力するニューラルネットワークのモジュールに含まれる制御対象の次時刻状態を予測する予測器と、制御対象の現在状態の最も近い予測を行った予測器を具備するユニットを最適合ユニットとして特定する手段と、最適合ユニットに係る制御器から出力された制御信号で制御対象を実際に制御する手段と、制御信号を採用されたユニットを最適合ユニットとして自己組織化マップを更新する手段としてコンピュータを機能させるためのものである。前記装置はプログラムとしても把握することができる。
これら前記の発明の概要は、本発明に必須となる特徴を列挙したものではなく、これら複数の特徴のサブコンビネーションも発明となり得る。 (8) A program according to the present invention is a program for causing a computer to function so as to construct a self-organizing map realized by competitive learning between units composed of modules of a neural network. The controller that controls the control target included in the module of the neural network that outputs the control signal by inputting the state and the current state of the control target, and the control signal at the current time of the control target and the current state of the control target are input The predictor that predicts the next time state of the controlled object included in the module of the neural network that outputs the predicted state of the controlled object of the next time and the predictor that performed the closest prediction of the current state of the controlled object Means for identifying the unit provided as the optimum combination unit, and the control output from the controller associated with the optimum combination unit. This is for causing the computer to function as means for actually controlling the control object with the control signal and means for updating the self-organizing map with the unit adopting the control signal as the optimum unit. The device can also be grasped as a program.
These outlines of the invention do not enumerate the features essential to the present invention, and a sub-combination of these features can also be an invention.

（本発明の第１の実施形態）
［１．基本構成］
ＳＯＡＣの構成を図1に示す。基本的なＳＯＡＣのアーキテクチャはｍｎＳＯＭと同じであり、ｍｎＳＯＭの機能モジュールが予測器ブロックと制御器ブロックから構成されたものである。
ｋ−ｔｈモジュールの制御器ブロックは、制御対象の現在の状態ｘ（ｔ）と目標状態？ｘ（ｔ）を入力とし、制御信号ｕ^k（ｔ）を出力とする。すなわち (First embodiment of the present invention)
[1. Basic configuration]
The configuration of the SOAC is shown in FIG. The basic SOAC architecture is the same as that of mnSOM, and the functional module of mnSOM is composed of a predictor block and a controller block.
Is the controller block of the k-th module the current state x (t) to be controlled and the target state? x (t) is an input and a control signal u ^k (t) is an output. Ie

と表されるとする。一方、ｋ−ｔｈモジュールの予測器ブロックは、制御対象の現在の状態ｘ（ｔ）と制御信号ｕ（ｔ）を入力とし、Δｔ秒後の制御対象の状態の予測値〜ｘ^k（ｔ＋Δｔ）を出力する。すなわち It is assumed that On the other hand, the predictor block of the k-th module receives the current state x (t) to be controlled and the control signal u (t) as inputs, and predicts the state to be controlled after Δt seconds to x ^k (t + Δt). Is output. Ie

と表されるとする。
ＳＯＡＣは、学習モードと実行モードの２つのモードを持つ。学習モードでは、全モジュールの予測器と制御器をｍｎＳＯＭのアルゴリズムに従って学習する。実行モードでは学習の完了したモジュールを用いて実際に制御対象を制御する。 It is assumed that
The SOAC has two modes, a learning mode and an execution mode. In the learning mode, the predictors and controllers of all modules are learned according to the mnSOM algorithm. In the execution mode, the control target is actually controlled using the module that has been learned.

［２．実行モード］
学習モードについて説明する前に、まずＳＯＡＣを実際に運用する実行モードについて説明する。制御対象の挙動と予測器が予想した挙動との誤差を次式で定義する。 [2. Execution mode]
Before describing the learning mode, first, an execution mode for actually operating the SOAC will be described. The error between the behavior of the controlled object and the behavior predicted by the predictor is defined by the following equation.

^pｅ^k(ｔ)は予測誤差の指数減衰平均である。すなわち実行モードにおいては、ごく近い過去から現在までの予測誤差の時間平均を取る。時間平均を取る区間の長さはεで決まり、εが小さいほど時間平均の区間は長くなり、逆にε=1のときはその瞬間の予測誤差のみで^pｅ^k(ｔ)が決まる。そして^pｅ^k(ｔ)をもっとも小さくするモジュールが時刻tにおけるＢＭＣとなる。εの値は制御対象に加わる外乱やノイズの大きさによって決まり、一般に外乱やノイズが大きいほどεは小さくとる方がよい。ＢＭＣの添字を＊とすれば、 ^p e ^k (t) is the exponential decay average of the prediction error. That is, in the execution mode, a time average of prediction errors from the very near past to the present is taken. The length of the time average interval is determined by ε. The smaller the ε, the longer the time average interval. Conversely, when ε = 1, ^p e ^k (t) is determined only by the prediction error at that moment. The module that minimizes ^p e ^k (t) is the BMC at time t. The value of ε is determined by the level of disturbance and noise applied to the controlled object, and it is generally better to set ε smaller as the disturbance and noise increase. If the BMC index is *,

となり、ＢＭＣの出力が実際の制御信号となり制御対象へ入力される。

Thus, the output of the BMC becomes an actual control signal and is input to the controlled object.

［３．学習モード（予測器ブロック）］
本節では、予測器ブロックの学習について説明し、制御器ブロックの学習は次節で説明する。今、事前にＩ個の既知な制御対象があり、これらを学習に使用するものとする。従って、これらを制御する制御器もＩ個用意する。よってＩ個の時系列データ[（ｘ_i(ｔ)，ｕ_i(ｔ)）]（ｉ=１,...,Ｉ）が得られる。 [3. Learning mode (predictor block)]
In this section, learning of the predictor block will be described, and learning of the controller block will be described in the next section. Now, there are I known control objects in advance, and these are used for learning. Therefore, I controllers for controlling these are also prepared. Therefore, I time-series data [(x _i (t), u _i (t))] (i = 1,..., I) are obtained.

予測器ブロックの学習アルゴリズムはｍｎＳＯＭのアルゴリズムと等しい。したがって、予測器ブロックのアルゴリズムはｍｎＳＯＭと同様に(1)評価過程(2)競合過程(3)協調過程(4)適応過程の４過程から成る。ここで、予測器はＭＬＰモジュールであると仮定し、重みベクトルを^pｗ^kとする。 The learning algorithm for the predictor block is equivalent to the mnSOM algorithm. Therefore, the algorithm of the predictor block is composed of four processes: (1) an evaluation process, (2) a competition process, (3) a coordination process, and (4) an adaptation process, similar to mnSOM. Here, it is assumed that the predictor is an MLP module, and the weight vector is ^p w ^k .

［３．１評価過程］
まず、教師パターンとの予測誤差をＩ個全てに対して求める。 [3.1 Evaluation process]
First, a prediction error from the teacher pattern is obtained for all I pieces.

ここで、？ｘ^k _i(t)と^pＥ^k _iはそれぞれｉ番目の教師に対するｋ−ｔｈ予測器の出力と平均予測誤差である。また、Ｔは時系列の長さを表す。 here,? x ^k _i (t) and ^p E ^k _i are respectively the output of the k-th predictor and the average prediction error for the i-th teacher. T represents the length of the time series.

［３．２競合過程］
予測誤差を求めた後、全ての教師パターンについてＢＭＣを決める。ＢＭＣは次式に示すように平均予測誤差を最小としたモジュールにより決定される。 [3.2 Competition process]
After obtaining the prediction error, BMC is determined for all the teacher patterns. The BMC is determined by the module that minimizes the average prediction error as shown in the following equation.

［３．３協調過程］
近傍関数を用いて学習分配率を決定する。 [3.3 Cooperation process]
The learning distribution rate is determined using a neighborhood function.

ここで、ξ^k、ξ^* _iはｋ−ｔｈモジュールとＢＭＣのマップ空間における座標を表す。

Here, ξ ^k and ξ ^* _i represent coordinates in the map space of the k-th module and BMC.

［３．４適応過程］
予測器の重みベクトルは学習分配率｛ψ^k _i｝を用いて次式のように表される。 [3.4 Adaptation process]
The weight vector of the predictor is expressed by the following equation using the learning distribution rate {ψ ^k _i }.

これら４過程をネットワークが定常状態になるまで繰り返す。その結果、近い性質を持つモジュールはマップ空間上の近い位置に配置される。
なお、この学習分配率に応じてマップが更新される。 These four processes are repeated until the network reaches a steady state. As a result, modules having close properties are arranged at close positions in the map space.
The map is updated according to the learning distribution rate.

［３．５学習モード（制御器ブロック）］
ＳＯＡＣの制御器としてフィードバック誤差学習を用いる。フィードバック誤差学習を用いることの利点は、制御器として従来型の線形フィードバック制御器を用いて訓練することができ、事前に最適な制御器を決定する必要がないこと、従って追加学習も可能になることである。 [3.5 Learning mode (controller block)]
Feedback error learning is used as a controller for the SOAC. The advantage of using feedback error learning is that it can be trained using a conventional linear feedback controller as a controller, eliminating the need to determine the optimal controller in advance, thus allowing additional learning That is.

ＳＯＡＣの１モジュールのブロック線図を図３に示す。閉ループ適応制御系にフィードバック誤差学習を導入したモデルである。このモデルは、制御器が従来型の線形フィードバック制御器（Conventional FeedbackController : ＣＦＣ）とニューラルネット制御器（NeuralNetwork Controller : ＮＮＣ）から構成される。ＣＦＣとＮＮＣを並列にすることで、単にＮＮＣをＣＦＣで学習できることのみならず、ＣＦＣによって制御系を安定させられること、ＮＮＣが非線形な補償を実現することができることなどの長所が生じる。今、多入出力系のフィードバック係数行列を^cfcＷとし、ｋ−ｔｈモジュールのフィードバック係数行列を添字をつけて^cfcＷ^kとする。このとき、ＳＯＡＣの制御則は以下のように表される。 A block diagram of one SOAC module is shown in FIG. This model introduces feedback error learning in a closed-loop adaptive control system. In this model, the controller is composed of a conventional linear feedback controller (Conventional Feedback Controller: CFC) and a neural network controller (Neural Network Controller: NNC). By making CFC and NNC parallel, not only can NNC be learned by CFC, but also the control system can be stabilized by CFC, and NNC can realize nonlinear compensation. Now, ^{let cfc} W be the feedback coefficient matrix of the multi-input / output system, and ^cfc W ^k be subscripted with the feedback coefficient matrix of the k-th module. At this time, the control law of SOAC is expressed as follows.

また、ＮＮＣの誤差信号^nncＥはＣＦＣの出力を用いて次式で定義する。

The NNC error signal ^nnc E is defined by the following equation using the output of the CFC.

最後に、フィードバック^cfcＷ^kと重みベクトル^nncＷ^kは以下の式に従って更新される。ここで、学習分配率ψ^k _iは予測器の学習で得られた値を用いる。すなわち制御器はペアになっている予測器が同定したシステムを正しく制御するように学習する。 Finally, the feedback ^cfc W ^k and the weight vector ^nnc W ^k are updated according to the following equations. Here, the learning distribution rate ψ ^k _i uses a value obtained by learning of the predictor. That is, the controller learns to correctly control the system identified by the pair of predictors.

以上がSOACのアーキテクチャと学習アルゴリズムである。

This is the SOAC architecture and learning algorithm.

［４．ハードウェア構成図］
図４は本実施形態に係る装置の構成要素のハードウェア構成図である。本装置は汎用的なコンピュータを用いることができる。ハードウェアの構成としてはＣＰＵ(Central Processing Unit)１１、ＤＲＡＭ(Dynamic Random Access Memory)１２等のメインメモリ、外部記憶装置であるＨＤ(hard disk)１３、表示装置であるディスプレイ１４、入力装置であるキーボード１５及びマウス１６、ネットワークに接続するための拡張カードであるＬＡＮカード１７、ＣＤ−ＲＯＭドライブ１８等からなる。
例えば、ＣＤ−ＲＯＭに格納されているプログラムがＨＤ１３上に複製（インストール）され、必要に応じてプログラムがメインメモリ１２に読み出され、ＣＰＵ１１がかかるプログラムを実行することで装置を構成する。 [4. Hardware configuration diagram]
FIG. 4 is a hardware configuration diagram of components of the apparatus according to the present embodiment. This apparatus can use a general-purpose computer. The hardware configuration includes a CPU (Central Processing Unit) 11, a main memory such as a DRAM (Dynamic Random Access Memory) 12, an HD (hard disk) 13 as an external storage device, a display 14 as a display device, and an input device. It includes a keyboard 15 and a mouse 16, a LAN card 17, which is an expansion card for connecting to a network, a CD-ROM drive 18, and the like.
For example, a program stored in a CD-ROM is duplicated (installed) on the HD 13, the program is read into the main memory 12 as necessary, and the CPU 11 executes the program to configure the apparatus.

［５．動作］
図５は本実施形態に係る装置の実行モードの動作のフローチャートの一例である。
ＣＰＵ１１（予測器２）は制御対象の現時刻の制御信号と制御対象の現在状態を用いて次時刻の制御対象の状態を予測する（ステップ１００）。ここでの予測は全ユニットの予測器２が実施する。
ＣＰＵ１１（遅延器４）は予測した次時刻の制御対象の実際の状態が入力されるまで待機させる（ステップ２００）。ここでの待機は全ユニットの遅延器４が実施する。
ＣＰＵ１１は予測した次時刻の制御対象の実際の状態と予測された状態とを比較し、最も近い予測を行った予測器２を特定する（ステップ３００）。ここで特定された予測器２を含むユニットが最適合ユニットとなる。 [5. Operation]
FIG. 5 is an example of a flowchart of the operation in the execution mode of the apparatus according to the present embodiment.
The CPU 11 (predictor 2) predicts the state of the control target at the next time using the control signal at the current time of the control target and the current state of the control target (step 100). The prediction here is performed by the predictors 2 of all units.
The CPU 11 (delay unit 4) waits until the actual state of the control target at the predicted next time is input (step 200). The waiting here is performed by the delay units 4 of all units.
The CPU 11 compares the predicted actual state of the control target at the next time with the predicted state, and identifies the predictor 2 that has performed the closest prediction (step 300). The unit including the predictor 2 specified here is the optimum combined unit.

ＣＰＵ１１（制御器１）は制御対象の理想状態と制御対象の現在状態を用いて制御信号を求め、モータ等の駆動源に制御信号を出力する（ステップ４００）。
ＣＰＵ１１は最適合ユニットに基づき自己組織化マップを更新する（ステップ５００）。
ここで、最適合ユニットと判明した後に最適合ユニットに係る制御器１が制御信号を求めているが、最適合ユニットと判明する前に全ユニットの制御器１が制御信号を求める構成にすることもできる。 The CPU 11 (controller 1) obtains a control signal using the ideal state of the controlled object and the current state of the controlled object, and outputs the control signal to a drive source such as a motor (step 400).
The CPU 11 updates the self-organizing map based on the optimal unit (step 500).
Here, the controller 1 related to the optimum combination unit obtains the control signal after the optimum combination unit is determined, but the controller 1 of all the units obtains the control signal before it is determined to be the optimum combination unit. You can also.

（その他の実施形態）
［自己組織化マップを用いた制御対象の条件推定］
例えば、ある制御対象の条件を制御した場合に最適合ユニットが図６（ｂ）に示すマップ上で真中に位置する場合には、制御対象の条件は振り子の重心までの距離が「Ｌｏｎｇ」で振り子の質量が「Ｈｅａｖｙ」であることが推測される。
すなわち、制御対象を制御することで自己組織化マップ上での位置を特定し、かかる位置に対応付いている従前の制御対象の条件から現制御対象の条件を推定することができる。 (Other embodiments)
[Control condition estimation using self-organizing map]
For example, when the optimal unit is located in the middle on the map shown in FIG. 6B when a certain control target condition is controlled, the distance to the center of gravity of the pendulum is “Long”. It is presumed that the mass of the pendulum is “Heavy”.
That is, by controlling the control object, the position on the self-organizing map can be specified, and the condition of the current control object can be estimated from the condition of the previous control object associated with the position.

［自己組織化マップを用いたユニットの特定］
ある制御対象の条件が入力された場合に、かかる制御対象の条件と同一条件の学習がなされていなかった場合であっても、類似する制御対象の条件が対応付いている自己組織化マップ上で近い位置と対応付いているユニットを用いることで初動時の制御の乱れを抑制することができる。 [Identification of units using self-organizing maps]
When a condition for a certain control target is input, even if the same condition as the condition for the control target is not learned, the self-organization map with which the similar control target condition is associated is used. By using a unit that is associated with a close position, it is possible to suppress control disturbance during the initial movement.

また、制御対象の条件により自己組織化マップ上で画定することも可能であり、各画定領域の中で新たに入力された制御対象の条件と最も合致する画定領域内のユニットを用いることで同様に初動時の制御の乱れを抑制することができる。また、選択可能なユニットを制限することで、制御に用いるユニットの選択にかかる労力を低減することができる。 It is also possible to demarcate on the self-organizing map according to the condition of the controlled object, and the same is achieved by using the unit in the demarcated area that most closely matches the newly input condition of the controlled object in each demarcated area. In addition, it is possible to suppress control disturbance at the time of initial movement. Further, by limiting the units that can be selected, it is possible to reduce the labor required for selecting the units used for control.

［その他］
もし制御対象のダイナミクスが隠れパラメータによって連続的に変化するような場合、それに対応したマップができると考えられ、かように生成された特徴マップを有効に活用することができる。
また、実行モードにおいても未学習の対象を制御しながら各モジュールの特性を修正する、追加学習の機能を加えることもできる。
以上の前記各実施形態により本発明を説明したが、本発明の技術的範囲は実施形態に記載の範囲には限定されず、これら各実施形態に多様な変更又は改良を加えることが可能である。そして、かような変更又は改良を加えた実施の形態も本発明の技術的範囲に含まれる。このことは、特許請求の範囲及び課題を解決する手段からも明らかなことである。 [Others]
If the dynamics to be controlled change continuously according to the hidden parameters, it is considered that a map corresponding to the dynamics can be formed, and the feature map thus generated can be used effectively.
Further, in the execution mode, an additional learning function for correcting the characteristics of each module while controlling an unlearned target can be added.
Although the present invention has been described with the above embodiments, the technical scope of the present invention is not limited to the scope described in the embodiments, and various modifications or improvements can be added to these embodiments. . And embodiment which added such a change or improvement is also contained in the technical scope of the present invention. This is apparent from the claims and the means for solving the problems.

［ＳＯＡＣを倒立振子の制御に適用した例］
ＳＯＡＣの性能を調べるために倒立振子系を用いてシミュレーション実験を行った。実験で使用した振子のパラメータを表１に示す（図６（ａ）参照）。 [Example of applying SOAC to control of inverted pendulum]
In order to investigate the performance of SOAC, a simulation experiment was performed using an inverted pendulum system. The parameters of the pendulum used in the experiment are shown in Table 1 (see FIG. 6A).

ＳＯＡＣのモジュールとしては図３と同様のものを用いた。シミュレーションでは振子の長さおよび重さは可変であるものとし、パラメータの異なる9組のパラメータセットを学習用パターンとして用意した。学習用パターンは表1に示すように振子の長さが“Ｌｏｎｇ”、“Ｈａｌｆ”、“Ｓｈｏｒｔ”の３種類と、振子の重さが“Ｈｅａｖｙ”、“Ｍｉｄｄｌｅ”、“Ｌｉｇｈｔ”の３種類の組み合わせで生成した。ＣＦＣのフィードバック係数行列（ベクトル）は状態フィードバック制御法により求めた。ＮＮＣは３層ＭＬＰを用い、予測器は線形ニューラルネットを用いた。また、台車には外乱としてガウス白色ノイズを与えた。 A SOAC module similar to that shown in FIG. 3 was used. In the simulation, the length and weight of the pendulum are variable, and nine parameter sets with different parameters are prepared as learning patterns. As shown in Table 1, there are three types of learning patterns with pendulum lengths of “Long”, “Half”, and “Short” and three types of pendulum weights of “Heavy”, “Middle”, and “Light”. Generated by a combination of The CFC feedback coefficient matrix (vector) was obtained by the state feedback control method. NNC used three-layer MLP, and the predictor used a linear neural network. In addition, Gaussian white noise was given to the carriage as a disturbance.

学習終了後、ネットワークを固定して制御実験を行った。図６（ｂ）にＳＯＡＣの学習により得られたマップを示す。マップは“Ｓｈｏｒｔ”、“Ｍｉｄｄｌｅ”、“Ｌｏｎｇ”の３つのクラスタを形成した。図６（ｂ）中は倒立振子制御モジュールのマップを示し、マップ中のグレースケールは予測器とその近傍の距離を示す。 After learning, the network was fixed and a control experiment was conducted. FIG. 6B shows a map obtained by SOAC learning. The map formed three clusters of “Short”, “Middle”, and “Long”. FIG. 6B shows a map of the inverted pendulum control module, and the gray scale in the map shows the distance between the predictor and the vicinity thereof.

ＳＯＡＣの適応能力を調べるために、３０秒間隔で振子の長さと重さを変化させた。ただし最初の３０秒は学習のときに与えたパラメータ（Ｈａｌｆ−Ｍｉｄｄｌｅ）を用い、その後は未学習のパラメータとした。また制御器の切り替えを行う場合と行わない場合、使用する制御器としてＣＦＣのみを用いた場合、ＮＮＣのみを用いた場合の４通りについてそれぞれ実験を行った。実験結果を図７に示す。図をみてわかるように、学習したパラメータについては、すべてのケースにおいて振子を倒すことなく制御できた。しかし、未学習のパラメータを与えた場合で、制御器の切り替えをおこなわなかった場合（non-adaptive）に予告なく変えると、ＣＦＣ、ＮＮＣどちらの場合も開始から75秒付近で振子は倒れてしまった。一方、制御器の切り替えをおこなった場合（adaptive）、ＣＦＣ、ＮＮＣどちらの場合も振子を倒すことなく制御可能であった。特にＮＮＣを用いた場合はＣＦＣの場合よりも振動の少ない制御が可能であった。 In order to examine the adaptability of the SOAC, the length and weight of the pendulum were changed at 30 second intervals. However, for the first 30 seconds, a parameter (Half-Middle) given at the time of learning was used, and thereafter, an unlearned parameter was used. In addition, the experiment was performed for each of the four cases of the case where the controller is switched and the case where the controller is not switched, the case where only CFC is used as the controller to be used, and the case where only NNC is used. The experimental results are shown in FIG. As can be seen from the figure, the learned parameters could be controlled without tilting the pendulum in all cases. However, if an unlearned parameter is given and the controller is not switched (non-adaptive) without notice, the pendulum collapses around 75 seconds from the start in both CFC and NNC cases. It was. On the other hand, when the controller was switched (adaptive), control was possible without tilting the pendulum in both cases of CFC and NNC. In particular, when NNC was used, control with less vibration was possible than with CFC.

シミュレーションの結果、ＳＯＡＣは倒立振子のパラメータ変化に対し適応的にＢＭＣを切り替え、安定した制御が可能であった。このことからＳＯＡＣは高い適応能力を持つことが分かった。さらに、ＳＯＡＣは適応制御器としての機能のみならず、制御対象の特徴マップを自己組織的に獲得することもできた。 As a result of the simulation, the SOAC adaptively switches the BMC in response to the parameter change of the inverted pendulum, and can perform stable control. This shows that SOAC has a high adaptability. Furthermore, the SOAC can acquire not only the function as an adaptive controller but also the feature map of the controlled object in a self-organizing manner.

本発明のＳＯＡＣの基本構成を示す図である。It is a figure which shows the basic composition of SOAC of this invention. 本発明のモジュール切り替えの説明図である。It is explanatory drawing of module switching of this invention. 本発明の実施形態に係るＳＯＡＣの１モジュールのブロック線図である。It is a block diagram of one module of SOAC concerning the embodiment of the present invention. 本発明の実施形態に係る装置で用いるハードウェア構成図である。It is a hardware block diagram used with the apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る装置の実行モードの動作のフローチャートの一例である。It is an example of the flowchart of the operation | movement of the execution mode of the apparatus which concerns on embodiment of this invention. 実施例に係るＳＯＡＣの学習により得られたマップの例である。It is an example of the map obtained by learning of SOAC which concerns on an Example. 実施例に係る実験結果である。It is an experimental result which concerns on an Example.

Explanation of symbols

１制御器
２予測器
３ＣＦＣ
４遅延器
１０コンピュータ
１１ＣＰＵ
１２ＤＲＡＭ
１３ＨＤ
１４ディスプレイ
１５キーボード
１６マウス
１７ＬＡＮカード
１８ＣＤ−ＲＯＭドライブ

1 Controller 2 Predictor 3 CFC
4 Delay device 10 Computer 11 CPU
12 DRAM
13 HD
14 Display 15 Keyboard 16 Mouse 17 LAN card 18 CD-ROM drive

Claims

An apparatus for constructing a self-organizing map realized by competitive learning between units composed of modules of a neural network,
The neural network module includes a controller that controls the control object and a predictor that predicts the next time state of the control object,
The controller outputs a control signal when the ideal state of the controlled object and the current state of the controlled object are input,
The predictor outputs the predicted state of the controlled object at the next time by inputting the control signal at the current time of the controlled object and the current state of the controlled object,
Specify the unit that has the predictor that made the closest prediction of the current state of the control target as the optimal unit,
The control object is actually controlled by the control signal output from the controller related to the optimal unit,
A device that updates a self-organizing map with a unit that adopts a control signal as an optimal unit.

If the candidate unit that is the unit that has the predictor that made the closest prediction in the current state is different from the unit that was the previous optimal combined unit,
If the current state of the candidate unit is not closer to the current state of the control target than the prediction of the current state of the unit that was the previous optimal combined unit, the unit that was the previous optimal combined unit is determined as the optimal combined unit The apparatus of claim 1.

The apparatus according to claim 1, further comprising a linear feedback controller prepared for each controller that outputs a control signal and also outputs a control signal when a current state of a control target is input.

The apparatus according to claim 1, further comprising a delay unit newly prepared for each prediction that holds at least the predicted state of the predicted control target output by the predictor until the predicted time arrives.

When the target unit identified as the optimal combination unit was previously identified as the optimal combination unit, or when it was close to the unit previously identified as the optimal combination unit on the self-organizing map, the control target at that time The apparatus according to claim 1, wherein a condition of a control target that is a target of the target unit is estimated based on the condition.

When a current control target condition is input to the apparatus, a means for specifying a position on the self-organizing map corresponding to the current control target condition is newly included.
The apparatus according to claim 1, wherein a current control target is controlled using a unit corresponding to a position on a self-organizing map.

A method of using a device for constructing a self-organizing map realized by competitive learning between units consisting of modules of a neural network,
A controller that controls a control target included in the module of the neural network outputs a control signal by inputting an ideal state of the control target and a current state of the control target; and
The predictor for predicting the next time state of the controlled object included in the module of the neural network outputs the predicted state of the controlled object at the next time by inputting the control signal of the current time of the controlled object and the current state of the controlled object. Steps,
Identifying a unit comprising a predictor that has performed the closest prediction of the current state of the controlled object as an optimal combined unit;
A step of actually controlling a control target with a control signal output from a controller related to the optimal combination unit;
Updating the self-organizing map with the unit adopting the control signal as the optimal unit.

A program for causing a computer to function to build a self-organizing map realized by competitive learning between units consisting of modules of a neural network,
A controller that controls the control object included in the module of the neural network that outputs a control signal by inputting the ideal state of the control object and the current state of the control object;
A predictor for predicting a next time state of a control object included in a module of a neural network that outputs a predicted state of a control object of a next time by inputting a control signal of the current time of the control object and a current state of the control object; ,
Means for identifying a unit having a predictor that has performed the closest prediction of the current state of the control target as an optimal combined unit;
Means for actually controlling the controlled object with a control signal output from the controller associated with the optimal unit;
A program for causing a computer to function as a means for updating a self-organizing map with a unit employing a control signal as an optimal unit.