JPH0765168A

JPH0765168A - Device and method for function approximation

Info

Publication number: JPH0765168A
Application number: JP5215492A
Authority: JP
Inventors: Hiroshi Shinjo; 広新庄; Toshihiko Nakano; 利彦中野; Yoshikane Sugimura; 好謙杉村; Hisao Ogata; 日佐男緒方
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1993-08-31
Filing date: 1993-08-31
Publication date: 1995-03-10

Abstract

PURPOSE:To provide the function approximation device and method which can find a precise function approximation expression from data for learning and further derive reliability to an obtained output. CONSTITUTION:The function approximation device has a learning part 100, a recording part 110, and an execution part 120. The learning part 100 has a cluster generation part 102 for classifying input data, an approximation expression generation part 104 for deriving an approximation expression for calculating outputs from inputs, cluster by cluster, and an approximation expression evaluation part 106 which evaluates the precision of the approximation expression so that the different approximation expression can be applied to input data having different features. The execution part 120 has an affiliation cluster decision part 122 for selecting which cluster the input data belong to and an approximation value calculation part 124 which derives the output by using the approximation expression of the selected cluster.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、制御問題や予測問題に
適用される、学習機能を持つ関数近似装置及び方法に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a function approximating apparatus and method having a learning function applied to control problems and prediction problems.

【０００２】[0002]

【従来の技術】従来の代表的な関数近似方法の例として
は、重回帰分析（参考：多変量解析法、ｐｐ．２５−１
５８、奥野忠一他、日科技連）と階層型ニューラルネ
ットワーク（参考：ニューロコンピュータ工学、久間和
生他、工業調査会）によるものがある。重回帰分析は
入力変数から出力変数を最もよく近似する多項式を求め
るものである。この式には一次（線形）のものと多次
（非線形）のものがある。また、階層型ニューラルネッ
トワークによる近似は、素子間の結合重みを調整するこ
とにより、入出力変数間の関係を学習するものである。2. Description of the Related Art As an example of a conventional representative function approximation method, multiple regression analysis (reference: multivariate analysis method, pp. 25-1
58, Tadashi Okuno et al., Nikkan Giren, and hierarchical neural networks (reference: Neurocomputer Engineering, Kazuo Kuma et al., Industrial Research Committee). Multiple regression analysis finds the polynomial that best approximates the output variable from the input variables. There are first-order (linear) and multi-order (non-linear) equations. Further, the approximation by the hierarchical neural network is to learn the relation between the input and output variables by adjusting the connection weight between the elements.

【０００３】[0003]

【発明が解決しようとする課題】線形の重回帰分析は、
非線形の対象については近似誤差が大きいという問題が
あった。また、非線形の重回帰分析は、全ての範囲を精
度良く近似するための次数の選択や近似式の選択が難し
いという問題があった。The linear multiple regression analysis is
There is a problem that the approximation error is large for non-linear objects. Further, the nonlinear multiple regression analysis has a problem that it is difficult to select an order or an approximate expression for accurately approximating the entire range.

【０００４】また、階層型ニューラルネットワークによ
る近似では、全く学習されていないパターンの入力に対
しては出力値を信用することができなかったため、制御
などの問題には利用しにくかった。Further, in the approximation by the hierarchical neural network, the output value cannot be trusted for the input of the pattern that has not been learned at all, so that it is difficult to use it for problems such as control.

【０００５】また、学習用データ群を追加する場合に
は、全ての学習用データ群と追加学習用データ群を用い
てはじめから学習を行わなければならず、学習に時間が
かかるという問題があった。In addition, when the learning data group is added, learning must be performed from the beginning using all the learning data groups and the additional learning data group, and there is a problem that the learning takes time. It was

【０００６】[0006]

【課題を解決するための手段】上記課題を解決するた
め、本発明では、関数近似式を作成する際に、（１）関
数近似を行なう前に、学習用データ群を特徴の似たデー
タ群ごとにクラスタリングする第１の手段と、（２）作
成された各クラスタごとに近似式を求める第２の手段
を、有している。In order to solve the above problems, according to the present invention, when a function approximation formula is created, (1) a learning data group is similar to a data group having similar characteristics before performing the function approximation. It has a first means for clustering for each cluster, and (2) a second means for obtaining an approximate expression for each created cluster.

【０００７】さらに、近似精度を高くするために、
（３）作成した近似式を評価する第３の手段と、（４）
全てのクラスタについて評価が良い場合には学習を終了
し、評価が悪いクラスタがある場合にはクラスタリング
を変更する第４の手段と、（５）近似式を変更する第５
の手段を、有している。Further, in order to improve the approximation accuracy,
(3) Third means for evaluating the created approximate expression, and (4)
If the evaluation is good for all the clusters, the learning is terminated, and if there is a bad evaluation, the fourth means for changing the clustering, and (5) the fifth means for changing the approximate expression
It has the means of.

【０００８】さらに、関数近似を行なう際には、（６）
入力データ群が属するクラスタを判定する第６の手段
と、（７）判定されたクラスタの近似式を用いて、入力
データ群から出力データもしくは出力データ群を求める
第７の手段を、有している。Further, when performing the function approximation, (6)
A sixth means for determining the cluster to which the input data group belongs, and (7) seventh means for obtaining output data or an output data group from the input data group using the approximate expression of the determined cluster. There is.

【０００９】さらに、入力データ群の所属するクラスタ
の判定を高速に行なうために、（８）クラスタ内のデー
タ群の特徴を代表するデータ群を設け、このデータ群と
入力データ群とを比較し、最も近いものを所属クラスタ
であると判定する第８の手段を、有している。Further, in order to quickly determine the cluster to which the input data group belongs, (8) a data group representative of the characteristics of the data group in the cluster is provided, and this data group is compared with the input data group. , And the eighth means for determining the closest one as the belonging cluster.

【００１０】さらに、出力されたデータ群に対する信頼
度を得るために、（９）入力データ群がそのクラスタに
属していることに対する信頼度を判定する評価量を計算
することにより、得られた出力データもしくは出力デー
タ群の信頼度を導出する第９の手段を、有している。Further, in order to obtain the reliability for the output data group, (9) the output obtained by calculating the evaluation amount for judging the reliability for the input data group belonging to the cluster It has a ninth means for deriving the reliability of the data or the output data group.

【００１１】さらに、学習用データ群を追加する際に、
高速な学習ができるために、（１０）追加学習用データ
群の追加されたクラスタやその周辺のクラスタのみを再
学習することにより、高速に学習することができる第１
０の手段を、有している。Further, when adding the learning data group,
Since high-speed learning can be performed, (10) high-speed learning is possible by re-learning only the cluster to which the additional learning data group has been added and the clusters around it.
It has 0 means.

【００１２】[0012]

【作用】上記（１）、（２）、（６）、（７）により、
傾向の似た入力データ群から導出した関数近似式を用い
て近似を行なうため、精度のよい近似ができるという作
用がある。By the above (1), (2), (6) and (7),
Since the approximation is performed using the function approximation formula derived from the input data groups having similar tendencies, there is an effect that the approximation can be performed with high accuracy.

【００１３】さらに、（３）、（４）、（５）により、
（１）による分類が不十分なときであっても、クラスタ
内のデータ群を再分類することにより、精度のよい近似
ができるという作用がある。Further, according to (3), (4) and (5),
Even when the classification according to (1) is insufficient, there is an effect that the data group in the cluster is reclassified so that accurate approximation can be performed.

【００１４】さらに、（８）により、利用者に出力結果
の信頼性の高さを知らせることができるという作用があ
る。Further, (8) has the effect that the user can be notified of the high reliability of the output result.

【００１５】さらに、（９）により、入力データ群の所
属するクラスタの判定を高速化するという作用がある。Further, (9) has the effect of speeding up the determination of the cluster to which the input data group belongs.

【００１６】さらに、（１０）により、学習用データ群
を追加しても高速に学習することができるという作用が
ある。Further, according to (10), the learning can be performed at high speed even if the learning data group is added.

【００１７】[0017]

【実施例】以下、本発明の実施例を説明する。図２は、
本発明の全体構成図である。装置全体の構成と動作を説
明する。装置は、制御装置（ＣＰＵ）２００とＣＲＴ２
０２、キーボード２０４、主メモリ２０６、磁気ディス
ク２０８からなる。関数近似を行なうプログラムやデー
タベースは磁気ディスク２０８に格納されており、必要
に応じて主メモリ２０６に記憶される。キーボード２０
４は、プログラムの指示のために使用する。ＣＲＴ２０
２は結果の表示などに使用する。EXAMPLES Examples of the present invention will be described below. Figure 2
It is a whole block diagram of this invention. The configuration and operation of the entire device will be described. The devices are a control device (CPU) 200 and a CRT2.
02, keyboard 204, main memory 206, and magnetic disk 208. A program and database for performing function approximation are stored in the magnetic disk 208 and are stored in the main memory 206 as needed. Keyboard 20
4 is used for the instruction of the program. CRT20
2 is used for displaying the result.

【００１８】次に、図１を用いて本関数近似装置のシス
テム構成と動作を説明する。図１は本関数近似装置のブ
ロック図である。本関数近似装置は学習部１００と記録
部１０２、実行部１０４から構成されている。Next, the system configuration and operation of this function approximating apparatus will be described with reference to FIG. FIG. 1 is a block diagram of this function approximating apparatus. The function approximating apparatus is composed of a learning unit 100, a recording unit 102, and an executing unit 104.

【００１９】本実施例では、サンプルとして与えられた
学習用データ群を特徴の似たもの同士にクラスタリング
し、各クラスタごとに近似式を求めるため、一つの近似
式で近似するより精度がよいという特徴を持つ。そし
て、精度の悪い近似式を持つクラスタをさらに小さいク
ラスタに分けるなどクラスタリング状態を変更して近似
式を変更することにより、バラつきの少ない学習用デー
タ群から近似式を求めることができるため、近似精度が
向上するところに特徴がある。In the present embodiment, the learning data group given as a sample is clustered into those having similar characteristics and an approximate expression is obtained for each cluster, so that it is more accurate than approximation by one approximate expression. With characteristics. Then, by changing the clustering state by changing the clustering state such as dividing a cluster having an approximation formula with poor accuracy into smaller clusters, the approximation formula can be obtained from the learning data group with less variation. Is characterized by improving.

【００２０】学習部１００を説明する。本学習部１００
は、サンプルとして与えられた複数の学習用データ群を
用いて、入力データ群から出力データ群を計算するため
の近似式を求める。本学習部１００では、全ての学習用
データ群に対応する近似式を一つだけ求めるのではな
く、複数の近似式を求める。実行時には、これらの式
は、入力データ群の特徴に応じて、最も適当なものが選
択される。また、一度作成された近似式の精度が低い場
合には、近似式を変更する。The learning unit 100 will be described. Main learning section 100
Is an approximate expression for calculating an output data group from an input data group using a plurality of learning data groups given as samples. The learning unit 100 does not obtain only one approximate expression corresponding to all learning data groups, but obtains a plurality of approximate expressions. At runtime, the most appropriate of these equations is selected according to the characteristics of the input data group. If the accuracy of the approximation formula created once is low, the approximation formula is changed.

【００２１】学習部１００と記録部１１０の動作を説明
する。クラスタ作成部１０２では、学習用データ群記録
部１１２に記録されている学習用データ群を、特徴の似
たデータ群ごとにまとめてクラスタリングする。作成さ
れたクラスタに関する情報は、クラスタ情報記録部１１
４に記録される。近似式作成部１０４では、１０２で作
成された各クラスタごとに、学習用データ群を用いて近
似式を求める。１０４で作成された近似式は近似式記録
部１１６に記録される。近似式評価部１０６では、近似
式作成部１０４で作成された近似式の精度を評価する。
評価結果が悪い場合には、クラスタリングの状態を変更
したり、近似式を変更するための命令を出す。The operation of the learning unit 100 and the recording unit 110 will be described. The cluster creating unit 102 collectively clusters the learning data groups recorded in the learning data group recording unit 112 into data groups having similar characteristics. Information on the created cluster is stored in the cluster information recording unit 11
Recorded in 4. The approximate expression creating unit 104 finds an approximate expression using the learning data group for each cluster created in 102. The approximate expression created in 104 is recorded in the approximate expression recording unit 116. The approximate expression evaluation unit 106 evaluates the accuracy of the approximate expression created by the approximate expression creating unit 104.
If the evaluation result is bad, a command for changing the state of clustering or changing the approximate expression is issued.

【００２２】実行部１２０を説明する。実行部１２０で
は、与えられた入力データ群に対して最も適切な近似式
を選択し、出力データ群を求める。The execution unit 120 will be described. The execution unit 120 selects the most appropriate approximate expression for the given input data group and obtains the output data group.

【００２３】実行部１２０と記録部１１０の動作を説明
する。所属クラスタ判定部１２２では、近似対象の入力
データ群に対して、クラスタ情報記録部１１４に記録さ
れているクラスタの中から、入力データ群が所属するク
ラスタを判定する。近似値計算部１２４では、入力デー
タ群が所属すると判定されたクラスタの近似式を近似式
記録部１１６から呼び出し、その式を用いて入力データ
群から出力データ群を求める。この値が近似値となる。The operations of the execution unit 120 and the recording unit 110 will be described. The belonging cluster determination unit 122 determines the cluster to which the input data group belongs from the clusters recorded in the cluster information recording unit 114 for the input data group to be approximated. The approximate value calculation unit 124 calls from the approximate expression recording unit 116 an approximate expression for a cluster determined to belong to the input data group, and uses the expression to obtain an output data group from the input data group. This value is an approximate value.

【００２４】本実施例では、クラスタ作成部１０２と、
クラスタごとの近似式を求める近似式作成部１０４と、
１０４で作成された近似式を評価する近似式評価部１０
６の３つを有する学習部１００と、学習用データ群を記
録する学習用データ群記録部１１２と、１０２で作成し
たクラスタに関する情報を記録するクラスタ情報記録部
１１４と、１０４で作成された近似式を記録する近似式
記録部１１６の３つを有する記録部１１０、及び所属ク
ラスタ判定部１２２と近似値計算部１２４の２つを有す
る実行部１２０を設けることにより、入力データ群を分
類し、その入力データ群に最も適当な近似式を用いて近
似値である出力データ群を求めることができ、さらに、
近似精度向上のために、クラスタの状態や近似式を変更
できるところに特徴がある。In this embodiment, the cluster creating unit 102,
An approximate expression creating unit 104 that obtains an approximate expression for each cluster,
Approximate expression evaluation unit 10 for evaluating the approximate expression created in 104
6, a learning data group recording unit 112 that records a learning data group, a cluster information recording unit 114 that records information about the cluster created in 102, and an approximation created in 104. An input data group is classified by providing a recording unit 110 having three approximate expression recording units 116 for recording expressions and an execution unit 120 having two belonging cluster determination units 122 and approximate value calculation units 124. An output data group that is an approximate value can be obtained using the most appropriate approximation formula for the input data group.
The feature is that the state of the cluster and the approximation formula can be changed in order to improve the approximation accuracy.

【００２５】次に、学習部１００の動作について、詳し
く説明する。クラスタ作成部１０２では、複数の学習用
データ群について類似度を計算し、類似度の高いものご
とに適当な数のクラスタにクラスタリングする。クラス
タリングを実現するには、多変量解析におけるクラスタ
解析（参考：多変量解析法、ｐｐ．３９１−４１２、奥
野忠一他、日科技連）やニューラルネットワーク（参
考：ＡＩＣ情報量基準を用いたクラスタリング型ニュー
ロの最適化、酒匂裕、１９９２年電子情報通信学会春季
大会予稿集６−３５）などをはじめとする従来手法を利
用することができる。Next, the operation of the learning section 100 will be described in detail. The cluster creation unit 102 calculates the degree of similarity for a plurality of learning data groups, and clusters into a suitable number of clusters with high degree of similarity. To realize clustering, cluster analysis in multivariate analysis (reference: multivariate analysis method, pp. 391-412, Tadashi Okuno et al., Nikka Giren) and neural network (reference: clustering type using AIC information criterion) Conventional methods such as neuro optimization, Yutaka Sako, 1992 Spring Conference of the Institute of Electronics, Information and Communication Engineers, 6-35), etc. can be used.

【００２６】本実施例では、酒匂のクラスタリング方式
を用いて説明をするので、この方式について説明する。
一般に、クラスタリングには、データを幾つのクラスタ
に分類するかという問題がある。酒匂のクラスタリング
方式は最適なクラスタ数を決めることができる手法であ
る。この手法は、ニューラルネットワークの一方式であ
るＬＶＱ（ＬｅａｒｎｉｎｇＶｅｃｔｏｒＱｕａｎ
ｔｉｚａｔｉｏｎ：学習ベクトル量子化）をクラスタリ
ングに適用したものである。ただし、通常のＬＶＱでは
最適なクラスタの数を決めることはできないため、情報
量基準ＡＩＣを用いてクラスタ数を決定している。この
ときＡＩＣはクラスタリングの程度の良さを表す指標と
して用いられている。具体的には、異なるクラスタ数で
分類した結果のそれぞれについて、ＡＩＣを計算し、Ａ
ＩＣの値が最小になるクラスタ数を最適なクラスタ数で
あるとする。一方、実行時には、新たにデータが入力さ
れると、入力されたデータが所属するクラスタを選択す
る。In the present embodiment, the sake odor clustering method is used for explanation, so this method will be explained.
In general, clustering has a problem of how many clusters the data is classified into. The sake odor clustering method is a method that can determine the optimal number of clusters. This method is an LVQ (Learning Vector Qua), which is a method of a neural network.
(learning vector quantization) is applied to clustering. However, since it is not possible to determine the optimum number of clusters with a normal LVQ, the number of clusters is determined using the information amount reference AIC. At this time, the AIC is used as an index indicating the goodness of clustering. Specifically, the AIC is calculated for each of the results classified by different cluster numbers, and AIC is calculated.
The number of clusters that minimizes the IC value is the optimum number of clusters. On the other hand, at the time of execution, when new data is input, the cluster to which the input data belongs is selected.

【００２７】近似式作成部１０４では、クラスタ作成部
１０２で作成されたクラスタごとに入力データ群から出
力データ群を求めるための近似式を作成する。ここで、
近似式を作成するには、多変量解析における重回帰分析
（参考：多変量解析法、ｐｐ．２５−１５８、奥野忠一
他、日科技連）やニューラルネットワーク（参考：ニ
ューロコンピュータ工学、久間和生他、工業調査会）
などをはじめとする従来手法を利用することができる。The approximate expression creating unit 104 creates an approximate expression for obtaining the output data group from the input data group for each cluster created by the cluster creating unit 102. here,
To create an approximate expression, multiple regression analysis in multivariate analysis (reference: multivariate analysis method, pp. 25-158, Tadashi Okuno, Nikka Giren) and neural network (reference: neurocomputer engineering, Kazuo Kuma) Others, Industrial Research Committee)
Conventional methods such as, etc. can be used.

【００２８】次に、記録部１１０の動作について、詳し
く説明する。学習用データ群記録部１１２には学習用デ
ータ群が格納されている。学習用データ群は、クラスタ
リングや近似式導出のためのサンプルデータとして利用
される。Next, the operation of the recording section 110 will be described in detail. The learning data group recording unit 112 stores a learning data group. The learning data group is used as sample data for clustering and approximation formula derivation.

【００２９】クラスタ情報記録部１１４では、各クラス
タごとに、クラスタ作成部１０２で作成されたクラスタ
の情報を記録する。実行時には、所属クラスタ判定部１
２２より送られた入力データ群が所属すると判定される
クラスタを提供する。The cluster information recording unit 114 records the information of the cluster created by the cluster creating unit 102 for each cluster. At the time of execution, belonging cluster determination unit 1
22 provides a cluster which is determined to belong to the input data group sent from S.22.

【００３０】近似式記録部１１６では、近似式作成部１
０４で作成された近似式を記録している。実行時には、
近似値計算部１２４より送られたクラスタに対応する近
似式を提供する。In the approximate expression recording unit 116, the approximate expression creating unit 1
The approximate expression created in 04 is recorded. At run time,
An approximate expression corresponding to the cluster sent from the approximate value calculation unit 124 is provided.

【００３１】次に、学習部１００においてクラスタリン
グ状態を変更する際の一実施例を説明する。本実施例
は、評価の悪いクラスタ内を更に小さなクラスタに分類
することにより精度を向上させるものである。なお、ク
ラスタ内を更に小さなクラスタに分類することを階層的
クラスタリングと呼ぶことにする。Next, an example of changing the clustering state in the learning section 100 will be described. In the present embodiment, the accuracy is improved by classifying a cluster with poor evaluation into smaller clusters. Note that classifying the inside of the cluster into smaller clusters will be referred to as hierarchical clustering.

【００３２】図４から図６を用いて、学習部１００にお
いて階層的にクラスタリングする例について説明する。
図中において、クラスタリングの様子を視覚的に理解し
やすくするため、２次元のデータを用いて表示している
が、実際には何次元でもよい。図４は、全学習用データ
群を対象にしてクラスタリングした結果である。Ｃ１か
らＣ８は各クラスタの名前である。図５は、Ｃ１、Ｃ
２、Ｃ４における精度が低いと評価されたため、領域内
を階層的にクラスタリングした例である。Ｃ１内をクラ
スタリングしたものが、Ｃ１１からＣ１４のクラスタで
ある。Ｃ２とＣ４についても同様である。さらに、図６
は、Ｃ１２、Ｃ２１における精度が低いと評価されたた
め、領域内を階層的にクラスタリングをした例である。
Ｃ１２内をクラスタリングしたものが、Ｃ１２１からＣ
１２３のクラスタである。Ｃ２１についても同様であ
る。図７は、図４から図６の処理を行った結果、作成さ
れたクラスタの階層関係を示したものである。An example of hierarchical clustering in the learning unit 100 will be described with reference to FIGS. 4 to 6.
In the figure, the two-dimensional data is used for display in order to make it easier to visually understand the state of clustering, but in reality, any number of dimensions may be used. FIG. 4 is a result of clustering targeting all the learning data groups. C1 to C8 are the names of each cluster. FIG. 5 shows C1 and C
It is an example in which the regions are hierarchically clustered because the accuracy is evaluated to be low in 2 and C4. A cluster of C1 is a cluster of C11 to C14. The same applies to C2 and C4. Furthermore, FIG.
Is an example in which the accuracy is evaluated to be low in C12 and C21, and thus the region is hierarchically clustered.
What clustered in C12 is C121 to C
There are 123 clusters. The same applies to C21. FIG. 7 shows a hierarchical relationship of clusters created as a result of performing the processes of FIGS. 4 to 6.

【００３３】次に、図７を用いて、実行部１２０におい
て階層的な構造のクラスタの中から入力データ群が属す
るクラスタを判定する一実施例を説明する。まず、最も
上位の階層のＣ１からＣ８のうち、どのクラスタに属す
るかを判定する。もし、Ｃ３やＣ５からＣ８などのクラ
スタであると判定されれば、そのクラスタには下位の階
層のクラスタはないので、所属クラスタが確定する。一
方、Ｃ１やＣ２、Ｃ４のように下位の階層にクラスタを
持つクラスタであると判定されれば、さらに判定を続け
る。この処理を、下位のクラスタを持たないクラスタを
見つけるまで続ける。Next, an embodiment in which the execution unit 120 determines the cluster to which the input data group belongs from the hierarchically structured clusters will be described with reference to FIG. First, it is determined which cluster, among C1 to C8 in the highest hierarchy, belongs to. If it is determined to be a cluster such as C3 or C5 to C8, there is no cluster in the lower hierarchy, and the belonging cluster is determined. On the other hand, if it is determined that the cluster has a cluster in a lower hierarchy like C1, C2, and C4, the determination is further continued. This process is continued until a cluster having no lower cluster is found.

【００３４】以下、Ｃ２１２を選択する場合を例とし
て、選択の過程を具体的に説明する。最初に、入力デー
タ群が、Ｃ１からＣ８のうちのどれに属するかを判定す
る。ここで、Ｃ２が選択された場合には、Ｃ２１からＣ
２４のうちのどれに属するかを判定する。ここで、Ｃ２
１が選択された場合には、さらにＣ２１１からＣ２１３
のうちのどれに属するかを判定する。Ｃ２１２が選択さ
れると、その下の階層にはクラスタがないので、クラス
タ選択の処理が終了し、最終的にＣ２１２に属すると判
定される。The selection process will be specifically described below by taking the case of selecting C212 as an example. First, it is determined which one of C1 to C8 the input data group belongs to. Here, when C2 is selected, C21 to C
It determines which one of 24 it belongs to. Where C2
When 1 is selected, C211 to C213
Which of which it belongs to is determined. When C212 is selected, there are no clusters in the hierarchy below it, so the cluster selection processing ends, and it is finally determined that it belongs to C212.

【００３５】次に、図２２と図２３を用いて図１をアル
ゴリズム的に実現する方法の一実施例を説明する。な
お、本実施例におけるクラスタリングの変更方法は、階
層的クラスタリングを用いるものとする。Next, an embodiment of a method for algorithmically realizing FIG. 1 will be described with reference to FIGS. 22 and 23. The clustering changing method in this embodiment uses hierarchical clustering.

【００３６】図２２は図１における学習部１００の機能
と記録部１１０の機能の一部を実現するものである。ま
ず、２２０４において、学習用データ群をクラスタリン
グする。次に、２２０６において、２２０４で作成され
た各クラスタについて近似式を求める。次に、２２０８
において、２２０６で求められた各クラスタの近似式の
精度を評価する。２２１０において、誤差の大きなクラ
スタが存在するかどうかを判定する。存在しなければ学
習は終了する。存在すれば、図２２の処理全体２２００
を再帰的に繰り返す。この処理を、誤差が大きいと判定
された全てのクラスタについて行う（２２１４）。FIG. 22 realizes a part of the function of the learning unit 100 and the function of the recording unit 110 in FIG. First, in 2204, the learning data group is clustered. Next, in 2206, an approximate expression is obtained for each cluster created in 2204. Then 2208
In, the accuracy of the approximate expression of each cluster obtained in 2206 is evaluated. At 2210, it is determined whether there are large error clusters. If it does not exist, learning ends. If present, the entire process 2200 of FIG.
Is repeated recursively. This process is performed for all clusters determined to have a large error (2214).

【００３７】図２３は図１における実行部１２０と記録
部１１０の機能の一部の機能を実現するものである。入
力データ群が与えられると、２３０２において入力デー
タ群の属するクラスタを判定する。次に、２３０４にお
いて、判定したクラスタの領域内に下の階層のクラスタ
が存在するかどうかを判定する。もし、下の階層のクラ
スタがあれば２３０２の処理に戻る。下の階層のクラス
タがなければ、２３０６において、２３０２で選択され
たクラスタの近似式を用いて入力データ群から近似値で
ある出力データ群を求める。FIG. 23 realizes a part of the functions of the execution unit 120 and the recording unit 110 in FIG. When the input data group is given, the cluster to which the input data group belongs is determined in 2302. Next, in 2304, it is determined whether or not there is a lower-level cluster within the determined cluster area. If there is a cluster in the lower layer, the process returns to 2302. If there is no cluster in the lower layer, in 2306, an output data group that is an approximate value is obtained from the input data group using the approximation formula of the cluster selected in 2302.

【００３８】次に、図８と図９を用いて、クラスタ内に
設けられたコードブックを利用して入力データ群の所属
クラスタを判定する例について説明する。図８は学習用
データ群の分布を表したものである。図９は同じデータ
を８つのクラスタに分類した結果である。Ｃ１からＣ８
はクラスタを表す。また、ＣＢ１からＣＢ８の点は、そ
れぞれＣ１からＣ８のコードブックを表す。入力データ
群９００が所属するクラスタをコードブックを用いて判
定する方法の一実施例としては、入力データ群と各コー
ドブックとの距離を求め、入力データ群に最も近いコー
ドブックが属するクラスタを選択する方法がある。図９
の例では、入力データ群９００に最も近いコードブック
はＣＢ１（９０２）であるので、クラスタＣ１に属して
いると判定される。Next, an example of determining the belonging cluster of the input data group by using the codebook provided in the cluster will be described with reference to FIGS. 8 and 9. FIG. 8 shows the distribution of the learning data group. FIG. 9 shows the result of classifying the same data into eight clusters. C1 to C8
Represents a cluster. The points CB1 to CB8 represent the codebooks C1 to C8, respectively. As one embodiment of the method of determining the cluster to which the input data group 900 belongs by using the codebook, the distance between the input data group and each codebook is obtained, and the cluster to which the codebook closest to the input data group belongs is selected. There is a way to do it. Figure 9
In this example, the codebook closest to the input data group 900 is CB1 (902), so it is determined that the codebook belongs to the cluster C1.

【００３９】本実施例では、コードブックを設けること
により、入力データ群の所属クラスタを高速に判定でき
るという特徴を持つ。コードブックを用いた場合は、判
定のための距離計算の回数はコードブックの個数だけで
済む。これに対し、コードブックを用いない場合は、入
力データ群（９００）に最も近いデータ群（９０４）の
所属するクラスタ（Ｃ１）を選択するので、全データ群
の数だけ距離計算を行わなければならない。したがっ
て、コードブックを用いることによって、距離計算の回
数を減少させることができるため、高速化することがで
きる。The present embodiment has a feature that the belonging cluster of the input data group can be determined at high speed by providing the codebook. When a codebook is used, the number of distance calculations for determination is limited to the number of codebooks. On the other hand, when the codebook is not used, since the cluster (C1) to which the data group (904) closest to the input data group (900) belongs is selected, the distance calculation must be performed by the number of all data groups. I won't. Therefore, by using the codebook, the number of distance calculations can be reduced, and the speed can be increased.

【００４０】コードブックの求め方の例を示す。第一の
例としては、クラスタ内の全てのデータ群の各データの
平均とするものである。第二の例としてはクラスタの領
域の重心をコードブックとするものである。この他に
も、クラスタ内のデータを代表するものであればどのよ
うに決めてもよい。An example of how to obtain the codebook will be shown. As a first example, the average of each data of all the data groups in the cluster is used. In the second example, the centroid of the cluster area is used as the codebook. Other than this, any method may be selected as long as it is representative of the data in the cluster.

【００４１】次に、図３に示した信頼度導出部３００の
処理の一実施例を、図１０から図１２を用いて説明す
る。本実施例は、データ群をクラスタリングすることに
より実現可能となるものである。なお、本実施例はクラ
スタ内にコードブックを持つものとする。Next, one embodiment of the processing of the reliability deriving unit 300 shown in FIG. 3 will be described with reference to FIGS. 10 to 12. This embodiment can be realized by clustering a data group. In this embodiment, it is assumed that the cluster has a codebook.

【００４２】本発明における出力データ群の信頼性は、
以下の要素で判定できる。（１）入力データ群とコードブックとの距離（２）クラスタ内の学習用データ群の数まず、上記（１）における入力データ群とコードブック
との距離を用いて信頼度を求める例を説明する。近似式
の導出は、クラスタ内の全学習用データ群を、最小自乗
近似を用いて作成した場合を考える。この時、一般にコ
ードブックに近い入力データ群の方が精度がよいので、
信頼度導出の基準を入力データ群とコードブックとの距
離とすることができる。ただし、クラスタ内のデータの
分布状態やクラスタの大きさが各クラスタごとに異なる
ため、単純に距離だけでは比較できない。The reliability of the output data group in the present invention is
It can be judged by the following factors. (1) Distance between input data group and codebook (2) Number of learning data group in cluster First, an example of obtaining reliability using the distance between the input data group and codebook in the above (1) will be described. To do. For the derivation of the approximate expression, consider a case where all the learning data groups in the cluster are created by using least square approximation. At this time, since the input data group close to the codebook is generally more accurate,
The criterion for deriving the reliability can be the distance between the input data group and the codebook. However, since the distribution state of the data within the cluster and the size of the cluster differ from one cluster to another, it is not possible to compare them simply by distance.

【００４３】距離に関する信頼度について、分布状態の
異なるクラスタ間でも比較できる方法の一実施例を説明
する。まず、クラスタ内の学習データ群の分布状態を、
コードブックを中心とする正規分布であると仮定する。
信頼度の基準としては、入力データ群の座標での正規分
布の分布密度の値をとることができる。このように、分
布密度の値を信頼度の基準とすれば、コードブックに近
い入力データ群は信頼度が高くなり、遠い入力データ群
は信頼度が低くなる。また、コードブックと入力データ
群との距離が同じでも、クラスタ内の学習用データ群の
分布が中心付近の密度が高いほど、正規分布の値が高く
なる。An embodiment of a method for comparing the distance reliability between clusters having different distribution states will be described. First, the distribution state of the learning data group in the cluster is
Assume a normal distribution centered on the codebook.
As the reliability standard, the value of the distribution density of the normal distribution at the coordinates of the input data group can be taken. As described above, when the value of the distribution density is used as the reliability standard, the input data group close to the codebook has high reliability, and the distant input data group has low reliability. Further, even if the distance between the codebook and the input data group is the same, the higher the density of the learning data group in the cluster near the center, the higher the value of the normal distribution.

【００４４】図１０から図１２を用いて正規分布を用い
た信頼度導出の具体例を示す。図１０と図１１はクラス
タ内のデータ群の分布状態を示したものである。図１１
において、図１０と同じ参照番号は同じものを指す。図
中で、１０００はコードブックであり、１００２と１０
０４は入力データ群である。図１２は図１０と図１１の
データ群の分布を正規分布と仮定したときの分布密度関
数である。１２００は図１０を、１２０２は図１１をの
分布状態を表している。１２０４は図１０と図１１にお
けるデータ群１００２とコードブック１０００との距離
１００６を表し、１２０６はデータ群１００４と１００
０との距離１００８を表している。図１２により、入力
データ群１００２は１００４より信頼度が高い。A specific example of reliability derivation using a normal distribution will be described with reference to FIGS. 10 to 12. 10 and 11 show the distribution state of the data group in the cluster. Figure 11
10, the same reference numerals as in FIG. 10 indicate the same things. In the figure, 1000 is a codebook, and 1002 and 10
Reference numeral 04 is an input data group. FIG. 12 shows a distribution density function when the distributions of the data groups of FIGS. 10 and 11 are assumed to be normal distributions. Reference numeral 1200 represents the distribution state of FIG. 10, and 1202 represents the distribution state of FIG. 11. Reference numeral 1204 denotes a distance 1006 between the data group 1002 and the codebook 1000 in FIGS. 10 and 11, and 1206 denotes the data groups 1004 and 100.
A distance 1008 from 0 is shown. According to FIG. 12, the input data group 1002 has higher reliability than 1004.

【００４５】一方、入力データ群１００２については、
図１０と図１１においてコードブックからの距離１００
８は同じであるが、分布密度関数が異なるため、図１０
のクラスタの分布状態の方が信頼度が高い。また、入力
データ群１００４は図１１の分布状態の方が信頼度が高
い。On the other hand, for the input data group 1002,
Distance from codebook 100 in FIGS. 10 and 11.
8 is the same, but the distribution density function is different.
The distribution of clusters of is more reliable. Further, the input data group 1004 has higher reliability in the distribution state of FIG. 11.

【００４６】次に、上記（２）におけるクラスタ内の学
習用データ群の数を用いて信頼度を求める例を説明す
る。クラスタ内の全データ群の誤差を最小にするように
近似式を導出した場合、一般に多くのデータを用いて導
出した方が精度がよい。したがって、クラスタ内のデー
タ群の数を信頼度導出の基準とすることができる。Next, an example of obtaining the reliability by using the number of learning data groups in the cluster in the above (2) will be described. When the approximation formula is derived so as to minimize the error of all data groups in the cluster, it is generally more accurate to derive it using a large amount of data. Therefore, the number of data groups in the cluster can be used as a criterion for deriving the reliability.

【００４７】図１３はクラスタ内のデータ群の分布状態
を示したものである。図１３において、図１０と同じ参
照番号は同じものを指す。図１３では、コードブック
（１０００）、入力データ群（１００２、１００４）、
コードブックとデータ群との距離（１００６、１００
８）、分布密度は図１０と同じであるが、近似式の導出
に用いたデータ群の数が少ない。したがって、図１３よ
り図１０の分布状態の方が信頼度が高いと考えられる。FIG. 13 shows the distribution state of the data group in the cluster. 13, the same reference numerals as those in FIG. 10 indicate the same things. In FIG. 13, the codebook (1000), the input data group (1002, 1004),
Distance between codebook and data group (1006, 100
8), the distribution density is the same as in FIG. 10, but the number of data groups used to derive the approximate expression is small. Therefore, the distribution of FIG. 10 is considered to have higher reliability than that of FIG.

【００４８】ここで述べた（１）と（２）の手法は単独
でも併用しても利用可能である。さらに、他の手法とも
併用できる。具体例として、各手法で求めた信頼度をか
けあわせたり、重みをつけて足しあわせることにより実
現できる。The methods (1) and (2) described here can be used independently or in combination. Furthermore, it can be used in combination with other methods. As a concrete example, it can be realized by multiplying the reliability obtained by each method or by adding weights.

【００４９】次に、図１４を用いて、信頼度が低い出力
を破棄する方法について説明する。１４００はクラスタ
内のデータ群の分布を正規分布と仮定した場合の分布密
度関数である。１４０２は破棄の基準とする。１４０４
と１４０６ははコードブックと入力データ群の距離であ
る。分布密度の価が１４０２を越える１４０４の入力デ
ータ群については得られた出力データ群を採用し、１４
０２を越えない１４０６の入力データ群については得ら
れた出力データ群を破棄する。Next, a method of discarding an output with low reliability will be described with reference to FIG. 1400 is a distribution density function when the distribution of the data group in the cluster is assumed to be a normal distribution. 1402 is a standard for discarding. 1404
And 1406 are the distances between the codebook and the input data group. For the 1404 input data group whose distribution density value exceeds 1402, the obtained output data group is adopted.
Regarding the input data group of 1406 which does not exceed 02, the obtained output data group is discarded.

【００５０】本実施例では、入力データ群とコードブッ
クとの距離を評価して、信頼度の低い出力を破棄できる
という特徴を持つ。したがって、制御問題などに適用し
た場合には、未知の入力に対して予期できない出力をす
るということが無くなる。The present embodiment is characterized in that the distance between the input data group and the codebook can be evaluated and the output with low reliability can be discarded. Therefore, when applied to a control problem or the like, unpredictable output will not occur for unknown inputs.

【００５１】次に、数１と数２に示す関数を用いて、本
発明を簡単な関数値の予測に適用した例を説明する。Next, an example in which the present invention is applied to a simple function value prediction by using the functions shown in Expressions 1 and 2 will be described.

【００５２】[0052]

【数１】 [Equation 1]

【００５３】[0053]

【数２】 [Equation 2]

【００５４】数１では、ｆ（０）＝０とし、ｋの値を０
から１９まで変化させて、ｆ（０）からｆ（１９）まで
の２０データを１セットとして扱っている。数２は数１
に−０．５から０．５までの範囲の乱数を加えたもので
ある。本実施例では、入力データ群、出力データ群、及
び学習用データ群の設定方法、学習方法、実行方法、結
果の表示方法、さらにデータ数の少ないクラスタ出の近
似式の作成方法について具体的に説明する。In Equation 1, f (0) = 0 and the value of k is 0.
To 19 and 20 pieces of data from f (0) to f (19) are treated as one set. Number 2 is number 1
Is a random number in the range of −0.5 to 0.5. In this embodiment, a method of setting an input data group, an output data group, and a learning data group, a learning method, an execution method, a result display method, and a method of creating an approximate expression from a cluster having a small number of data will be specifically described. explain.

【００５５】まず、図１６を用いて、本実施例における
入力データ群、出力データ群、及び学習用データ群の設
定方法について説明する。入力データ群は、連続する５
個のデータ（１６００〜１６０８）である。出力データ
群は、入力データ群の最後のデータの次のデータ（１６
１０）である。学習用データ群は連続する６個のデータ
である。このうち、前から５つのデータが入力データ群
に相当し、最後のデータが出力データ群に相当する。学
習時には、１６００〜１６０８の入力データ群に相当す
るデータを用いてクラスタリングを行なう。近似式を作
成するときには、１６００〜１６０８のから出力データ
群に相当するデータ１６１０を導出する計算式を求め
る。First, the method of setting the input data group, the output data group, and the learning data group in this embodiment will be described with reference to FIG. Input data group is continuous 5
It is individual data (1600 to 1608). The output data group is the data next to the last data of the input data group (16
10). The learning data group is 6 continuous data. Of these, the five data from the front correspond to the input data group, and the last data correspond to the output data group. At the time of learning, clustering is performed using the data corresponding to the input data group 1600 to 1608. When creating the approximate expression, a calculation expression for deriving data 1610 corresponding to the output data group from 1600 to 1608 is obtained.

【００５６】本実施例では、１セットのデータ数は２０
個なので、連続する６個のデータから構成される学習用
データ群は１セットにつき１５種類設定できる。入力デ
ータ群、出力データ群も学習用データ群に従い、１セッ
トにつき１５種類設定する。In this embodiment, the number of data in one set is 20.
Since it is a single piece, a learning data group composed of 6 consecutive data pieces can be set in 15 types per set. The input data group and the output data group are also set in 15 types per set according to the learning data group.

【００５７】本実施例では、数１に示す関数を１セット
と、数２に示す関数を９セットの合計１０セットを用意
する。このうち、学習用データ群は数１に基づく１セッ
トと数２に基づく７セットの合計８セット、すなわち１
５×８の１２０個のデータ群を用いる。学習結果の評価
には学習に用いていない２セットを使う。In the present embodiment, one set of the function shown in Formula 1 and 9 sets of the function shown in Formula 2 are prepared for a total of 10 sets. Of these, the learning data group has a total of 8 sets, that is, one set based on Formula 1 and 7 sets based on Formula 2, that is, 1
5 × 8 120 data groups are used. Two sets not used for learning are used to evaluate the learning results.

【００５８】図１５に１０セット全てを表示する。１か
ら８セット目のデータは学習に用いたものであり、９、
１０セット目のデータはテストデータである。図１５に
おいては、横軸のａ−ｂという記述は、その軸上の値
が、データ番号ａの学習用データ群の最初のデータであ
り、データ番号ｂの学習用データ群の最後のデータであ
ることを表している。例えば、データ番号４の学習用デ
ータ群は（４−＊）の軸上の値から（９−４）の軸上の
値までの６つのデータから構成されている。図１６のデ
ータ群の例では、（Ｐ１−Ｐ６）と記述される。FIG. 15 shows all 10 sets. The 1st to 8th sets of data are used for learning, and
The 10th set of data is test data. In FIG. 15, in the description of ab on the horizontal axis, the value on that axis is the first data of the learning data group of data number a and the last data of the learning data group of data number b. It means that there is. For example, the learning data group of data number 4 is composed of six data from the value on the axis of (4- *) to the value on the axis of (9-4). In the example of the data group of FIG. 16, it is described as (P1-P6).

【００５９】次に、本実施例の学習方法を説明する。本
実施例では、クラスタリングには前述の酒匂による手法
を、近似式の導出には線形の重回帰分析を用いるものと
する。クラスタリングには学習用データ群のうち前から
５つのデータのみを用いる。クラスタリング終了後、ク
ラスタ内のすべてのデータを用いて、この５つのデータ
から出力データ群に相当する１つのデータを求める近似
式を導出する。数３に近似式の例を示す。ここで、ａ１
からａ５は入力データ群のデータの値を表し、ｂは出力
データ群のデータの値を表す。Next, the learning method of this embodiment will be described. In the present embodiment, it is assumed that the above-mentioned method based on liquor is used for clustering and linear multiple regression analysis is used for deriving an approximate expression. For clustering, only the five data from the front of the learning data group are used. After completion of the clustering, an approximate expression for obtaining one data corresponding to the output data group is derived from the five data by using all the data in the cluster. Equation 3 shows an example of the approximation formula. Where a1
To a5 represent data values of the input data group, and b represent data values of the output data group.

【００６０】[0060]

【数３】 [Equation 3]

【００６１】次に、本実施例の実行方法を説明する。本
実施例では、まず、コードブックを用いて入力データ群
の所属するクラスタを判定する。コードブックはクラス
タ内の学習用データ群の５つのデータの値の平均値とす
る。入力データ群とコードブックとの距離が最も近いク
ラスタを、入力データ群の所属するクラスタとして選択
する。クラスタが決まった後、入力データ群の各データ
の値を該当するクラスタの近似式に代入して出力データ
群を求める。Next, the execution method of this embodiment will be described. In the present embodiment, first, the codebook is used to determine the cluster to which the input data group belongs. The codebook is the average value of the values of five data of the learning data group in the cluster. The cluster having the shortest distance between the input data group and the codebook is selected as the cluster to which the input data group belongs. After the cluster is determined, the value of each data in the input data group is substituted into the approximate expression of the corresponding cluster to obtain the output data group.

【００６２】次に、本実施例における階層的クラスタリ
ングについて説明する。本実施例では、クラスタ内の全
ての学習用データ群を用いて近似式を検証し、最大誤差
が０．５を越えた場合に、該当するクラスタの領域内を
細分化するものとする。この結果、各クラスタでほぼ±
０．５以下の誤差で予測ができる。Next, the hierarchical clustering in this embodiment will be described. In the present embodiment, the approximation formula is verified using all the learning data groups in the cluster, and when the maximum error exceeds 0.5, the area of the corresponding cluster is subdivided. As a result, in each cluster almost ±
Prediction is possible with an error of 0.5 or less.

【００６３】次に、図１７を用いて、本実施例の結果の
表示装置について説明する。図１７は本実施例における
クラスタリング結果の一例を示したものである。本実施
例では、クラスタ間の関係の表示方法として、クラスタ
を木構造を用いて表示する。１７００から１７４４はク
ラスタを表しており、鍵括弧内の数字は、クラスタ内に
存在する学習用データ群の先頭のデータ番号、その下の
数字は、クラスタ内の学習用データ群の数である。先頭
のデータ番号はデータ群の種類を表す。例えば、クラス
タ１７２４では、２で始まる８個のデータ群が全て一つ
のクラスタにまとめられている。本表示装置では、図１
７に示したような階層構造のクラスタの関係を示した図
の全体もしくは一部が、実行結果としてユーザーに表示
される。この表示により、学習用データ群がどのように
分類されたかを認識することができる。Next, the display device for displaying the results of this embodiment will be described with reference to FIG. FIG. 17 shows an example of the clustering result in this embodiment. In this embodiment, clusters are displayed using a tree structure as a method of displaying the relationship between clusters. Reference numerals 1700 to 1744 represent clusters. The number in the parentheses is the head data number of the learning data group existing in the cluster, and the number below is the number of the learning data group in the cluster. The leading data number represents the type of data group. For example, in cluster 1724, all eight data groups starting with 2 are grouped into one cluster. The display device shown in FIG.
The whole or part of the diagram showing the relationship between the clusters having the hierarchical structure as shown in FIG. 7 is displayed to the user as the execution result. From this display, it is possible to recognize how the learning data group has been classified.

【００６４】図１８を用いて、この表示における結果の
強調方法について説明する。図１８において、図１７と
同じ参照番号は同じものをさす。入力データ群が属する
と判定されたクラスタに点滅表示を行えば（１８０
０）、選択されたクラスタを認識することが容易にな
る。この表示によって、入力データ群の全体に対する位
置付けが容易に理解できる。また、点滅表示のほかに、
白黒反転表示などの例も利用できる。A method of highlighting the result in this display will be described with reference to FIG. 18, the same reference numerals as those in FIG. 17 refer to the same parts. If the cluster determined to belong to the input data group is blinked (180
0), it becomes easy to recognize the selected cluster. With this display, the positioning of the entire input data group can be easily understood. In addition to the blinking display,
Examples such as black and white reverse display are also available.

【００６５】図１９を用いて、この表示における付加情
報として、信頼度を表示する方法について説明する。図
１９において、図１７と同じ参照番号は同じものをさ
す。１７００から１７４４のクラスタの表示の他に、信
頼度を表示すれば（１９００）、ユーザーがその値をど
の程度信用すべきかという判断基準を与えることにな
る。A method of displaying the reliability as additional information in this display will be described with reference to FIG. 19, reference numerals that are the same as those in FIG. 17 refer to the same things. In addition to displaying the clusters from 1700 to 1744, displaying the reliability (1900) provides a criterion for how much the user should trust the value.

【００６６】図２０を用いて、この表示における付加情
報として、近似式を表示する方法について説明する。図
２０において、図１７と同じ参照番号は同じものをさ
す。１７００から１７４４のクラスタの表示の他に、近
似式を表示すれば（２０００）、入力値がどれだけ出力
に影響を与えているのかという情報をユーザーに示すこ
とができる。A method of displaying an approximate expression as additional information in this display will be described with reference to FIG. 20, the same reference numerals as those in FIG. 17 refer to the same parts. In addition to displaying the clusters from 1700 to 1744, by displaying an approximate expression (2000), it is possible to show the user how much the input value affects the output.

【００６７】次に、本実施例において、クラスタごとに
近似式の入力データと出力データの数や種類を変更する
方法の一実施例を説明する。数や種類を変更する方法と
しては、分散の大きなデータを選択する方法がある。こ
れはクラスタ内のデータ群の数が近似式の入力データ数
より少ない場合に、近似式の入力データ数を減らすこと
により近似式を作成できるという特徴を持つ。図１７に
おいて、１７３８のクラスタ内のデータ群の数は４個で
ある。近似式の入力データ数は５個であるので、５個の
うち４個を選択して近似式を作成する。Next, an embodiment of a method of changing the number and types of input data and output data of an approximate expression for each cluster in this embodiment will be described. As a method of changing the number and type, there is a method of selecting data with large dispersion. This has a feature that when the number of data groups in the cluster is smaller than the number of input data of the approximate expression, the approximate expression can be created by reducing the number of input data of the approximate expression. In FIG. 17, the number of data groups in the cluster 1738 is four. Since the number of input data of the approximate expression is 5, 4 out of 5 are selected to create the approximate expression.

【００６８】次に、図２１を用いて、本装置をプラント
などの異常検出に用いる例について説明する。プラント
異常検出装置の構成を図２１に示す。本実施例におい
て、関数近似装置は、プラントの過去の入出力データ群
から次の出力の予測値を求める働きをする。本実施例で
は、異常診断の対象となるプラント２１００の出力と関
数近似装置２１１０による予測値を、異常検出装置２１
１２において比較する。その結果、誤差が大きければ、
予想外の挙動が起こっているものとして、異常であると
判定し、オペレータに警告するものである。Next, with reference to FIG. 21, an example in which the present apparatus is used for detecting abnormality in a plant or the like will be described. The configuration of the plant abnormality detection device is shown in FIG. In this embodiment, the function approximating device functions to obtain the predicted value of the next output from the past input / output data group of the plant. In this embodiment, the output of the plant 2100 that is the target of abnormality diagnosis and the predicted value by the function approximation device 2110 are used as the abnormality detection device 21.
Compare in 12. As a result, if the error is large,
It is determined that an unexpected behavior is occurring, and the operator is warned.

【００６９】以下に、本装置の動作について説明する。
プラント２１００は、入力２１１４を受けて出力２１１
６を出力するものである。２１０２から２１０６は時間
遅れ素子である。この素子は、出力２１１６を一定時間
蓄え、一時点前の出力２１１８を前処理装置２１０８に
送るものである。前処理装置２１０８は、プラントの入
力２１１４と一時点前の出力２１１８を受け、数時点分
のデータを蓄えて、関数近似装置２１１０の学習用デー
タ群及び入力データ群２１２０を生成する。関数近似装
置２１１０は、前処理部で作成した入力データ群２１２
０を受けて、出力の予測値２１２２を生成する。異常検
出装置２１１２は、出力データ群２１１６と予測値２１
２２とを比較して、差が大きいときにプラントの動作が
異常であるとしてユーザに警告を与える。The operation of this apparatus will be described below.
The plant 2100 receives the input 2114 and outputs the output 211.
6 is output. 2102 to 2106 are time delay elements. This element stores the output 2116 for a certain period of time and sends the output 2118 before the temporary point to the preprocessing device 2108. The preprocessing device 2108 receives the plant input 2114 and the output 2118 before the temporary point, accumulates data for several time points, and generates a learning data group and an input data group 2120 of the function approximating device 2110. The function approximation device 2110 uses the input data group 212 created by the preprocessing unit.
Upon receiving 0, the predicted value 2122 of the output is generated. The abnormality detection device 2112 uses the output data group 2116 and the predicted value 21.
22 is compared, and when the difference is large, the user is warned that the operation of the plant is abnormal.

【００７０】次に、本発明を画像修復に適用して、空間
パターンの予測に用いる例について説明する。学習時に
は、学習用データ群として画像データを蓄えておき、そ
れらをクラスタ作成部１０６でクラスタリングする。実
行時には、一部が欠けた画像を入力し、その画像がどの
クラスタに属するかを判定する。さらに、クラスタ内の
画像から、入力画像の欠けた部分を予測する。簡単な具
体例を示す。データベースには人、猿、犬などの動物の
顔の画像を蓄えておく。実行時には、顔の一部の欠けた
画像を入力する。まず、人、猿、犬のうちどの動物であ
るかを判定する。階層的にクラスタリングがされている
場合には、各動物について似たもの同士がクラスタリン
グされているので、入力画像に似た顔のクラスタを選択
する。次に、クラスタ内の画像と入力画像から欠けた部
分を修復する。Next, an example in which the present invention is applied to image restoration and used to predict a spatial pattern will be described. At the time of learning, image data is stored as a learning data group, and the cluster creating unit 106 clusters them. At the time of execution, an image with a part missing is input and it is determined to which cluster the image belongs. Furthermore, the missing part of the input image is predicted from the images in the cluster. A simple concrete example is shown. Images of humans, monkeys, dogs, and other animals' faces are stored in the database. At the time of execution, an image with a part of the face missing is input. First, it is determined which of a human, a monkey and a dog is an animal. When the clustering is performed hierarchically, similar ones are clustered for each animal, and therefore a cluster of faces similar to the input image is selected. Next, the missing parts from the image in the cluster and the input image are repaired.

【００７１】次に、本発明による関数近似装置における
追加学習方法について説明する。本実施例は、本発明に
おいて階層的なクラスタリングを行う場合に適用するも
のとする。本実施例では、一度学習を終了した後、新た
に学習用データ群を追加する場合に、それまでの学習結
果を部分的に改良して対応するものである。したがっ
て、全ての学習用データ群を用いて最初から学習しなお
す場合に比べて、学習が高速であるという特徴がある。
さらに、近似対象の特性が変化しても、変化後のデータ
群を追加学習するだけで、本関数近似装置がこの特性変
化に追従できるという特徴を持つ。Next, an additional learning method in the function approximating apparatus according to the present invention will be described. This embodiment is applied when hierarchical clustering is performed in the present invention. In the present embodiment, when a learning data group is newly added after the learning has been completed once, the learning results up to that point are partially improved and dealt with. Therefore, the learning is faster than the case where the learning is performed again from the beginning using all the learning data groups.
Further, even if the characteristics of the approximation target change, the function approximating apparatus can follow this characteristic change only by additionally learning the changed data group.

【００７２】なお、追加学習方法には次の４つの適用方
法が考えられる。（１）追加学習用データ群を加えるごとに適用する。（２）追加学習用データ群を従来の近似式で評価し、誤
差が基準を越えたときのみ適用する。（３）追加学習用データ群の数があらかじめ設定したあ
る個数に達したときのみ適用する。（４）追加学習用データ群を従来の近似式で評価し、誤
差が基準を越えた回数があらかじめ設定したある回数に
達したときのみ適用する。The following four application methods can be considered as the additional learning method. (1) It is applied every time an additional learning data group is added. (2) The additional learning data group is evaluated by the conventional approximation formula, and is applied only when the error exceeds the standard. (3) It is applied only when the number of additional learning data groups reaches a preset number. (4) The additional learning data group is evaluated by the conventional approximation formula, and is applied only when the number of times the error exceeds the reference reaches a preset number of times.

【００７３】まず、図２４を用いて、追加学習の一実施
例である追加学習方法１について説明する。本追加学習
方法は、追加学習用データ群が所属し、かつ上から第ｍ
階層目のクラスタＸｍの領域内についてのみ学習を行う
ものである。このｍの値は任意に決めることができる。First, referring to FIG. 24, an additional learning method 1 which is an embodiment of the additional learning will be described. In this additional learning method, a group of data for additional learning belongs, and
Learning is performed only within the region of the cluster Xm of the hierarchy. The value of this m can be determined arbitrarily.

【００７４】以下、本実施例の処理手順について説明す
る。まず、２４０２において、第ｍ階層目のクラスタの
うち、追加学習用データ群が属するクラスタＸｍを選択
する。もし第ｍ階層目に追加学習用データ群が属するク
ラスタが存在しない場合には、第ｍ階層より上位の階層
のクラスタを選択する。次に、２４０４において、この
クラスタＸｍの領域内についてのみ、図２２における２
２００の処理を行い、クラスタと近似式を求めなおすこ
とにより追加学習を終了する。なお、これ以降、あるク
ラスタの領域内で２２００の処理を施すことを再学習と
呼ぶ。The processing procedure of this embodiment will be described below. First, in 2402, the cluster Xm to which the additional learning data group belongs is selected from the m-th layer clusters. If the cluster to which the additional learning data group belongs does not exist in the m-th layer, a cluster in a layer higher than the m-th layer is selected. Next, in 2404, only in the area of this cluster Xm, 2 in FIG.
The additional learning is ended by performing the processing of 200 and re-calculating the cluster and the approximate expression. Note that, thereafter, performing the processing of 2200 in the area of a certain cluster is called re-learning.

【００７５】本実施例は、追加学習用データ群が所属す
る第ｍ階層目のクラスタの領域内についてのみ再学習を
行い、その他の部分は再学習を行わないため、新規に学
習するより高速に学習できるという特徴がある。In the present embodiment, re-learning is performed only in the area of the cluster of the m-th layer to which the additional learning data group belongs, and re-learning is not performed for the other portions, so that it is faster than new learning. There is a feature that you can learn.

【００７６】図２７から図２９を用いて、本実施例を具
体的に説明する。図２７から図２９においてＣとはクラ
スタを示すものとする。本実施例では、再学習する階層
の数ｍを２とする。以下、本実施例の処理手順について
説明する。まず、追加学習用データ群が属するクラスタ
を求める。追加学習用データ群の属するクラスタが、第
１階層目ではＣ２、第２階層目ではＣ２１、第３階層目
ではＣ２１２であると判定された場合を考える。ｍの値
が２であるので、再学習するクラスタにはＣ２１が選択
される。そこで、図２８に示すように、一端Ｃ２１１と
Ｃ２１２を破棄して、Ｃ２１の領域内の全ての学習用デ
ータ群と追加学習用データ群を用いて再学習する。図２
９の２９００は追加学習により新たに作成されたクラス
タである。この例では、クラスタＣ２１４の近似精度が
悪いため、第４階層目のクラスタＣ２１４１とＣ２１４
２が作成されている。This embodiment will be specifically described with reference to FIGS. 27 to 29. 27 to 29, C represents a cluster. In this embodiment, the number m of layers to be re-learned is 2. The processing procedure of this embodiment will be described below. First, the cluster to which the additional learning data group belongs is determined. Consider a case where it is determined that the cluster to which the additional learning data group belongs is C2 in the first hierarchy, C21 in the second hierarchy, and C212 in the third hierarchy. Since the value of m is 2, C21 is selected as the cluster to be relearned. Therefore, as shown in FIG. 28, the ends C211 and C212 are discarded, and re-learning is performed using all the learning data groups and the additional learning data group in the area C21. Figure 2
2900 of 9 is a cluster newly created by additional learning. In this example, since the approximation accuracy of the cluster C214 is poor, the clusters C2141 and C214 of the fourth layer are
2 has been created.

【００７７】次に、図２５を用いて、追加学習のもう一
つの実施例である追加学習方法２について説明する。図
２５において図２４と同じ参照番号は同じものを指す。
本実施例は、以下の２種類のクラスタについて再学習を
行う。Next, with reference to FIG. 25, an additional learning method 2 which is another embodiment of the additional learning will be described. 25, the same reference numerals as those in FIG. 24 indicate the same things.
In this embodiment, re-learning is performed on the following two types of clusters.

【００７８】（１）追加学習用データ群が所属し、かつ
上から第ｍ階層目のクラスタ（２）（１）のクラスタに隣接し、かつ上から第ｎ階層
目のクラスタこのｍとｎの値はあらかじめ任意に決めておく。なお、
第ｍ階層目に追加学習用データ群が所属するクラスタが
存在しない場合には、第ｍ階層より上位の階層のクラス
タを選択する。ｎについても同様である。(1) The cluster for the additional learning data belongs and is adjacent to the cluster of the mth layer from the top (2) The cluster of the nth layer from the top. Predetermine the value arbitrarily. In addition,
When there is no cluster to which the additional learning data group belongs in the m-th layer, a cluster in a layer higher than the m-th layer is selected. The same applies to n.

【００７９】本実施例は、追加学習用データ群が所属す
る第ｍ階層目のクラスタとその周辺にあるクラスタの領
域内についてのみ、学習を行い、その他の部分は学習を
行わないため、新規に学習するより高速に学習できると
いう特徴がある。さらに、追加学習用データ群の属する
クラスタの再学習の影響を受けるクラスタについても再
学習を行うため、前記の追加学習方法より精度が良いと
いう特徴がある。In this embodiment, learning is performed only in the area of the m-th layer cluster to which the additional learning data group belongs and the clusters around it, and the other portions are not learned. There is a feature that you can learn faster than you learn. Further, since the clusters that are affected by the re-learning of the cluster to which the additional learning data group belongs are re-learned, the additional learning method is characterized by higher accuracy.

【００８０】以下、本実施例の処理手順について説明す
る。まず、２４０２において、追加学習用データ群が所
属し、かつ第ｍ階層目のクラスタＸｍを選択する。次
に、２４０４において、このクラスタＸｍの領域内につ
いてのみ、図２２における２２００の処理を行い、再学
習する。次に、２５０２において、クラスタＸｍに隣接
するクラスタのうち、第ｎ階層目のクラスタＹｎを選択
する。次に、２５０４において、クラスタＹｎの領域内
を再学習するかどうかを判定するための評価量を求め、
２５０６において、クラスタＹｎの領域内を再学習する
必要があるかどうかを判定する。再学習の必要がなけれ
ば２５１０の処理へ進む。再学習の必要があれば、クラ
スタＹｎの領域内について図２２における処理２２５０
を行う。この処理は、クラスタＹｎのクラスタリング状
態は変更せず、Ｙｎの領域内の全ての学習用データ群を
用いて近似式を求め、その近似式の精度が悪ければ、図
２２における処理２２００を行うものである。最後に、
２５１０おいて、クラスタＸに隣接する第ｎ階層目の全
てのクラスタについて処理２５０６を行ったかどうかを
判定し、全てのクラスタについて行ったところで追加学
習を終了する。なお、２５０８の処理において、２２５
０のかわりに２２００の処理を行なうことも可能であ
る。これは、クラスタＹｎのクラスタリング状態を変更
した後に近似式も変更するものである。The processing procedure of this embodiment will be described below. First, in 2402, the cluster Xm of the m-th layer to which the additional learning data group belongs and is selected. Next, in 2404, the process of 2200 in FIG. 22 is performed only for the region of the cluster Xm to relearn. Next, at 2502, the cluster Yn of the nth hierarchy is selected from the clusters adjacent to the cluster Xm. Next, in 2504, an evaluation amount for determining whether or not to re-learn the area of the cluster Yn is obtained,
At 2506, it is determined whether it is necessary to re-learn in the area of the cluster Yn. If there is no need for re-learning, the process proceeds to 2510. If re-learning is necessary, the process 2250 in FIG.
I do. In this processing, the clustering state of the cluster Yn is not changed, an approximate expression is obtained using all the learning data groups in the area of Yn, and if the accuracy of the approximate expression is poor, the processing 2200 in FIG. 22 is performed. Is. Finally,
In 2510, it is determined whether the process 2506 has been performed for all clusters in the nth layer adjacent to the cluster X, and the additional learning ends when all clusters have been processed. In the processing of 2508, 225
It is also possible to perform the processing of 2200 instead of 0. This is to change the approximation formula after changing the clustering state of the cluster Yn.

【００８１】図３０から図３２を用いて、本実施例につ
いて具体的に説明する。まず、ｍとｎが１、すなわち、
追加学習用データ群の属するクラスタも、隣接するクラ
スタも第１階層目のクラスタについて学習しなおす例を
示す。なお、図３０から図３３において、ＣＢとはコー
ドブックを表す。コードブックはクラスタ内の全学習用
データ群の平均であると仮定する。また、クラスタ間の
境界は、各クラスタを代表するコードブックの間の垂直
２等分面（線）であるとする。なお、以下の記述におい
て、ＣＢｎ（ｎは整数）をコードブックとするクラスタ
を「ＣＢｎのクラスタ」と呼ぶことにする。This embodiment will be specifically described with reference to FIGS. 30 to 32. First, m and n are 1, that is,
An example is shown in which both the cluster to which the additional learning data group belongs and the adjacent clusters are relearned with respect to the cluster of the first layer. 30 to 33, CB represents a codebook. The codebook is assumed to be the average of all training data sets in the cluster. Further, the boundary between the clusters is a vertical bisector (line) between the codebooks representing each cluster. In the following description, a cluster having CBn (n is an integer) as a codebook will be referred to as a “CBn cluster”.

【００８２】図３０は、追加学習を行う前のクラスタの
状態を表したものである。FIG. 30 shows the state of the cluster before the additional learning.

【００８３】まず、図３１を用いて追加学習用データ群
の所属するクラスタの再学習の一実施例を説明する。図
３１は、ＣＢ１（３１０２）のクラスタの領域内に追加
学習用データ群（３１００）が加わった例である。コー
ドブックは全学習用データ群の平均であるので、追加学
習用データ群の影響によりコードブックの位置がＣＢ
１’（３１０４）に移動する。ＣＢ１の移動に伴い、Ｃ
Ｂ間の垂直二等分面であるクラスタ間の境界も移動す
る。ＣＢ１がＣＢ１’に移動した後の状態で、ＣＢ１’
のクラスタの領域内を再学習する。First, an embodiment of re-learning of the cluster to which the additional learning data group belongs will be described with reference to FIG. FIG. 31 is an example in which the additional learning data group (3100) is added to the area of the cluster of CB1 (3102). Since the codebook is the average of all learning data groups, the position of the codebook is CB due to the influence of the additional learning data group.
Move to 1 '(3104). As CB1 moves, C
The boundary between the clusters, which is the vertical bisector between B, also moves. After CB1 moves to CB1 ', CB1'
Retrain in the region of the cluster of.

【００８４】次に、追加学習用データ群の所属するクラ
スタに隣接するクラスタの再学習の一実施例を説明す
る。３１０８はＣＢ１の移動に伴って、ＣＢ４とＣＢ５
のクラスタからＣＢ１のクラスタの領域に加わった領域
である。ここで、ＣＢ４とＣＢ５のクラスタを再学習す
る必要はない。これらのクラスタの領域は減少するが、
領域内に残った全てのデータ群は、従来通りのＣＢ４や
ＣＢ５の近似式で精度良く近似できるからである。ただ
し、これらのクラスタも再学習することも可能である。Next, an embodiment of re-learning of a cluster adjacent to the cluster to which the additional learning data group belongs will be described. 3108 is CB4 and CB5 along with the movement of CB1.
This is an area that is added to the area of the cluster of CB1 from the cluster of. Here, it is not necessary to re-learn the clusters of CB4 and CB5. The area of these clusters decreases, but
This is because all the data groups remaining in the area can be accurately approximated by the conventional approximation formulas of CB4 and CB5. However, these clusters can also be relearned.

【００８５】さらに、追加学習用データ群の所属するク
ラスタに隣接するクラスタの再学習の一実施例を説明す
る。３１０６は、コードブックの移動によりＣＢ１のク
ラスタの領域から外れて、ＣＢ２やＣＢ３、ＣＢ６のク
ラスタに加わった領域である。ここで、ＣＢ２やＣＢ
３、ＣＢ６のクラスタは再学習する必要がある。追加学
習以前には、３１００の領域はＣＢ１の領域であったの
で、３１０６の領域内のデータ群の近似はＣＢ１の式で
行われていたものである。したがって、３１０６の領域
内のデータ群はＣＢ２やＣＢ３、ＣＢ６のクラスタの近
似式では精度良く近似できる保証はないためである。Further, an embodiment of re-learning of a cluster adjacent to the cluster to which the additional learning data group belongs will be described. Reference numeral 3106 denotes an area which is deviated from the area of the cluster of CB1 due to the movement of the codebook and added to the clusters of CB2, CB3, and CB6. Where CB2 and CB
3. The cluster of CB6 needs to be relearned. Before the additional learning, the region 3100 was the region CB1, so the data group in the region 3106 was approximated by the formula CB1. Therefore, there is no guarantee that the data group in the region 3106 can be approximated with high accuracy by the approximation formula of the clusters CB2, CB3, and CB6.

【００８６】なお、これらのクラスタを再学習する際
に、コードブックの扱い方が２種類考えられる。コード
ブックを移動させずに近似式のみを変更する場合と、コ
ードブックを移動させた後に近似式を変更する場合であ
る。前者は図２９の２５０８に相当し、後者は２５０８
において処理２２５０のかわりに処理２２００を行なう
ものに相当する。この区別は、隣接するクラスタのコー
ドブックを移動させた場合、さらにこれらのクラスタに
隣接するクラスタとの境界が変更されるために、再学習
の影響が次々に伝播するという問題にどう対処するかと
いうことに起因する。コードブックを移動させない場合
は境界が移動しないので、これ以上他のクラスタを再学
習する必要はない。コードブックを移動させる場合は、
境界の移動による影響がなくなるまで、境界が変更にな
るクラスタを再学習し続ける必要がある。When re-learning these clusters, there are two possible ways to handle the codebook. There are a case where only the approximate expression is changed without moving the codebook, and a case where the approximate expression is changed after moving the codebook. The former corresponds to 2508 in FIG. 29, and the latter is 2508.
Corresponds to performing the process 2200 instead of the process 2250. How does this distinction deal with the problem of re-learning effects propagating one after another when moving the codebooks of adjacent clusters and further changing the boundaries of those clusters with adjacent clusters? Due to that. If the codebook is not moved, the boundaries do not move, so there is no need to relearn other clusters. If you want to move the codebook,
It is necessary to continue to re-learn clusters whose boundaries change until the effect of moving the boundaries disappears.

【００８７】次に、再学習するクラスタの判定方法を評
価する方法の一実施例を説明する。これは図２５の２５
０４に相当する。３１０６の領域内に学習用データ群が
存在しない場合には、ＣＢ２やＣＢ３、ＣＢ６のクラス
タにおいて近似式を導出するための学習用データ群に変
化がないので、再学習する必要はない。したがって、３
１０６の領域内に学習用データが存在するかどうを判定
して最終的に再学習するかどうかを決める。Next, an embodiment of a method for evaluating a method for determining a cluster to be relearned will be described. This is 25 in FIG.
Equivalent to 04. When the learning data group does not exist in the region 3106, there is no change in the learning data group for deriving the approximate expression in the clusters CB2, CB3, and CB6, and thus it is not necessary to re-learn. Therefore, 3
It is determined whether or not the learning data exists in the region of 106, and finally it is decided whether or not to re-learn.

【００８８】次に、図３２を用いて、ｎが２、すなわ
ち、隣接するクラスタの第２階層目のクラスタについて
再学習する例を示す。第２階層目で、追加学習用データ
群が所属するクラスタ（ＣＢ１’）に隣接し、図３１を
用いて説明した条件を満たす領域が３２００である。し
たがって、この領域について再学習する。Next, with reference to FIG. 32, an example will be shown in which n is 2, that is, relearning is performed for the cluster of the second layer of the adjacent cluster. In the second layer, 3200 is an area adjacent to the cluster (CB1 ′) to which the additional learning data group belongs and that satisfies the condition described with reference to FIG. Therefore, this area is relearned.

【００８９】次に、図２６を用いて、追加学習のもう一
つの実施例である追加学習方法３について説明する。本
実施例は、追加学習用データ群が所属し、かつ第ｋ階層
目のクラスタＸｋの領域内についてのみ学習を行うもの
である。ただし、ｋの値は、個々の追加学習用データ群
に応じて適当なもを決めることができる。Next, with reference to FIG. 26, an additional learning method 3, which is another embodiment of the additional learning, will be described. In the present embodiment, the additional learning data group belongs and learning is performed only in the area of the cluster Xk of the kth layer. However, the value of k can be determined appropriately according to each additional learning data group.

【００９０】本実施例は、追加学習用データ群の周辺の
領域内についてのみ学習を行い、その他の部分は学習を
行わないため、新規に学習するより高速に学習できると
いう特徴がある。さらに、個々の追加学習用データ群に
応じて、最も適当な再学習の階層ｋを求めることができ
るため、必要以上に大きな領域を再学習することがな
く、高速である。The present embodiment is characterized in that learning is performed only in the area around the additional learning data group, and learning is not performed in the other portions, so that learning can be performed faster than new learning. Further, since the most appropriate re-learning hierarchy k can be obtained according to each additional learning data group, it is fast without re-learning an area larger than necessary.

【００９１】以下、本実施例の処理手順について説明す
る。まず、２６０２において、各階層において追加学習
用データ群が属するクラスタの評価量を計算する。次
に、２６０４において、評価量が基準以下のクラスタの
うち、下からｎ番目の階層のクラスタＸｋを選択する。
次に、２６０６において、このクラスタＸｋの領域内に
ついてのみ、図２２における２２００の処理を行い、ク
ラスタリング状態と近似式を変更することにより追加学
習を終了する。The processing procedure of this embodiment will be described below. First, in 2602, the evaluation amount of the cluster to which the additional learning data group belongs in each layer is calculated. Next, in 2604, the cluster Xk of the nth hierarchy from the bottom is selected from the clusters whose evaluation amount is equal to or less than the reference.
Next, in 2606, the processing of 2200 in FIG. 22 is performed only within the region of the cluster Xk, and the additional learning is ended by changing the clustering state and the approximate expression.

【００９２】ここで、２６０２と２６０４における評価
量の例について説明する。この評価量は、個々の追加学
習用データ群に応じて再学習するクラスタの階層を決め
るものである。再学習するクラスタを以下のように判定
する。（１）追加学習用データ群がクラスタ間の境界付近に存
在する場合には、その階層より上の階層で再学習を行う
（再学習する領域を広げる）。（２）追加学習用データ群がコードブック付近に存在す
る場合には、そのクラスタの領域内において再学習を行
う。Here, an example of the evaluation amount in 2602 and 2604 will be described. This evaluation amount determines the hierarchy of the cluster to be re-learned according to each additional learning data group. The cluster to be relearned is determined as follows. (1) When the additional learning data group exists near the boundary between the clusters, re-learning is performed in a layer above that layer (the region for re-learning is expanded). (2) When the additional learning data group exists near the codebook, relearning is performed within the area of the cluster.

【００９３】この根拠は以下のとおりである。（ａ）クラスタ間の境界に付近に追加学習用データ群が
存在する場合には、コードブックの移動量も大きくな
り、クラスタ間の境界と近似式が大きく変化する可能性
がある。（ｂ）クラスタ間の境界に付近に追加学習用データ群が
存在する場合には、クラスタの中心からの距離が大きい
ので、近似式の精度が悪い可能性がある。したがって、判定基準を「コードブックと追加学習用デ
ータ群との距離が、あらかじめ決められた基準以下であ
るクラスタのクラスタのうち、最下層のクラスタ」と決
めることができる。ただし、最下層である必要はなく、
下から２番目、３番目など任意の階層が可能である。The grounds for this are as follows. (A) When the additional learning data group exists near the boundary between the clusters, the amount of movement of the codebook also increases, and the boundary between the clusters and the approximate expression may change significantly. (B) When the additional learning data group exists near the boundary between the clusters, the distance from the center of the cluster is large, and thus the accuracy of the approximation formula may be poor. Therefore, it is possible to determine the criterion as “the lowest cluster among the clusters in which the distance between the codebook and the additional learning data group is less than or equal to a predetermined criterion”. However, it does not have to be the bottom layer,
Arbitrary layers such as the second from the bottom and the third from the bottom are possible.

【００９４】また、判定のための距離の基準は各クラス
タによって変えることができる。例えば、クラスタ内の
全学習用データ群についてのコードブックからの距離の
平均値や標準偏差に、あらかじめ決めておいた定数αを
かけた値や、隣接するクラスタ間の距離に、あらかじめ
決めておいた定数αをかけた値などが考えられる。The distance criterion for judgment can be changed depending on each cluster. For example, the average value or standard deviation of the distances from the codebook for all learning data groups in a cluster is multiplied by a predetermined constant α, or the distance between adjacent clusters is predetermined. The value multiplied by the constant α that has been used can be considered.

【００９５】さらに、各階層ごとにαを異なる値にする
ことにより、階層ごとに基準を変えることができる。例
えば、上位のクラスタほど基準値を大きくすれば、同じ
基準値を用いる場合に比べて、より下の階層で再学習で
きる。これは、下位の階層のクラスタほど再学習する領
域が狭くなるため、追加学習の時間が短くなるという特
徴がある。Furthermore, the reference can be changed for each layer by making α different for each layer. For example, if the reference value is set to be higher in the higher clusters, re-learning can be performed in a lower hierarchy as compared with the case where the same reference value is used. This is characterized by the fact that the cluster for the lower hierarchy has a smaller area for re-learning, so that the time for additional learning becomes shorter.

【００９６】図３３と図３４を用いて、本実施例を具体
的に説明する。図３３はクラスタリングの様子と追加学
習用データ群を示したものである。ＣＢ１からＣＢ６は
第１階層のクラスタのコードブックである。ＣＢ１１か
らＣＢ１３は第２階層のクラスタのコードブックであ
る。３３００と３３０２、３３０４は追加学習用データ
群である。３３１１から３３１６はＣＢ１からＣＢ６に
ついて、コードブックと追加学習用データ群との距離が
評価基準以内の領域である。３３２１から３３２３はＣ
Ｂ１１からＣＢ１３について、コードブックと追加学習
用データ群との距離が評価基準以内の領域である。この
評価基準以内の領域に追加学習用データ群が追加されれ
ば、該当するクラスタの領域内の全てのデータ群を用い
て再学習を行う。This embodiment will be specifically described with reference to FIGS. 33 and 34. FIG. 33 shows a state of clustering and a data group for additional learning. CB1 to CB6 are codebooks of clusters in the first layer. CB11 to CB13 are codebooks of clusters in the second layer. 3300, 3302, and 3304 are additional learning data groups. Reference numerals 3311 to 3316 denote areas of CB1 to CB6 in which the distance between the codebook and the additional learning data group is within the evaluation criterion. 3321 to 3323 are C
Regarding B11 to CB13, the distance between the codebook and the additional learning data group is within the evaluation standard. When the additional learning data group is added to the area within the evaluation standard, re-learning is performed using all the data groups in the area of the corresponding cluster.

【００９７】本実施例では、図２６におけるｋの値を１
とする。すなわち、コードブックと追加学習用データ群
との距離が、基準値以下になるクラスタのうち、最も下
位の階層のクラスタを再学習するものとする。In this embodiment, the value of k in FIG. 26 is set to 1
And That is, among the clusters in which the distance between the codebook and the additional learning data group is equal to or less than the reference value, the cluster in the lowest hierarchy is relearned.

【００９８】以下に、３３００、３３０２、３３０４の
３つの追加学習用データ群の再学習の階層の判定につい
て説明する。まず、追加学習用データ群３３００が追加
された場合を説明する。このデータ群はＣＢ１１の評価
基準内の領域内にあるので、ＣＢ１１のクラスタの領域
内で再学習を行う。The determination of the re-learning hierarchy of the three additional learning data groups 3300, 3302 and 3304 will be described below. First, the case where the additional learning data group 3300 is added will be described. Since this data group is within the area within the evaluation criteria of CB11, re-learning is performed within the area of the cluster of CB11.

【００９９】次に、追加学習用データ群３３０２が追加
された場合を説明する。このデータ群はＣＢ１２の評価
基準内の領域外にあるので、ＣＢ１２のクラスタの領域
内では再学習を行わない。そこで、ＣＢ１２より上の階
層のクラスタであるＣＢ１のクラスタについて評価す
る。ＣＢ１では、評価基準内の領域内にあるので、ＣＢ
１のクラスタの領域内で再学習を行う。Next, the case where the additional learning data group 3302 is added will be described. Since this data group is outside the area within the evaluation criteria of CB12, relearning is not performed within the area of the cluster of CB12. Therefore, the cluster of CB1 which is a cluster in a layer above CB12 is evaluated. In CB1, since it is in the area within the evaluation criteria, CB
Re-learning is performed within the area of one cluster.

【０１００】最後に、追加学習用データ群３３０２が追
加された場合を説明する。このデータ群はＣＢ１３につ
いてもＣＢ１についてもの評価基準内の領域外にあるの
で、全てのデータ群を用いて再学習を行う。Finally, the case where the additional learning data group 3302 is added will be described. Since this data group is outside the area within the evaluation criteria for CB13 and CB1, re-learning is performed using all data groups.

【０１０１】図３４は、図３３のクラスタを階層的に示
したものである。３４００は、追加学習用データ群３３
００によって再学習される領域である。ＣＢ１１の領域
内の全データ群が再学習に用いられる。３４０２は、追
加学習用データ群３３０２によって再学習される領域で
ある。ＣＢ１１からＣＢ１３のクラスタは破棄され、Ｃ
Ｂ１の領域内の全データ群が再学習に用いられる。３４
０４は、追加学習用データ群３３０４によって再学習さ
れる領域である。全てのクラスタは破棄され、全データ
群が再学習に用いられる。FIG. 34 shows the clusters of FIG. 33 hierarchically. 3400 is a data group 33 for additional learning
This is a region to be relearned by 00. All data groups in the area of CB11 are used for re-learning. An area 3402 is an area to be relearned by the additional learning data group 3302. The clusters CB11 to CB13 are discarded and C
All data groups in the area of B1 are used for re-learning. 34
Reference numeral 04 is an area to be relearned by the additional learning data group 3304. All clusters are discarded and all data sets are used for retraining.

【０１０２】なお、上記３種類の追加学習方法を併用す
ることも可能である。例えば、通常は図２９に示す追加
学習方法３を用いて、上の階層であると判定された場合
にのみ、追加学習方法２もしくは１を利用するというこ
とができる。追加学習方法３のみでは、上の階層で再学
習すると判定された場合に、追加学習に要するデータ群
の数が多くなるという欠点がある。複数の方式の併用に
より、この欠点を防ぐことができるという特徴がある。It is also possible to use the above three types of additional learning methods together. For example, it is usually possible to use the additional learning method 3 shown in FIG. 29 and use the additional learning method 2 or 1 only when it is determined to be in the upper hierarchy. The additional learning method 3 alone has a drawback that the number of data groups required for the additional learning increases when it is determined to re-learn in the upper hierarchy. There is a feature that this defect can be prevented by using a plurality of methods together.

【０１０３】次に、上記の追加学習方法において、記録
されていた学習用データ群を破棄する方法の一実施例を
説明する。破棄の方法としては、古いデータ群から破棄
する方法や、予め決められた評価基準を越えるか満たさ
ないデータ群を破棄する方法がある。破棄したデータ群
は、以降の追加学習において、クラスタリングや近似式
の作成には用いない。本実施例は、学習用データ群の増
加により、関数近似装置の記憶容量が足りなくなること
や、処理速度が遅くなることを回避することができると
いう特徴を持つ。Next, one embodiment of a method of discarding the recorded learning data group in the above additional learning method will be described. As a method of discarding, there are a method of discarding an old data group and a method of discarding a data group that exceeds or does not meet a predetermined evaluation standard. The discarded data group is not used for clustering or creation of an approximate expression in the subsequent additional learning. The present embodiment is characterized in that it is possible to prevent the storage capacity of the function approximating apparatus from becoming insufficient and the processing speed from becoming slow due to the increase of the learning data group.

【０１０４】[0104]

【発明の効果】本発明によれば、関数近似をする前にデ
ータのクラスタリングを行うことにより、入力データ群
に最もふさわしい近似式を用いて近似を実行することが
できるため、精度のよい近似が可能になる。さらに、デ
ータ群をクラスタリングし、統計的な処理を行なうこと
により、近似値の信頼度を得ることができる。さらに、
追加学習機能を持つことにより、学習用データ群を追加
したときも高速に学習できる。さらに、追加学習機能を
持つことにより、特性が変化する近似対象に対しても、
変化後のデータ群を追加学習するだけで、本関数近似装
置が特性の変化に追従できる。According to the present invention, since the clustering of data is performed before performing the function approximation, the approximation can be executed by using the approximation formula most suitable for the input data group. It will be possible. Furthermore, by clustering the data group and performing statistical processing, the reliability of the approximate value can be obtained. further,
By having the additional learning function, the learning can be performed at high speed even when the learning data group is added. Furthermore, by having an additional learning function, even for approximation targets whose characteristics change,
Only by additionally learning the changed data group, the present function approximating apparatus can follow the change in characteristics.

[Brief description of drawings]

【図１】本発明の一実施例を示す関数近似装置の処理を
示すブロック図である。FIG. 1 is a block diagram showing processing of a function approximating apparatus showing an embodiment of the present invention.

【図２】本発明の一実施例を示す関数近似装置の全体構
成図である。FIG. 2 is an overall configuration diagram of a function approximating apparatus showing an embodiment of the present invention.

【図３】出力データ群の信頼度を導出する関数近似装置
の処理を示すブロック図である。FIG. 3 is a block diagram showing a process of a function approximating device for deriving the reliability of an output data group.

【図４】クラスタリングの一例である。FIG. 4 is an example of clustering.

【図５】図４の一部のクラスタを階層的にクラスタリン
グした例である。5 is an example of hierarchical clustering of some of the clusters in FIG.

【図６】図５の一部のクラスタをさらに階層的にクラス
タリングした例である。FIG. 6 is an example in which some clusters in FIG. 5 are further hierarchically clustered.

【図７】図４から図６のクラスタリングによって得られ
たクラスタの階層構造である。FIG. 7 is a hierarchical structure of clusters obtained by the clustering of FIGS. 4 to 6;

【図８】学習用データ群の分布の一例である。FIG. 8 is an example of a distribution of a learning data group.

【図９】図８の学習用データ群をクラスタリングし、コ
ードブックを配置し、入力データ群を加えた例である。9 is an example in which the learning data group of FIG. 8 is clustered, a codebook is arranged, and an input data group is added.

【図１０】クラスタの中心付近において、学習用データ
群が密に分布しているクラスタの例である。FIG. 10 is an example of a cluster in which a learning data group is densely distributed near the center of the cluster.

【図１１】図１０とはデータ数は同じであるが、分布状
態が異なるクラスタの例である。11 is an example of a cluster having the same number of data as FIG. 10 but a different distribution state.

【図１２】図１０と図１１の分布状態を示す密度関数で
ある。FIG. 12 is a density function showing the distribution state of FIGS. 10 and 11.

【図１３】図１０とは分布状態の密度関数は同じである
が、データ数が異なるクラスタの例である。FIG. 13 is an example of a cluster having the same distribution density function as that of FIG. 10, but having a different number of data.

【図１４】図１４の分布状態を示す密度関数である。14 is a density function showing the distribution state of FIG.

【図１５】本発明の一実施例に用いた関数である。FIG. 15 is a function used in an embodiment of the present invention.

【図１６】本発明の入力データ群、出力データ群の設定
方法の例である。FIG. 16 is an example of a method for setting an input data group and an output data group according to the present invention.

【図１７】図１５の関数に対して学習した階層的クラス
タリング結果である。17 is a hierarchical clustering result learned for the function of FIG.

【図１８】図１７において、点滅表示をした例である。FIG. 18 shows an example of blinking display in FIG.

【図１９】出力の信頼度を表示した例である。FIG. 19 is an example in which the reliability of output is displayed.

【図２０】近似に用いた近似式を表示した例である。FIG. 20 is an example in which an approximate expression used for approximation is displayed.

【図２１】本発明の一実施例を示す異常検出装置のブロ
ック図である。FIG. 21 is a block diagram of an abnormality detection device showing an embodiment of the present invention.

【図２２】本発明の学習方法の一実施例を示すフローチ
ャートである。FIG. 22 is a flowchart showing an embodiment of the learning method of the present invention.

【図２３】本発明の実行方法の一実施例を示すフローチ
ャートである。FIG. 23 is a flowchart showing an embodiment of an execution method of the present invention.

【図２４】追加学習の一実施例を示すフローチャートで
ある。FIG. 24 is a flowchart showing an example of additional learning.

【図２５】追加学習の一実施例を示すフローチャートで
ある。FIG. 25 is a flowchart showing an example of additional learning.

【図２６】追加学習の一実施例を示すフローチャートで
ある。FIG. 26 is a flowchart showing an example of additional learning.

【図２７】追加学習前のクラスタの階層構造である。FIG. 27 is a hierarchical structure of clusters before additional learning.

【図２８】追加学習によって再学習するクラスタを判定
したときのクラスタの階層構造である。FIG. 28 is a hierarchical structure of clusters when a cluster to be relearned by additional learning is determined.

【図２９】追加学習後のクラスタの階層構造である。FIG. 29 is a hierarchical structure of clusters after additional learning.

【図３０】追加学習前のクラスタの状態の例である。FIG. 30 is an example of a state of a cluster before additional learning.

【図３１】追加学習用データ群に基づいてコードブック
を移動させた状態の例である。FIG. 31 is an example of a state in which the codebook is moved based on the additional learning data group.

【図３２】隣接するクラスタに第２番目の階層のクラス
タを選んだ場合に、再学習する可能性のある領域であ
る。FIG. 32 is an area in which there is a possibility of re-learning when a cluster in the second hierarchy is selected as an adjacent cluster.

【図３３】追加学習前のクラスタの状態と、追加学習の
クラスタの階層を決めるための評価基準を示したもので
ある。FIG. 33 shows a state of a cluster before additional learning and an evaluation criterion for determining a hierarchy of the cluster for additional learning.

【図３４】図３３のクラスタの階層構造である。34 is a hierarchical structure of the cluster shown in FIG.

[Explanation of symbols]

１００…学習部、１０２…クラスタ作成部、１０４…近
似式作成部、１０６…近似式評価部、１１０…記録部、
１１２…学習用データ群記録部、１１４…クラスタ情報
記録部、１１６…近似式記録部、１２０…実行部、１２
２…所属クラスタ判定部、１２４…近似値計算部。100 ... Learning unit, 102 ... Cluster creation unit, 104 ... Approximation formula creation unit, 106 ... Approximation formula evaluation unit, 110 ... Recording unit,
112 ... Learning data group recording unit, 114 ... Cluster information recording unit, 116 ... Approximate expression recording unit, 120 ... Execution unit, 12
2 ... Affiliated cluster determination unit, 124 ... Approximate value calculation unit.

───────────────────────────────────────────────────── フロントページの続き (72)発明者緒方日佐男東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Hisao Ogata 1-280, Higashi Koikekubo, Kokubunji, Tokyo Metropolitan Research Center, Hitachi, Ltd.

Claims

[Claims]

1. From an input data group which is a set of input data,
In a function approximation device that obtains output data or an output data group that is a set of output data, (1) calculation formula for function approximation for obtaining output data or output data group from input data group (hereinafter referred to as approximation formula) (A) means for clustering a learning data group, which is an input data group used for learning, into data groups having similar characteristics, and (b) an approximate expression for each created cluster. , (C) a means for evaluating the created approximate expression, (d) learning is ended when (c) is good for all clusters, and clustering is performed when there is a bad evaluation. A function approximation apparatus comprising: a function approximation expression learning means having a means for changing the approximation expression and a means for changing the approximation expression according to the evaluations of (e) and (c).

2. An input data group, which is a set of input data,
In a function approximation device that obtains output data or an output data group that is a set of output data, (1) calculation formula for function approximation for obtaining output data or output data group from input data group (hereinafter referred to as approximation formula) (A) means for clustering a learning data group, which is an input data group used for learning, into data groups having similar characteristics, and (b) an approximate expression for each created cluster. , (C) a means for evaluating the created approximate expression, (d) learning is ended when (c) is good for all clusters, and clustering is performed when there is a bad evaluation. And (e) and (c) according to the evaluation of the approximation formula changing means learning means, and (2) (a) (1) (a) clustering means To And means for recording learning data group can use this in, (b)
(1) In the clustering means of (a), means for recording information on the created cluster, and (c)
(1) In the means for obtaining the approximate expression in (b), a recording means having a means for recording the created approximate expression, and (3) When obtaining and executing output data or output data group from the input data group, , (A) means for determining the cluster to which the input data group belongs, and (b) means for obtaining output data or output data group from the input data group using the approximate expression of the determined cluster And a means for approximating a function.

3. The function approximating apparatus according to claim 2, wherein for each created cluster, the input data is set by setting a codebook that is a data group representing the characteristics of the data group existing in the cluster. A function approximating apparatus characterized by calculating a distance between an input data group and a codebook when selecting a cluster to which the group belongs and selecting a cluster to which the codebook closest to the input data group belongs.

4. The function approximating device according to claim 2, wherein (1) when changing the approximation formula of a cluster with a poor evaluation by means for changing the clustering of (d), (f) the corresponding cluster Means for performing hierarchical clustering by clustering the region of into smaller clusters,
(G) means for creating an approximate expression for each cluster created in (f), and a function approximating apparatus.

5. The function approximating apparatus according to claim 2, wherein a cluster to which the input data group is determined to belong to
A function approximating apparatus comprising means for deriving the reliability of the obtained output data or output data group by calculating the evaluation amount of the input data group.

6. The function approximating apparatus according to claim 5, wherein for the clusters to which the input data group is determined to belong,
A function approximating apparatus having means for discarding the output data or output data group when the reliability does not exceed a reference.

7. The function approximating apparatus according to claim 2, further comprising means for displaying a classification relation between the created clusters and displaying a relation between a cluster to which the input data group belongs and another cluster. Characteristic function approximation device.

8. The function approximating apparatus according to claim 2, further comprising means for displaying the reliability of the derived output data group.

9. The function approximating apparatus according to claim 2, further comprising means for displaying a function approximating expression derived for each cluster.

10. The function approximating apparatus according to claim 2, wherein when the time series data is handled, one data or a set of a plurality of data is output from the time series data obtained from a plurality of times. A means for obtaining a calculation formula for deriving an output data group from the input data group for each cluster, using a data group and a data set other than the output data group as the input data group.
The function approximating device having the function approximating formula learning means and deriving an approximate value of the output data group using the calculated formula.

11. The function approximating apparatus according to claim 2, wherein the time series data (P
1, P2 ,. ．． , Pt), (P1, P
2 ,. ．． , Pt-1) as the input data group and Pt as the output data group, and when the error between the output data group and the actual measurement value Rt at time t exceeds the reference value, it is determined that the behavior is not predicted. Then, the function approximating apparatus is provided with means for detecting that the target system is behaving abnormally, and means for displaying the detected abnormal behavior.

12. The function approximating apparatus according to claim 2, wherein the function approximating means (1) is used to obtain a calculation formula for deriving an output data group related to another location from an input data group measured at a plurality of locations. A function approximating device, characterized in that the function approximating means has means for deriving a predicted value or an interpolated value of an output data group.

13. The function approximating apparatus according to claim 2, wherein when an approximate expression is created, data of an input data group and an output data group of the approximate expression (that is, an input variable and an output variable of the approximate expression) for each cluster. A function approximating apparatus characterized by having a means for changing the number and type) of.

14. The function approximating apparatus according to claim 13, wherein when an approximate expression cannot be created because the number of data in the cluster is small, an input variable of the approximate expression is applied to the corresponding cluster. And a function approximation device characterized by reducing the number of output variables.

15. The function approximating apparatus according to claim 4, wherein when hierarchical clustering is performed, an additional learning data group, which is a new learning data group, is added in addition to the existing learning data group. When learning, the value of m is arbitrarily determined in advance, and among the clusters created by the learning performed in advance, the cluster Xm to which the additional learning data group belongs and which is in the mth hierarchy from the top is selected. A function approximating apparatus characterized by having means for performing re-learning by selecting and using the learning data group and additional learning data group in the cluster Xm.

16. The function approximating apparatus according to claim 4, wherein when hierarchical clustering is performed, an additional learning data group, which is a new learning data group, is added in addition to the existing learning data group. When learning, the values of m and n are arbitrarily determined in advance, and among the clusters created by the learning performed in advance, the cluster for the additional learning data group and in the mth hierarchy from the top Xm is selected, learning is performed using the learning data group and the additional learning data group in the cluster Xm, the cluster Yn adjacent to the cluster Xm and in the nth hierarchy from the top is selected. A function approximating apparatus having means for determining whether or not to perform re-learning, and having means for re-learning using a learning data group in the determined cluster, and performing additional learning.

17. The function approximating apparatus according to claim 4, wherein when hierarchical clustering is performed, an additional learning data group, which is a new learning data group, is added in addition to the existing learning data group. At the time of learning, the value of k is arbitrarily determined in advance, the evaluation amount is calculated for the cluster to which the additional learning data group belongs in each layer, and among the clusters whose evaluation amount is equal to or less than the reference, A function approximating apparatus having means for selecting a cluster Xk and performing learning on a learning data group and additional learning data group in the cluster Xk, and performing additional learning.

18. The function approximating apparatus according to claim 15, 16, or 17, wherein when the number or capacity of learning data groups exceeds a reference due to the addition of learning data groups, the old A function approximating apparatus characterized in that a learning data group is deleted from a learning data group recording unit, and the deleted learning data group is not used in subsequent additional learning.

19. In a function approximation method for obtaining output data or an output data group from an input data group, (1) (a) a learning data group is clustered for each data group having similar characteristics, and (b) is created. An approximate expression is obtained for each cluster, (c) the created approximate expression is evaluated, and (d)
If the evaluation of (c) is good for all clusters, the learning is terminated, if there is a cluster with a bad evaluation, the clustering is changed, and the approximation formula is changed according to the evaluations of (e) and (c). (2) (a) Determine the cluster to which the input data group belongs,
(B) A function approximation method characterized by performing output by obtaining output data or an output data group from an input data group using an approximate expression of the determined cluster.