JPH0540829A

JPH0540829A - Data clustering method

Info

Publication number: JPH0540829A
Application number: JP3197734A
Authority: JP
Inventors: Mitsuhiro Inazumi; 満広稲積
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 1991-08-07
Filing date: 1991-08-07
Publication date: 1993-02-19

Abstract

PURPOSE:To obtain an equivalent result to that obtained with an operator's subjective clustering while seeing entire data by performing data conversion by means of a neural network as a preliminary processing of a clustering means. CONSTITUTION:A data input means 1, a data conversion neural network 2 inputting the data of the data input means 1 and a data inversion neural network 3 inputting the output of the data inversion neural network 2, are incorporated. A data comparison/learning control means 4 inputting the data of the data input means 1 as well as the output of the data inversion neural network 3 and controlling the learning of the network 2 and network 3, is also incorporated. The data conversion by means of the neural network is performed as the preliminary processing of the clustering means. Thus, the result equivalent to that obtained with the operator's subjective clustering while looking at the structure of the entire data, can be easily obtained.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は情報圧縮、パタン認識等
に用いられるデータクラスタリング方法に関するもので
ある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data clustering method used for information compression, pattern recognition and the like.

【０００２】[0002]

【従来の技術】従来の技術におけるデータをクラスタリ
ングする方法の例として、例えばＬＢＧアルゴリズムを
考える。これは、いくつかの設定されたクラスタ中心点
を考え、それと各々のデータ点の距離を誤差と考え、そ
の誤差の総計を小さくするようにクラスタ中心点を修正
し、データ全体をクラスタリングする方法である。2. Description of the Related Art As an example of a conventional method for clustering data, consider the LBG algorithm. This is a method of clustering the entire data by considering several set cluster center points, considering the distance between it and each data point as an error, and correcting the cluster center points so as to reduce the total error. is there.

【０００３】これは非常に有効な方法であるが、基本的
にはある２点間の情報しか用いていない方法である。Although this is a very effective method, it is basically a method that uses only information between two points.

【０００４】これらの方法における問題点として、例え
ば図３に示されるような場合を考えてみる。この図にお
いてデータは図中番号９に示した半円上と、１０に示し
た半円上にあるとする。これを２つのクラスタに分ける
事を考える。As a problem with these methods, consider the case shown in FIG. 3, for example. In this figure, it is assumed that the data are on the semicircle indicated by number 9 and the semicircle indicated by 10 in the figure. Consider dividing this into two clusters.

【０００５】人間がこのデータを２つに分けるのは非常
に簡単である。つまり、９の半円と１０の半円に分ける
だけである。It is very easy for a human to split this data in two. In other words, it is only divided into 9 semicircles and 10 semicircles.

【０００６】しかし、従来の２点間の距離を考える方法
においては、この問題はそれほど簡単ではない。つま
り、例えば１０の半円のクラスタ中心を従来的な方法に
おいて求め、それと点１１と点１２の間の距離を考えれ
ば、点１１の方が点１２よりも１０の半円のクラスタに
近い事になってしまう。つまり、従来の方法において得
られるクラスタは、上で述べた人間が分けたクラスタと
は全く異なったものになってしまうのである。However, in the conventional method of considering the distance between two points, this problem is not so easy. That is, for example, if the center of the cluster of 10 semicircles is found by a conventional method and the distance between the points 11 and 12 is considered, the point 11 should be closer to the cluster of 10 semicircles than the point 12. Become. In other words, the cluster obtained by the conventional method is completely different from the above-mentioned human-divided cluster.

【０００７】[0007]

【発明が解決しようとする課題】本発明が解決しようす
る課題は、従来的な単純に２点間の距離のみを考えたク
ラスタリングが与えるクラスタが、人間がデータの全体
を多面的に見て、主観的に分けたクラスタとは全く異な
ったものになると言う事であり、本発明の目的はより人
間の主観に近いクラスタリングを実現するクラスタリン
グ方法を提供する事である。The problem to be solved by the present invention is that a conventional clustering that simply considers only the distance between two points gives humans a multifaceted view of the entire data, It is said that it will be completely different from a cluster divided subjectively, and an object of the present invention is to provide a clustering method that realizes clustering closer to human subjectivity.

【０００８】[0008]

【課題を解決するための手段】図１は本発明の方法の構
成の模式図である。この図を用い本発明の構成を示す
と、本発明は、１）、データ入力手段１と、２）、１のデータ入力手段によるデータを入力とするデ
ータ変換ニューラルネットワーク２と、３）、２のデータ変換ニューラルネットワークの出力を
入力とするデータ逆変換ニューラルネットワーク３と、４）、１データ入力手段によるデータと、３のデータ逆
変換ニューラルネットワークの出力を入力とし、２のデ
ータ変換ニューラルネットワークと、３のデータ逆変換
ニューラルネットワークの学習を制御するデータ比較・
学習制御手段と４、をその構成中に含む事を特徴とするデータクラスタリン
グ方法である。FIG. 1 is a schematic view of the constitution of the method of the present invention. When the configuration of the present invention is shown using this figure, the present invention is as follows: 1), a data input means 1 and 2), a data conversion neural network 2 which receives data from the data input means 1 and 3), 2 A data inverse transformation neural network 3 that receives the output of the data transformation neural network of 4), and 4) a data transformation neural network that receives the data from the 1 data input means and the output of the 3 data inverse transformation neural network as an input Data comparison that controls the learning of the data inversion neural network of 3
This is a data clustering method characterized by including learning control means and 4 in its configuration.

【０００９】[0009]

【実施例】図３の例を用い本発明を説明する。The present invention will be described with reference to the example of FIG.

【００１０】図１は本発明の全体の概略図である。また
図２は図１の内、データ変換ニューラルネットワーク
と、データ逆変換ニューラルネットワークの部分を、図
３の例を処理する例として具体的に書いたものである。FIG. 1 is an overall schematic view of the present invention. FIG. 2 specifically shows the data conversion neural network and the data inverse conversion neural network in FIG. 1 as an example of processing the example of FIG.

【００１１】このネットワークに図３に示したデータの
例として、表１に入力データとして示した３４個の点を
入力する場合を考える。As an example of the data shown in FIG. 3, consider the case where the 34 points shown as the input data in Table 1 are input to this network.

【００１２】[0012]

【表１】 [Table 1]

【００１３】先ずそれぞれのネットワークに適当な初期
値を設定し、データ変換ニューラルネットワーク２に入
力データを与え、その変換出力を計算する。次に、その
変換出力を入力データとしてデータ逆変換ニューラルネ
ットワークに与え、その逆変換出力を計算する。First, an appropriate initial value is set for each network, input data is given to the data conversion neural network 2, and the conversion output is calculated. Next, the converted output is given as input data to a data inverse conversion neural network, and the inverse converted output is calculated.

【００１４】このように次々とデータを与え、入力した
データと、データ逆変換ニューラルネットワークの逆変
換出力が同じになるまで各々のニューラルネットワーク
を適当な学習アルゴリズムで学習させる。In this way, the data is successively supplied, and each neural network is trained by an appropriate learning algorithm until the input data and the inverse transform output of the data inverse transform neural network become the same.

【００１５】表１にニューラルネットワークを学習させ
た後の、各々のデータに対応する変換出力と、逆変換出
力を示す。この例においては、誤差の評価としてマクレ
ラン誤差を、また学習アルゴリズムとして誤差逆伝搬ア
ルゴリズムを用いて学習させた。表１より明かであるよ
うに、入力データと逆変換出力は殆ど一致している。図
４は表１の変換出力を模式的に図式化したものである。Table 1 shows the conversion output and the inverse conversion output corresponding to each data after learning the neural network. In this example, the McClellan error was used as the error evaluation, and the error backpropagation algorithm was used as the learning algorithm. As is clear from Table 1, the input data and the inverse transform output almost match. FIG. 4 is a schematic diagram of the conversion output of Table 1.

【００１６】図４の番号１３は図３半円９を、１４は半
円１０を、点１５は点１１を、点１６は点１２にそれぞ
れ対応している。The numeral 13 in FIG. 4 corresponds to the semicircle 9 in FIG. 3, 14 corresponds to the semicircle 10, point 15 corresponds to the point 11 and point 16 corresponds to the point 12.

【００１７】図４より明かであるように、このように変
換されたデータを用いれば、それをクラスタリングする
アルゴリズムが従来的なものであっても、その結果は人
間がクラスタリングする場合と同等の結果を与える。As is clear from FIG. 4, even if the algorithm for clustering the data is the same as when humans are clustered, the result obtained by using the data thus converted is the same. give.

【００１８】この例においては、入力データと、その変
換出力が同じ次元を持つとしたが、データの構造によっ
ては、変換出力の方が次元が小さい場合、あるいはその
逆に次元を大きくした方がより自然にデータを表現する
場合も考えられる。In this example, the input data and the converted output have the same dimension. However, depending on the structure of the data, if the converted output has a smaller dimension or vice versa, the dimension may be increased. It may be possible to express the data more naturally.

【００１９】[0019]

【発明の効果】本発明によれば、人間がデータ全体の構
造から主観的にクラスタリングするのと同等の結果を容
易に得る事ができる。As described above, according to the present invention, it is possible to easily obtain the same result as human being subjectively clusters from the structure of the entire data.

[Brief description of drawings]

【図１】本発明によるクラスタリング方法の全体の概
略図である。FIG. 1 is an overall schematic diagram of a clustering method according to the present invention.

【図２】本発明の１実施例としてのニューラルネット
ワークの構成図である。FIG. 2 is a configuration diagram of a neural network as one embodiment of the present invention.

【図３】本発明の説明の為のデータ例を示した図であ
る。FIG. 3 is a diagram showing an example of data for explaining the present invention.

【図４】本発明により変換された図３のデータを示し
た図である。FIG. 4 is a diagram showing the data of FIG. 3 converted according to the present invention.

[Explanation of symbols]

１：データ入力手段２：データ変換ニューラルネットワーク３：データ逆変換ニューラルネットワーク４：データ比較・学習制御手段５：クラスタリング手段６：データ変換ネットワーク７：データ逆変換ネットワーク８：クラスタリング手段９：半円状のデータ１０：半円状のデータ１１：データの端点１２：データの端点１３：９に対応するデータ１４：１０に対応するデータ１５：１１に対応するデータ１６：１２に対応するデータ 1: Data input means 2: Data conversion neural network 3: Data inverse conversion neural network 4: Data comparison / learning control means 5: Clustering means 6: Data conversion network 7: Data inverse conversion network 8: Clustering means 9: Semicircular shape Data 10: semi-circular data 11: end point of data 12: end point of data 13: data corresponding to 9: data corresponding to 14:10 data corresponding to 15: 11 data corresponding to 16: 12

Claims

[Claims]

1. A data inputting means, 2) a data conversion neural network which receives data from the 1 data inputting means, and 3) a data inverse which inputs the output of the 2 data conversion neural network. A conversion neural network, 4) data for controlling learning of the data conversion neural network 2 and the data inverse conversion neural network 3 with the data from the one data input means and the output of the data inverse conversion neural network 3 as input Comparison
A data clustering method characterized by including learning control means in its configuration.