JPH0836557A

JPH0836557A - Cluster classifying device

Info

Publication number: JPH0836557A
Application number: JP6172442A
Authority: JP
Inventors: Mikihiko Terajima; 寺島幹彦
Original assignee: Olympus Optical Co Ltd
Current assignee: Olympus Corp
Priority date: 1994-07-25
Filing date: 1994-07-25
Publication date: 1996-02-06

Abstract

PURPOSE:To properly classify clusters without any previous knowledge of the number, positions, distribution shape, etc., of the clusters and without depending upon processing procedures, to visually see the process and result of the processing to facilitate processing by computing and to obtain the hierarchical structure of the clusters. CONSTITUTION:The device consists of a map generation part 11 which generates a map consisting of a prototype group for input data by using linear self- structured feature mapping, a hierarchical structure generation part 12 which generates the hierarchical structure of the clusters from the map, and a labeling part 13 which classifies the input data according to the obtained map and hierarchical structure, and, the hierarchical structure generation part 12 consists of a map analysis part 121 which calculates a quantity showing the degree of integration of the clusters from the obtained map and generates a data sequence and a data sequence merging part 122 which generates the hierarchical structure of the clusters from the obtained data sequence.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、クラスタ分類装置に関
し、特に、複数個のデータをその類似性によってクラス
タとしてまとめることにより複数個のクラスタに分類す
る装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a cluster classification device, and more particularly to a device for classifying a plurality of data into a plurality of clusters by grouping them into clusters according to their similarity.

【０００２】[0002]

【従来の技術】複数個のデータをその類似性により複数
個のクラスタに分類する方法としては、代表的には最尤
推定法がある。この方法は、クラスタ数が既知で、それ
ぞれのクラスタの大まかな位置が分かっている場合に用
いることができる。まず、それぞれのクラスタ内のデー
タの分布を例えば正規分布等と仮定し、平均、分散等の
パラメータを近似的に計算する。次に、あるデータがそ
のクラスタに所属する確率（この場合は正規分布）から
識別関数を定義する。そして、パラメータから求められ
る識別関数の大小によりデータをクラスタに割り当てる
ことによってクラスタ分類を行う。2. Description of the Related Art As a method for classifying a plurality of data into a plurality of clusters according to their similarity, there is a maximum likelihood estimation method. This method can be used when the number of clusters is known and the rough position of each cluster is known. First, assuming that the distribution of data in each cluster is, for example, a normal distribution, parameters such as mean and variance are approximately calculated. Next, the discriminant function is defined from the probability (certain distribution in this case) of certain data belonging to the cluster. Then, cluster classification is performed by assigning data to clusters according to the size of the discriminant function obtained from the parameters.

【０００３】クラスタ数が既知で、分布の形を仮定しな
い方法としては、Ｋ−ｍｅａｎｓ法、ＬＢＧ法がある。
これは、分類の良さに関する評価基準を定義し、１）各
クラスタの代表点の選出、２）その代表点を基にしたク
ラスタ分類、という操作を逐次繰り返すことにより評価
基準を最適化し、クラスタ分類を行う方法であり、非階
層的方法と呼ばれる。As a method in which the number of clusters is known and the shape of the distribution is not assumed, there are the K-means method and the LBG method.
This is to define an evaluation criterion for goodness of classification, optimize the evaluation criterion by sequentially repeating the operations of 1) selection of representative points of each cluster, 2) cluster classification based on the representative points, and cluster classification. Is called a non-hierarchical method.

【０００４】クラスタ数が未知で、分布の形も仮定でき
ない場合、すなわち、データに関する前知識が全然ない
場合は、階層的方法がある。これは、データ間及びクラ
スタ間に何らかの距離を定義して、それを基にデータを
逐次的に統合・分割し、クラスタ分類を行うものであ
る。When the number of clusters is unknown and the shape of the distribution cannot be assumed, that is, when there is no prior knowledge about the data, there is a hierarchical method. In this method, some distance is defined between data and clusters, and the data is sequentially integrated / divided based on the distance to perform cluster classification.

【０００５】また、データを自己組織化特徴マッピング
ニューラルネットワークに入力し、２次元のマップ上の
素子にデータを割り当て、その素子に対応するデータの
数からクラスタ分けをする手法が提案されている（Xueg
ong Zhang,Yanda Li,"SELF-ORGANIZING MAP AS A NEW M
ETHOD FOR CLUSTERING AND DATA ANALYSIS",Proceeding
s of the International Joint Conference on Neural
Networks,vol.3,pp.2448-2451,1993) 。Further, a method has been proposed in which data is input to a self-organizing feature mapping neural network, data is assigned to elements on a two-dimensional map, and clustering is performed based on the number of data corresponding to the elements ( Xueg
ong Zhang, Yanda Li, "SELF-ORGANIZING MAP AS A NEW M
ETHOD FOR CLUSTERING AND DATA ANALYSIS ", Proceeding
s of the International Joint Conference on Neural
Networks, vol.3, pp. 2448-2451, 1993).

【０００６】[0006]

【発明が解決しようとする課題】上記のように、データ
をクラスタリングする従来の方法は、クラスタの数や位
置、分布の形を仮定しているものがほとんどである。と
ころが、一般的にクラスタ分類を行う場合、分類前はク
ラスタの数や分布の形は未知であることが多い。例え
ば、画像の領域分割を行うために特徴ベクトルをクラス
タ分類しようとした場合、分類前にクラスタの数や分布
の形状は未知である。As described above, most of the conventional methods for clustering data assume the number and position of clusters and the shape of distribution. However, in general, when performing cluster classification, the number of clusters and the shape of distribution are often unknown before classification. For example, when a feature vector is classified into clusters to divide an image into regions, the number of clusters and the shape of distribution are unknown before the classification.

【０００７】前述の、最尤推定法やＫ−ｍｅａｎｓ法、
ＬＢＧ法は、クラスタの数、位置、分布の形状を仮定し
て行う方法であり、この場合、その仮定を間違えたり、
初期値としての与え方が不適当な場合、本来クラスタを
構成しているのにクラスタ分類されなかったり（過統
合）、１つのクラスタとされるべきものが複数のクラス
タに分類されたり（過分割）、本来所属すべきクラスタ
に分類されなかったり（誤分類）して、適正な結果は得
られなくなる。クラスタの数を順次変化させてそれぞれ
の場合を調べる方法が、特開平５−２０５０５８号に開
示されているが、分類処理をクラスタの数だけ繰り返さ
なければならず、アルゴリズムが複雑になる。かつ、そ
の場合、クラスタの数を正しく推定したとしても、その
位置、分布の仮定を間違えると、誤分類を生じ、適正な
分類はできない。The above-mentioned maximum likelihood estimation method and K-means method,
The LBG method is a method performed by assuming the number of clusters, positions, and the shape of distribution. In this case, the assumption may be wrong or
If the method of giving the initial value is inappropriate, the cluster is not classified even though it originally constitutes a cluster (over-integration), or what should be one cluster is classified into multiple clusters (over-division). ), The cluster is not originally classified (misclassification), and an appropriate result cannot be obtained. A method of sequentially changing the number of clusters and examining each case is disclosed in Japanese Patent Laid-Open No. 5-205058, but the classification process must be repeated for the number of clusters, which complicates the algorithm. And in that case, even if the number of clusters is correctly estimated, if the position and distribution assumptions are mistaken, misclassification will occur and proper classification cannot be performed.

【０００８】また、従来のクラスタの数や分布の形状を
仮定しない階層的手法には、以下の問題点がある。Ａ−１）分割・統合処理の手順、及び、アルゴリズムの
初期状態の設定によって結果が大きく変化する。Ａ−２）統合されない（すなわち、クラスタ分類されな
い）データが残ってしまうことがある。Ａ−３）処理の経過、結果を表すことが難しく、何時処
理を終了するかを明確に判断できないため、過統合、過
分割が起こりやすい。The conventional hierarchical method which does not assume the number of clusters or the shape of distribution has the following problems. A-1) The result greatly changes depending on the procedure of the division / integration process and the setting of the initial state of the algorithm. A-2) Data that is not integrated (that is, not cluster-classified) may remain. A-3) It is difficult to represent the progress and results of the processing, and it is not possible to clearly determine when to end the processing, so that over-integration and over-division tend to occur.

【０００９】前述のデータを自己組織化特徴マッピング
ニューラルネットワークに入力し、２次元のマップ上の
素子にデータを割り当て、その素子に対応するデータの
数からクラスタ分けをする方法では、処理の経過、結果
を表示することができる。しかし、この方法は２次元の
マップを用いており、視覚的には表示できるが、その結
果から、視覚的にではなく、計算的にクラスタを見つけ
る処理をするのは、大きな工数と複雑なアルゴリズムを
必要とする。In the method of inputting the above-mentioned data into the self-organizing feature mapping neural network, allocating the data to the elements on the two-dimensional map, and clustering from the number of data corresponding to the elements, The results can be displayed. However, although this method uses a two-dimensional map and can be displayed visually, the process of finding clusters computationally, not visually, is the result of large man-hours and complicated algorithms. Need.

【００１０】以上の課題を整理すると、本発明のクラス
タ分類装置に求められる条件は、以下の通りである。Ｂ−１）クラスタの数、位置、分布の形等の前知識なし
に、過統合や過分割のない適正なクラスタ分類ができ
る。Ｂ−２）処理の手順に依存しないクラスタ分類ができ
る。Ｂ−３）処理の経過や結果を視覚的に見ることができ、
しかもその結果を計算的に処理することが容易である。Summarizing the above problems, the conditions required for the cluster classification device of the present invention are as follows. B-1) Appropriate cluster classification without over-integration or over-division can be performed without prior knowledge of the number, position, distribution shape, etc. of clusters. B-2) Cluster classification that does not depend on the processing procedure can be performed. B-3) You can visually see the progress and results of the processing,
Moreover, it is easy to process the result computationally.

【００１１】また、クラスタ分類においては、その応用
目的によって、分類後、さらに特定のデータに関して分
割したり統合すべき場合がある。このとき、クラスタの
階層構造が得られていれば、再統合、再分割は容易であ
る。よって、上記のＢ−１）、Ｂ−２）、Ｂ−３）の条
件に次のＢ−４）を加える。Ｂ−４）クラスタの階層構
造を得ることができる。Further, in the cluster classification, there is a case where the specific data should be further divided or integrated after the classification depending on its application purpose. At this time, if the hierarchical structure of the cluster is obtained, re-integration and re-division are easy. Therefore, the following B-4) is added to the above conditions B-1), B-2), and B-3). B-4) A hierarchical structure of clusters can be obtained.

【００１２】本発明はこのような状況に鑑みてなされた
ものであり、その目的は、上記のＢ−１）、Ｂ−２）、
Ｂ−３）、Ｂ−４）の条件を満たして、クラスタの数、
位置、分布の形等の前知識なしに、過統合や過分割のな
い適正なクラスタ分類ができ、処理の手順に依存しない
で分類でき、処理の経過や結果を視覚的に見ることがで
き、しかもその結果を計算的に処理することが容易で、
かつ、クラスタの階層構造を得ることができるクラスタ
分類装置を提供することにある。The present invention has been made in view of such a situation, and its objects are the above-mentioned B-1), B-2),
The number of clusters satisfying the conditions of B-3) and B-4),
Appropriate cluster classification without over-integration or over-division can be performed without prior knowledge of position, distribution shape, etc., classification can be performed without depending on the processing procedure, and the progress and results of processing can be visually viewed. Moreover, it is easy to process the result computationally,
Another object of the present invention is to provide a cluster classification device that can obtain a hierarchical structure of clusters.

【００１３】[0013]

【課題を解決するための手段】上記の目的を達成する本
発明のクラスタ分類装置は、１次元の自己組織化特徴マ
ッピングを用いて、入力データに対するプロトタイプ群
からなるマップを作成するマップ作成部と、そのマップ
からクラスタの階層構造を作成する階層構造作成部と、
得られたマップと階層構造に従って入力データを分類す
るラベル付け部とからなることを特徴とするものであ
る。A cluster classifying apparatus of the present invention which achieves the above object, uses a one-dimensional self-organizing feature mapping to create a map consisting of a group of prototypes for input data. , A hierarchical structure creating section for creating a hierarchical structure of the cluster from the map,
It is characterized by comprising the obtained map and a labeling section for classifying input data according to a hierarchical structure.

【００１４】この場合、階層構造作成部は、得られたマ
ップからクラスタの集積度を表す量を計算し、データ列
を作成するマップ解析部と、得られたデータ列からクラ
スタの階層構造を作成するデータ列融合部とからなる場
合と、得られたマップからクラスタの集積度を表す量を
計算する集積度計算部と、クラスタの階層構造を作成す
るプロトタイプ融合部とからなる場合と、得られたマッ
プからクラスタの階層構造を作成するプロトタイプ融合
部のみからなる場合とが考えられる。In this case, the hierarchical structure creating unit calculates a quantity representing the degree of cluster clustering from the obtained map and creates a data sequence, and a map analyzing unit that creates the cluster hierarchical structure from the obtained data sequence. And a data integration unit that calculates a cluster integration amount from the obtained map, and a prototype integration unit that creates a hierarchical structure of clusters. It is considered that it consists of only the prototype fusion part that creates the hierarchical structure of the cluster from the map.

【００１５】[0015]

【作用】以下、上記のような構成を採用する理由と作用
について説明する。まず、本発明の構成の概略とその作
用を、図１のブロック図と、クラスタ分類の過程を簡単
に示す図２〜図６を参照にして説明する。まず、本発明
の構成の概略を示すと、図１に示したように、入力デー
タを入力してマップを作成するマップ作成部１１と、ク
ラスタの階層構造を作成する階層構造作成部１２と、階
層構造によってラベル付けされたマップと入力データか
ら入力データのラベル付けを行うラベル付け部１３から
なっている。The function and operation of adopting the above configuration will be described below. First, the outline of the configuration of the present invention and its operation will be described with reference to the block diagram of FIG. 1 and FIGS. 2 to 6 which briefly show the process of cluster classification. First, the outline of the configuration of the present invention will be described. As shown in FIG. 1, a map creation unit 11 that inputs input data and creates a map, a hierarchical structure creation unit 12 that creates a hierarchical structure of clusters, It comprises a map labeled with a hierarchical structure and a labeling unit 13 that labels input data from input data.

【００１６】階層構造作成部１２は、その一例として、
マップからクラスタの集積度に関係する量を計算し、デ
ータ列を作成するマップ解析部１２１と、そのデータ列
に基づいてクラスタの階層構造を作成するデータ列融合
部１２２から構成する。階層構造作成部１２についての
他の例は後述する。The hierarchical structure creating section 12 is, for example, as follows.
The map analysis unit 121 calculates a quantity related to the cluster integration degree from a map and creates a data string, and the data string fusion unit 122 creates a cluster hierarchical structure based on the data string. Another example of the hierarchical structure creating unit 12 will be described later.

【００１７】この構成のクラスタ分類装置の作用を示す
一例として、２次元のデータを３つのクラスタに分類す
ることを考える。その中の１つのクラスタは、さらに２
つのサブクラスタからなっているとする。ここでは、そ
の階層構造も得ることを考える。もちろん、クラスタ分
類前は、クラスタ数や分布の形状は未知である。As an example showing the operation of the cluster classifying apparatus having this structure, consider classifying two-dimensional data into three clusters. One cluster in that is 2 more
It consists of two sub-clusters. Here, it is considered to obtain the hierarchical structure. Of course, before cluster classification, the number of clusters and the shape of the distribution are unknown.

【００１８】まず、マップ作成部１１について説明す
る。マップ作成部１１は、データ入力部１１１とマップ
部１１２から構成される。データ入力部１１１におい
て、入力データ群２１を入力する。入力データ群２１
は、図２に示すような２次元ベクトルであり、大きく分
けて３つのクラスタ２１Ａ、２１Ｂ、２１Ｃを形成し、
その１つの２１Ａは２つのサブクラスタ２１Ａ１、２１
Ａ２からなっている。ただし、階層構造作成部１２の説
明までは、２１Ａが２つのサブクラスタ２１Ａ１、２１
Ａ２からなることは考慮しないこととする。First, the map creating section 11 will be described. The map creation unit 11 includes a data input unit 111 and a map unit 112. In the data input unit 111, the input data group 21 is input. Input data group 21
Is a two-dimensional vector as shown in FIG. 2, and is roughly divided into three clusters 21A, 21B, and 21C,
The one 21A includes two sub-clusters 21A1, 21
It consists of A2. However, until the explanation of the hierarchical structure creating unit 12, 21A has two sub-clusters 21A1 and 21A2.
It does not consider that it consists of A2.

【００１９】次に、マップ部１１２で、入力データ群２
１を用いて、図３のマップ３１を作成する。マップ３１
は、複数個（ｋ個と置く。）の素子群３２によって構成
される。入力データ群２１のそれぞれのデータは、素子
群３２の何れかの素子に対応するようにする。具体的な
対応方法を述べる。まず、入力データ群２１に対するプ
ロトタイプ群３３を素子数個（ｋ個）だけ作成する。そ
して、各素子にそれぞれプロトタイプ群３３の１つを割
り当てる。そして、入力データ群２１の各々に対しその
入力データと最も類似しているプロトタイプを持つ素子
を対応させればよい。そのとき、入力データ群２１の
中、類似しているデータはそれぞれマップ３１上で近い
素子に対応し、類似していないデータはそれぞれマップ
３１上で遠い素子に対応するように、素子にプロトタイ
プを割り当てる。つまり、入力データ群２１の各データ
の位相情報をマップ３１に反映させるのである。Next, in the map section 112, the input data group 2
1 is used to create the map 31 of FIG. Map 31
Is constituted by a plurality (k pieces) of element groups 32. Each data of the input data group 21 corresponds to any element of the element group 32. A concrete correspondence method will be described. First, the prototype group 33 for the input data group 21 is created by the number of elements (k). Then, one of the prototype group 33 is assigned to each element. Then, each of the input data groups 21 may be associated with an element having a prototype most similar to the input data. At this time, in the input data group 21, the elements are prototyped so that the similar data correspond to the close elements on the map 31, and the dissimilar data respectively correspond to the distant elements on the map 31. assign. That is, the phase information of each data of the input data group 21 is reflected on the map 31.

【００２０】このようにして、入力データ群２１から、
クラスタ２１Ａ〜Ｃに属するベクトルにそれぞれ対応す
る素子群３２Ａ〜Ｃからなるマップ３１を作成する。In this way, from the input data group 21
A map 31 including element groups 32A to 32C corresponding to the vectors belonging to the clusters 21A to 21C is created.

【００２１】ここで注意しなくてはならないのは、クラ
スタ２１Ａ〜Ｃの記号は説明の便宜上付けたものであ
り、入力データ群２１は、クラスタ分類前に全くラベル
付けされていないことである。もし、クラスタ分類前に
いくつかの入力データがラベル付けされている場合は、
ラベル付けされていないデータに対して、マップ作成後
に簡単にクラスタ分類ができる。この方法を述べてお
く。１）あるラベル（例えばＡ）のクラスタ２１Ａに属
するデータに対応するマップ３１上の素子を選び、その
素子にラベルＡを与える。２）１）の操作を２１Ｂ、２
１Ｃのクラスタに属するデータに対しても行い、マップ
３１上の素子群にそれぞれＡ〜Ｃのラベルの何れかを与
える。３）ラベル付けしていない入力データ群２１に対
応するマップ上の素子を見つけ、その素子のラベルをそ
のデータのラベルとする。１）から３）の操作を行うこ
とにより、全ての入力データをラベル付けでき、クラス
タ分類が終了する。マップ３１をみると、上記の２）の
操作が終了しているようにも思われるが、入力データ群
２１は全くラベル付けされていないので、マップ３１の
どこにクラスタが存在しているかはまだ不明である。よ
って、マップ３１上のどこにクラスタが存在するかを見
つけるために、マップを解析しなくてはならない。It should be noted that the symbols of the clusters 21A to 21C are added for convenience of explanation, and the input data group 21 is not labeled at all before cluster classification. If some input data is labeled before cluster classification,
You can easily classify unlabeled data after creating a map. This method will be described. 1) An element on the map 31 corresponding to data belonging to the cluster 21A having a certain label (for example, A) is selected, and the label A is given to the element. 2) Perform operation 1) with 21B, 2
This is also performed for data belonging to the 1C cluster, and each of the element groups on the map 31 is given one of the labels A to C. 3) Find the element on the map corresponding to the unlabeled input data group 21 and use the label of the element as the label of the data. By performing the operations 1) to 3), all the input data can be labeled and the cluster classification is completed. Looking at the map 31, it seems that the above operation 2) has been completed, but since the input data group 21 is not labeled at all, it is still unknown where in the map 31 the cluster exists. Is. Therefore, the map must be analyzed to find where on the map 31 the clusters reside.

【００２２】そこで、マップ上のどこにクラスタが存在
するかを見つけるために、マップ作成部１１で作成され
たマップ３１をマップ解析部１２１で解析する。以下、
マップ解析部１２１について説明する。マップ解析部１
２１は、各素子に対しクラスタの集積度に関する量を計
算する集積度計算部１２１Ａと、その結果に従ってデー
タ列を作成するデータ列作成部１２１Ｂからなる。クラ
スタの集積度を示す量としては、以下のような量が挙げ
られる。Then, in order to find where on the map the cluster exists, the map analysis unit 121 analyzes the map 31 created by the map creation unit 11. Less than,
The map analysis unit 121 will be described. Map analysis unit 1
Reference numeral 21 includes an integration degree calculation unit 121A that calculates an amount related to the integration degree of the cluster for each element, and a data string creation unit 121B that creates a data string according to the result. Examples of the quantity indicating the cluster integration degree include the following quantities.

【００２３】Ｃ−１）マップ３１上の素子群３２の各素
子に対応する入力データ群の数。C-1) The number of input data groups corresponding to each element of the element group 32 on the map 31.

【００２４】Ｃ−２）マップ３１上のある１つの素子に
割り当てられたプロトタイプと、その素子とマップ３１
上で隣接する素子に割り当てられたプロトタイプとの類
似性。クラスタは、データ群の空間において類似してい
るデータが集まったものである。この性質を用いて、上
記のＣ−１）、Ｃ−２）の量がクラスタの集積度を示す
理由を説明する。C-2) A prototype assigned to an element on the map 31, the element and the map 31
Similarity to the prototype assigned to the adjacent element above. A cluster is a collection of similar data in the space of the data group. Using this property, the reason why the amounts of C-1) and C-2) described above indicate the degree of cluster integration will be described.

【００２５】クラスタ内のデータは、クラスタ外に比べ
て多いという性質から、素子に対応する入力データの数
を比較すれば、クラスタ中心付近のデータに対応する素
子の場合は、対応するデータ数は多くなり、クラスタ中
心から外れたデータに対応する素子の場合は対応するデ
ータ数は少なくなるはずである。よって、Ｃ−１）の量
を用いれば、図４（ａ）のように、山の部分がクラスタ
を示すヒストグラムが作成される。以下、この量を勝利
数Ｖとも表記する。Since the amount of data in the cluster is larger than that in the outside of the cluster, if the numbers of input data corresponding to the elements are compared, if the elements corresponding to the data near the center of the cluster have the corresponding number of data, The number of corresponding data should be small in the case of an element corresponding to data deviated from the center of the cluster. Therefore, if the amount of C-1) is used, as shown in FIG. 4A, a histogram in which a mountain portion represents a cluster is created. Hereinafter, this amount is also referred to as the number of wins V.

【００２６】次に、Ｃ−２）の量について説明する。前
述のように、マップ上で隣接する素子のそれぞれのプロ
トタイプは、入力データ空間でも類似している。また、
クラスタ内のデータは類似しているという類似性から、
そのプロトタイプの類似度は、クラスタ内では高く、ク
ラスタ外では低いといえる。この２つのことから、マッ
プ上で隣接するそれぞれの素子のプロトタイプ同志を比
較することにより、その類似度からその素子の対応する
入力データがクラスタ中心かクラスタ外かを区別できる
ことが分かる。具体的に述べると、マップ上で隣接する
それぞれの素子のプロトタイプ同志の類似度が高けれ
ば、その素子はクラスタ中心付近のデータに対応する素
子であり、逆に、マップ上で隣接するそれぞれの素子の
プロトタイプ同志の類似度が低ければ、その素子はクラ
スタ中心から外れたデータに対応する素子である。類似
度として、例えば２次元ベクトルデータの場合、そのユ
ークリッド距離を選べば、距離が大きければ類似度は低
く、距離が小さければ類似度は高くなる。このとき、Ｃ
−２）の量を用いてヒトスグラムを作成すれば、図４
（ｂ）のように、山から山までがクラスタを表すように
なる。以下、この量を隣接素子間の類似度ｄＭとも表記
する。Next, the amount of C-2) will be described. As mentioned above, the prototypes of each of the adjacent elements on the map are similar in the input data space. Also,
From the similarity that the data in the clusters are similar,
It can be said that the similarity of the prototype is high inside the cluster and low outside the cluster. From these two facts, by comparing the prototypes of the adjacent elements on the map, it is possible to distinguish whether the corresponding input data of the element is the cluster center or the outside of the cluster from the similarity. Specifically, if the prototypes of adjacent elements on the map have high similarity, the element is the element corresponding to the data near the center of the cluster, and conversely, the adjacent elements on the map If the prototypes have a low degree of similarity, the element is an element corresponding to data deviated from the cluster center. For example, in the case of two-dimensional vector data, if the Euclidean distance is selected as the similarity, the similarity is low when the distance is large, and the similarity is high when the distance is small. At this time, C
If a human sgram is created using the amount of -2), the result of FIG.
As shown in (b), the mountains are represented as clusters. Hereinafter, this amount is also referred to as a similarity dM between adjacent elements.

【００２７】なお、勝利数Ｖと隣接素子間の類似度ｄＭ
の定義から、Ｖ／ｄＭの量もクラスタの集積度を表すこ
とが分かる。このときは、谷から谷までの山がクラスタ
を表す。The number of wins V and the similarity dM between adjacent elements
From the definition of, it can be seen that the amount of V / dM also represents the cluster integration degree. At this time, the mountains from valleys to valleys represent clusters.

【００２８】このようなヒストグラムの性質から、図４
（ａ）の場合は山、図４（ｂ）の場合は谷を分割するこ
とがマップをクラスタ毎に分割することに相当する。よ
って、このヒストグラムの山あるいは谷の数がクラスタ
数に対応する。各山あるいは谷に相当するマップ上の素
子がクラスタのプロトタイプに相当するので、この時点
で適当な数のクラスタ分類ができたことになる（この場
合は、３つ）が、階層構造を求めるために、次の操作を
行う。From the nature of such a histogram, FIG.
Dividing a mountain in the case of (a) and a valley in the case of FIG. 4B corresponds to dividing the map into clusters. Therefore, the number of peaks or valleys in this histogram corresponds to the number of clusters. Since the elements on the map corresponding to each peak or valley correspond to the prototype of the cluster, it means that an appropriate number of clusters could be classified at this point (three in this case). Then, do the following:

【００２９】まず、ヒストグラムの諸量から、データ列
作成部１２１Ｂでデータ列を作成する。このデータ列に
ついて説明する。素子ｉに対応する勝利数、隣接素子間
の類似度をそれぞれＶ_i、ｄＭ_iとする。そして、式
（１）のようなデータ列｛Ｘ_k｝を作成する。First, the data string creating unit 121B creates a data string from various amounts of the histogram. This data string will be described. The number of wins corresponding to the element i and the similarity between adjacent elements are V _i and dM _i , respectively. Then, a data string {X _k } like the formula (1) is created.

【００３０】図４のようなヒストグラムの場合、データ列｛Ｘ_k｝を
数直線上にプロットしたのが図５である。このように、
データ列｛Ｘ_k｝は、図４（ａ）では山、図４（ｂ）で
は谷に相当する部分に、数直線上でクラスタを形成して
いることが分かる。｛Ｘ_k｝を定性的に説明する。ま
ず、素子ｋの重みベクトルの座標をｎ次元空間上で折れ
線でつなぐ。そして、Ｘ_kは、折れ線を一直線に伸ばし
たときの線上での素子ｋの座標であるといえる。このと
き、データ列｛Ｘ_k｝のｉ番目の点は、マップ３１のｉ
番目の素子に対応していることになる。つまり、複数の
クラスタからなるｎ次元ベクトル入力データを、クラス
タが抽出しやすいように、１次元データ集合に変換した
と考えることができる。[0030] In the case of the histogram as shown in FIG. 4, FIG. 5 shows the data string {X _k } plotted on the number line. in this way,
It can be seen that the data string {X _k } forms clusters on the number line in the portions corresponding to the peaks in FIG. 4A and the valleys in FIG. 4B. Qualitatively describe {X _k }. First, the coordinates of the weight vector of the element k are connected by a polygonal line in the n-dimensional space. It can be said that X _k is the coordinate of the element k on the line when the broken line is straightened. At this time, the i-th point of the data string {X _k } is the i-th point of the map 31.
It corresponds to the second element. That is, it can be considered that n-dimensional vector input data composed of a plurality of clusters is converted into a one-dimensional data set so that the clusters can be easily extracted.

【００３１】また、Ｖ_iはマップ３１上のｉ素子に対応
する入力データ群の数であることから、図５のデータ列
｛Ｘ_k｝において、各点の個数がＶ_kであるという情報
を付加すれば、さらにクラスタを抽出しやすくすること
ができる。Since V _i is the number of input data groups corresponding to i elements on the map 31, information that the number of each point is V _k in the data string {X _k } of FIG. If added, clusters can be more easily extracted.

【００３２】次に、データ列作成部１２１Ｂで作成した
データ列｛Ｘ_k｝を用いて、データ列融合部１２２で階
層構造を作成する。この階層構造は、データ列｛Ｘ_k｝
において近い値は逐次融合しながら最終的に一つになる
まで融合し、その過程を表示することによって作成す
る。例えば、この場合、階層構造は図５のようになる。
融合過程については、詳しく後述する。Next, the data string merging unit 122 creates a hierarchical structure using the data string {X _k } created by the data string creating unit 121B. This hierarchical structure has a data string {X _k }
The values close to each other are sequentially merged until they finally become one, and are displayed by displaying the process. For example, in this case, the hierarchical structure is as shown in FIG.
The fusion process will be described later in detail.

【００３３】前述のように、ヒストグラムの分割によっ
て、大きく分けてクラスタの数は３であることは判明し
ていたが、その階層構造は図５のようになっていること
が分かる。As described above, it was found that the number of clusters was roughly divided into three by dividing the histogram, but it can be seen that the hierarchical structure is as shown in FIG.

【００３４】以上のデータ列融合部１２２で作成された
階層構造に基づいて、ラベル付け部１３で入力データに
ラベル付けする。ラベル付け部１３は、階層構造に基づ
いてマップにラベルを付けるマップラベル部１３１と、
ラベル付けするデータを入力するデータ入力部１３２
と、その入力データにラベルを付けるデータラベル部１
３３によって構成される。The labeling unit 13 labels the input data based on the hierarchical structure created by the data string merging unit 122 described above. The labeling unit 13 includes a map label unit 131 that labels a map based on a hierarchical structure,
Data input unit 132 for inputting data to be labeled
And the data label part 1 that labels the input data
33.

【００３５】マップラベル部１３１では、階層構造に基
づいてマップにラベル付けを行う。階層構造に基づい
て、例えば図６に示すように、マップにＡ、Ｂ、Ｃとい
うラベルを与えて、マップ６１とする。次に、入力デー
タ群２１を再びデータ入力部１３２によって入力し、そ
の入力データ群２１のラベル付けを行う。ラベル付けに
は、その入力データ群２１と、ラベル付けをしたマップ
６１を用いる。具体的には、入力データ群２１に対応す
るマップ６１上の素子を見つけ、その素子のラベルをそ
のデータのラベルとすればよい。全てのデータ群２１に
対し、ラベル付けが終了すれば、図６に示すように、入
力データ群２１がＡ、Ｂ、Ｃの３つのクラスタに分類さ
れたことになる。図６では、Ａ、Ｂ、Ｃそれぞれのクラ
スタに所属するデータを丸で囲んである。ここで、この
丸は、説明の便宜上、データのあるところを囲むために
つけたもので、厳密な分離境界線を示している訳ではな
い。なお、前述のように、図２の入力データは予めラベ
ル付けされていない。ラベル付け部１３で初めてラベル
付けされることに注意する。ここで、便宜上、図２のラ
ベルと図６のラベルは一致させてある。The map label section 131 labels the map based on the hierarchical structure. Based on the hierarchical structure, for example, as shown in FIG. 6, the maps are labeled as A, B, and C to form a map 61. Next, the input data group 21 is input again by the data input unit 132, and the input data group 21 is labeled. The input data group 21 and the labeled map 61 are used for labeling. Specifically, the element on the map 61 corresponding to the input data group 21 may be found and the label of the element may be used as the label of the data. When labeling of all data groups 21 is completed, the input data group 21 is classified into three clusters A, B, and C, as shown in FIG. In FIG. 6, the data belonging to each of the clusters A, B, and C is circled. Here, for convenience of explanation, this circle is added to enclose a portion of the data, and does not indicate a strict separation boundary line. Note that, as described above, the input data in FIG. 2 is not labeled in advance. Note that labeling is done for the first time by the labeling unit 13. Here, for convenience, the label of FIG. 2 and the label of FIG. 6 are matched.

【００３６】なお、さらに特定の部分だけ再分割、統合
する場合は、階層構造に基づいて行えばよい。図５の階
層構造から、マップ６１ａ）のように、Ａ、Ｂは１つの
クラスタとして再統合させてもよいことが分かるし、さ
らに、Ａを再分類するには、マップ６１ｂ）のように、
Ａ１、Ａ２として分類することが可能である。このこと
は、後の実施例で示す。In addition, in the case of subdividing and integrating only a specific part, it may be performed based on a hierarchical structure. From the hierarchical structure of FIG. 5, it can be seen that A and B may be reintegrated as one cluster as shown in map 61a), and to reclassify A, as shown in map 61b),
It can be classified as A1 and A2. This will be shown in a later example.

【００３７】以上が本発明のクラスタ分類装置の作用の
概略であり、図２のデータ群２１が、図６のように大き
く分けてＡ、Ｂ、Ｃの３つのクラスタに分類され、階層
構造が図５のように求められたことになる。本作用は、
クラスタの数、位置、分布の形等の前知識を必要として
いないことは明らかであり、本発明のクラスタ分類装置
に求められる条件のＢ−１）を満たしている。The above is the outline of the operation of the cluster classification device of the present invention. The data group 21 of FIG. 2 is roughly divided into three clusters A, B and C as shown in FIG. This is what was calculated as shown in FIG. This action is
It is clear that no prior knowledge of the number of clusters, position, shape of distribution, etc. is required, and the condition B-1) required for the cluster classification device of the present invention is satisfied.

【００３８】続いて、Ｂ−２）…処理の手順に依存しな
いクラスタ分類ができる…ことと、Ｂ−３）…処理の経
過や結果を視覚的に見ることができ、しかもその結果を
計算的に処理することが容易である…という条件を本発
明が満たすことを示す。そのために、マップ作成部１１
についてさらに詳しく説明する。Next, B-2) ... cluster classification independent of the processing procedure can be performed, and B-3) ... progress and results of the processing can be visually viewed, and the results can be calculated. It is shown that the present invention satisfies the condition that it is easy to process ... Therefore, the map creation unit 11
Will be described in more detail.

【００３９】前述のように、マップ作成部１１では、デ
ータ群のプロトタイプを作成し、入力データの位相を反
映するように、そのプロトタイプをマップの素子に割り
当てることを行う。プロトタイプの作成は、ベクトル量
子化法を用いれば可能だが、入力データの位相を反映す
るようにそのプロトタイプをマップの素子に割り当てる
ことはできない。プロトタイプの作成と入力データの位
相を反映するためのプロトタイプの割り当てを同時に行
う方法は、コホーネンによる自己組織化特徴マッピング
（以下、ＳＯＭと表記する。）のアルゴリズムがある
（T.Kohonen,"Self-Organization and Associative Mem
ory",Third Edition,Springer-Verlag,Berlin,1989) 。
以下、このＳＯＭについて説明する。As described above, the map creating section 11 creates a prototype of the data group and assigns the prototype to the elements of the map so as to reflect the phase of the input data. Prototypes can be created using the vector quantization method, but the prototypes cannot be assigned to map elements so as to reflect the phase of the input data. As a method of simultaneously creating a prototype and assigning a prototype for reflecting the phase of input data, there is a self-organizing feature mapping (hereinafter referred to as SOM) algorithm by Kohonen (T. Kohonen, "Self- Organization and Associative Mem
ory ", Third Edition, Springer-Verlag, Berlin, 1989).
The SOM will be described below.

【００４０】ＳＯＭは、図７に模式的に示すように、２
次元に並ぶ素子群の層ＭＬ（以下、マップ層ＭＬと表記
する。）と、データを入力する入力層ＩＰから構成され
る。このマップ層ＭＬは、図７では２次元に並ぶ素子を
示したが、１次元に並ぶ素子を用いてもよい。入力層Ｉ
Ｐは、マップ層ＭＬの全ての素子と結合しており、入力
データをマップ層ＭＬの全ての素子に与えることができ
る。入力データは、スカラーでもベクトルでもかまわな
いが、ここでは一般的に、ベクトルｘ（ｎ次元）とお
く。マップ層ＭＬの素子ｉ（ｉはマップ上の順番とし、
全素子数をｋ個とする。）は、全て重みベクトルｍ_i
(ｎ次元）を持つことにする。ＳＯＭのアルゴリズム
は、入力ベクトルｘと各素子の重みベクトルｍ_iとの類
似性から更新すべき重みベクトルを決定する＜類似性マ
ッチング＞と、その重みベクトルｍ_iを入力ベクトルｘ
の方に近付ける＜更新＞とに分けられる。そして、両者
の作用を繰り返すことにより、入力ベクトルｘの分布を
反映する重みベクトルｍ_i（１≦ｉ≦ｋ）が生成する。
＜類似性マッチング＞と＜更新＞の具体的な表式を以下
に示す。The SOM, as shown schematically in FIG.
It is composed of a layer ML of element groups arranged in a dimension (hereinafter referred to as a map layer ML) and an input layer IP for inputting data. This map layer ML shows elements arranged in two dimensions in FIG. 7, but elements arranged in one dimension may be used. Input layer I
P is coupled to all the elements of the map layer ML and can input data to all the elements of the map layer ML. The input data may be either a scalar or a vector, but here, in general, a vector x (n-dimensional) is used. The element i of the map layer ML (i is the order on the map,
The total number of elements is k. ) Are all weight vectors m _i
(n-dimensional). The SOM algorithm determines the weight vector to be updated from the similarity between the input vector x and the weight vector m _{i of} each element <similarity matching>, and sets the weight vector m _i to the input vector x.
It is divided into <update> which is closer to. Then, by repeating the actions of both, a weight vector m _i (1 ≦ i ≦ k) that reflects the distribution of the input vector x is generated.
The specific expressions of <similarity matching> and <update> are shown below.

【００４１】＜類似性マッチング＞＜更新＞ｍ_i（ｔ＋１）＝ｍ_i（ｔ）＋α（ｔ）｛ｘ（ｔ）−ｍ_i（ｔ）｝ｉ∈Ｎ_c ｍ_i（ｔ＋１）＝ｍ_i（ｔ）その他・・・（３）ここで、｜ｘ−ｍ_i｜はｘとｍ_iのユークリッド距離、
Ｃはその距離が最も小さかった素子（勝利素子）、Ｎ_c
はその勝利素子Ｃのマップ層ＭＬでの近傍、α（ｔ）は
正の定数、ｔは時刻を示す。更新を繰り返しながら、Ｎ
_cとα（ｔ）の大きさは徐々に小さくする。また、α
（ｔ）は勝利素子Ｃから離れるに従い、小さくなるよう
に選ぶこともできる。<Similarity Matching> _{<Update> m i (t + 1)} = m i (t) + α (t) {x (t) -m i (t)} i∈N c m i (t + 1) = m i (t) Others ... ( 3) where | x−m _i | is the Euclidean distance between x and m _i ,
C is the element with the smallest distance (victory element), N _c
Is the neighborhood of the winning element C in the map layer ML, α (t) is a positive constant, and t is the time. While repeating the update, N
The magnitudes of _c and α (t) are gradually reduced. Also, α
(T) can be selected so as to become smaller as the distance from the victory element C increases.

【００４２】入力ベクトルｘの集合からランダムにｘを
選んで逐次入力し、重みベクトルｍ_iの更新を繰り返す
ことにより、入力ベクトルｘの分布を反映する重みベク
トルｍ_i（１≦ｉ≦ｋ）が生成する。すなわち、重みベ
クトルｍ_i（１≦ｉ≦ｋ）が入力ベクトルｘの分布のプ
ロトタイプになっている。そして、ある素子の重みベク
トルを入力ベクトルに近付けるように更新するとき、マ
ップ上のその素子の近傍の素子も同様に更新するので、
マップ上で隣接する素子同志は、それぞれ、入力ベクト
ルの空間上でも近いベクトルに対応するようになる。よ
って、ＳＯＭアルゴリズムは、入力データ空間の位相を
反映したプロトタイプの集合を作成することができる。
ＳＯＭアルゴリズムには、次のような特長がある。[0042] Type sequentially choose x at random from the set of input vectors x, by repeating the update of the weight vector m _i, the weight vector m _i which reflects the distribution of the input vector x (1 ≦ i ≦ k) is To generate. That is, the weight vector m _i (1 ≦ i ≦ k) is a prototype of the distribution of the input vector x. Then, when the weight vector of a certain element is updated so as to approach the input vector, the elements in the vicinity of that element on the map are also updated,
The elements adjacent to each other on the map correspond to vectors close to each other in the space of the input vector. Therefore, the SOM algorithm can create a set of prototypes that reflect the topology of the input data space.
The SOM algorithm has the following features.

【００４３】Ｄ−１）重みベクトルｍ_i（１≦ｉ≦ｋ）
の初期状態によらず、適正なマップが作成できる。Ｄ−２）入力ベクトルｘの入力順によらず、適正なマッ
プが作成できる。Ｄ−３）マップが１次元か２次元であるので、入力デー
タの位相を視覚的に見ることができる。Ｄ−４）＜類似性マッチング＞と＜更新＞という単純な
操作の繰り返しなので、アルゴリズムが簡単である。D-1) Weight vector m _i (1≤i≤k)
A proper map can be created regardless of the initial state of. D-2) An appropriate map can be created regardless of the input order of the input vector x. D-3) Since the map is one-dimensional or two-dimensional, the phase of the input data can be visually seen. D-4) Since the simple operations of <similarity matching> and <update> are repeated, the algorithm is simple.

【００４４】ここで、適正なマップとは、プロトタイプ
の集合が入力データの位相をよく反映しているものをい
う。Ｄ−１）、Ｄ−２）の特長は、本発明のクラスタ分
類装置に求められる、Ｂ−２）…処理の手順に依存しな
いクラスタ分類ができる…という条件を満たすものであ
る。Ｄ−３）の特長は、Ｂ−３）…処理の経過や結果を
視覚的に見ることができ、しかもその結果を計算的に処
理することが容易である…という条件に寄与する。Here, an appropriate map means that the set of prototypes well reflects the phase of input data. The features of D-1) and D-2) satisfy the condition that the cluster classification device of the present invention requires, that is, B-2) ... Cluster classification independent of the processing procedure can be performed. The feature of D-3) is that it contributes to the condition that B-3) ... The progress of processing and the result can be visually viewed, and that the result can be processed easily.

【００４５】しかし、マップが２次元の場合は、視覚的
に見ることができるが、その結果を計算的に処理するこ
とは容易ではない。マップを１次元にすると、１次元の
ヒストグラムは２次元以上のヒストグラムに比べ、計算
処理がかなり容易になる。よって、Ｂ−３）の条件を満
たすことができる。However, when the map is two-dimensional, it can be visually viewed, but it is not easy to process the result computationally. When the map is made one-dimensional, the one-dimensional histogram is considerably easier to calculate than the two-dimensional or more histogram. Therefore, the condition of B-3) can be satisfied.

【００４６】ＳＯＭアルゴリズムのこのような有効性よ
り、マップ作成部１１では、この１次元のＳＯＭアルゴ
リズムを採用する。すなわち、マップ作成部１１のデー
タ入力部１１１をＳＯＭの入力層ＩＰ、マップ部１１２
をＳＯＭのマップ層ＭＬとする。この構成により、入力
データの位相を反映するプロトタイプの集合を作成し、
そのプロトタイプを持つ素子からなる１次元マップを作
成する。このマップ作成部１１を具備した本発明のクラ
スタ分類装置は、前述のように、Ｂ−２）、Ｂ−３）の
条件を満たすことが可能である。Because of the effectiveness of the SOM algorithm, the map generator 11 adopts this one-dimensional SOM algorithm. That is, the data input unit 111 of the map creating unit 11 is used as the SOM input layer IP and the map unit 112.
Is a map layer ML of SOM. With this configuration, create a set of prototypes that reflect the phase of the input data,
Create a one-dimensional map consisting of elements with the prototype. As described above, the cluster classification device of the present invention including this map creation unit 11 can satisfy the conditions of B-2) and B-3).

【００４７】なお、マップ作成部１１で行うＳＯＭアル
ゴリズムにおいては、ヒストグラムの山と谷がはっきり
した時点でマップ３１の作成が終了となる。このとき、
入力データ群２１の全てのデータを入力していなくて
も、作成を終了してよい。もし、入力データ群２１の全
てのデータを入力した時点でヒストグラムの山と谷がは
っきりしていない場合は、再び入力データ群２１を入力
して、山と谷がはっきりしたときにマップ作成を終了す
ればよい。ヒストグラムの山と谷がはっきりしたかどう
かの判断は、視覚的に容易にできるが、ヒストグラムの
山と谷がはっきりしたかどうかを表す評価基準（グラフ
の平滑度、最大値と最小値の相対比等）を用意すれば、
自動的に判断することも可能である。この場合も、明ら
かに１次元のヒストグラムの方が２次元のヒストグラム
より処理が容易である。なお、ヒストグラムを用いなく
ても、ＳＯＭアルゴリズムが進むにつれて入力データ群
２１とそれに対応するプロトタイプの差が次第に小さく
なることを利用し、その値またはその変化率があるしき
い値より小さくなったときにマップ３１の作成を終了し
てもよい。In the SOM algorithm performed by the map creating section 11, the creation of the map 31 ends when the peaks and valleys of the histogram become clear. At this time,
The creation may be completed without inputting all the data of the input data group 21. If the peaks and valleys of the histogram are not clear at the time when all the data in the input data group 21 are input, input the input data group 21 again and finish the map creation when the peaks and valleys are clear. do it. Although it is easy to visually judge whether the peaks and valleys of the histogram are clear, it is an evaluation criterion (smoothness of the graph, relative ratio of the maximum value and the minimum value) that indicates whether the peaks and valleys of the histogram are clear. Etc.),
It is also possible to judge automatically. Also in this case, the one-dimensional histogram is obviously easier to process than the two-dimensional histogram. It should be noted that even if the histogram is not used, the fact that the difference between the input data group 21 and the prototypes corresponding thereto becomes gradually smaller as the SOM algorithm progresses is utilized, and when the value or the change rate becomes smaller than a certain threshold value. The creation of the map 31 may be finished.

【００４８】続いて、Ｂ−４）…クラスタの階層構造を
得ることができる…という条件を本発明が満たすことを
示す。そのために、階層構造作成部１２について、さら
に詳しく説明する。階層構造作成部１２では、前述のよ
うに、複数のクラスタからなるｎ次元ベクトル入力デー
タをクラスタが抽出しやすいように、図５のような１次
元データ集合に変換してから、そのデータの階層構造を
得る。Next, it is shown that the present invention satisfies the condition B-4) ... A cluster hierarchical structure can be obtained. Therefore, the hierarchical structure creating unit 12 will be described in more detail. As described above, the hierarchical structure creating unit 12 converts the n-dimensional vector input data composed of a plurality of clusters into a one-dimensional data set as shown in FIG. Get the structure.

【００４９】このような階層構造を得る手法として、メ
ルティングアルゴリズムがある（Kenneth Rose et a
l.,"Statistical Mechanics and Phase Transition in
Clustering",Phys.Rev.Lett.65,pp.945-948(1990)）。
このアルゴリズムは、あるベクトルデータとそれに対す
るプロトタイプの集合からエネルギー関数を定義し、そ
のエネルギーの局所極小解（ここでは、プロトタイプの
集合を解とする。）が、クラスタを表すことを利用す
る。エネルギー関数の形は、温度パラメータによって変
更され、一般に温度が上昇するにつれて滑らかになり、
かつ、局所解の数は少なくなる。つまり、温度が上昇す
るにつれて、プロトタイプの数が減少する。プロトタイ
プはクラスタを代表するものと考えてよいので、プロト
タイプの座標と温度の関係を表示すれば、階層構造が分
かる。As a method for obtaining such a hierarchical structure, there is a melting algorithm (Kenneth Rose et a
l., "Statistical Mechanics and Phase Transition in
Clustering ", Phys. Rev. Lett. 65, pp. 945-948 (1990)).
This algorithm defines that an energy function is defined from a certain vector data and a set of prototypes for the vector data, and that the local minimum solution of the energy (here, the set of prototypes is a solution) represents a cluster. The shape of the energy function is modified by the temperature parameter and generally becomes smoother as the temperature rises,
Moreover, the number of local solutions is reduced. That is, as the temperature increases, the number of prototypes decreases. Since the prototype may be considered to represent a cluster, the hierarchical structure can be understood by displaying the relationship between the prototype coordinates and the temperature.

【００５０】メルティングアルゴリズムの更新則につい
て、式を用いて説明する。データをｘ、プロトタイプを
ｙとし、分配関数Ｚ、フリーエネルギーＦを次式で表
す。ここで、Σ、Πの下の添字はそれぞれその記号によ
る和、及び、積を示す。ＦにＺを代入し、Ｆの極小値を求めるために、∂Ｆ／∂
ｙ＝０を解く。この結果、プロトタイプｙは次式のよう
になる。The updating rule of the melting algorithm will be described using equations. The data is x, the prototype is y, and the partition function Z and the free energy F are represented by the following equations. Here, the subscripts under Σ and Π indicate the sum and product of the symbols, respectively. Substituting Z for F and obtaining the minimum value of F, ∂F / ∂
Solve y = 0. As a result, the prototype y becomes as follows.

【００５１】この式がメルティングアルゴリズムのプロトタイプｙの
更新則である。ある温度Ｔにおいて、この更新則を行
い、ｙを求める。そして、温度Ｔを上昇させ、再びこの
更新則を行い、ｙを求める。温度Ｔが上昇するにつれ
て、Ｆの局所極小値の数は次第に減少し、最終的には１
つになる。Ｆの局所極小値は、クラスタのプロトタイプ
に相当している。温度Ｔを低温から高温に上昇させるに
つれて、クラスタのプロトタイプは近いもの同志が融合
する。そこで、温度Ｔとクラスタのプロトタイプｙの関
係をグラフに表せば、クラスタの階層構造が得られるこ
とになる。プロトタイプの表示は、ｙ座標をそのまま表
示してもよいし、１次元に並んだＳＯＭのプロトタイプ
と対応付けて表してもよい。[0051] This expression is the update rule for the prototype y of the melting algorithm. At a certain temperature T, this updating rule is performed to find y. Then, the temperature T is increased, this updating rule is performed again, and y is obtained. As the temperature T rises, the number of local minima in F gradually decreases, eventually reaching 1
Become one The local minimum of F corresponds to the prototype of the cluster. As the temperature T is increased from cold to hot, the prototypes of the clusters fuse close together. Therefore, by expressing the relationship between the temperature T and the prototype y of the cluster in a graph, the hierarchical structure of the cluster can be obtained. When displaying the prototype, the y-coordinate may be displayed as it is, or may be displayed in association with the one-dimensionally arranged SOM prototypes.

【００５２】メルティングアルゴリズムは、データｘが
２次元以上になると、プロトタイプの表示が困難にな
る、孤立点が融合し難い、等の欠点がある。しかし、マ
ップ解析部１２１で作成されたデータ集合は１次元であ
り、また、マップ作成部１１で作成されたｎ次元ベクト
ルデータのプロトタイプ群から生成されているので、孤
立点は生成され難い。また、メルティングアルゴリズム
は、入力データの次元が大きくなると計算量が増すが、
この場合のデータ列は、本クラスタ分類装置が分類すべ
き入力データの次元にかかわらず、１次元なので、計算
量は一定である。以上のことから、メルティングアルゴ
リズムの欠点は解決できる。The melting algorithm has drawbacks such that when the data x is two-dimensional or more, it is difficult to display a prototype and it is difficult to fuse isolated points. However, since the data set created by the map analysis unit 121 is one-dimensional and is created from the prototype group of n-dimensional vector data created by the map creation unit 11, isolated points are difficult to create. In addition, the melting algorithm increases the calculation amount as the dimension of the input data increases,
Since the data string in this case is one-dimensional regardless of the dimension of the input data to be classified by the cluster classification device, the calculation amount is constant. From the above, the drawbacks of the melting algorithm can be solved.

【００５３】よって、階層構造作成部１２では、このメ
ルティングアルゴリズムを用いて、１次元データ集合か
らそのデータの階層構造を得る。階層構造作成部１２に
このメルティングアルゴリズムを用いた本発明は、Ｂ−
４）…クラスタの階層構造を得ることができる…という
条件を満たしている。Therefore, the hierarchical structure creating section 12 uses this melting algorithm to obtain the hierarchical structure of the data from the one-dimensional data set. The present invention using this melting algorithm in the hierarchical structure creating unit 12 is B-
4) The condition that a hierarchical structure of clusters can be obtained is satisfied.

【００５４】なお、適当なクラスタ数は、集積度計算部
１２１Ａでヒストグラムを解析することによって分かる
が、データ列融合部１２２においても求めることができ
る。適当なクラスタ程、メルティングアルゴリズムにお
いて、そのプロトタイプが生じている温度範囲が長いと
いえる。よって、プロトタイプが融合されずに残ってい
る温度範囲が長いときのクラスタの数を適当なクラスタ
の数とすればよい。ノイズ等でヒストグラムの山谷が微
妙になり解析困難な場合、データ列融合部１２２でクラ
スタ数を求めればよい。The appropriate number of clusters can be found by analyzing the histogram in the integration degree calculating unit 121A, but can also be found in the data string merging unit 122. It can be said that a suitable cluster has a longer temperature range in which the prototype occurs in the melting algorithm. Therefore, the number of clusters when the prototypes are not fused and the remaining temperature range is long may be set as an appropriate number of clusters. If the peaks and valleys of the histogram are subtle due to noise or the like and analysis is difficult, the number of clusters may be calculated by the data string fusion unit 122.

【００５５】以上、本発明のクラスタ分類装置の構成と
作用を説明した。本発明のクラスタ分類装置は、次のＢ
−１）、Ｂ−２）、Ｂ−３）、Ｂ−４）の条件を満た
す、以下のようなクラスタ分類装置である。The configuration and operation of the cluster classification device of the present invention have been described above. The cluster classification device of the present invention is
The following cluster classification device satisfies the conditions of -1), B-2), B-3), and B-4).

【００５６】すなわち、１次元の自己組織化特徴マッピ
ングを用いて、入力データに対するプロトタイプ群から
なるマップを作成するマップ作成部と、そのマップから
クラスタの階層構造を作成する階層構造作成部と、得ら
れたマップと階層構造に従って入力データを分類するラ
ベル付け部とからなることを特徴とするクラスタ分類装
置である。That is, using a one-dimensional self-organizing feature mapping, a map creating section for creating a map consisting of prototypes for input data, and a hierarchical structure creating section for creating a hierarchical structure of clusters from the map are obtained. A cluster classification device, comprising: the created map and a labeling unit that classifies input data according to a hierarchical structure.

【００５７】この場合、階層構造作成部は、図１のよう
に、得られたマップからクラスタの集積度を表す量を計
算し、データ列を作成するマップ解析部と、得られたデ
ータ列からクラスタの階層構造を作成するデータ列融合
部とからなる場合と、後記する図１１（ｂ）のように、
得られたマップからクラスタの集積度を表す量を計算す
る集積度計算部と、クラスタの階層構造を作成するプロ
トタイプ融合部とからなる場合と、後記の図１１（ａ）
のように、得られたマップからクラスタの階層構造を作
成するプロトタイプ融合部のみからなる場合とが考えら
れる。In this case, as shown in FIG. 1, the hierarchical structure creating unit calculates a quantity representing the cluster integration degree from the obtained map and creates a data string, and the obtained data string from the map analyzing unit. In the case where it is composed of a data string fusion unit that creates a hierarchical structure of clusters, and as shown in FIG.
FIG. 11A, which shows a case including an integration degree calculation unit that calculates an amount indicating the integration degree of the cluster from the obtained map and a prototype fusion unit that creates a hierarchical structure of the cluster.
It is conceivable that the prototype fusion unit only creates a hierarchical structure of clusters from the obtained map.

【００５８】Ｂ−１）クラスタの数、位置、分布の形等
の前知識なしに、過統合や過分割のない適正なクラスタ
分類ができる。Ｂ−２）処理の手順に依存しないクラスタ分類ができ
る。Ｂ−３）処理の経過や結果を視覚的に見ることができ、
しかもその結果を計算的に処理することが容易である。B-1) Appropriate cluster classification without over-integration or over-division can be performed without prior knowledge of the number of clusters, positions, distribution shapes, etc. B-2) Cluster classification that does not depend on the processing procedure can be performed. B-3) You can visually see the progress and results of the processing,
Moreover, it is easy to process the result computationally.

【００５９】Ｂ−４）クラスタの階層構造を得ることが
できる。B-4) A hierarchical structure of clusters can be obtained.

【００６０】[0060]

【実施例】以下、本発明のクラスタ分類装置の実施例に
ついて説明する。クラスタ分類の表示を簡単にするため
に、再び、図２のような２次元ベクトルデータをクラス
タに分け、かつ、階層構造を求める場合の実施例を示
す。まず、本発明の第１実施例として、図２と同じ２次
元ベクトルデータを入力する場合を示す。本実施例は、
マップ部１１２のマップ３１の素子の数を３０にしてい
る。Embodiments of the cluster classification device of the present invention will be described below. In order to simplify the display of the cluster classification, an example of dividing the two-dimensional vector data as shown in FIG. 2 into clusters and obtaining the hierarchical structure will be shown again. First, as a first embodiment of the present invention, a case where the same two-dimensional vector data as in FIG. 2 is input will be described. In this embodiment,
The number of elements of the map 31 of the map unit 112 is 30.

【００６１】まず、図４（ａ）に相当するヒストグラム
を図８に示す。図８によると、３つの山が形成されてお
り、入力データ群２１が３つのクラスタからなることが
視覚的に明確である。次に、階層構造作成部１２で求め
た階層構造を図９に示す。この図から、分類（ａ）、
（ｂ）、（ｃ）の３種類でプロトタイプを分け、マップ
６１ａ）、６１、６１ｂ）により、入力データ群２１を
分割した結果を図１０に示す。First, FIG. 8 shows a histogram corresponding to FIG. According to FIG. 8, three peaks are formed, and it is visually clear that the input data group 21 is composed of three clusters. Next, FIG. 9 shows the hierarchical structure obtained by the hierarchical structure creating unit 12. From this figure, classification (a),
FIG. 10 shows the result of dividing the input data group 21 by the maps 61a), 61, 61b) by dividing the prototype into three types of (b) and (c).

【００６２】なお、マップ部１１２の１次元マップは、
マップの素子が両側でつながるリング状にしても、切り
離した紐状にしてもよい。両者は、素子の重みの更新の
際の近傍の概念が異なる。リング状の場合は、マップの
両側を近傍としてつなぐことに相当し、紐状の場合はマ
ップの両側を近傍としないことに相当する。リング状の
場合は、マップの両側で、入力データの位相関係の反映
が歪む＜ＢｏｒｄｅｒＥｆｆｅｃｔｓ＞(T.Kohonen,"Th
ings You Haven't Heard about the Self-Organizing M
ap",Proc. IEEE Int. Conf. on Neural Network,vol.3,
pp.1147-1156,1993)を除くことができる。紐状の場合
は、両側が必ず切れているので、ヒストグラムで表した
り、階層構造を表す際、便利である。この場合、境界効
果を除くためには、両側の素子は、メルティングアルゴ
リズムの入力データとしない、ヒストグラムの横軸から
除く、等をすればよい。The one-dimensional map of the map unit 112 is
The elements of the map may be in a ring shape in which they are connected on both sides, or may be in a string shape separated from each other. The two differ in the concept of the neighborhood when updating the element weight. The ring shape corresponds to connecting both sides of the map as neighborhoods, and the string shape corresponds to not connecting both sides of the map. In the case of the ring shape, the reflection of the phase relationship of the input data is distorted on both sides of the map <BorderEffects> (T.Kohonen, "Th
ings You Haven't Heard about the Self-Organizing M
ap ", Proc. IEEE Int. Conf. on Neural Network, vol.3,
pp.1147-1156,1993) can be excluded. In the case of a string shape, both sides are always cut off, which is convenient when displaying in a histogram or a hierarchical structure. In this case, in order to remove the boundary effect, the elements on both sides may be excluded from the input data of the melting algorithm, removed from the horizontal axis of the histogram, or the like.

【００６３】メルティングアルゴリズムにおいて、デー
タ｛Ｘ_k｝を（６）式のｘに代入することによりｙを求
めた。ここで、ｋ番目のデータの個数がＶ_kである、と
いう付加情報を更新式に入れることができる。このと
き、更新式は（７）式のようになる。In the melting algorithm, y was obtained by substituting the data {X _k } for x in the equation (6). Here, additional information that the number of k-th data is V _k can be included in the update formula. At this time, the update formula is as shown in formula (7).

【００６４】更新式を、（６）式の代わりに（７）式のようにした場
合、Ｖのヒストグラムの山谷の大小がクラスタの階層構
造の決定に寄与することになる。したがって、Ｖの小さ
な孤立点はプロトタイプになり難くなり、前述のような
メルティングアルゴリズムの欠点である、孤立点がクラ
スタになりやすいという問題を解決できる。ただし、デ
ータ列は、その作成過程より、孤立点が生じ難くなって
いるので、特に、入力データの分布がノイズが少ない滑
らかな場合は、（６）式、（７）式の何れを用いてもよ
いが、入力データの分布にノイズが多く、データ列に孤
立点が生じやすいときは（７）式を用いると効果があ
る。本実施例では（７）式を使った。[0064] When the updating formula is set to the formula (7) instead of the formula (6), the magnitude of the peaks and valleys of the histogram of V contributes to the determination of the hierarchical structure of the cluster. Therefore, an isolated point having a small V is hard to be a prototype, and it is possible to solve the above-described drawback of the melting algorithm, that is, the problem that an isolated point is likely to be a cluster. However, since an isolated point is less likely to occur in the data string during the process of creating the data string, in particular, when the distribution of the input data is smooth with little noise, either equation (6) or equation (7) is used. However, if the distribution of input data has a lot of noise and isolated points are likely to occur in the data string, it is effective to use the equation (7). In this embodiment, the expression (7) is used.

【００６５】また、（６）式において、Ｐ（ｘ∈ｙ）を
ｙに代入し、ｘの依存性を無視して、Ｐ（ｘ∈ｙ）の分
母をはらうと、（８）式のようになる（Yui-fai Wong,"
Clustering Data by Melting",Nural Computation,5,89
-104(1993)）。In equation (6), substituting P (xεy) for y, ignoring the dependency of x, and deriving the denominator of P (xεy) yields equation (8). Become (Yui-fai Wong, "
Clustering Data by Melting ", Nural Computation, 5,89
-104 (1993)).

【００６６】（８）式は（６）式に比べて、指数関数の和の回数が少
ないので、計算量が少なくなり、アルゴリズムを高速化
することができる。よって、階層構造作成部１２では、
（８）式のメルティングアルゴリズムを用いてもよい。
また、（８）式に（７）式と同様にＶの項を加えてもよ
い。上記の実施例の階層構造作成部１２は、図１のよう
に、まず、データ列を作成して、そのデータの融合によ
り階層構造を作成した。[0066] Since the formula (8) has a smaller number of sums of exponential functions than the formula (6), the amount of calculation is small and the algorithm can be speeded up. Therefore, in the hierarchical structure creating unit 12,
You may use the melting algorithm of Formula (8).
Further, the term V may be added to the equation (8) as in the equation (7). As shown in FIG. 1, the hierarchical structure creating unit 12 of the above-described embodiment first creates a data string and then creates a hierarchical structure by fusing the data.

【００６７】階層構造作成部１２では、この他に、図１
１（ａ）のように、マップ部１１２で作成したＳＯＭの
プロトタイプが１次元で配列していることを利用して、
そのｎ次元ベクトルのプロトタイプから、直接、階層構
造を作成してもよい。この場合は、マップ層のｎ次元プ
ロトタイプのベクトルを、直接プロトタイプ融合部１５
１のメルティングアルゴリズムの入力とし、融合して得
られたプロトタイプを表示すればよい。プロトタイプの
表示は、１次元に並んだＳＯＭのプロトタイプと対応づ
けることにする。図１１（ａ）は、メルティングアルゴ
リズムの更新として（６）式あるいは（８）式を用いた
場合であり、Ｖを用いた（７）式を使う場合は、図１１
（ｂ）のように、階層構造作成部１２でプロトタイプ融
合部１５１の前に集積度計算部１５２を備えればよい。In addition to the above, the hierarchical structure creating section 12 also includes
As shown in 1 (a), by utilizing the fact that the prototypes of SOMs created by the map unit 112 are arranged one-dimensionally,
A hierarchical structure may be created directly from the prototype of the n-dimensional vector. In this case, the vector of the n-dimensional prototype of the map layer is directly converted into the prototype fusion unit 15
The prototype obtained by the fusion may be displayed as the input of the melting algorithm of 1. The display of the prototype will correspond to the prototypes of the SOMs arranged in one dimension. FIG. 11A shows the case where the equation (6) or (8) is used for updating the melting algorithm, and when the equation (7) using V is used, FIG.
As shown in (b), the hierarchical structure creating unit 12 may be provided with the integration degree calculating unit 152 before the prototype merging unit 151.

【００６８】なお、図１と図１１（ａ）、（ｂ）の何れ
の構成にしても、メルティングアルゴリズムの次元が異
なるが、そのアルゴリズムの本質は同様であるので、融
合過程は同様の結果を得ることができる。よって、本実
施例の結果は、図１の構成の場合のみを示した。図１の
場合は、本クラスタ分類装置が分類すべき入力データの
次元にかかわらず、メルティングアルゴリズムの次元数
を１次元とすることができ、また、図１１（ａ）、
（ｂ）では、プロトタイプをそのまま融合するので、デ
ータ列作成部１２１Ｂを省略することができる。It should be noted that the configurations of FIG. 1 and FIGS. 11A and 11B have different dimensions of the melting algorithm, but the essence of the algorithm is the same, so that the fusion process produces the same result. Can be obtained. Therefore, the result of the present example shows only the case of the configuration of FIG. In the case of FIG. 1, the number of dimensions of the melting algorithm can be set to one dimension regardless of the dimension of the input data to be classified by the cluster classification device, and FIG.
In (b), since the prototypes are combined as they are, the data string creation unit 121B can be omitted.

【００６９】上記の実施例の入力データは、全て２次元
のベクトルであったが、データ入力部１１１、１３２と
マップ部１１２のプロトタイプ３３の次元数を変えるこ
とにより、多次元ベクトルにもスカラーにもすることが
できる。Although the input data of the above-mentioned embodiment are all two-dimensional vectors, by changing the number of dimensions of the prototype 33 of the data input sections 111 and 132 and the map section 112, a multidimensional vector can be converted into a scalar. You can also

【００７０】次元を変えた場合の例として、図１２に３
次元ベクトル５クラスタの場合のヒストグラムと得られ
た階層構造を示す。また、次元とヒストグラムの量を変
えた場合として、図１３に４次元４クラスタの場合の
Ｖ、ｄＭ、Ｖ／ｄＭのヒストグラムと得られた階層構造
の例を示す。図１３のヒストグラムによると、適当なク
ラスタ数を見つける場合は、ｄＭあるいはＶ／ｄＭを用
いると、解析しやすいことが分かる。これは、前記の境
界効果と、ＳＯＭのもう１つの性質である、アルゴリズ
ムが進むにつれて勝つ数が均等化する等確率性によるも
のである。As an example when the dimensions are changed, FIG.
The histogram and the obtained hierarchical structure in the case of five-dimensional vector clusters are shown. 13 shows an example of a histogram of V, dM, and V / dM in the case of four-dimensional four clusters and the obtained hierarchical structure when the dimension and the amount of the histogram are changed. According to the histogram of FIG. 13, when finding an appropriate number of clusters, it is easy to analyze by using dM or V / dM. This is due to the aforementioned boundary effect and another property of SOM, which is equal probability that the number of wins becomes equal as the algorithm progresses.

【００７１】多次元ベクトルの場合、データをそのまま
座標軸上に表すことが困難であるため、本発明のよう
に、視覚的にクラスタが発見でき、かつ、階層構造が分
かることは有効である。In the case of a multidimensional vector, it is difficult to represent the data on the coordinate axes as they are, so it is effective to visually discover the clusters and to know the hierarchical structure as in the present invention.

【００７２】本発明の入力データとしては、任意の大き
さのスカラー、ベクトルを選んでよい。すなわち、ＳＯ
Ｍアルゴリズムは、Ｄ−１）…重みベクトルｍ_i（１≦
ｉ≦ｋ）の初期状態によらず、適正なマップが作成でき
る…という特長があるので、データを予め規格化した
り、データの特徴（クラスタ数、クラスタ位置等）を知
る必要がない。よって、画像情報、音声情報、通信記
号、時系列データ等、あらゆる入力データに対し、クラ
スタ分類が可能である。As the input data of the present invention, a scalar or vector of any size may be selected. That is, SO
The M algorithm is D-1) ... Weight vector m _i (1 ≦
Since there is a feature that an appropriate map can be created regardless of the initial state of i ≦ k), it is not necessary to standardize the data in advance or to know the features of the data (the number of clusters, cluster positions, etc.). Therefore, cluster classification is possible for all input data such as image information, voice information, communication symbols, time series data, and the like.

【００７３】[0073]

【発明の効果】以上述べたように、本発明によると、以
下の条件を満たすクラスタ分類装置を提供することがで
きる。As described above, according to the present invention, it is possible to provide a cluster classification device that satisfies the following conditions.

【００７４】Ｂ−１）クラスタの数、位置、分布の形等
の前知識なしに、過統合や過分割のない適正なクラスタ
分類ができる。Ｂ−２）処理の手順に依存しないクラスタ分類ができ
る。Ｂ−３）処理の経過や結果を視覚的に見ることができ、
しかもその結果を計算的に処理することが容易である。B-1) Appropriate cluster classification without over-integration or over-division can be performed without any prior knowledge of the number, position, distribution shape, etc. of clusters. B-2) Cluster classification that does not depend on the processing procedure can be performed. B-3) You can visually see the progress and results of the processing,
Moreover, it is easy to process the result computationally.

【００７５】Ｂ−４）クラスタの階層構造を得ることが
できる。B-4) A hierarchical structure of clusters can be obtained.

[Brief description of drawings]

【図１】本発明の基本的な構成の概略を示す図である。FIG. 1 is a diagram showing an outline of a basic configuration of the present invention.

【図２】本発明によりクラスタ分類するデータの例を示
す図である。FIG. 2 is a diagram showing an example of data for cluster classification according to the present invention.

【図３】図１のマップ作成部で作成するマップを示す図
である。FIG. 3 is a diagram showing a map created by a map creating unit in FIG.

【図４】図１の集積度計算部で計算する勝利数と隣接素
子間の類似度の分布図である。FIG. 4 is a distribution diagram of the number of wins calculated by the integration degree calculation unit of FIG. 1 and the similarity between adjacent elements.

【図５】図１のデータ列作成部で作成したデータ列とデ
ータ列融合部で作成したクラスタの階層構造を示す図で
ある。5 is a diagram showing a hierarchical structure of a data string created by the data string creating unit and a cluster created by the data string merging unit of FIG. 1;

【図６】ラベル付け部でラベル付けしたマップとデータ
を示す図である。FIG. 6 is a diagram showing maps and data labeled by a labeling unit.

【図７】自己組織化特徴マッピングの構造を示す図であ
る。FIG. 7 is a diagram showing a structure of self-organizing feature mapping.

【図８】本発明の１実施例の勝利数のヒストグラムを示
す図である。FIG. 8 is a diagram showing a histogram of the number of wins according to one embodiment of the present invention.

【図９】本発明の１実施例のクラスタの階層構造を示す
図である。FIG. 9 is a diagram showing a hierarchical structure of clusters according to an embodiment of the present invention.

【図１０】本発明の１実施例でラベル付けしたデータを
示す図である。FIG. 10 is a diagram showing labeled data in one example of the present invention.

【図１１】本発明の階層構造作成部の別の構成例を示す
図である。FIG. 11 is a diagram showing another configuration example of the hierarchical structure creation unit of the present invention.

【図１２】３次元ベクトル５クラスタの場合の勝利数の
ヒストグラムとクラスタの階層構造を示す図である。FIG. 12 is a diagram showing a histogram of winning numbers and a hierarchical structure of clusters in the case of a three-dimensional vector of five clusters.

【図１３】４次元ベクトル４クラスタの場合の勝利数と
隣接素子間の類似度とそれらの比のヒストグラムとクラ
スタの階層構造を示す図である。FIG. 13 is a diagram showing a histogram of the number of wins, the degree of similarity between adjacent elements and their ratio, and a hierarchical structure of clusters in the case of four-dimensional vector four clusters.

[Explanation of symbols]

１１…マップ作成部１２…階層構造作成部１３…ラベル付け部２１…入力データ群２１Ａ、２１Ｂ、２１Ｃ…クラスタ２１Ａ１、２１Ａ２…サブクラスタ３１…マップ３２…素子群３２Ａ、３２Ｂ、３２Ｃ…素子群３３…プロトタイプ群６１、６１ａ）、６１ｂ）…マップ１１１…データ入力部１１２…マップ部１２１…マップ解析部１２２…データ列融合部１２１Ａ…集積度計算部１２１Ｂ…データ列作成部１２２…データ列融合部１３１…マップラベル部１３２…データ入力部１３３…データラベル部１５１…プロトタイプ融合部１５２…集積度計算部Ｖ…勝利数ｄＭ…隣接素子間の類似度ＭＬ…マップ層ＩＰ…入力層 11 ... Map creation unit 12 ... Hierarchical structure creation unit 13 ... Labeling unit 21 ... Input data group 21A, 21B, 21C ... Cluster 21A1, 21A2 ... Sub-cluster 31 ... Map 32 ... Element group 32A, 32B, 32C ... Element group 33 Prototype group 61, 61a), 61b) ... Map 111 ... Data input unit 112 ... Map unit 121 ... Map analysis unit 122 ... Data string fusion unit 121A ... Integration degree calculation unit 121B ... Data sequence creation unit 122 ... Data sequence fusion unit 131 ... Map label part 132 ... Data input part 133 ... Data label part 151 ... Prototype fusion part 152 ... Integration degree calculation part V ... Number of wins dM ... Similarity between adjacent elements ML ... Map layer IP ... Input layer

Claims

[Claims]

1. A map creating unit for creating a map of prototypes for input data using one-dimensional self-organizing feature mapping, and a hierarchical structure creating unit for creating a hierarchical structure of clusters from the map. A cluster classification device, comprising: a labeled map and a labeling unit that classifies input data according to a hierarchical structure.

2. The hierarchical structure creating unit according to claim 1,
It is characterized by comprising a map analysis unit that calculates a quantity representing the cluster integration degree from the obtained map and creates a data sequence, and a data sequence fusion unit that creates a cluster hierarchical structure from the obtained data sequence. Cluster classifier.

3. The hierarchical structure creating section according to claim 1,
A cluster classification device comprising: an integration degree calculation unit that calculates an amount representing the integration degree of a cluster from the obtained map; and a prototype fusion unit that creates a hierarchical structure of the cluster.

4. The hierarchical structure creating unit according to claim 1,
A cluster classification device characterized by comprising only a prototype fusion unit for creating a hierarchical structure of clusters from the obtained map.