JP2003330583A

JP2003330583A - Cluster analysis system

Info

Publication number: JP2003330583A
Application number: JP2002143169A
Authority: JP
Inventors: Shigeru Tago; 滋多胡; Noriyuki Yamamoto; 宣之山本
Original assignee: Hitachi Software Engineering Co Ltd
Current assignee: Hitachi Software Engineering Co Ltd
Priority date: 2002-05-17
Filing date: 2002-05-17
Publication date: 2003-11-21

Abstract

<P>PROBLEM TO BE SOLVED: To provide a cluster analysis system that efficiently extracts a part of clusters, which are adequate results of classification, from the results of classification obtained by short time trial and error. <P>SOLUTION: The cluster analysis system is provided with a first means which inputs a plurality of groups of data to be classified; a second means which accumulates the groups of the partial data of the results of the classification executed to each of the plurality of groups of data inputted by the first means with a plurality of kinds of similarity arithmetics respectively different in parameter; and a third means that checks to which group of the partial data of the results of the similarity arithmetics, among the groups of the partial data of the results of the classification accumulated by the second means, a specific group of data among the plurality of groups of the data inputted by the first means belongs and displays the results of the check thus obtained in a visually recognizable form. <P>COPYRIGHT: (C)2004,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、多数のデータをそ
の類似度に応じて分類するクラスタ解析システムに係
り、特に分類実行時の類似度演算式またはパラメータの
値によって分類結果が異なるという性質を持つデータを
分類する場合に好適なクラスタ解析システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a cluster analysis system for classifying a large number of data according to the degree of similarity, and in particular, has a property that the classification result differs depending on the similarity calculation expression or the parameter value at the time of classification. The present invention relates to a cluster analysis system suitable for classifying held data.

【０００２】[0002]

【従来の技術】従来において、複数の分類対象のデータ
を入力し、それらを互いの類似度によって部分データ化
すなわちクラスタ化する機能を持つクラスタ解析システ
ムがある。このクラスタ解析システムにおいては、クラ
スタ化を行う際に適用される各種計算手法または、各種
計算手法に対し実行前に設定する初期パラメータによっ
て、得られるクラスタの内容、すなわちどのデータがい
ずれのクラスタに属しているかが変化することが知られ
ている。その変化するクラスタの中には、元となるデー
タ自身の本来の性質から判断して不適当な分類になって
しまうものもしばしば混在している。したがって、解析
担当者は、分類の元となるデータ自身の本来の性質から
判断して適当と思われる分類が実行できるような計算手
法およびその実行前に設定する初期パラメータを選択し
なければならない。2. Description of the Related Art Conventionally, there is a cluster analysis system having a function of inputting a plurality of classification target data and converting them into partial data, that is, clustering according to the mutual similarity. In this cluster analysis system, the contents of the obtained cluster, that is, which data belongs to which cluster, depends on various calculation methods applied when clustering or initial parameters set before execution for each calculation method. It is known that there is a change. Of the changing clusters, there are often mixed clusters that are judged to be inappropriate according to the original nature of the original data itself. Therefore, the analyst must select a calculation method and an initial parameter to be set before the execution so that the classification that is considered appropriate can be executed by judging from the original property of the data itself which is the source of the classification.

【０００３】しかし、分類の元となるデータ自身の本来
の性質から判断して適当と思われる分類が実行できるよ
うな計算手法およびその実行前に設定する初期パラメー
タについて、適切なものを選択することは非常に困難に
なっている。なぜなら、それら計算手法そのものが特定
のデータに対して常に適当な分類が実行できることが保
証されている手法として開発されておらず、他の手法に
対して経験的に優位な分類が実行できる場合があること
が知られている程度であって、実際に入力されるデータ
の微妙な性質あるいは誤差あるいは傾向などの影響を受
けて大きく異なる分類結果を出力する場合があるからで
ある。However, it is necessary to select a proper calculation method and an initial parameter to be set before the execution, which makes it possible to carry out a classification which seems to be appropriate by judging from the original property of the data which is the source of the classification. Has become very difficult. This is because those calculation methods themselves have not been developed as methods that are guaranteed to always perform appropriate classification on specific data, and there are cases where empirical superior classification can be performed over other methods. This is because it is known that there is a certain degree, and there may be a case where a significantly different classification result is output due to the influence of the subtle nature, error, or tendency of the data that is actually input.

【０００４】[0004]

【発明が解決しようとする課題】これに対し、ある特定
のデータ群を入力として分類を実行する場合、異なる計
算手法の異なる初期パラメータを無作為あるいは経験的
な判断から選択し、これを試行錯誤的に繰り返し、入力
データの本来の性質から判断して適当と思われる分類結
果を得るまで繰り返すという方法がとられている。On the other hand, when performing classification with a specific data group as an input, different initial parameters of different calculation methods are selected from random or empirical judgment, and this is selected by trial and error. Iteratively repeats the process until it obtains a classification result that is considered appropriate based on the original nature of the input data.

【０００５】しかし、比較的短い試行錯誤の結果、適当
な分類結果が得られるとは限らず、時間等の物理的な制
約により適当な分類結果が得られない段階で試行錯誤を
中止しなければならない場合もしばしばある。この場合
は、それまでに得られた分類結果はすべて不適当な分類
結果として扱われ、期待した結果が得られないばかりで
なく、多大な作業時間を無駄にすることになってしま
う。したがって、短い試行錯誤によって得られた分類結
果の中からも、その各々に含まれるクラスタのうち、適
当な分類結果となっている一部のクラスタだけでも抽出
し、少なくともそれらクラスタについては適当な分類結
果が得られたと見なすことにより、作業効率を上げる必
要がある。However, as a result of relatively short trial and error, an appropriate classification result is not always obtained, and trial and error must be stopped at a stage where an appropriate classification result cannot be obtained due to physical constraints such as time. Often it does not. In this case, all the classification results obtained up to that point are treated as inappropriate classification results, not only the expected result cannot be obtained, but also a great amount of work time is wasted. Therefore, even among the classification results obtained by short trial and error, only some of the clusters included in each of them that have appropriate classification results are extracted, and at least those clusters are classified appropriately. It is necessary to improve work efficiency by assuming that the result is obtained.

【０００６】しかし、従来においては、適当な分類結果
となっている一部のクラスタを効率良く抽出する手段に
ついて何ら考慮されていなかった。However, heretofore, no consideration has been given to means for efficiently extracting a part of clusters having an appropriate classification result.

【０００７】本発明の目的は、短い試行錯誤によって得
られた分類結果の中から適当な分類結果となっている一
部のクラスタを効率良く抽出することができるクラスタ
解析システムを提供することにある。An object of the present invention is to provide a cluster analysis system capable of efficiently extracting a part of clusters having an appropriate classification result from the classification results obtained by short trial and error. .

【０００８】[0008]

【課題を解決するための手段】上記の目的を達成するた
めに、本発明は、複数の試行錯誤の分類結果同士を視覚
的に簡易に比較できるように表示するような構成にした
ことを特徴とする。すなわち、分類対象の複数組のデー
タを入力する第１の手段と、該第１の手段によって入力
された複数組のデータのそれぞれについてパラメータを
異ならせた複数種類の類似度演算によって分類した分類
結果の部分データ群を蓄積する第２の手段と、前記第１
の手段によって入力された複数組のデータの特定の組の
データが前記第２の手段に蓄積された分類結果の部分デ
ータ群の中のいずれの種類の類似度演算結果の部分デー
タ群に属しているかを照合し、その結果を視覚的に認識
できる形式で表示する第３の手段とを備えることを特徴
とする。また、前記第３の手段が、画面上に表示される
２次元の表の一方の軸に前記第１の手段によって入力さ
れた複数組のデータを個々に区別できる形式で表示し、
他方の軸に分類結果の部分データ群を個々に区別できる
形式で表示し、各軸上の表示位置を２次元座標とする各
格子点上に、各組のデータが対応する部分データ群に属
しているか否かを視覚的に認識できる形式で表示する手
段を備えることを特徴とする。また、前記第３の手段
が、前記２次元の表の一方の軸上に分類結果の部分デー
タ群を表示するに際し、いずれの種類の類似度演算の結
果であるかを視覚的に区別できる形式で表示する手段を
さらに備えることを特徴とする。さらに、前記第３の手
段における照合対象となる組のデータおよび類似度演算
の実行単位を選択する手段をさらに備えることを特徴と
する。In order to achieve the above object, the present invention has a structure in which a plurality of trial-and-error classification results are displayed for easy visual comparison. And That is, the first means for inputting a plurality of sets of data to be classified, and the classification results obtained by performing a plurality of types of similarity calculation with different parameters for each of the plurality of sets of data input by the first means. Second means for accumulating a partial data group of
The data of a specific set of the plurality of sets of data inputted by the means belong to any kind of the partial data group of the similarity calculation result in the partial data group of the classification result accumulated in the second means. And a third means for displaying the result in a visually recognizable format. Further, the third means displays a plurality of sets of data input by the first means on one axis of a two-dimensional table displayed on the screen in a format that can be individually distinguished,
The partial data group of the classification result is displayed on the other axis in a form that can be individually distinguished, and each set of data belongs to the corresponding partial data group on each grid point whose display position on each axis is two-dimensional coordinates. It is characterized in that it is provided with a means for displaying in a format in which it can be visually recognized. A format in which the third means can visually distinguish which kind of similarity calculation result is obtained when displaying the partial data group of the classification result on one axis of the two-dimensional table. Is further provided with means for displaying. Further, it is characterized by further comprising means for selecting a set of data to be collated in the third means and an execution unit of similarity calculation.

【０００９】このような構成とすることによって、分類
を実行する特定の計算手法およびその初期パラメータの
元で、入力データの微妙な性質あるいは誤差あるいは傾
向などの影響を受けて、分類結果が大きく異なる場合に
対し、複数種類の計算手法および初期パラメータを使っ
て分類を実行し、それらの結果を比較することにより、
その実行結果の各々に含まれるクラスタのうち、適当な
分類結果となっている一部のクラスタを効率良く抽出す
ることが可能となり、解析作業時間の無駄を削減するこ
とに貢献することができる。With such a configuration, the classification result is significantly different under the influence of the delicate property of the input data, the error, the tendency, etc. under the specific calculation method for executing the classification and its initial parameters. For each case, by performing classification using multiple calculation methods and initial parameters and comparing the results,
Among the clusters included in each of the execution results, it is possible to efficiently extract a part of the clusters having an appropriate classification result, which can contribute to reducing waste of analysis work time.

【００１０】[0010]

【発明の実施の形態】以下、本発明の実施の形態を図面
を参照して詳細に説明する。図１は、本発明に係るクラ
スタ解析システムの実施の形態を示すシステム構成図で
ある。図１において、１０１は文字および図形を描画す
る機能を持つディスプレイ装置、１０２はディスプレイ
装置１０１の画面上の１点を指示する機能を持つマウス
装置、１０３および１０４は数値および文字データが格
納されているファイルを読み書きする機能を持つ補助記
憶装置である。補助記憶装置１０３は分類する対象のデ
ータを格納しており、補助記憶装置１０４は分類結果の
クラスタデータを格納している。１０５は補助記憶装置
１０３および１０４上のファイルから数値および文字を
読み出す機能と、ディスプレイ装置１０１の画面の任意
の場所に文字または図形を描画するよう指示する機能を
持つ演算装置である。この演算装置１０５は、付属の入
力手段から入力され、かつ補助記憶装置１０３に格納さ
れた複数組の分類対象のデータについてパラメータを異
ならせた複数種類の類似度演算を行い、その演算結果の
部分データ群を補助記憶装置１０４に蓄積する。なお、
類似度演算については、公知のものを利用することがで
きるので、詳細な説明は省略する。BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a system configuration diagram showing an embodiment of a cluster analysis system according to the present invention. In FIG. 1, 101 is a display device having a function of drawing characters and figures, 102 is a mouse device having a function of indicating one point on the screen of the display device 101, and 103 and 104 are numerical and character data. It is an auxiliary storage device that has the function of reading and writing existing files. The auxiliary storage device 103 stores the data to be classified, and the auxiliary storage device 104 stores the cluster data of the classification result. Reference numeral 105 denotes an arithmetic unit having a function of reading out numerical values and characters from the files on the auxiliary storage devices 103 and 104 and a function of instructing to draw a character or a figure at an arbitrary position on the screen of the display device 101. The arithmetic unit 105 performs a plurality of types of similarity arithmetic operations with different parameters for a plurality of sets of data to be classified, which are input from the attached input means and are stored in the auxiliary storage unit 103, and a portion of the arithmetic operation result. The data group is stored in the auxiliary storage device 104. In addition,
A publicly known one can be used for the similarity calculation, and thus detailed description thereof will be omitted.

【００１１】図２は、補助記憶装置１０３に保存されて
いる分類対象の入力データファイルの構造を示してい
る。図２において、２０１は本ファイル内で一意なデー
タＩＤである。２０２は本ファイル内で一意なデータ名
称であり、ここでは１０組の分類対象データについてDa
ta1〜Data9という名称が付与されている。２０３〜２０
５は各組の分類対象データの要素である。この例では３
つの数値から１組のデータが構成されている。FIG. 2 shows the structure of the input data file to be classified, which is stored in the auxiliary storage device 103. In FIG. 2, 201 is a data ID unique in this file. 202 is a unique data name in this file, and here, for 10 sets of classification target data, Da
The names ta1 to Data9 are given. 203-20
5 is an element of the classification target data of each set. 3 in this example
A set of data is composed of one numerical value.

【００１２】図３は、補助記憶装置１０４に保存されて
いるクラスタファイルの構造を示している。本実施形態
ではパラメータを異ならせた複数種類の類似度演算を複
数回実行し、各実行単位で生成されたクラスタデータ群
（部分データ群）を１個のファイルに書き込むことによ
り、１回の操作でそのすべてを入力する方法となってい
るが、各分類の実行結果を別々のファイルに保存し、そ
れらをまとめてあるいは順番に読み込む方法もありう
る。３０１は本ファイル内で一意な分類実行単位ＩＤで
ある。３０２は本ファイル内で一意な分類実行単位名称
である。３０３は本ファイル内で一意な各実行単位の実
行の結果生成されたクラスタデータ群のＩＤである。３
０４は本ファイル内で一意なクラスタＩＤ３０３に対応
するクラスタ名称である。３０５は各クラスタデータ群
に属する部分データのデータＩＤである。FIG. 3 shows the structure of a cluster file stored in the auxiliary storage device 104. In the present embodiment, a plurality of types of similarity calculation with different parameters are executed a plurality of times, and the cluster data group (partial data group) generated in each execution unit is written in one file to perform a single operation. Although it is a method of inputting all of them, there is also a method of saving the execution results of each classification in separate files and reading them all together or in order. 301 is a classification execution unit ID that is unique in this file. 302 is a classification execution unit name unique in this file. 303 is an ID of the cluster data group generated as a result of execution of each execution unit, which is unique in this file. Three
Reference numeral 04 is a cluster name corresponding to the unique cluster ID 303 in this file. 305 is a data ID of the partial data belonging to each cluster data group.

【００１３】図４は、ディスプレイ装置１０１に表示す
るメニューの構成例を示す図であり、４０１は選択可能
なデータの組のデータ名称を一覧するメニューである。
４０２は選択可能な分類実行単位の分類実行単位名称を
一覧するメニューである。４０３はメニュー選択の終了
を指示するＯＫボタンである。FIG. 4 is a diagram showing an example of the structure of a menu displayed on the display device 101, and 401 is a menu for listing the data names of selectable data sets.
A menu 402 lists the classification execution unit names of the selectable classification execution units. 403 is an OK button for instructing the end of menu selection.

【００１４】図５は、クラスタ比較画面の構成例を示す
図である。図５において、５０１は図４のメニュー４０
１上で選択したデータの組のデータ名称を一覧表示する
領域である。５０２は図４のメニュー４０２上で選択し
た分類実行単位の分類実行単位名称を一覧表示する領域
である。５０３は分類実行単位名称の一覧表示領域５０
２に表示された分類実行単位に属するクラスタのクラス
タ名称を一覧表示する領域である。５０４は特定のデー
タ名称のデータが特定のクラスタ名称のクラスタに属す
るか否かを表示するためのセル領域である。データ名称
一覧表示領域５０１上のデータ名称のデータが、クラス
タ名称一覧表示領域５０３上のいずれかのクラスタ名称
のクラスタに属するものであることが判明した場合、当
該データの名称表示行と当該クラスタ名称表示列とが交
差する位置にあるセル領域が特定の色（例えば黒）で塗
りつぶされる。FIG. 5 is a diagram showing a configuration example of the cluster comparison screen. In FIG. 5, 501 is the menu 40 of FIG.
1 is an area for displaying a list of data names of the data set selected above. Reference numeral 502 is an area for displaying a list of classification execution unit names of the classification execution units selected on the menu 402 in FIG. 503 is a list display area 50 of classification execution unit names
This is an area for displaying a list of cluster names of clusters belonging to the classification execution unit displayed in 2. Reference numeral 504 is a cell area for displaying whether or not the data having the specific data name belongs to the cluster having the specific cluster name. When it is found that the data having the data name in the data name list display area 501 belongs to the cluster having any of the cluster names in the cluster name list display area 503, the name display line of the data and the cluster name The cell area at the position intersecting the display column is filled with a specific color (for example, black).

【００１５】図６は、本実施形態の処理の詳細を示した
フローチャートである。以下、このフローチャートに従
い、本実施形態の処理の詳細を説明する。まず、ステッ
プ６０１において、演算装置１０５が補助記憶装置１０
３から入力データファイルの分類対象データ（図２）を
読み込む。次に、ステップ６０２において、分類対象デ
ータについてパラメータを異ならせた複数種類の類似度
演算を複数回実行し、図３に示したように分類対象デー
タが分類されたクラスタデータファイルを補助記憶装置
１０４に格納する。次に、ステップ６０３において、演
算装置１０５が補助記憶装置１０４からクラスタデータ
ファイルを読み込む。次に、ステップ６０４において、
演算装置１０５がデータＩＤ２０１およびデータ名称２
０２を参照し、その一覧を生成する。FIG. 6 is a flow chart showing details of the processing of this embodiment. The details of the processing of this embodiment will be described below with reference to this flowchart. First, in step 601, the arithmetic unit 105 causes the auxiliary storage device 10 to operate.
The classification target data (FIG. 2) of the input data file is read from No. 3. Next, in step 602, a plurality of types of similarity calculation with different parameters for the classification target data are executed a plurality of times, and the cluster data file in which the classification target data is classified as shown in FIG. To store. Next, in step 603, the arithmetic device 105 reads the cluster data file from the auxiliary storage device 104. Next, in step 604,
The arithmetic unit 105 has the data ID 201 and the data name 2
02, and creates the list.

【００１６】次に、ステップ６０５において、演算装置
１０５が分類実行単位ＩＤ３０１および分類実行単位名
称３０２を参照し、その一覧を生成する。次に、ステッ
プ６０６において、ステップ６０４およびステップ６０
５で生成した一覧を元に、ディスプレイ装置１０１の画
面上にデータ名称および分類実行単位名称を図４で示す
ような形式で一覧表示する。次に、ステップ６０７にお
いて、マウス装置１０２によってメニュー４０１および
メニュー４０２の各項目のうち１個または複数個の領域
を指示することにより、分類対象データの組およびクラ
スタデータの分類実行単位名称を選択する。図４の例で
は、網掛け表示で示しているように、Data1、Data2およ
びData4の３つのデータと、Run2、Run3の２つの分類実行
単位名称を選択している。次に、ステップ６０８におい
て、マウス装置１０２によってＯＫボタン４０３の領域
を指示することにより、メニュー選択が終了したことを
演算装置１０５に対して指示する。Next, in step 605, the arithmetic unit 105 refers to the classification execution unit ID 301 and the classification execution unit name 302 to generate a list thereof. Next, in step 606, step 604 and step 60
Based on the list generated in step 5, a list of data names and classification execution unit names is displayed on the screen of the display device 101 in the format shown in FIG. Next, in step 607, the mouse device 102 designates one or more areas of the items of the menu 401 and the menu 402 to select a set of classification target data and a classification execution unit name of cluster data. . In the example of FIG. 4, as shown by the shaded display, three data of Data1, Data2 and Data4 and two classification execution unit names of Run2 and Run3 are selected. Next, in step 608, the area of the OK button 403 is instructed by the mouse device 102, thereby instructing the arithmetic device 105 that the menu selection is completed.

【００１７】次に、ステップ６０９において、ステップ
６０７で選択された全データのデータ名称を図５の領域
５０１に表示する。図５の例では、図４のメニュー４０
１で１０組のデータが選択された場合を仮定し、領域５
０１に垂直方向に同一の高さの行としてData1〜Data9と
いう１０組のデータ名称を一覧表示している。次に、ス
テップ６１０において、ステップ６０７で選択された全
分類実行単位に属する全クラスタのクラスタ名称を図５
の領域５０３に分類実行単位ごとに連続した領域として
表示する。図５の例では領域５０３に水平方向に同一の
幅の列としてCluster11〜Cluster33というクラスタ名称
を一覧表示している。そして、全分類実行単位の実行単
位名称を領域５０２に表示する。図５の例では、各実行
単位に含まれるクラスタのクラスタ名称表示領域５０３
の列の幅の合計と同じ幅の列としてRun1〜Run3という分
類実行単位名称を表示する。Next, in step 609, the data names of all the data selected in step 607 are displayed in the area 501 of FIG. In the example of FIG. 5, the menu 40 of FIG.
Assuming that 10 sets of data are selected in 1, the area 5
In 01, 10 sets of data names of Data1 to Data9 are displayed in a list as rows having the same height in the vertical direction. Next, in step 610, the cluster names of all clusters belonging to all classification execution units selected in step 607 are shown in FIG.
Is displayed as a continuous area for each classification execution unit. In the example of FIG. 5, cluster names Cluster11 to Cluster33 are displayed as a list in the area 503 as columns having the same width in the horizontal direction. Then, the execution unit names of all classification execution units are displayed in the area 502. In the example of FIG. 5, the cluster name display area 503 of the cluster included in each execution unit
Run 1 to Run 3 classification run unit names are displayed as columns with the same width as the total column width.

【００１８】次に、ステップ６１１において、メニュー
４０１で選択された全データの各々について、それぞれ
のデータがいずれのクラスタデータ群に属するかを照合
する処理を繰り返す。照合の結果、例えばデータData1
が実行単位Run1のクラスタデータ群Cluster1に属するこ
とが判明した場合、データ名称Data1の表示行とクラス
タデータ名称Cluster1の表示列とが交差する位置にある
セル領域を特定の色（例えば黒）で塗りつぶす。この場
合、データの各々が、図３に示したクラスタデータファ
イル中において複数の異なるクラスタデータ群に属する
場合は、そのクラスタ群の各々について対応する領域を
塗りつぶす。すべてのデータについてステップ６１１の
処理を実行した時点で、利用者は図５に示した画面を参
照し、各クラスタが適当なものになっているか否かを視
覚的に判断する。Next, in step 611, for each of all the data selected in the menu 401, the process of collating which cluster data group each data belongs to is repeated. Matching result, for example data Data1
Is found to belong to the cluster data group Cluster1 of the execution unit Run1, fill the cell area at the position where the display row of the data name Data1 and the display column of the cluster data name Cluster1 intersect with a specific color (for example, black) . In this case, if each piece of data belongs to a plurality of different cluster data groups in the cluster data file shown in FIG. 3, the corresponding area for each of the cluster groups is filled. When the process of step 611 is executed for all the data, the user visually refers to the screen shown in FIG. 5 and visually determines whether or not each cluster is appropriate.

【００１９】図５の例では９組のデータを３回にわたっ
て分類した結果、各々３個のクラスタ群に分類されてい
ることを示している。例えばData1、Data4、Data7の３組
のデータは、分類実行単位Run1においては同一のクラス
タデータ群に分類されているが、他の分類実行単位Run
2、Run3では異なるクラスタデータ群に分類されている。
一方、Data2、Data5、Data8は、分類実行単位Run1、Run2に
おいて同一のクラスタデータ群に分類されている。した
がって、Data2、Data5、Data8は２種類の分類実行単位に
よって同一のクラスタデータ群に分類されているという
ことから、Cluster12、またはCluster22は比較的適当な
クラスタであると判断できる。The example of FIG. 5 shows that, as a result of classifying nine sets of data three times, each group is classified into three cluster groups. For example, three sets of data of Data1, Data4, and Data7 are classified into the same cluster data group in the classification execution unit Run1, but other classification execution units Run
2 and Run 3 are classified into different cluster data groups.
On the other hand, Data2, Data5, and Data8 are classified into the same cluster data group in the classification execution units Run1 and Run2. Therefore, since Data2, Data5, and Data8 are classified into the same cluster data group by the two types of classification execution units, it can be determined that Cluster12 or Cluster22 is a relatively appropriate cluster.

【００２０】[0020]

【発明の効果】以上のように、本発明によれば、分類を
実行する特定の計算手法およびその初期パラメータの元
で、入力データの微妙な性質あるいは誤差あるいは傾向
などの影響を受けて、分類結果が大きく異なる場合に対
し、複数種類の計算手法および初期パラメータを使って
分類を実行し、それらの結果を比較することにより、そ
の実行結果の各々に含まれるクラスタのうち、適当な分
類結果となっている一部のクラスタを効率良く抽出する
ことが可能となり、解析作業時間の無駄を削減すること
に貢献することができる。As described above, according to the present invention, the classification is performed under the influence of the delicate property of the input data, the error, the tendency, etc. under the specific calculation method for executing the classification and its initial parameters. For cases where the results are very different, classification is performed using multiple types of calculation methods and initial parameters, and the results are compared to find the appropriate classification result among the clusters included in each of the execution results. It becomes possible to efficiently extract a part of the clusters that have become obsolete, which can contribute to reducing waste of analysis work time.

[Brief description of drawings]

【図１】本発明の実施の形態をシステム構成図である。FIG. 1 is a system configuration diagram of an embodiment of the present invention.

【図２】入力データの構成図である。FIG. 2 is a configuration diagram of input data.

【図３】クラスタデータの構成図である。FIG. 3 is a configuration diagram of cluster data.

【図４】分類対象データ及び分類実行単位を選択するメ
ニュー画面の例を示す図である。FIG. 4 is a diagram showing an example of a menu screen for selecting classification target data and classification execution units.

【図５】クラスタ比較画面の例を示す図である。FIG. 5 is a diagram showing an example of a cluster comparison screen.

【図６】全体の処理を示すフローチャートである。FIG. 6 is a flowchart showing the overall processing.

[Explanation of symbols]

１０１…ディスプレイ装置、１０２…マウス装置、１０
３，１０４…補助記憶装置、１０５…演算装置。101 ... Display device, 102 ... Mouse device, 10
3, 104 ... Auxiliary storage device, 105 ... Arithmetic device.

フロントページの続き (72)発明者山本宣之神奈川県横浜市中区尾上町６丁目81番地日立ソフトウエアエンジニアリング株式会社内Ｆターム(参考） 5E501 AA01 AC18 BA03 FA01 FA24 FA47 Continued front page (72) Inventor Nobuyuki Yamamoto 6-81 Onoe-cho, Naka-ku, Yokohama-shi, Kanagawa Hitachi Software Engineering Stock Association In-house F-term (reference) 5E501 AA01 AC18 BA03 FA01 FA24 FA47

Claims

[Claims]

1. A first method for inputting a plurality of sets of data to be classified
Means, and second means for accumulating a partial data group of a classification result classified by a plurality of types of similarity operation with different parameters for each of the plurality of sets of data input by the first means, The data of a specific set of the plurality of sets of data input by the first means is the second data.
Of the partial data group of the classification result accumulated in the means for checking which kind of partial data group of the similarity calculation result belongs, and displaying the result in a visually recognizable format. A cluster analysis system comprising:

2. The second means for displaying on a screen 2
A plurality of sets of data input by the first means are displayed on one axis of the dimension table in a format that can be individually distinguished, and a partial data group of classification results is displayed on the other axis in a format that can be individually distinguished. , Means for displaying on a grid point having a display position on each axis as a two-dimensional coordinate in a format capable of visually recognizing whether or not each set of data belongs to a corresponding partial data group. The cluster analysis system according to claim 1, which is characterized in that.

3. When displaying the partial data group of the classification result on one axis of the two-dimensional table, the third means visually indicates which kind of similarity calculation result is obtained. The cluster analysis system according to claim 1 or 2, further comprising means for displaying in a distinguishable format.

4. The method according to claim 1, further comprising means for selecting a set of data to be collated by the third means and an execution unit of similarity calculation. Cluster analysis system.