JP2001175640A

JP2001175640A - Cluster classification device and method, and recording medium recorded with program for cluster classification

Info

Publication number: JP2001175640A
Application number: JP35970899A
Authority: JP
Inventors: Nobukatsu Kitajima; 伸克北島
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1999-12-17
Filing date: 1999-12-17
Publication date: 2001-06-29
Anticipated expiration: 2019-12-17
Also published as: JP3636016B2

Abstract

PROBLEM TO BE SOLVED: To provide a cluster classifying device which is improved in performance by performing classification and analysis by taking a subset (area) of featured data from actual property data which have overlaps between different classes and are used for, e.g. credit examination, marketing, etc. SOLUTION: Past property data are stored in a past property data storage part 21, property data stored in the storage part 21 are received, and a cluster number determination part 31 determines the number of clusters corresponding to the received property data; and the number of clusters and the property data are received from the decision part 31 and clustered by a clustering part 32, the clustering result is received from the clustering part 32, and a result analysis part 33 analyzes the result.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、クラスタ分類装
置、クラスタ分類方法及びクラスタ分類用プログラムを
記録した記録媒体に関し、特に、性質の異なる少なくと
も２種以上のデータを含むデータ集合(データ空間)か
ら、特徴的なデータの部分集合(データ領域)を取り出す
ことによって対象データの分類、分析を目的とするクラ
スタ分類装置、クラスタ分類方法及びクラスタ分類用プ
ログラムを記録した記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a cluster classification device, a cluster classification method, and a recording medium storing a cluster classification program, and more particularly, to a data set (data space) including at least two or more types of data having different properties. The present invention relates to a cluster classification apparatus, a cluster classification method, and a recording medium that records a cluster classification program for extracting and classifying target data by extracting a subset (data area) of characteristic data.

【０００２】[0002]

【従来の技術】大量の顧客データや商品データのような
属性データを利用して、与信審査、マーケティング等を
行う場合、対象データの分類を行う必要が生じる。2. Description of the Related Art When performing credit screening, marketing, and the like using a large amount of attribute data such as customer data and product data, it is necessary to classify target data.

【０００３】例えば、与信審査では新規顧客が返済可能
であるか否かを判定するために顧客データの分類が必要
である。For example, in credit screening, it is necessary to classify customer data in order to determine whether a new customer can be repaid.

【０００４】また、マーケティングにおいては、顧客を
購買傾向や興味を持ちそうな製品毎に分類できると非常
に有用な情報が得られる。[0004] In marketing, very useful information can be obtained if customers can be classified according to purchase tendency or products that are likely to be of interest.

【０００５】以下では、与信審査を例にデータ分類につ
いて説明する。このようなデータ分類においては、従来
より、各種方法や装置が提案されているが、その多く
は、対象データ空間中に、分類境界面を構築するクラス
分類を行うものである。[0005] The data classification will be described below by taking a credit check as an example. Conventionally, various methods and apparatuses have been proposed for such data classification, and most of them perform class classification for constructing a classification boundary surface in a target data space.

【０００６】ところが、現実の属性データの分類では、
異なるクラス間の重なりが大きく、データ空間内に境界
面を構築して分類する方法では、どのように境界面を構
築してたとしても、分類性能に限界が生じることが頻繁
に起こりうる。例えば、与信審査で、ローンを返却でき
ない顧客データ(不良データ)の一部は、遅れ等がなく正
常に返済した顧客のデータ(正常データ)と属性が重なる
場合がある。However, in the classification of actual attribute data,
There is a large overlap between different classes, and in the method of constructing and classifying a boundary surface in the data space, no matter how the boundary surface is constructed, a limit may often occur in the classification performance. For example, in the credit check, a part of the customer data (defective data) for which the loan cannot be returned may have the same attribute as the data (normal data) of the customer who normally repaid without delay.

【０００７】また、無理やり境界面を構築する分類方法
では、意味のある特徴を持つデータも単なる正分類、誤
分類として認識されるのみである場合が多い。例えば、
与信審査において、過去の事例では幸運にもローンの返
済が出来たが、確率的にはローンの返済が難しい属性を
持つ顧客は、返済可能性が困難な顧客として認識される
べきであるが、分類境界面を構築するような手法では、
このような認識は難しい。In the classification method for forcibly constructing a boundary surface, data having meaningful characteristics is often recognized only as a correct classification or a misclassification. For example,
In credit screening, in the past cases, fortunately, loans were repaid, but customers with attributes that make it difficult to repay loans should be recognized as those who have difficulty in repayment, Techniques such as constructing classification boundaries,
Such recognition is difficult.

【０００８】上記のような問題を解決する方法として、
データ空間を境界面で分離するのではなく、データ空間
中から特徴的な領域を取り出す方法が知られている。As a method for solving the above problems,
There is known a method of extracting a characteristic region from a data space instead of separating the data space at a boundary surface.

【０００９】例えば、ローン与信の顧客データ空間中か
ら「正常顧客の多い領域」、「不良顧客の多い領域」、
「正常、不良顧客が混在している領域」という特徴的な
領域を取り出すことによって、過去の顧客データの詳し
い分析が可能になる。[0009] For example, from the customer data space of the loan credit, "the area with many normal customers", "the area with many bad customers",
By extracting a characteristic area "area where both normal and defective customers are mixed", detailed analysis of past customer data becomes possible.

【００１０】さらに、新規の顧客に対しても、単に「正
常」、「不良」の判定を行うだけでなく、「正常である
可能性が高い」等の判定を、過去の事例をもとにした確
率値とともに示すことが出来る。[0010] Further, for a new customer, not only the determination of “normal” and “bad” is made, but also the determination of “likely to be normal” is made based on past cases. It can be shown together with the probability value.

【００１１】また、過去の事例から該当する新規顧客に
属性が近い顧客リストを取り出すことも可能である。It is also possible to extract a customer list whose attributes are close to the new customer concerned from past cases.

【００１２】上記の要件を満たす技術は、クラス分類と
いうよりも、むしろクラスタリングの技術といえる。つ
まり、前述の例に示すような、「正常顧客の多い領
域」、「不良顧客の多い領域」、「正常、不良顧客が混
在している領域」という特徴領域をクラスタとして取り
出して、分類、分析を行うのである。A technique satisfying the above requirements can be said to be a clustering technique rather than a class classification. That is, as shown in the above-described example, the characteristic regions “region with many normal customers”, “region with many bad customers”, and “region where normal and bad customers are mixed” are extracted as clusters, and classified and analyzed. It does.

【００１３】クラスタリングの代表的手法としては、最
尤推定法、K-means法、LBG法、階層的方法等が挙げられ
る。Typical clustering methods include a maximum likelihood estimation method, a K-means method, an LBG method, and a hierarchical method.

【００１４】このうち、最尤推定法、K-means法、LBG法
は、クラスタ数、位置、データの分布形状を仮定して、
分類、分析を実行する手法である。ところが、現実のデ
ータを扱う場合には、分類前にはクラスタ数、位置、デ
ータの分布形状は未知であることがほとんどであるた
め、仮定を間違えたり、適切な初期値を設定できない可
能性がある。このような場合には、これらのクラスタリ
ング手法（最尤推定法、K-means法、LBG法、階層的方法
等）では、優れた性能は期待できない。Among them, the maximum likelihood estimation method, the K-means method, and the LBG method assume the number of clusters, the position, and the distribution shape of data,
This is a method for performing classification and analysis. However, when dealing with actual data, the number of clusters, the position, and the distribution shape of the data are often unknown before classification, so there is a possibility that mistakes may be made or appropriate initial values cannot be set. is there. In such a case, excellent performance cannot be expected with these clustering methods (such as the maximum likelihood estimation method, the K-means method, the LBG method, and the hierarchical method).

【００１５】また、階層的方法は、ユーザが定義したデ
ータ間、クラスタ間の距離を基に、データを逐次的に統
合、分割してクラスタリングを行う手法であるが、分
割、統合処理の手順やアルゴリズムの初期状態の設定に
よって結果が大きく変化するという問題点を有してい
る。The hierarchical method is a method of sequentially integrating and dividing data and performing clustering based on a distance between data and a cluster defined by a user. There is a problem that the result greatly changes depending on the setting of the initial state of the algorithm.

【００１６】上記の問題点を解決する方法として、自己
組織化特徴写像(Self-Organizing Map, 「ＳＯＭ」とい
う)が知られている。As a method for solving the above problem, a self-organizing feature map (SOM) is known.

【００１７】ＳＯＭは、分類前にクラスタ数や位置、デ
ータの分布形状を仮定する必要なしに、クラスタリング
を行うことが可能である。The SOM can perform clustering without having to assume the number of clusters, the position, and the distribution shape of data before classification.

【００１８】このＳＯＭについて以下に説明する。図１
０は、ＳＯＭを説明するための図である。図１０を参照
すると、ＳＯＭは、入力層５１、競合層５２からなる、
２層のニューラルネットワークである。入力層５１のユ
ニット数は、対象ベクトルデータの次元数に等しい。競
合層５２では、競合層ユニットが、２次元に配置されて
おり、各競合層ユニットは、入力層ユニットと全結合し
ている。This SOM will be described below. FIG.
0 is a diagram for explaining SOM. Referring to FIG. 10, the SOM includes an input layer 51 and a competition layer 52.
This is a two-layer neural network. The number of units in the input layer 51 is equal to the number of dimensions of the target vector data. In the competitive layer 52, the competitive layer units are two-dimensionally arranged, and each competitive layer unit is fully connected to the input layer unit.

【００１９】ＳＯＭを使ってクラスタリングを行うと、
最終的に１個以上の競合層ユニットが１個のクラスタに
対応することになる。When clustering is performed using SOM,
Finally, one or more competitive layer units correspond to one cluster.

【００２０】つまり、各クラスタに属する過去のデータ
は、各クラスタに対応する競合層ユニットのいずれかに
割当てられる。That is, past data belonging to each cluster is assigned to one of the competitive layer units corresponding to each cluster.

【００２１】ＳＯＭによるクラスタリング処理を説明す
る。ＳＯＭでは、 1)入力データに対する競合層ユニットによる競合、及
び、 2)競合に勝った競合層ユニット及び勝った競合層ユニッ
トに競合層上で近傍であるユニットの結合の重みベクト
ルの更新、という２段階でクラスタリング処理を行う。The clustering process by SOM will be described. In SOM, 1) a competition by a competition layer unit for input data, and 2) an update of a weight vector of a combination of a competition layer unit that has won the competition and a unit that is close to the competition layer unit on the competition layer. The clustering process is performed at the stage.

【００２２】競合層ユニット数が、Ｍ個の場合、各競合
層ユニットには、番号i = 1、...、Mが付与されてい
る。第i競合層ユニットの入力層との間の結合の重みベ
クトルをwiとする。When the number of competitive layer units is M, each competitive layer unit is given a number i = 1,..., M. Let wi be the weight vector of the connection between the i-th competitive layer unit and the input layer.

【００２３】過去のデータxを入力した場合、データ空
間中で、xに最近接となる競合層の重みベクトルwcを見
つける。When past data x is input, a weight vector wc of a competitive layer closest to x is found in the data space.

【００２４】つまり、 That is,

【００２５】である。このwcが競合で勝利した競合層ユ
ニットである。## EQU1 ## This wc is the competitor unit that won the competition.

【００２６】このとき、各競合層ユニットの重みベクト
ルに対して、以下の更新を施す。At this time, the following update is performed on the weight vector of each competitive layer unit.

【００２７】 [0027]

【００２８】ここで、Ncは第c競合層ユニットの近傍、
α(t)は学習係数、tは時刻を表す。Here, Nc is in the vicinity of the c-th competitive layer unit,
α (t) represents a learning coefficient, and t represents time.

【００２９】更新を繰り返しながら、Ncとα(t)は徐々
に小さくする。While repeating the update, Nc and α (t) are gradually reduced.

【００３０】以上の処理によって、競合層ユニットの結
合の重みベクトルは、対象データ空間を代表するベクト
ルとなる。そして、各競合層ユニットの結合の重みベク
トルは、その重みベクトルを最近接とする部分空間の代
表データとなる。By the above processing, the weight vector of the combination of the competitive layer units becomes a vector representing the target data space. Then, the weight vector of the combination of the respective competitive layer units becomes representative data of the subspace in which the weight vector is closest.

【００３１】さらに、各重みベクトルを最近接とする過
去の顧客データは、各重みベクトルに対応する競合層ユ
ニットに割当てられて、クラスタリングが実現できるこ
とになる。Further, past customer data with each weight vector being closest is assigned to a competitive layer unit corresponding to each weight vector, and clustering can be realized.

【００３２】ＳＯＭについては、例えばT. Kohonenによ
る文献（“Self-Organizing Map”， Proc. IEEE,Vol.
78, No. 9, pp. 1464-1480 (1990年)）（「文献１」と
いう）に詳しい。SOM is described in, for example, a document by T. Kohonen (“Self-Organizing Map”, Proc. IEEE, Vol.
78, No. 9, pp. 1464-1480 (1990)) (referred to as “Reference 1”).

【００３３】[0033]

【発明が解決しようとする課題】上記のＳＯＭの性質を
クラスタリングに利用した方法は、これまでも提案され
ているが、前述した特徴的領域を効果的に取り出すため
には、競合層ユニット数がデータに適応している必要が
ある。Although a method using the above-mentioned SOM property for clustering has been proposed, the number of competing layer units must be reduced in order to effectively extract the above-mentioned characteristic region. Must adapt to data.

【００３４】また例えば特開平10-283336号公報、特開
平8-36557号公報、特開平7-64948号公報、特開平2-2115
76号公報等に記載される方法は、競合層ユニット数を初
期的に与えて単純にＳＯＭを利用しているだけであるた
め、異なるクラス間に重なりが存在するデータ集合から
必ずしも適当な特徴領域を取り出せない。Further, for example, JP-A-10-283336, JP-A-8-36557, JP-A-7-64948, JP-A 2-2115
The method described in Japanese Patent Publication No. 76, etc. merely uses the SOM simply by giving the number of competitive layer units at the initial stage. Therefore, an appropriate feature area is not necessarily obtained from a data set in which overlap exists between different classes. Can not be taken out.

【００３５】また、例えば特開平5-205058号公報に記載
されるクラスタリング方法では、最適なクラスタを得る
ために、競合層ユニット数数を順次変化させて、それぞ
れの場合を調べる方法を用いているが、対象データに依
らず機械的にクラスタ数が1個の場合から、あらかじめ
データに対する事前知識なしに決めた最大クラスタ数ま
でクラスタリングを繰り返すため、無駄な処理が多い。In the clustering method described in, for example, Japanese Patent Application Laid-Open No. 5-205058, in order to obtain an optimal cluster, a method is employed in which the number of units in the competitive layer is sequentially changed and each case is examined. However, since the clustering is repeated from a case where the number of clusters is one mechanically irrespective of the target data to a predetermined maximum number of clusters without prior knowledge of the data, there are many useless processes.

【００３６】さらに、各クラスタ数を調べた後で行う評
価は、各クラスタごとの各データと代表データの間の距
離の平均値、及びクラスタ内のデータの平均値をもとに
しているため、やはり本質的に異なるクラス間に重なり
のあるデータを取扱うには不適切である。Further, since the evaluation performed after checking the number of each cluster is based on the average value of the distance between each data and the representative data of each cluster and the average value of the data in the cluster, It is still unsuitable for dealing with overlapping data between disparate classes.

【００３７】そして上記各特許公開公報に記載されてい
る装置等では、ＳＯＭのオリジナルアルゴリズムをその
まま用いているだけであり、ＳＯＭアルゴリズム特有の
性質により、一部のクラスタにのみデータが集中し、各
クラスタの傾向が明確に取り出せない場合がある。In the devices and the like described in the above patent publications, only the original algorithm of SOM is used as it is, and data is concentrated only in some clusters due to the characteristic of the SOM algorithm. In some cases, cluster trends cannot be clearly identified.

【００３８】このように、例えば与信審査、マーケティ
ング等で分類対象とする属性データはクラス間に重なり
が発生することが多いのにもかかわらず、従来より、提
案されている、多くのクラス分類及びクラスタリング手
法ではクラス間の重なりを考慮しないでむりやり分類し
ているだけであり、適切に対応できていない。As described above, although attribute data to be classified in, for example, credit screening, marketing, and the like often overlaps between classes, there have been many conventionally proposed class classifications and classifications. In the clustering method, the classification is simply performed without considering the overlap between the classes, and it cannot be appropriately handled.

【００３９】したがって、本発明は、上記問題点に鑑み
て創案されたものであって、その目的は、異なるクラス
間に重なりが存在する、例えば与信審査、マーケティン
グ等で用いられている現実の属性データから、特徴的な
データの部分集合(領域)を取り出すことにより分類、分
析を行い、分類精度及び性能を向上する装置及び方法並
びに記録媒体を提供することにある。これ以外の本発明
の目的、利点、特徴等は、以下の説明から、当業者に
は、直ちに明らかとされるであろう。Accordingly, the present invention has been made in view of the above-mentioned problems, and has as its object to provide an overlap between different classes, for example, a real attribute used in credit screening, marketing and the like. It is an object of the present invention to provide an apparatus and a method and a recording medium that perform classification and analysis by extracting a subset (region) of characteristic data from data to improve classification accuracy and performance. Other objects, advantages, features, etc. of the present invention will be readily apparent to those skilled in the art from the following description.

【００４０】[0040]

【課題を解決するための手段】前記目的を達成する本発
明に係る装置は、過去における属性データを記憶する過
去属性データ記憶手段と、前記過去属性データ記憶手段
に記憶されている属性データを受け取り、その属性デー
タに対応したクラスタ数を決定するクラスタ数決定手段
と、前記クラスタ数決定手段から、クラスタ数及び属性
データを受け取り、クラスタリングを行うクラスタリン
グ手段と、前記クラスタリング手段から、クラスタリン
グ結果を受け取り、結果の解析を行う結果解析手段とを
備えている。According to a first aspect of the present invention, there is provided an apparatus for receiving past attribute data storing means for storing attribute data in the past, and receiving attribute data stored in the past attribute data storing means. Cluster number determining means for determining the number of clusters corresponding to the attribute data; cluster number and attribute data from the cluster number determining means; clustering means for performing clustering; and clustering results from the clustering means, And a result analyzing means for analyzing the result.

【００４１】また本発明に係る装置においては、結果の
出力を行う出力手段と、クラスタ数の指定を行うための
入力手段とを備え、前記結果解析手段の解析結果を見な
がら前記クラスタ数決定手段を用いて、最適なクラスタ
数をユーザが指定することが可能としている。The apparatus according to the present invention further comprises an output means for outputting a result, and an input means for designating the number of clusters, wherein the number of clusters is determined while observing the analysis result of the result analyzing means. , The user can specify the optimal number of clusters.

【００４２】本発明に係る装置においては、前記クラス
タ数決定手段が、対象データのクラス比をもとにクラス
タ数を決定する。また、本発明に係る装置においては、
前記結果解析手段が、各クラスタのクラス比、代表デー
タ及びクラスタ内のデータ間の類似性を解析する。前記
属性データは、顧客データであることを特徴とする。In the apparatus according to the present invention, the number-of-clusters determining means determines the number of clusters based on the class ratio of the target data. In the device according to the present invention,
The result analysis unit analyzes the class ratio of each cluster, the representative data, and the similarity between the data in the cluster. The attribute data is customer data.

【００４３】また本発明に係る方法は、過去における属
性データを記憶手段に格納するステップと、前記記憶手
段に格納された属性データに対応したクラスタ数を決定
するステップと、前記決定されたクラスタ数と属性デー
タを受け取ってクラスタリングを行うステップと、クラ
スタリング結果の解析を行うステップと、を含む。The method according to the present invention further comprises the steps of: storing attribute data in the past in a storage means; determining the number of clusters corresponding to the attribute data stored in the storage means; Receiving the attribute data and performing the clustering, and analyzing the clustering result.

【００４４】本発明に係る方法においては、前記解析結
果に基づき、最適なクラスタ数をユーザが指定して、対
象データの分析を可能としている。また、本発明に係る
方法においては、前記クラスタ数を決定するステップに
おいて、対象データのクラス比をもとに、クラスタ数を
決定する。さらに、本発明に係る方法においては、前記
クラスタリング結果の解析を行うステップにおいて、各
クラスタのクラス比、代表データ、及び、クラスタ内の
データ間の類似性を解析することもその特徴の一つとし
ている。In the method according to the present invention, the user can specify the optimum number of clusters based on the analysis result, and can analyze the target data. In the method according to the present invention, in the step of determining the number of clusters, the number of clusters is determined based on a class ratio of the target data. Further, in the method according to the present invention, in the step of analyzing the clustering result, analyzing the class ratio of each cluster, representative data, and the similarity between the data in the cluster is one of the features. I have.

【００４５】本発明は、与信審査において新規顧客が返
済可能であるか否かを判定するための顧客データの分類
のための顧客データ分類、マーケティングにおいて顧客
を購買傾向や興味を持ちそうな製品毎に分類するための
データ分類等のビジネスに用いて好適とされる。According to the present invention, a customer data classification for classifying customer data for judging whether or not a new customer can be repaid in a credit check, and a customer in a marketing for each product which is likely to have a purchasing tendency or interest. It is suitable for use in business such as data classification for classifying into.

【００４６】[0046]

【発明の実施の形態】本発明の実施の形態について以下
に説明する。本発明のクラスタ分類装置は、その好まし
い一実施の形態において、過去における属性データを過
去属性データ記憶手段（２１）に記憶し、過去属性デー
タ記憶手段（２１）に記憶されている属性データを受け
取り、クラスタ数決定手段（３１）で、受け取った属性
データに対応したクラスタ数を決定し、クラスタ数決定
手段（３１）からクラスタ数及び属性データを受け取っ
てクラスタリング手段（３２）でクラスタリングを行
い、クラスタリング手段（３２）からのクラスタリング
結果を受け取り、結果解析手段（３３）で結果の解析を
行う。Embodiments of the present invention will be described below. In a preferred embodiment, the cluster classification device of the present invention stores attribute data in the past in past attribute data storage means (21), and receives attribute data stored in past attribute data storage means (21). The number of clusters corresponding to the received attribute data is determined by the number-of-clusters determining means (31), the number of clusters and the attribute data are received from the number-of-clusters determining means (31), and clustering is performed by the clustering means (32). The clustering result is received from the means (32), and the result is analyzed by the result analyzing means (33).

【００４７】クラスタリング手段（３２）では、分類前
にデータの分布等を仮定する必要なく、クラスタリング
を行うことで、データ空間中の特徴領域を抽出する。The clustering means (32) extracts a characteristic region in the data space by performing clustering without assuming data distribution or the like before classification.

【００４８】結果解析手段（３３）で各クラスタについ
て解析し、解析結果を基に、クラスタ数決定手段（３
１）を用いて、対象データに適合したクラスタ数を決定
することが出来る。The result analysis means (33) analyzes each cluster, and based on the analysis result, determines the number of clusters (3).
Using 1), the number of clusters suitable for the target data can be determined.

【００４９】さらに、クラス間の重なりがある領域と重
なりがない領域を抽出できるまで、クラスタ数決定手段
（３１）、クラスタリング手段（３２）、結果解析手段
（３３）の処理を繰り返すことによって、特徴的データ
領域を抽出し、各特徴データ集合を分析することによっ
て、高精度なデータ分類及び新規データの分類判定が可
能になる。Further, the processing of the number-of-clusters determining means (31), the clustering means (32), and the result analyzing means (33) is repeated until an area having an overlap between classes and an area having no overlap can be extracted. By extracting an objective data area and analyzing each feature data set, highly accurate data classification and classification determination of new data become possible.

【００５０】本発明の方法は、その好ましい一実施の形
態において、（ａ）過去における属性データを記憶手段
に格納するステップと、（ｂ）前記記憶手段に格納され
た属性データに対応したクラスタ数を決定するステップ
と、（ｃ）前記決定されたクラスタ数と属性データを受
け取ってクラスタリングを行うステップと、（ｄ）クラ
スタリング結果の解析を行うステップと、を含む。In a preferred embodiment of the method of the present invention, (a) storing attribute data in the past in a storage means; and (b) the number of clusters corresponding to the attribute data stored in the storage means. (C) receiving the determined number of clusters and attribute data and performing clustering, and (d) analyzing the clustering result.

【００５１】本発明によれば、解析結果に基づき、最適
なクラスタ数をユーザが指定して、対象データの分析を
可能としている。According to the present invention, the user can specify the optimum number of clusters on the basis of the analysis result and analyze the target data.

【００５２】本発明においては、対象データのクラス比
を基にクラスタ数を決定している。In the present invention, the number of clusters is determined based on the class ratio of the target data.

【００５３】本発明においては、前記結果解析が、各ク
ラスタのクラス比、代表データ及びクラスタ内のデータ
間の類似性を解析する。In the present invention, the result analysis analyzes the similarity between the class ratio of each cluster, the representative data, and the data in the cluster.

【００５４】本発明において、コンピュータ等のデータ
処理装置で実行されるプログラムは、（ａ）過去におけ
る属性データを過去属性データ記憶部に記憶する処理
と、（ｂ）過去属性データ記憶部に記憶されている属性
データを受け取り、前記属性データに対応したクラスタ
数を決定するクラスタ数決定処理と、（ｃ）クラスタ数
決定処理から、クラスタ数及び属性データを受け取り、
クラスタリングを行うクラスタリング処理と、（ｄ）前
記クラスタリング処理から、クラスタリング結果を受け
取り、結果の解析を行う結果解析処理と、を含み、前記
（ａ）乃至（ｄ）の各処理をコンピュータで実行させる
ためのプログラムを記録した記録媒体、あるいは該プロ
グラムを担持する無線・有線通信媒体から、該プログラ
ムをデータ処理装置に読み出して実行することで本発明
を実施することができる。In the present invention, a program executed by a data processing device such as a computer includes (a) a process of storing attribute data in the past in a past attribute data storage unit, and (b) a process of storing attribute data in the past in a past attribute data storage unit. Receiving the attribute data, and determining the number of clusters corresponding to the attribute data; and (c) receiving the cluster number and the attribute data from the cluster number determining process.
A clustering process for performing clustering; and (d) a result analysis process for receiving a clustering result from the clustering process and analyzing the result, and causing the computer to execute each of the processes (a) to (d). The present invention can be implemented by reading out the program from a recording medium storing the program or a wireless / wired communication medium carrying the program into a data processing device and executing the program.

【００５５】[0055]

【実施例】上記した本発明の実施の形態についてさらに
詳細に説明するため、本発明の実施例について図面を参
照して以下に説明する。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing an embodiment of the present invention;

【００５６】図１は、本発明の一実施例の構成を示す図
である。図１を参照すると、本発明の一実施例は、キー
ボード等の入力装置１と、情報を記憶する記憶装置２
と、プログラム制御により動作するデータ処理装置３
と、ディスプレイ装置や印刷装置等の出力装置４と、を
備えている。FIG. 1 is a diagram showing the configuration of one embodiment of the present invention. Referring to FIG. 1, an embodiment of the present invention includes an input device 1 such as a keyboard and a storage device 2 for storing information.
And a data processing device 3 operating under program control
And an output device 4 such as a display device or a printing device.

【００５７】データ処理装置３は、クラスタ数決定部３
１と、クラスタリング部３２と、結果解析部３３と、を
備えている。The data processing device 3 includes a cluster number determining unit 3
1, a clustering unit 32, and a result analysis unit 33.

【００５８】記憶装置２は、入力装置１から入力された
過去における属性データを記憶する過去属性データ記憶
部２１を備えている。The storage device 2 includes a past attribute data storage section 21 for storing past attribute data input from the input device 1.

【００５９】クラスタ数決定部３１は、過去属性データ
記憶部２１に記憶されている過去の属性データを読み出
し、該属性データをクラスタリングが可能な形態に整形
し、集計する。The number-of-clusters determining unit 31 reads out past attribute data stored in the past attribute data storage unit 21, shapes the attribute data into a form that can be clustered, and totals.

【００６０】クラスタ数決定部３１は、過去属性データ
記憶部２１に記憶されている属性データを集計した結果
から、初期クラスタ数を設定する。The number-of-clusters determining unit 31 sets the initial number of clusters based on the result of the aggregation of the attribute data stored in the past attribute data storage unit 21.

【００６１】クラスタリング部３２は、クラスタ数決定
部３１でクラスタリング可能な形態に整形し、集計され
た属性データを受け取り、ＳＯＭのアルゴリズムに従っ
て、クラスタリングを行う。ＳＯＭのアルゴリズムにつ
いては、前述した、例えばT.Kohonenの文献１の記載が
参照される。The clustering section 32 receives the attribute data that has been formed into a form that can be clustered by the cluster number determining section 31, and performs clustering according to the SOM algorithm. For the SOM algorithm, reference is made to the description of the above-mentioned document 1 of, for example, T. Kohonen.

【００６２】クラスタリング部３２において、ＳＯＭの
オリジナルアルゴリズムを用いた場合、ＳＯＭアルゴリ
ズム特有の性質により、一部のクラスタにのみ競合の勝
利が集中することによって、競合層全体で対象データ全
体の傾向が把握できなくなり、各クラスタの傾向が明確
に取り出せない場合がある。When the original algorithm of the SOM is used in the clustering unit 32, the victory of the competition is concentrated on only a part of the clusters due to the characteristic characteristic of the SOM algorithm. In some cases, it is not possible to clearly extract the tendency of each cluster.

【００６３】このような場合には、良心メカニズムアル
ゴリズムを用いて、補正を行うことで、上記の状況が回
避される。良心メカニズムアルゴリズムとは、クラスタ
リング処理過程の競合の勝利履歴を基にして、競合に補
正を行うことによって、競合層全体で、対象データ全体
の傾向を把握することを可能にするアルゴリズムであ
る。ＳＯＭのオリジナルアルゴリズムに発生し得る上記
の問題点、及び、良心メカニズムアルゴリズムについて
は、D. DeSienoによる文献（“Adding a Conscience to
Competitive Learning,”Proc. Int. Conf. on Neural
Networks, I, pp. 117-124 (1988)）（「文献２」とい
う）等の記載が参照される。In such a case, the above situation is avoided by performing correction using a conscience mechanism algorithm. The conscience mechanism algorithm is an algorithm that makes it possible to comprehend the competition based on the victory history of the competition in the clustering process and thereby grasp the tendency of the entire target data in the entire competition layer. For the above-mentioned problems that may occur in the original algorithm of SOM and the conscience mechanism algorithm, see the document by D. DeSieno (“Adding a Conscience to
Competitive Learning, ”Proc. Int. Conf. On Neural
Networks, I, pp. 117-124 (1988)) (referred to as “Reference 2”) and the like.

【００６４】結果解析部３３は、クラスタリング部３２
から、クラスタリング結果を受け取り、結果を出力装置
４に表示して処理を終了するか、あるいは、再度クラス
タ数を変更してクラスタリングを行うかを判定するため
の解析を行う。The result analyzing unit 33 includes the clustering unit 32
Then, an analysis is performed to determine whether to receive the clustering result and display the result on the output device 4 to end the processing, or to change the number of clusters again to perform clustering.

【００６５】結果解析の結果、各クラスタの傾向が明確
化できなかった場合には、傾向が明確化するまで、クラ
スタ数を変更して、上記したクラスタリング処理と結果
解析を繰り返す。As a result of the result analysis, when the tendency of each cluster cannot be clarified, the number of clusters is changed and the above-described clustering process and result analysis are repeated until the tendency is clarified.

【００６６】上記処理によって、対象データ空間中の特
徴的データ領域を、クラスタとして取り出すことが可能
となり、その結果を用いて、クラス間に重なりのあるデ
ータでも分類を行うことが可能になる。例えば、あるデ
ータが、あるクラスのデータが多く属するクラスタに割
当てられた場合、当該データは、そのクラスに属する可
能性が高い。By the above processing, a characteristic data area in the target data space can be extracted as a cluster, and the result can be used to classify even data having an overlap between classes. For example, when certain data is assigned to a cluster to which many data of a certain class belongs, there is a high possibility that the data belongs to the class.

【００６７】また、あるデータが複数クラスのデータが
属するクラスタに割当てられた場合には、当該データ
が、確実に特定クラスに属する確率は低くなる。さら
に、その確率を過去データの集計によって概算すること
も可能になる。When certain data is assigned to a cluster to which a plurality of classes of data belong, the probability that the data belongs to a specific class is reduced. Further, the probability can be estimated by summing up the past data.

【００６８】図２は、本発明の一実施例の動作を説明す
るためのフローチャートである。以下では、図３に示す
顧客データ（属性、年齢、性別、年収、職業、業種、婚
姻、返済状況等）を用いて金融機関における与信審査を
行う場合を例に、本発明の一実施例の動作について説明
する。FIG. 2 is a flowchart for explaining the operation of one embodiment of the present invention. In the following, an example of performing a credit check at a financial institution using the customer data (attribute, age, gender, annual income, occupation, industry, marriage, repayment status, etc.) shown in FIG. The operation will be described.

【００６９】動作を開始すると(図２のステップ１０
１)、クラスタ数決定部３１において、クラスタリング
に用いる属性データの整形、集計を行い、初期クラスタ
数の設定を行う(図２のステップ１０２)。When the operation is started (step 10 in FIG. 2)
1) The cluster number determination unit 31 shapes and totals the attribute data used for clustering, and sets the initial cluster number (step 102 in FIG. 2).

【００７０】属性データの整形は、例えば、図４のよう
なテーブルを利用して行う。図５は、図３に示した顧客
データについて属性データを整形した例を示す図であ
る。The shaping of the attribute data is performed using, for example, a table as shown in FIG. FIG. 5 is a diagram showing an example in which the attribute data is shaped for the customer data shown in FIG.

【００７１】図２のステップ１０２における初期クラス
タ数の設定は、例えば次のように行う。返済遅れ等なく
正常に返済が出来た顧客のデータ(正常データ)と、返済
遅れ等の異常が発生した顧客のデータ(不良データ)の個
数比を調べる。多くの場合、特殊なデータ(この例で
は、不良データ)は個数が少ない。The setting of the initial number of clusters in step 102 of FIG. 2 is performed, for example, as follows. The number ratio of customer data that has been normally repaid without repayment delays (normal data) and customer data that has an abnormal occurrence such as repayment delays (defective data) is checked. In many cases, the number of special data (defective data in this example) is small.

【００７２】良心メカニズムアルゴリズムを利用して、
各クラスタに同数のデータが割当てられることを想定し
た場合、不良データのみのクラスタと、正常データのみ
のクラスタが生成できれば理想的である。Using the conscience mechanism algorithm,
Assuming that the same number of data is allocated to each cluster, it is ideal if a cluster with only bad data and a cluster with only normal data can be generated.

【００７３】したがって、初期クラスタ数の一例は、デ
ータ全体数を不良データ数で割った値に近い整数値とす
るものである。例えば、正常データが900個、不良デー
タが100個のデータについてクラスタリングを行う場合
は、初期クラスタ数は10個とする。Accordingly, an example of the initial cluster number is an integer value close to a value obtained by dividing the total number of data by the number of defective data. For example, when clustering is performed on data having 900 normal data and 100 defective data, the initial number of clusters is assumed to be 10.

【００７４】クラスタ数決定部３１は、クラスタリング
用のデータと初期クラスタ数をクラスタリング部３２に
渡す。The number-of-clusters determining unit 31 passes the data for clustering and the initial number of clusters to the clustering unit 32.

【００７５】次に、クラスタリング部３２において、ク
ラスタ数決定部３１から受け取った初期クラスタ数で、
ＳＯＭのアルゴリズムに従ってクラスタリングを行う
(図２のステップ１０３)。Next, in the clustering unit 32, the initial cluster number received from the cluster number determination unit 31
Perform clustering according to SOM algorithm
(Step 103 in FIG. 2).

【００７６】ＳＯＭのアルゴリズムについては、例えば
前述したT. Kohonenによる文献１の記載が参照される。For the SOM algorithm, for example, the description of the above-mentioned document 1 by T. Kohonen is referred to.

【００７７】クラスタリング部３２は、クラスタリング
結果を結果解析部３３に送る。The clustering section 32 sends the clustering result to the result analyzing section 33.

【００７８】結果解析部３３は、クラスタリング部３２
からクラスタリング結果を受け取り、解析する(図２の
ステップ１０４)。The result analyzing unit 33 includes the clustering unit 32
Receives and analyzes the clustering result (step 104 in FIG. 2).

【００７９】結果解析部３３は、解析結果を出力装置４
に渡し、出力装置４は、受け取った結果を表示する(図
２のステップ１０５)。The result analyzer 33 outputs the analysis result to the output device 4
And the output device 4 displays the received result (step 105 in FIG. 2).

【００８０】結果解析部３３で行われる解析として、例
えば、クラスタリングの結果、各クラスタに属するデー
タの正常／不良比を求める。各クラスタに属するデータ
の正常／不良比を調べた結果、「属するデータのほとん
どが正常データであるクラスタ」、「属するデータのほ
とんどが不良データであるクラスタ」、「属するデータ
のほぼ半数が正常、ほぼ半数が不良であるクラスタ」等
が求まっており、ユーザが顧客データの傾向を把握でき
れば、ユーザの判断で(図２のステップ１０６)、クラス
タリングを終了する(図２のステップ１０８)。As an analysis performed by the result analysis unit 33, for example, a normal / defective ratio of data belonging to each cluster is obtained as a result of clustering. As a result of examining the normal / defective ratio of the data belonging to each cluster, "a cluster where most of the belonging data is normal data", "a cluster where most of the belonging data is bad data", "almost half of the belonging data are normal, Clusters with almost half of which are defective, etc. are obtained. If the user can grasp the tendency of the customer data, the user's judgment (step 106 in FIG. 2) ends the clustering (step 108 in FIG. 2).

【００８１】結果解析部３３が、出力装置４に上記の正
常／不良比を送る場合には、図３に示すようなグラフィ
カルデータも併せて送ることによってユーザの理解をさ
らに助けることができるようになる。When the result analyzer 33 sends the above-mentioned normal / defective ratio to the output device 4, the graphical data as shown in FIG. 3 is also sent so that the user's understanding can be further assisted. Become.

【００８２】図６は、ＳＯＭの競合層を図式化したもの
であり、前述した通り、各競合層ユニットがそれぞれ別
々のクラスタに対応する。FIG. 6 is a schematic diagram of a competitive layer of the SOM. As described above, each competitive layer unit corresponds to a separate cluster.

【００８３】図６は、各クラスタに属する顧客データの
正常/不良比を円グラフ表示している。図６に示す例で
は、・競合層ユニット１が「属するデータのほとんどが正常
データであるクラスタ」、・競合層ユニット４が「属するデータのほとんどが不良
データであるクラスタ」、・競合層ユニット２及び競合層ユニット３が「正常、不
良データが混在するクラスタ」であることがわかる。FIG. 6 is a pie chart showing the normal / defective ratio of customer data belonging to each cluster. In the example shown in FIG. 6, the competitive layer unit 1 is “a cluster in which most of the data to which the data belongs is normal data”; the competitive layer unit 4 is a “cluster in which most of the data to which the data belongs is bad data”; Further, it can be seen that the competitive layer unit 3 is a “cluster in which normal and defective data are mixed”.

【００８４】また、各競合層ユニット(クラスタ)をクリ
ックすることによって、図７に示すように、各クラスタ
に属する顧客データを表示することで、各クラスタに属
する顧客データを詳細に分析することが出来る。By clicking each competitive layer unit (cluster) to display the customer data belonging to each cluster as shown in FIG. 7, it is possible to analyze the customer data belonging to each cluster in detail. I can do it.

【００８５】図７において、「代表データ」とは、競合
層ユニットの重みベクトルの値に対応する最も近い属性
値である。In FIG. 7, “representative data” is the closest attribute value corresponding to the value of the weight vector of the competitive layer unit.

【００８６】ここで、各クラスタに属するデータを代表
データにユークリッド距離等の距離が近い順で整列させ
ることも可能である。Here, it is also possible to arrange the data belonging to each cluster to the representative data in ascending order of distance such as Euclidean distance.

【００８７】未だ傾向が明確化していない等の理由で、
ユーザがさらに分析を行うと判断した場合(図２のステ
ップ１０６)、クラスタ数決定部３２に対して、ユーザ
の指示によって、クラスタ数を変更し(図２のステップ
１０７)、再度クラスタリングを行い(図２のステップ１
０３)、ユーザが満足できる結果が得られるまで上記の
処理を繰り返す。For the reason that the tendency has not been clarified yet,
If the user determines that further analysis is to be performed (step 106 in FIG. 2), the number of clusters is changed by the user's instruction to the cluster number determination unit 32 (step 107 in FIG. 2), and clustering is performed again (step 106 in FIG. 2). Step 1 of FIG.
03), The above processing is repeated until a satisfactory result is obtained by the user.

【００８８】クラスタ数の変更は、例えば、上記したよ
うに、１０個のクラスタで傾向が明確化しなかった場合
は、より細かくクラスタ分割を行うことによって、傾向
が出る可能性があるので、各クラスタに、５０個ずつデ
ータが割当てられることを想定して、クラスタ数を２０
個に増加させる。For example, as described above, if the tendency is not clarified in ten clusters as described above, there is a possibility that the tendency may appear by performing finer cluster division. The number of clusters is assumed to be 20
Increase to pieces.

【００８９】ユーザが満足できるクラスタリング結果が
得られた後には、新規の顧客の判定を行うことができ
る。その方法について、以下に説明する。After a satisfactory clustering result is obtained for the user, a new customer can be determined. The method will be described below.

【００９０】新規の顧客データが入力装置１から入力さ
れると、クラスタリング部３２は、新規の顧客データに
最も近い重みを持つ競合層ユニット、すなわちクラスタ
を発見し、結果を結果解析部３３に渡す。When new customer data is input from the input device 1, the clustering unit 32 finds a competitive layer unit having a weight closest to the new customer data, that is, a cluster, and passes the result to the result analysis unit 33. .

【００９１】最近接と判断されたクラスタ内に属するデ
ータのクラスを集計することによって新規顧客データの
判定を行うことが出来る。New customer data can be determined by counting the classes of data belonging to the cluster determined to be closest.

【００９２】結果解析部３４は、解析した結果を出力装
置４に渡し、出力装置４では、その結果を表示する。図
８は、その表示例を示す図である。図８において、新規
入力顧客データが属する競合層ユニット、すなわち競合
層ユニット４に対応するクラスタが画面上で光り、光っ
たクラスタをマウス等でクリックした際に、入力した新
規データ及び、その新規データに近い過去の顧客データ
が整列されて、表示されている。The result analyzing section 34 passes the analyzed result to the output device 4, and the output device 4 displays the result. FIG. 8 shows an example of the display. In FIG. 8, the cluster corresponding to the competitive layer unit to which the newly input customer data belongs, that is, the cluster corresponding to the competitive layer unit 4 shines on the screen, and when the lit cluster is clicked with a mouse or the like, the new data input and the new data are input. Past customer data close to is arranged and displayed.

【００９３】図８に示す例では、新規顧客に最近接のク
ラスタは不良データが多数属するクラスタであるため、
新規顧客データは返済遅れ等が発生する可能性が高いと
判定できる。もちろん、クラスタに属する過去のデータ
数から、新規データが不良である確率を計算することも
出来る。In the example shown in FIG. 8, the cluster closest to the new customer is a cluster to which many defective data belong.
The new customer data can be determined to have a high possibility of repayment delay or the like. Of course, it is also possible to calculate the probability that the new data is defective from the number of past data belonging to the cluster.

【００９４】なお、前記実施例において、過去属性デー
タ記憶部２１は、例えば、磁気ディスク装置、半導体メ
モリ記憶装置等の書き込み及び読み出し可能な記憶装置
が用いられる。In the above embodiment, the past attribute data storage unit 21 is a writable and readable storage device such as a magnetic disk device and a semiconductor memory storage device.

【００９５】また、クラスタ数決定部３１、クラスタリ
ング部３２、結果解析部３３を含む、プログラム制御さ
れるデータ処理装置３はパーソナルコンピュータやワー
クステーション等の情報処理装置からなる。The program-controlled data processing device 3 including the number-of-clusters determining unit 31, the clustering unit 32, and the result analyzing unit 33 comprises an information processing device such as a personal computer or a workstation.

【００９６】さらに、入力装置１は、キーボード、マウ
ス等の入力装置、出力装置４は、ＣＲＴディスプレイあ
るりは液晶ディスプレイ等よりなる。Further, the input device 1 comprises an input device such as a keyboard and a mouse, and the output device 4 comprises a CRT display or a liquid crystal display.

【００９７】次に、本発明の第２実施例について説明す
る。図２は、本発明の第２の実施例の構成を示す図であ
る。Next, a second embodiment of the present invention will be described. FIG. 2 is a diagram showing the configuration of the second embodiment of the present invention.

【００９８】図９は、本発明の第２実施形態によるクラ
スタ分類装置の構成を示すブロック図である。図９にお
いて、図１と同一又は同等の要素には同一の参照符号が
付されている。図９を参照すると、本発明の第２の実施
例は、前記実施例の構成に加えて、クラスタ分類プログ
ラムを記録した、磁気ディスク、半導体メモリその他の
記録媒体からなる記録媒体５を備えている。クラスタ分
類プログラムは、データ処理装置３上で、前記したクラ
スタ数決定部３１、クラスタリング部３２、結果解析部
３３の機能・処理を実現する。記録媒体５から読み出し
装置（不図示）を介してデータ処理装置３に読み込ま
れ、実行形式のプログラムがデータ処理装置３の主記憶
にロードされ、データ処理装置３のＣＰＵで該プログラ
ムを実行することで、本発明を実施することができる。FIG. 9 is a block diagram showing the configuration of the cluster classification device according to the second embodiment of the present invention. In FIG. 9, the same or equivalent elements as those in FIG. 1 are denoted by the same reference numerals. Referring to FIG. 9, the second embodiment of the present invention includes, in addition to the configuration of the above-described embodiment, a recording medium 5 on which a cluster classification program is recorded, which is composed of a magnetic disk, a semiconductor memory, or another recording medium. . The cluster classification program implements the functions and processes of the above-described cluster number determination unit 31, clustering unit 32, and result analysis unit 33 on the data processing device 3. The program is read from the recording medium 5 into the data processing device 3 via a reading device (not shown), and an executable program is loaded into the main memory of the data processing device 3 and the CPU of the data processing device 3 executes the program. Thus, the present invention can be implemented.

【００９９】[0099]

【発明の効果】以上説明したように、本発明によれば、
データの分類に際して、各クラスのデータが主として存
在するデータ領域や複数クラスのデータが混在するデー
タ領域を取り出すことが出来る、という効果を奏する。As described above, according to the present invention,
In classifying the data, it is possible to extract a data area in which data of each class mainly exists or a data area in which data of a plurality of classes are mixed.

【０１００】その理由は、例えば金融機関などの与信審
査、マーケティング等で分類対象とする属性データが一
般的に持っている性質である、異なるクラス間に存在す
る重なりに対処するために、本発明においては、対象デ
ータの分析を行いながらＳＯＭによるクラスタリングを
繰り返す構成としているためである。The reason is that, for example, in order to deal with the overlap existing between different classes, which is a property that attribute data to be classified in credit screening and marketing of financial institutions and the like generally has, the present invention In this case, the clustering by SOM is repeated while analyzing the target data.

【０１０１】本発明によれば、分類の結果得られたデー
タ領域を利用することによって、データの分類を高い性
能で行うことができるだけでなく、対象データの性質を
詳しく解析することが可能になる、という効果を奏す
る。According to the present invention, by using the data area obtained as a result of the classification, not only can the data be classified with high performance, but also the characteristics of the target data can be analyzed in detail. The effect is as follows.

[Brief description of the drawings]

【図１】本発明の一実施例をなすクラスタ分類装置の構
成を示す図である。FIG. 1 is a diagram illustrating a configuration of a cluster classification device according to an embodiment of the present invention.

【図２】本発明の一実施例の動作を説明するためのフロ
ーチャートである。FIG. 2 is a flowchart for explaining the operation of one embodiment of the present invention.

【図３】本発明の一実施例を具体的の説明するための図
であり、属性データの一例を示す図である。FIG. 3 is a diagram for specifically describing one embodiment of the present invention, and is a diagram illustrating an example of attribute data.

【図４】本発明の一実施例における属性データの数値へ
の変換の一例を示す図である。FIG. 4 is a diagram illustrating an example of conversion of attribute data into numerical values according to an embodiment of the present invention.

【図５】本発明の一実施例における属性データの数値化
例を示す図である。FIG. 5 is a diagram illustrating an example of digitizing attribute data according to an embodiment of the present invention.

【図６】本発明の一実施例におけるクラスタリング結果
の表示例を模式的に示す図である。FIG. 6 is a diagram schematically illustrating a display example of a clustering result according to an embodiment of the present invention.

【図７】本発明の一実施例におけるクラスタリング結果
の詳細表示例を模式的に示す図である。FIG. 7 is a diagram schematically showing a detailed display example of a clustering result in one embodiment of the present invention.

【図８】本発明の一実施例において新規顧客データ適用
時の判定結果の表示例を模式的に示す図である。FIG. 8 is a diagram schematically illustrating a display example of a determination result when new customer data is applied in one embodiment of the present invention.

【図９】本発明の他の実施例の構成を示す図である。FIG. 9 is a diagram showing a configuration of another embodiment of the present invention.

【図１０】ＳＯＭの構成を模式的に示す図である。FIG. 10 is a diagram schematically showing a configuration of an SOM.

[Explanation of symbols]

１入力装置２記憶装置３データ処理装置４出力装置５記録媒体２１過去属性データ記憶部３１クラスタ数決定部３２クラスタリング部３３結果解析部 REFERENCE SIGNS LIST 1 input device 2 storage device 3 data processing device 4 output device 5 recording medium 21 past attribute data storage unit 31 cluster number determination unit 32 clustering unit 33 result analysis unit

Claims

[Claims]

A storage means for storing attribute data in the past; a cluster number determination means for determining a cluster number corresponding to the attribute data from the attribute data stored in the storage means; A clustering unit that receives the determined number of clusters and the attribute data and performs clustering; and a result analysis unit that receives a clustering result output from the clustering unit and analyzes the clustering result. Cluster classifier.

An output means for outputting a result; and an input means for designating the number of clusters, wherein an optimum value is provided to the number-of-clusters determining means based on an analysis result of the result analyzing means. 2. The cluster classification device according to claim 1, wherein the number of clusters can be set.

3. The cluster classification apparatus according to claim 1, wherein said cluster number determination means determines the number of clusters based on a class ratio of the target data.

4. The method according to claim 1, wherein said result analyzing means analyzes a class ratio of each cluster, representative data, and similarity between data in the cluster. Cluster classifier.

5. The cluster classification device according to claim 1, wherein the attribute data is customer data.

6. A customer data classification device for classifying customer data for judging whether a new customer can be repaid in a credit screening is provided for a credit screening system comprising the cluster classification device according to claim 5. Cluster classifier.

7. A cluster classification device for marketing customers, comprising a cluster classification device according to claim 5, wherein a customer data classification device for classifying customers into products that are likely to be purchased or interested in marketing.

8. A step of storing attribute data in the past in storage means, a step of determining the number of clusters corresponding to the attribute data stored in the storage means, and receiving the determined number of clusters and attribute data. A cluster classification method, comprising: performing a clustering; and analyzing a clustering result.

9. The cluster classification method according to claim 8, wherein a user designates an optimal number of clusters based on the analysis result, thereby enabling analysis of the target data.

10. The cluster classification method according to claim 8, wherein in the step of determining the number of clusters, the number of clusters is determined based on a class ratio of the target data.

11. In the step of analyzing the clustering result, the class ratio of each cluster, representative data,
The cluster classification method according to any one of claims 8 to 10, wherein similarity between data in the cluster is analyzed.

12. The attribute data is customer data.
The cluster classification method according to any one of claims 8 to 11, wherein:

13. The cluster classification method according to claim 12, wherein the cluster classification method according to claim 12 is used as a classification method of customer data for classifying customer data for determining whether a new customer can be repaid in a credit check. A cluster classification method characterized by the following.

14. A cluster classification method using the cluster classification method according to claim 12 as a method of classifying customer data for classifying customers into products that are likely to be purchased or interested in marketing. Method.

15. A process for storing (a) past attribute data in a storage unit; and (b) a cluster number for receiving attribute data stored in the storage unit and determining a cluster number corresponding to the attribute data. A determination process; (c) a clustering process of receiving the number of clusters and attribute data from the cluster number determination process and performing clustering; and (d) a result analysis process of receiving a clustering result from the clustering process and analyzing the result. And a recording medium on which a program for causing a computer to execute each of the processes (a) to (d) is recorded.

16. The recording medium according to claim 15, wherein: (f) outputting the analysis result so that the user can specify the optimum number of clusters while observing the analysis result and analyze the target data. A recording medium storing a program for causing the computer to execute an output process and (g) an input process for designating the number of clusters.

17. The recording medium according to claim 15, wherein said number-of-clusters determining process determines the number of clusters based on a class ratio of target data. A recording medium on which a program to be executed is recorded.

18. The recording medium according to claim 15, wherein the result analysis process analyzes a class ratio of each cluster, representative data, and similarity between data in the cluster. And the result analysis processing is
A recording medium recording a program to be executed by the computer.

19. The recording medium according to claim 15, wherein said attribute data is customer data.

20. (a) for the attribute data stored in the storage unit, shaping the attribute data used for clustering and totalizing the attribute data to determine the number of clusters; and (b) determining the number of clusters. Performing the clustering of the attribute data; (c) analyzing the clustering result and displaying and outputting the analysis result on a display device; and (d) calculating the data of the data belonging to each cluster from the display output of the result. A step of ending clustering when the tendency is grasped and the classification is completed; and (e) the number of clusters is changed and changed when further analysis is performed because the data tendency is not clarified from the display output of the result. Returning to step (b) and performing clustering again with the number of clusters. Star classification method.

21. The cluster classification method according to claim 20, wherein, in determining the cluster in the step (a), the number of clusters is determined based on the class ratio of the target data.

22. In the analysis of the step (c),
Find the ratio of normal to bad data belonging to each cluster.
21. The cluster classification method according to claim 20, wherein:

23. In the step (d), based on the analysis result, the user can see at a glance that a cluster is mostly normal data, a cluster is mostly defective data, and a cluster is a mixture of normal data and data. 21. The cluster classification method according to claim 20, wherein the information is displayed on the display device in a display format understandable by (1).

24. The method according to claim 20, wherein in the clustering step (b), clustering is performed using an SOM (Self Organizing Map) algorithm or an SOM algorithm and a conscience algorithm.
The described cluster classification method.

25. In the clustering step (b), clustering is performed using an SOM algorithm or an SOM algorithm and a conscience algorithm. In the step (d), S
For each of the competitive layers of the OM, the display device is displayed in a display format that can be understood by the user at a glance, such as a cluster where most of the data is normal data, a cluster where most of the data is bad, and a cluster where normal data and data are mixed. 21. The cluster classification method according to claim 20, further comprising displaying the attribute data belonging to a cluster designated by a click operation on the display device.

26. The cluster classification method according to claim 20, wherein in the step (c), a class ratio of each cluster, representative data, and similarity between data in the cluster are analyzed.

27. When new attribute data is input, a cluster having a weight closest to the new attribute data is detected,
21. The cluster classification method according to claim 20, wherein the classification and analysis of the new attribute data are performed by totalizing the classes of the data belonging to the cluster determined to be closest.

28. A storage unit for storing attribute data, means for reading attribute data from the storage unit, shaping attribute data used for clustering and totalizing attribute data to determine the number of clusters, and Means for clustering the attribute data, means for analyzing and displaying the clustering result, and display of the result. From the display output of the result, the number of clusters is determined when further analysis is to be performed. And a means for controlling to perform clustering again with the changed number of clusters.

29. When new attribute data that is not stored in the storage unit is input, a cluster having a weight closest to the new attribute data is detected, and data of the data belonging to the cluster determined to be the closest is detected. 29. The cluster classification apparatus according to claim 28, wherein new attribute data is classified and analyzed by totalizing the classes.