JP2946449B2

JP2946449B2 - Clustering processor

Info

Publication number: JP2946449B2
Application number: JP5086858A
Authority: JP
Inventors: 啓之安藤; 友彦佐藤
Original assignee: Azbil Corp
Current assignee: Azbil Corp
Priority date: 1993-03-23
Filing date: 1993-03-23
Publication date: 1999-09-06
Anticipated expiration: 2014-09-06
Also published as: JPH06274635A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は、データ群の分布に合
わせてクラスタを生成するクラスタリング処理装置に関
し、クラスタリングするときの初期クラスタ広がり幅を
自動で設定するクラスタリング処理装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a clustering processing apparatus for generating a cluster in accordance with the distribution of a data group, and to a clustering processing apparatus for automatically setting an initial cluster spread width when performing clustering.

【０００２】[0002]

【従来の技術】得られた事例（データ）から情報処理動
作を獲得するようなシステムでは、獲得できるデータの
量に比較して入力データは非常に広い範囲にわたり、そ
の量も多い。ここで、入力データを分類し、その分類し
た状態でデータを獲得するようにすれば、獲得するデー
タ量を減少させることが可能になる。このように、入力
したデータを分類する方法として、クラスタリングがあ
る。クラスタリングは入力データの中で似たもの同士を
併合していくつかのグループすなわちクラスタにまとめ
ていくものである。クラスタリングによる分類は、集合
を形成する各データの類似の度合いを測定して数量化
し、類似の程度が高いデータを同一のグループ（クラス
タ）に集めることによりなされる。2. Description of the Related Art In a system in which an information processing operation is obtained from obtained cases (data), input data covers a very wide range and the amount is large as compared with the amount of data that can be obtained. Here, if the input data is classified and data is obtained in the classified state, the amount of data to be obtained can be reduced. As described above, there is clustering as a method of classifying input data. Clustering is to merge similar data in input data and combine them into several groups, that is, clusters. The classification by clustering is performed by measuring and quantifying the degree of similarity of each data forming a set, and collecting data having a high degree of similarity into the same group (cluster).

【０００３】図６は、上述したように入力データをクラ
スタリングする装置である従来のクラスタリング処理装
置の構成を示す構成図である。同図において、１はデー
タ入力部、２は入力したデータについて新規にクラスタ
を生成するか否かを判定するクラスタ生成判定部、３は
入力データのクラスタを新規に生成するクラスタ生成
部、４は入力データに基づいてクラスタの調節を行うク
ラスタ調節部、５は入力データに基づいて複数のクラス
タどうしを融合すべきか否かを判定するクラスタ融合判
定部、６は複数のクラスタどうしを融合するクラスタ融
合部である。また、７はクラスタを新規に生成する場合
或いはクラスタの調節や融合する場合に必要とするパラ
メータやクラスタリング情報が記憶される記憶部であ
り、クラスタリング用パラメータ記憶部８及びクラスタ
リング情報記憶部９から構成される。FIG. 6 is a configuration diagram showing a configuration of a conventional clustering processing device which is a device for clustering input data as described above. In the figure, 1 is a data input unit, 2 is a cluster generation determining unit that determines whether a new cluster is to be generated for input data, 3 is a cluster generating unit that newly generates a cluster of input data, 4 is A cluster adjusting unit that adjusts clusters based on input data; 5 is a cluster fusion determining unit that determines whether to fuse a plurality of clusters based on input data; and 6 is a cluster fusion unit that fuses a plurality of clusters. Department. Reference numeral 7 denotes a storage unit for storing parameters and clustering information necessary for newly generating a cluster or for adjusting and merging clusters. The storage unit 7 includes a clustering parameter storage unit 8 and a clustering information storage unit 9. Is done.

【０００４】図７は、このようなクラスタリング処理装
置における処理状況を示す説明図であり、例えばある時
点までの入力データ群が図７（ａ）に示すような状況に
ある場合は、そのクラスタリング結果は図７（ｃ）に示
すようになされる。その後、時間の経過とともに入力デ
ータ群の特徴が図７（ｂ）に示すような状況に徐々に変
化すると、これに伴い、図７（ｃ）のクラスタの調節・
融合、あるいは図７（ｃ）のクラスタに新たなクラスタ
を生成し、その結果、図７（ｅ）に示すようにクラスタ
リングする。即ち本装置は、時間とともに増加していく
入力データ群に対応したクラスタを生成するものであ
り、逐次入力されるデータにしたがってクラスタの生
成，調節及び融合を行うことができる。FIG. 7 is an explanatory diagram showing a processing situation in such a clustering processing apparatus. For example, when an input data group up to a certain point is in a situation as shown in FIG. Is performed as shown in FIG. Thereafter, as the time elapses, the characteristics of the input data group gradually change to the state shown in FIG. 7B, and accordingly, the adjustment and adjustment of the cluster shown in FIG.
Fusion or generation of a new cluster in the cluster of FIG. 7C, and as a result, clustering is performed as shown in FIG. 7E. That is, the present apparatus generates a cluster corresponding to an input data group that increases with time, and can generate, adjust, and fuse clusters in accordance with sequentially input data.

【０００５】図６に示すクラスタリング処理装置では、
まずデータ入力部１を介して入力されたデータ（信号）
が、クラスタ生成判定部２で新規にクラスタを生成する
か、それとも既に生成されているクラスタに帰属させる
かが判定される。この入力したデータの各クラスタへの
帰属度等を演算する場合のものさしとして、一般には、
式（１）に示すユークリッド距離及び式（２）に示すマ
ハラノビス距離とがある。In the clustering processing device shown in FIG.
First, data (signal) input via the data input unit 1
Is determined by the cluster generation determination unit 2 to generate a new cluster or to belong to an already generated cluster. Generally, as a measure for calculating the degree of belonging of the input data to each cluster,
There is a Euclidean distance shown in equation (1) and a Mahalanobis distance shown in equation (2).

【０００６】[0006]

【数１】 (Equation 1)

【０００７】[0007]

【数２】 (Equation 2)

【０００８】ここで式（２）において、σｘ² ，σｙ²
は分散値を示し、それぞれ式（３），（４）で示され
る。なおｎはデータ数，μは平均値である。Here, in equation (2), σx ² , σy ²
Indicates a variance value, and is represented by equations (3) and (4), respectively. Here, n is the number of data, and μ is an average value.

【０００９】[0009]

【数３】 (Equation 3)

【００１０】[0010]

【数４】 (Equation 4)

【００１１】このマハラノビス距離は、確率または統計
等で利用される正規分布を求める場合に用いられ、した
がって演算が複雑かつ演算時間が長くなる。このため、
迅速な処理が要求されるオンライン処理では不向きであ
る。このクラスタリング処理装置では、オンライン処理
を可能とするため、入力した信号のクラスタへの帰属度
等を演算する場合、四則演算のみのユークリッド距離を
用いて演算時間を短縮すると共に、メモリ容量の増加を
抑制するようにしている。The Mahalanobis distance is used for obtaining a normal distribution used for probability, statistics, and the like. Therefore, the calculation is complicated and the calculation time is long. For this reason,
It is not suitable for online processing that requires quick processing. In this clustering processing device, in order to enable online processing, when calculating the degree of membership of an input signal to a cluster, the calculation time is reduced by using the Euclidean distance of only the four arithmetic operations, and the memory capacity is increased. I try to suppress it.

【００１２】ここで、入力データの各クラスタへの帰属
度を定義するための分布として、ファジィ制御のメンバ
ーシップ関数としてよく利用される三角形の分布を用い
る。図８に示すように、一次元における三角形分布は、
分布の広がりαをパラメータとして有し、中心座標μの
位置を帰属度「１」として定義する。なお、多次元の場
合は各軸方向に独立して同様な分布を与える。次に一次
元におけるクラスタリング処理のアルゴリズムについ
て、図８の三角形分布を用いその概要を説明する。Here, as a distribution for defining the degree of belonging of the input data to each cluster, a distribution of triangles often used as a membership function of fuzzy control is used. As shown in FIG. 8, the one-dimensional triangle distribution is
The spread α of the distribution is used as a parameter, and the position of the center coordinate μ is defined as the degree of belonging “1”. In the case of multi-dimension, a similar distribution is given independently in each axis direction. Next, an outline of an algorithm of the one-dimensional clustering process will be described with reference to the triangular distribution of FIG.

【００１３】上記三角形分布において、既に生成されて
いるクラスタの領域を，とし、その周辺の領域を
，として各領域に分割する。ここで、Ｘは入力デー
タ、μはクラスタの中心位置、αはクラスタの中心から
の広がりを示している。なお、Ｋ１はクラスタの拡大す
る範囲に関するパラメータ、Ｋ２は縮小する範囲に関す
るパラメータであり、それぞれクラスタリング用パラメ
ータ記憶部８（図６）に記憶されている。データ入力部
１において入力されたデータＸが、領域の範囲に該当
している場合は、クラスタ生成部３において新規にクラ
スタが生成される。また、入力されたデータＸが、領域
の範囲に該当している場合は、クラスタ調節部４にお
いてクラスタの広がりに対し拡大処理を行う。また、入
力されたデータＸが、領域の範囲に該当している場合
は、クラスタの広がりを変更しない。また、入力された
データＸが、領域（クラスタの中心部）の範囲に該当
している場合は、クラスタ調節部４においてクラスタの
広がりに対し縮小処理を行う。なお、入力されたデータ
Ｘが、領域〜の範囲に該当している場合にはクラス
タの中心位置も変更される。In the above triangular distribution, a region of a cluster that has already been generated is defined as, and a region around the cluster is defined as. Here, X indicates input data, μ indicates the center position of the cluster, and α indicates the extent from the center of the cluster. Note that K1 is a parameter related to the range in which the cluster is to be enlarged, and K2 is a parameter related to the range in which the cluster is to be reduced, and are stored in the clustering parameter storage unit 8 (FIG. 6). If the data X input in the data input unit 1 falls within the range of the area, a new cluster is generated in the cluster generation unit 3. If the input data X falls within the range of the region, the cluster adjusting unit 4 performs enlargement processing on the spread of the cluster. If the input data X falls within the range of the area, the spread of the cluster is not changed. If the input data X falls within the range of the region (the center of the cluster), the cluster adjusting unit 4 performs a reduction process on the spread of the cluster. Note that if the input data X falls within the range of the area 1 to the center, the center position of the cluster is also changed.

【００１４】次に、クラスタ生成部３におけるクラスタ
の生成アルゴリズムについて説明する。上記したよう
に、データ入力部１において入力されたデータＸが領域
の範囲に該当し帰属するクラスタが存在しない場合
や、クラスタが未だ生成されていない場合は、その入力
データを中心位置μとした図９に示すような帰属度分布
を有するクラスタを生成し、クラスタリング情報記憶部
９へ記憶する。この場合、帰属度Ｆ（ｘ）は、式（５）
で与えられ、その最大値は「１」である。なお、αはク
ラスタの初期広がりを示すパラメータであり、クラスタ
リング用パラメータ記憶部８に記憶されている。Next, a cluster generation algorithm in the cluster generation section 3 will be described. As described above, when the data X input in the data input unit 1 falls within the range of the area and there is no cluster to which the data X belongs, or when no cluster has been generated yet, the input data is set to the center position μ. A cluster having a membership distribution as shown in FIG. 9 is generated and stored in the clustering information storage unit 9. In this case, the degree of belonging F (x) is calculated by the equation (5).
And the maximum value is “1”. Here, α is a parameter indicating the initial spread of the cluster, and is stored in the clustering parameter storage unit 8.

【００１５】[0015]

【数５】 (Equation 5)

【００１６】次に、クラスタ調節部４における調節アル
ゴリズムについて図１０を用いて説明する。図１０
（ａ）に示すように、入力されたデータＸが領域の範
囲に該当している場合には、クラスタ調節部４において
図１０（ｂ）に示すようなクラスタの広がりに対して拡
大調節が行われる。この場合、クラスタ調節部４は、ま
ずクラスタの広がりαnew については式（６）に基づい
て演算する。Next, an adjustment algorithm in the cluster adjustment section 4 will be described with reference to FIG. FIG.
As shown in FIG. 10A, when the input data X falls within the range of the area, the cluster adjustment unit 4 performs enlargement adjustment on the spread of the cluster as shown in FIG. Will be In this case, the cluster adjusting unit 4 first calculates the cluster expansion αnew based on Expression (6).

【００１７】[0017]

【数６】 (Equation 6)

【００１８】次に、拡大されたクラスタの新たな中心位
置μnew を、それぞれ式（７），（８）に基づいて演算
し、この結果をクラスタリング情報としてクラスタリン
グ情報記憶部９へ記憶する。Next, a new center position μnew of the enlarged cluster is calculated based on equations (7) and (8), and the result is stored in the clustering information storage unit 9 as clustering information.

【００１９】[0019]

【数７】 (Equation 7)

【００２０】[0020]

【数８】 (Equation 8)

【００２１】次に、図１０（ｃ），（ｄ）は、クラスタ
の縮小調節を示す図である。上記したように、入力され
たデータＸが、領域（クラスタの中心部）の範囲に該
当している場合、クラスタ調節部４は、図１０（ｄ）に
示すようにクラスタの縮小調節を行うが、まずその中心
位置μnew を、式（９）に基づいて演算する。なお、式
（９）中のＮｏｌｄは、この時までにクラスタに入力さ
れたデータの数を示し、またｎは入力されたデータＸが
帰属すべきクラスタの数の逆数を示している。Next, FIGS. 10 (c) and 10 (d) are diagrams showing the reduction adjustment of the cluster. As described above, when the input data X falls within the range of the area (the center of the cluster), the cluster adjustment unit 4 performs the cluster reduction adjustment as shown in FIG. First, the center position μnew is calculated based on equation (9). Note that Nold in Expression (9) indicates the number of data input to the cluster up to this point, and n indicates the reciprocal of the number of clusters to which the input data X should belong.

【００２２】[0022]

【数９】 (Equation 9)

【００２３】次に、クラスタリング用パラメータ記憶部
８に記憶されている縮小に関するパラメータをＫ３とす
ると、クラスタの広がりαnew を、式（１０）に基づい
て演算する。Next, assuming that a parameter related to reduction stored in the clustering parameter storage unit 8 is K3, the cluster expansion αnew is calculated based on the equation (10).

【００２４】[0024]

【数１０】 (Equation 10)

【００２５】なお、入力データＸが領域の範囲に該当
している場合は、クラスタの広がりについては変更せ
ず、式（９）に基づいて中心位置のみが変更される。次
に、クラスタ融合部６における融合アルゴリズムについ
て説明する。帰属すべきクラスタが複数あり、各クラス
タをペアとし、そのペア毎の共通部分の各軸における交
点の最小帰属度を各クラスタペアの帰属度として、その
最大帰属度が予め設定されたしきい値ＴＨを超えたとク
ラスタ融合判定部５が判定した場合、クラスタ融合部６
は、以下のアルゴリズムにより選ばれたクラスタのペア
を融合する。即ち、この場合新しいクラスタの広がりα
new が元のクラスタの広がりαoldを全て包含できるよ
うにクラスタの広がりを再定義する。しかし、このまま
ではクラスタの領域が拡大する一方となるため、クラス
タの端部があまり重要ではないと仮定して、以下の式
（１１）に基づいてデータ数Ｎ（new）を演算する。さ
らに式（１２），（１３）に基づいて中心位置μ（ne
w），クラスタの広がりα（new）を各軸について独立
に演算する。When the input data X falls within the range of the region, the spread of the cluster is not changed, and only the center position is changed based on the equation (9). Next, a fusion algorithm in the cluster fusion unit 6 will be described. There are a plurality of clusters to belong to, each cluster is a pair, and the minimum degree of intersection at each axis of the common part of each pair on each axis is the degree of membership of each cluster pair, and the maximum degree of membership is a preset threshold. When the cluster fusion determination unit 5 determines that the time has exceeded TH, the cluster fusion unit 6
Fuses pairs of clusters selected by the following algorithm: That is, in this case, the spread of the new cluster α
Redefine the cluster spread so that new can include all of the original cluster spread αold. However, since the area of the cluster is only expanding in this state, the number of data N (new) is calculated based on the following equation (11), assuming that the end of the cluster is not so important. Further, based on the equations (12) and (13), the center position μ (ne
w), and calculate the cluster spread α (new) independently for each axis.

【００２６】[0026]

【数１１】 [Equation 11]

【００２７】[0027]

【数１２】 (Equation 12)

【００２８】[0028]

【数１３】 (Equation 13)

【００２９】図１１は、２つのクラスタＡ，Ｂを融合す
る場合の状況を示す図である。即ち同図（ａ）に示すよ
うに、入力データＸがクラスタＡ，Ｂの領域の共通領域
の範囲にあり、かつこの共通領域の各軸における交点の
最小帰属度がしきい値ＴＨを超えていると判定される場
合は、クラスタＡ，Ｂの融合が行われ、図１１（ｂ）に
示すようなクラスタＣとして生成される。この場合、ク
ラスタＡ，Ｂのデータ数をＮＡ，ＮＢとすると、融合さ
れたクラスタＣのデータ数ＮＣは、式（１４）に基づい
て演算される。FIG. 11 is a diagram showing a situation in which two clusters A and B are merged. That is, as shown in FIG. 9A, the input data X is in the range of the common area of the areas of the clusters A and B, and the minimum degree of belonging of the intersection of each axis of this common area exceeds the threshold value TH. If it is determined that there is a cluster, clusters A and B are merged to generate a cluster C as shown in FIG. In this case, assuming that the numbers of data of the clusters A and B are NA and NB, the number of data NC of the merged cluster C is calculated based on Expression (14).

【００３０】[0030]

【数１４】 [Equation 14]

【００３１】また、融合されたクラスタＣの中心位置μ
Ｃ及び広がりαＣは、それぞれ式（１５），（１６）に
基づいて演算される。Further, the center position μ of the fused cluster C
C and the spread αC are calculated based on equations (15) and (16), respectively.

【００３２】[0032]

【数１５】 (Equation 15)

【００３３】[0033]

【数１６】 (Equation 16)

【００３４】なお、入力データＸがクラスタＡ，Ｂの領
域の共通領域の範囲内にあっても、図１１（ｃ）のよう
に共通領域の各軸における交点の最小帰属度がしきい値
ＴＨを超えていなければ、クラスタＡ，Ｂの融合を行わ
ない。Even if the input data X is within the range of the common region of the regions of the clusters A and B, as shown in FIG. Is not exceeded, clusters A and B are not merged.

【００３５】ここで、このクラスタリング処理装置を色
画像処理装置に対応させた例を以下に示す。色画像処理
装置は、映像を撮影しているカラービデオカメラなどか
ら得られるＲＧＢ信号をＩｒｇ空間に変換し、カラー画
像の入力データをこの空間内のｒｇ変面上にｒとｇで示
されるベクトルの分布として形成するものである。な
お、Ｉｒｇ空間は、各信号Ｒ，Ｇ，Ｂの和である強度Ｉ
と、Ｒ（赤）信号を強度Ｉで割ったｒと、Ｇ（緑）信号
を強度Ｉで割ったｇとからなるものである。この色画像
処理装置により得られる、図１２に示すような、ｒｇ平
面に形成された基準入力画像のカラーデータの分布を、
クラスタリング処理装置によりクラスタリングして基準
クラスタを形成することにより、ある画像が基準の画像
と同じかどうかの判断が可能となる。Here, an example in which the clustering processing device is adapted to a color image processing device will be described below. The color image processing apparatus converts an RGB signal obtained from a color video camera or the like that is capturing an image into an Irg space, and converts the input data of the color image into a vector represented by r and g on an rg variable surface in this space. Is formed as a distribution. Note that the Irg space has an intensity I that is the sum of the signals R, G, and B.
And r obtained by dividing the R (red) signal by the intensity I and g obtained by dividing the G (green) signal by the intensity I. The distribution of the color data of the reference input image formed on the rg plane as shown in FIG.
By forming a reference cluster by performing clustering by the clustering processing device, it is possible to determine whether a certain image is the same as the reference image.

【００３６】すなわち、まず、基準画像をカメラにより
取り込み、色画像処理装置により変換された基準画像デ
ータの分布をクラスタリング処理装置でクラスタリング
して、基準クラスタとして設定する。次に、認識対象の
画像をカメラにより取り込んだときには、色画像処理装
置により変換された認識画像データの分布状態が、基準
クラスタと何処まで一致しているかにより、認識対象の
画像の基準画像に対する一致度を求めることができる。That is, first, the reference image is captured by the camera, and the distribution of the reference image data converted by the color image processing device is clustered by the clustering processing device and set as a reference cluster. Next, when the recognition target image is captured by the camera, the recognition target image matches the reference image depending on how far the distribution state of the recognition image data converted by the color image processing device matches the reference cluster. Degree can be determined.

【００３７】[0037]

【発明が解決しようとする課題】従来は、以上のように
構成されていたので、以下に示すような問題があった。
まず、前述した色画像処理装置によって得られた基準画
像のデータの分布のクラスタリングをする場合、このデ
ータの分布を人間がみてクラスタの広がり幅の初期値α
を決定するので、このために労力を要するという問題が
あった。そして、クラスタリングするデータの次元が４
を越えるような場合、クラスタの広がり幅の初期値αを
人間が決定するということはほとんど不可能である。Conventionally, the above-described configuration has the following problems.
First, when clustering the distribution of the data of the reference image obtained by the above-described color image processing apparatus, the distribution of this data is viewed by a human and the initial value α of the spread width of the cluster is used.
Therefore, there is a problem that labor is required for this. And the dimension of the data to be clustered is 4
, It is almost impossible for a human to determine the initial value α of the spread width of the cluster.

【００３８】また、生成されるクラスタの広がりは、１
つのパラメータ（α、α_new ）によって決定され、初期
クラスタが正方形であったので、クラスタリングするデ
ータ群の分布によっては、適切な初期クラスタを与えて
いないという問題があった。例えば、色画像処理装置に
よって得られた基準画像のデータの分布が、図１２
（ａ）に示すような状態の場合は、全ての分布の状態が
ｒｇ平面でｒ方向ｇ方向におおよそ均一に分布している
ので、初期クラスタが正方形であってもあまり問題はな
い。しかし、色画像処理装置によって得られた基準画像
のデータの分布が、図１２（ｂ）に示されるような状態
であった場合、基準画像のデータのすべての分布の状態
が、ｒ軸方向により広がっているので、初期クラスタが
正方形であると、この分布状態に適合したクラスタの生
成に時間がかかるという問題があった。The spread of the generated cluster is 1
Since the initial cluster is determined by two parameters (α, α _new ) and the initial cluster is a square, there is a problem that an appropriate initial cluster is not given depending on the distribution of the data group to be clustered. For example, the distribution of the data of the reference image obtained by the color image processing apparatus is shown in FIG.
In the case of the state shown in (a), since all the distribution states are approximately uniformly distributed in the rg plane in the rg plane, there is not much problem even if the initial cluster is a square. However, when the distribution of the data of the reference image obtained by the color image processing apparatus is in a state as shown in FIG. 12B, the state of all distributions of the data of the reference image is changed according to the r-axis direction. If the initial cluster is a square, it takes a long time to generate a cluster suitable for this distribution state.

【００３９】また、従来では、初期クラスタが正方形で
あったため、前述の説明の色画像処理装置のデータでは
分布を示す平面の横軸方向と縦軸方向との値、すなわち
入力するデータを示す各次元の値を正規化する必要があ
った。In the prior art, since the initial cluster is a square, the data of the color image processing apparatus described above has values in the horizontal axis direction and the vertical axis direction of the plane indicating the distribution, that is, each data indicating input data. Dimension values needed to be normalized.

【００４０】この発明は、以上のような問題点を解消す
るために成されたものであり、入力するデータの分布に
合わせて初期クラスタを自動生成できるようにすること
を目的とする。The present invention has been made to solve the above problems, and has as its object to automatically generate an initial cluster in accordance with the distribution of input data.

【００４１】[0041]

【課題を解決するための手段】この発明のクラスタリン
グ処理装置は、クラスタリングするデータ群より３個以
上のデータを順次任意抽出し、抽出したデータとこのデ
ータの次に抽出するデータとの差の絶対値を算出し、差
の絶対値を降べきの順に並べて、この順にとなり同氏の
差の絶対値を算出し、この算出した値の中の最大値を用
いて新たに生成するクラスタの大きさを決定する初期ク
ラスタ広がり決定手段を有することを特徴とする。SUMMARY OF THE INVENTION A clustering processing apparatus according to the present invention sequentially and arbitrarily extracts three or more pieces of data from a data group to be clustered, and calculates the absolute difference between the extracted data and the data to be extracted next to this data. Calculate the values, arrange the absolute values of the differences in descending power order, calculate the absolute value of his difference in this order, and use the maximum value of the calculated values to calculate the size of the newly generated cluster. It is characterized by having an initial cluster spread determining means for determining.

【００４２】[0042]

【作用】得られたデータより新たにクラスタを生成する
とき、このクラスタの大きさがクラスタリングするデー
タ群の分布の状態に適合したものとなる。When a new cluster is generated from the obtained data, the size of the cluster is adapted to the distribution state of the data group to be clustered.

【００４３】[0043]

【実施例】以下この発明の１実施例を図を参照して説明
する。図１は、この発明のクラスタリング処理装置の構
成を示す構成図である。同図において、１０はデータ入
力部１において入力されたデータが帰属するクラスタが
存在しない場合や、クラスタが未だ生成されていない場
合に生成する初期クラスタの広がり幅を、入力したデー
タの状態により自動で決定する初期クラスタ広がり幅決
定部であり、他は図６と同様である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a configuration diagram showing the configuration of the clustering processing device of the present invention. Referring to FIG. 1, reference numeral 10 denotes an initial cluster expansion width generated when there is no cluster to which the data input in the data input unit 1 belongs or a cluster has not yet been generated, depending on the state of the input data. Is the initial cluster spread width determination unit, and the other components are the same as those in FIG.

【００４４】次に、このクラスタリング装置の初期クラ
スタ広がり幅決定部１０の動作を説明する。まず、デー
タ入力部１を介して、クラスタリング処理をするデータ
群より１０個のデータを任意抽出し、これら各データ間
の抽出した順に差分の絶対値を求める。ここで、任意抽
出したデータが抽出した順に「３」，「２５」，
「５」，「２７」，「７」，「１」，「４」，「３
２」，「３４」，「８」であった場合について説明す
る。Next, the operation of the initial cluster spread width determination unit 10 of the clustering device will be described. First, 10 data are arbitrarily extracted from the data group to be subjected to the clustering process via the data input unit 1, and the absolute value of the difference is obtained in the order in which the data is extracted. Here, "3", "25",
"5", "27", "7", "1", "4", "3"
The case where the numbers are "2", "34", and "8" will be described.

【００４５】これら任意抽出したデータの抽出した順に
差を取っていくと、この差分の絶対値は、「２２」，
「２０」，「２１」，「２０」，「６」，「３」，「２
８」，「３」，「２６」となる。次いで、求めた差分の
絶対値を降べきの順に並び変えて「２８」，「２６」，
「２２」，「２１」，「２０」，「２０」，「６」，
「３」，「３」とし、再びこれらの順にとなり同士の値
の差分の絶対値を取り、この中の最大値である「１４」
の１／２を初期クラスタ広がり幅とする。この場合、初
期クラスタ広がり幅αは「７」となる。そして、このよ
うにして得られた初期クラスタ広がり幅αを、クラスタ
リング用パラメータ記憶部８に記憶し、この値を用いて
クラスタ生成部３は初期クラスタを生成する。By taking the difference in the order in which these randomly extracted data are extracted, the absolute value of this difference becomes “22”,
“20”, “21”, “20”, “6”, “3”, “2”
8 "," 3 ", and" 26 ". Next, the absolute values of the obtained differences are rearranged in descending power order, and “28”, “26”,
"22", "21", "20", "20", "6",
"3", "3", and in this order again, the absolute value of the difference between the values is taken, and the maximum value among these is "14".
Is set as the initial cluster spread width. In this case, the initial cluster spread width α is “7”. Then, the initial cluster spread width α thus obtained is stored in the clustering parameter storage unit 8, and the cluster generation unit 3 generates an initial cluster using this value.

【００４６】この初期クラスタ広がり幅決定部１０によ
り、前述の色画像処理装置によりｒｇ平面に形成された
基準入力画像のカラー分布を示すデータのクラスタリン
グのための初期クラスタ広がり幅αを決定する場合、ｒ
とｇに対して前述したように初期クラスタ広がり幅を決
定する。このことにより、全ての分布の状態がｒｇ平面
でｒ方向ｇ方向におおよそ均一に分布している図２
（ａ）の場合は、従来と同様に正方形の初期クラスタ２
１が生成され、基準画像のデータのすべての分布の状態
がｒ軸方向により広がっている図２（ｂ）の場合は、従
来とは異なり横に長い長方形の初期クラスタ２２を生成
する。このように、この発明のクラスタリング処理装置
によれば、実際に分布に適合した初期クラスタを生成す
るようになるので、クラスタリング処理が早くなる。When the initial cluster spread width determining unit 10 determines the initial cluster spread width α for clustering data indicating the color distribution of the reference input image formed on the rg plane by the above-described color image processing apparatus, r
And g, the initial cluster spread width is determined as described above. As a result, all distribution states are approximately uniformly distributed in the r direction and the g direction in the rg plane.
In the case of (a), as in the conventional case, the square initial cluster 2
1 is generated, and in the case of FIG. 2B in which the state of all distributions of the data of the reference image is spread in the r-axis direction, unlike the conventional case, a horizontally long rectangular initial cluster 22 is generated. As described above, according to the clustering processing device of the present invention, an initial cluster that is actually adapted to the distribution is generated, so that the clustering processing is performed quickly.

【００４７】ここで、クラスタリングするデータ群が、
図３に示すように、２つの領域に分かれて分布している
状態を考える。ここでは、次元が１つの場合について説
明する。この場合、前述のようにデータ群の中よりデー
タを任意抽出すると、領域３１中のデータと領域３２中
のデータとが、おおよそこの分布に近い状態で抽出され
るはずである。この抽出したデータを抽出した順に差を
取っていくと、その絶対値の中には、図３に示す距離
ａ，距離ｂ，距離ｃ，距離ｄに近い値のものがあると考
えられる。従って、これらの抽出データの差を大きい順
に並べて、その順に差の絶対値を取っていった場合、こ
れらの中の最大値Ｄは、大きい順に並べた距離ｄ，距離
ｃ，距離ａ，距離ｂの順に差の絶対値を取った中の最大
値Ｌに近い値となる。Here, the data group to be clustered is
As shown in FIG. 3, consider a state in which the data is distributed in two regions. Here, the case where there is one dimension will be described. In this case, if data is arbitrarily extracted from the data group as described above, the data in the area 31 and the data in the area 32 should be extracted in a state close to this distribution. If the difference is taken in the order in which the extracted data is extracted, it is considered that some of the absolute values have values close to the distances a, b, c, and d shown in FIG. Therefore, when the differences between the extracted data are arranged in descending order and the absolute values of the differences are obtained in that order, the maximum value D among them is the distance d, distance c, distance a, and distance b arranged in descending order. In this order, the absolute value of the difference is close to the maximum value L.

【００４８】ここで、クラスタリングするデータ群の分
布状態が、図３に示すように、領域３１と領域３２とが
それぞれの領域の大きさより充分離れている場合、その
距離ｃと距離ａとの差が最大値Ｌである。この最大値Ｌ
の半分を初期クラスタの幅αとすれば、領域３１内の最
大値データＸで初期クラスタ３３を生成しても、このク
ラスタ３３が領域３２にかかることはない。従って、最
大値Ｌより小さい値である最大値Ｄの半分の値を初期ク
ラスタの幅αとし、同様に領域３１の最大値データでこ
の幅αの初期クラスタを生成しても、この初期クラスタ
が領域３２にかかることはない。Here, when the distribution state of the data group to be clustered is, as shown in FIG. 3, the area 31 and the area 32 are sufficiently separated from the size of each area, the difference between the distance c and the distance a is determined. Is the maximum value L. This maximum value L
Is the width α of the initial cluster, even if the initial cluster 33 is generated with the maximum value data X in the area 31, the cluster 33 does not cover the area 32. Accordingly, half of the maximum value D, which is smaller than the maximum value L, is set as the initial cluster width α. Similarly, even if the initial cluster of this width α is generated using the maximum value data of the area 31, this initial cluster becomes It does not cover the area 32.

【００４９】実施例２．ところで、上記実施例では、ク
ラスタリング処理するデータが時間の経過とともに変化
しない場合について説明したが、この発明のクラスタリ
ング装置の適用範囲はこれに限るものではない。初期ク
ラスタ広がり幅決定部１０が所定の時間間隔で初期クラ
スタ広がり幅αを定期的に生成するようにすれば、クラ
スタリング処理するデータが時間の経過とともに変化す
るデータに対しても、実際に分布に適合した初期クラス
タが生成できるようになる。Embodiment 2 FIG. By the way, in the above embodiment, the case where the data to be subjected to the clustering process does not change with the passage of time has been described, but the application range of the clustering device of the present invention is not limited to this. If the initial cluster spread width determining unit 10 periodically generates the initial cluster spread width α at a predetermined time interval, the data to be subjected to the clustering process is actually distributed to data that changes with time. A suitable initial cluster can be generated.

【００５０】データが随時発生している状態では、図
４，図５に示すように、データの分布は時間とともに変
化する。図４は、ある時刻における実際の分布４１が時
間経過とともにより分散した分布４２に変化した状態を
示す分布図であり、図５は実際の分布５１が時間経過と
ともにより集約した分布に変化した状態を示す分布図で
ある。In a state where data is generated as needed, as shown in FIGS. 4 and 5, the distribution of data changes with time. FIG. 4 is a distribution diagram showing a state in which the actual distribution 41 at a certain time has changed to a more dispersed distribution 42 with time, and FIG. 5 shows a state in which the actual distribution 51 has changed to a more aggregated distribution with time. FIG.

【００５１】図４に示すように分布が変化するデータを
クラスタリングする場合、従来のように初期クラスタ４
３が固定されていると、図４（ｂ）に示すように、分布
４２に対しては小さいクラスタを多数生成しまうように
なる。このように、図４（ｂ）に示すような状態のクラ
スタを生成しても、従来の技術の欄で説明したように、
クラスタ同士が融合を繰り返していき、結果として、図
４（ａ）に示すようなクラスタを生成する場合もある
が、これではクラスタリング処理に時間がかかってしま
う。また、初期クラスタの大きさが実際の分布と大きく
異なる場合は、小さいクラスタが多数できたままになっ
てしまう。一方、この実施例２のクラスタリング処理装
置のように、その時のデータの分布状態に合わせて初期
クラスタ広がり幅αを設定すれば、図４（ａ）に示すよ
うに、分布４２に対して適切なクラスタ４４を生成する
ようになる。When clustering data whose distribution changes as shown in FIG.
When 3 is fixed, a large number of small clusters are generated for the distribution 42 as shown in FIG. As described above, even if a cluster in a state as shown in FIG. 4B is generated, as described in the section of the related art,
In some cases, clusters repeat fusion, and as a result, a cluster as shown in FIG. 4A may be generated. However, this takes time for the clustering process. If the size of the initial cluster is significantly different from the actual distribution, many small clusters remain. On the other hand, if the initial cluster spread width α is set according to the data distribution state at that time as in the clustering processing device of the second embodiment, an appropriate distribution for the distribution 42 is obtained as shown in FIG. A cluster 44 is generated.

【００５２】つぎに、図５に示すように初期分布５１が
分布５２へ変化するデータをクラスタリングする場合、
従来のように初期クラスタ５３が固定されていると、図
５（ｂ）に示すように、分布５２に対しては大きいクラ
スタを多数生成しまうようになる。そして、結果とし
て、分布５２は３つの領域から形成されているにもかか
わらず、１つのクラスタを生成してしまう場合がある。
これに対して、この実施例２のクラスタリング処理装置
のように、その時のデータの分布状態に合わせて初期ク
ラスタ広がり幅αを設定すれば、図５（ａ）に示すよう
に、分布５２に対して適切なクラスタ５４を生成するよ
うになる。このように、時間経過とともに分布が変化を
していくようなデータ群をクラスタリング処理する場
合、従来では実際の分布に適さないクラスタを生成して
しまう場合があるが、この実施例２のクラスタリング処
理装置では、実際のデータの分布状態を反映した適切な
クラスタリング処理を行う。Next, when clustering data in which the initial distribution 51 changes to the distribution 52 as shown in FIG.
If the initial cluster 53 is fixed as in the related art, many large clusters are generated for the distribution 52 as shown in FIG. 5B. As a result, one cluster may be generated even though the distribution 52 is formed from three regions.
On the other hand, if the initial cluster spread width α is set according to the distribution state of the data at that time as in the clustering processing apparatus of the second embodiment, the distribution 52 is reduced as shown in FIG. Thus, an appropriate cluster 54 is generated. As described above, when performing clustering processing on a data group whose distribution changes with the passage of time, a cluster that is not suitable for an actual distribution may be conventionally generated. The apparatus performs an appropriate clustering process that reflects the actual data distribution state.

【００５３】[0053]

【発明の効果】以上説明したように、この発明によれ
ば、クラスタリング処理をするデータ群のデータの次元
が多数存在しても、新たにクラスタを生成するときのク
ラスタ（初期クラスタ）の大きさを、人間の感覚に頼ら
ずに決定できるという効果がある。従来では、データの
次元が４以上になると、クラスタリングするデータ群の
分布に適合した初期クラスタの大きさの決定は不可能で
あったが、この発明により可能となる。また、入力する
データの次元が多数あっても、それらを正規化する必要
がないという効果がある。As described above, according to the present invention, the size of a cluster (initial cluster) when a new cluster is generated even if there are many data dimensions of the data group to be subjected to the clustering process. Is determined without relying on human senses. Conventionally, if the dimension of the data is four or more, it is impossible to determine the size of the initial cluster suitable for the distribution of the data group to be clustered. However, the present invention makes it possible. Further, even if there are many dimensions of input data, there is an effect that it is not necessary to normalize them.

【００５４】そして、クラスタリングするデータ群の分
布に適合した大きさの初期クラスタを生成できるという
効果がある。従って、クラスタリング処理の結果がデー
タ群の分布の状態を正確に反映したものとなり、また、
クラスタリング処理をするデータ群が時間とともに分布
が変化するものであっても、その分布状態を正確に反映
したクラスタリング処理が可能となる。There is an effect that an initial cluster having a size suitable for the distribution of the data group to be clustered can be generated. Therefore, the result of the clustering process accurately reflects the state of distribution of the data group.
Even if the distribution of the data to be subjected to the clustering process changes with time, the clustering process that accurately reflects the distribution state can be performed.

[Brief description of the drawings]

【図１】この発明の１実施例であるクラスタリング処理
装置の構成を示す構成図である。FIG. 1 is a configuration diagram illustrating a configuration of a clustering processing apparatus according to an embodiment of the present invention;

【図２】図１のクラスタリング処理装置でクラスタリン
グ処理をする色画像処理装置により得られた基準画像の
データの分布の状態と、生成される初期クラスタの状態
を示す分布図である。2 is a distribution diagram showing a state of distribution of data of a reference image obtained by a color image processing apparatus that performs a clustering process by the clustering processing apparatus of FIG. 1 and a state of an initial cluster to be generated.

【図３】初期クラスタ広がり幅決定部１０の動作の基本
概念を説明するための説明図である。FIG. 3 is an explanatory diagram for explaining a basic concept of an operation of an initial cluster spread width determination unit 10;

【図４】時間の経過とともに分布の状態が変化するデー
タ群を示す分布図である。FIG. 4 is a distribution diagram showing a data group whose distribution state changes with time.

【図５】時間の経過とともに分布の状態が変化するデー
タ群を示す分布図である。FIG. 5 is a distribution diagram showing a data group whose distribution state changes over time.

【図６】従来のクラスタリング処理装置の一例を示す構
成図である。FIG. 6 is a configuration diagram illustrating an example of a conventional clustering processing device.

【図７】図６のクラスタリング装置のクラスタリング処
理の状況を示す図である。FIG. 7 is a diagram illustrating a state of a clustering process of the clustering device in FIG. 6;

【図８】図６のクラスタリング装置のクラスタリング処
理アルゴリズムの概要を説明するための分布図である。FIG. 8 is a distribution diagram for explaining an outline of a clustering processing algorithm of the clustering device in FIG. 6;

【図９】図６のクラスタリング装置のクラスタ生成アル
ゴリズムを説明するための説明図である。FIG. 9 is an explanatory diagram illustrating a cluster generation algorithm of the clustering device in FIG. 6;

【図１０】図６のクラスタリング装置のクラスタの拡大
調節及び縮小調節のアルゴリズムを説明するための説明
図である。FIG. 10 is an explanatory diagram for explaining an algorithm of cluster enlargement adjustment and cluster adjustment of the clustering device of FIG. 6;

【図１１】図６のクラスタリング装置のクラスタ融合ア
ルゴリズムを説明するための説明図である。FIG. 11 is an explanatory diagram illustrating a cluster fusion algorithm of the clustering device in FIG. 6;

【図１２】図６のクラスタリング装置でクラスタリング
処理をする色画像処理装置により得られた基準画像のデ
ータの分布の状態を示す分布図である。12 is a distribution diagram illustrating a state of distribution of data of a reference image obtained by a color image processing device that performs a clustering process by the clustering device of FIG. 6;

[Explanation of symbols]

１データ入力部２クラスタ生成判定部３クラスタ生成部４クラスタ調節部５クラスタ融合判定部６クラスタ融合部７記憶部８クラスタリング用パラメータ記憶部９クラスタリング情報記憶部１０初期クラスタ広がり幅決定部 Reference Signs List 1 data input unit 2 cluster generation determination unit 3 cluster generation unit 4 cluster adjustment unit 5 cluster fusion determination unit 6 cluster fusion unit 7 storage unit 8 clustering parameter storage unit 9 clustering information storage unit 10 initial cluster spread width determination unit

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06T 7/00 G10L 3/00 ──────────────────────────────────────────────────続き Continued on the front page (58) Field surveyed (Int. Cl. ⁶ , DB name) G06T 7/00 G10L 3/00

Claims

(57) [Claims]

1. A clustering processing apparatus for generating a new cluster when the obtained data does not fall into an existing cluster and performing a clustering process according to the characteristics of the sequentially obtained data. Is sequentially extracted arbitrarily, the absolute value of the difference between the extracted data and the data to be extracted next to this data is calculated, and the absolute values of the differences are arranged in descending power order. A clustering processing apparatus comprising: an initial cluster spread determining unit that calculates a value and determines a size of the newly generated cluster using a maximum value of the calculated values.