CN107330442A

CN107330442A - In a kind of combination class between compactness and class separation property increment fuzzy clustering method

Info

Publication number: CN107330442A
Application number: CN201710387502.8A
Authority: CN
Inventors: 刘永利; 段天毅; 刘静; 晁浩; 陈敬丽
Original assignee: Henan University of Technology
Current assignee: Henan University of Technology
Priority date: 2017-05-25
Filing date: 2017-05-25
Publication date: 2017-11-07

Abstract

The present invention proposes a kind of increment fuzzy clustering method of separation property between compactness and class in combination class, and this method splits data into continuous data block, and is handled in order, and this method can handle large-scale data and data flow.After being weighted to FCS algorithms, single channel increment method can be used.After this method, processing speed is significantly improved, and does not interfere with the accuracy of cluster.

Description

In a kind of combination class between compactness and class separation property increment fuzzy clustering method

Technical field

The present invention relates to a kind of clustering method, in particular it relates in a kind of combination class between compactness and class separation property increasing Fuzzy clustering method is measured, belongs to Data Mining.

Background technology

The high data object of similarity is divided into a cluster by clustering algorithm, and the high data object of distinctiveness ratio is divided into not Same cluster.So far, the achievement in research for clustering algorithm is plentiful and substantial, according to accumulation rule of the data object in cluster not Together, these algorithms can be divided into hard cluster and fuzzy clustering.In hard cluster, each data object can only be under the jurisdiction of a certain completely Individual cluster；And fuzzy clustering then requires that each data object is under the jurisdiction of multiple clusters with different probability.Comparatively speaking, two class algorithm Have his own strong points, hard clustering algorithm is simply efficient, and fuzzy clustering algorithm more meets cognition of the people to objective world.

Either hard cluster or fuzzy clustering, most of clustering algorithm only considers compactness in class, and ignores between class point From property, therefore FCS (Fuzzy Compactness and Separation) algorithm is suggested.FCS algorithms ensure that tight in class While cause property is minimum, separation property is maximum between class, and the characteristic with hard cluster and fuzzy clustering, can effectively lift cluster essence Degree and cluster efficiency.

But FCS algorithms can not effectively handle large-scale data and flow data, therefore, the present invention proposes a kind of combine in class The increment fuzzy clustering method of separation property between compactness and class.This method is pressed suitable by splitting data into continuous data block Sequence is handled, and the present invention is handled large-scale data and data flow.

The content of the invention

In order to solve problems of the prior art, the present invention proposes a kind of to combine in class separation property between compactness and class Increment fuzzy clustering method.

1. in a kind of combination class between compactness and class separation property increment fuzzy clustering method, it is characterised in that：This method Step is as follows：

(1) whole data set is divided into D blocks, and is each data point distribution weight 1 per block number in；

(2) clustering processing is carried out to each data block for distributing weight；

(3) the 1st data block is clustered, obtains cluster result [U₁₁,U₁₂,...,U_1t,...,U_1c] and cluster barycenter [a₁₁,a₁₂,...,a_1t,...,a_1c], wherein 0<T≤c, U_1cRepresent the c classes of the 1st data block, a_1cRepresent the 1st data block C-th of barycenter；

(4) after the i-th -1 data block has been handled, wherein 1<I≤D, is the barycenter [a of i-1 data block_(i-1)1, a_(i-1)2,...,a_(i-1)t,...,a_(i-1)c] in each barycenter assign weighted value w_t, w_tIt is subordinate to for the data point in data block In cluster U_(i-1)tDegree of membership sum；Weight 1 is assigned by each data point in i-th of the data block newly obtained, power will be assigned The data block of group of data points Cheng Xin in the barycenter and i-th of data block of the i-th -1 data block of weight, enters again to new data block Row cluster operation, obtains cluster result [U_i1,U_i2,...,U_it,...,U_ic] and cluster barycenter [a_i1,a_i2,...,a_it,..., a_ic], sequentially find the cluster U where the barycenter that the i-th -1 time cluster is obtained_it, then class U_(i-1)tIn all data points belong to class U_it；

(5) circulation performs step (4), has handled last data block, has obtained final barycenter and cluster result.2. as weighed Profit requires the increment fuzzy clustering method of separation property between compactness and class in a kind of combination class described in 1, it is characterised in that：Step Suddenly what is clustered in (2) comprises the following steps that：

1) β, worst error value ε, maximum iteration τ are initialized_maxWith subordinated-degree matrix u_cj, by η be entered as 0 to 1 with Machine number, defines τ=1；

2) according to η_c, u_cjWithUpdate a_c；

3) according to η_c, a_cWithUpdate u_cj；

4) according to β, a_cWithUpdate η_c；

5) τ=τ+1 is updated；

If 6) max (| u_cj(τ)-u_cj(τ -1) |)≤ε or τ=τ_max, terminate iteration, otherwise return to step 2).

Wherein C is the number of class, and N is data amount check, η_cIt is that c-th of barycenter of control and class where other barycenter are misaligned Parameter, u_cjIt is degree of membership of j-th of data point to c-th of class, constraints isu_ci∈ [0,1], m be it is fuzzy because Son and m>1, w_jFor the weight of j-th of data, a_cIt is the barycenter of c-th of class, x_jIt is j-th of data point,It is several According to average, | | x_j-a_c||²It is square of j-th of data point to c-th of barycenter Euclidean distance, 0≤β≤1.0, k=1 ..., C.

In order to handle large-scale data and data flow, it is proposed that the present invention, this method can not only significantly improve place Speed is managed, and does not interfere with the accuracy of cluster.Compared with the conventional method, the method that the present invention is newly proposed can be faster more smart Really handle large-scale data and data flow.

Embodiment

In order that with single channel increment method, it is necessary to be weighted to FCS algorithms.First, matrix in the class of definition weighting S_IFWThe matrix S between class_IFB

Wherein C is the number of class, and N is the number of data, w_jFor weight, u_cjIt is that j-th of data point is subordinate to c-th of class Spend, constraints isu_cj∈ [0,1], m are fuzzy factor and m>1, x_jIt is j-th of data point,It is sample average, a_cIt is the barycenter of c-th of class, wherein | | x_j-a_c||²It is j-th of data point to c-th of matter Square of the Euclidean distance of the heart.

According to above-mentioned two formula, the object function of increment FCS algorithms is obtained

By following constraint

Wherein

0≤β≤1.0, k=1 ..., C.

According to constraints, method of Lagrange multipliers is used object function, and the following new object function of construction can be tried to achieve Object function is set to reach the necessary condition of minimum value

Local derviation is asked to U in above formula and allows it to be equal to 0 and is obtained

It can be drawn with constraints according to above formula

Obtained likewise, seeking local derviation to A in new object function and allowing it to be equal to 0

It can be obtained according to above formula

This method comprises the following steps：

(5) circulation performs step (4), has handled last data block, has obtained final barycenter and cluster result.Step (2) cluster comprises the following steps that in：

2) according to η_c, u_cjA is updated with formula (10)_c；

3) according to η_c, a_cU is updated with formula (8)_cj；

4) according to β, a_cη is updated with formula (5)_c；

5) τ=τ+1 is updated；

By taking Statlog Segmentation data sets as an example, the data set has 2310 data points, and 19 attributes are drawn It is divided into 7 classes.Clustering method is carried out to the data set as follows：

The data set is divided into 10 pieces, every piece has 231 data points, and weight 1 is distributed to each data point；

1st data block is clustered, 7 classes, 7 barycenter corresponding with its are obtained, the weight of each barycenter distribution is Such data point degree of membership sum；The representative of this 7 barycenter as 7 classes is added in the 2nd data block, new number is constituted Clustered again according to block；By that analogy, always this 10 pieces of data block clusters are completed to can obtain final result.

Data set is carried out piecemeal processing by this method, is reduced data volume and is clustered iterations, therefore improves cluster Efficiency, and can be obtained through experiment, the F-Measure values of SPFCS algorithms are lifted respectively than traditional SPFCM algorithms and SPHFCM algorithms 3.9% and 21.4%.

Claims

1. in a kind of combination class between compactness and class separation property increment fuzzy clustering method, it is characterised in that：This method step It is as follows：

(3) the 1st data block is clustered, obtains cluster result [U₁₁,U₁₂,...,U_1t,...,U_1c] and cluster barycenter [a₁₁, a₁₂,...,a_1t,...,a_1c], wherein 0<T≤c, U_1cRepresent the c classes of the 1st data block, a_1cRepresent the of the 1st data block C barycenter；

(5) circulation performs step (4), has handled last data block, has obtained final barycenter and cluster result.

2. in a kind of combination class as described in claim 1 between compactness and class separation property increment fuzzy clustering method, its It is characterised by：Cluster comprises the following steps that in step (2)：

1) β, worst error value ε, maximum iteration τ are initialized_maxWith subordinated-degree matrix u_cj, by η be entered as 0 to 1 it is random Number, defines τ=1；

2) according to η_c, u_cjWithUpdate a_c；

3) according to η_c, a_cWithUpdate u_cj；

4) according to β, a_cWithUpdate η_c；

5) τ=τ+1 is updated；

Wherein C is the number of class, and N is data amount check, η_cThe misaligned parameter of the class where c-th of barycenter of control and other barycenter, u_cjIt is degree of membership of j-th of data point to c-th of class, constraints isM is fuzzy factor and m> 1, w_jFor the weight of j-th of data, a_cIt is the barycenter of c-th of class, x_jIt is j-th of data point,It is that data are equal Value, | | x_j-a_c||²It is square of j-th of data point to c-th of barycenter Euclidean distance, 0≤β≤1.0, k=1 ..., C.