CN103995821A

CN103995821A - Selective clustering integration method based on spectral clustering algorithm

Info

Publication number: CN103995821A
Application number: CN201410096258.6A
Authority: CN
Inventors: 徐森; 李先锋; 曹瑞; 花小朋; 徐静; 陈荣
Original assignee: Yangcheng Institute of Technology
Current assignee: Yangcheng Institute of Technology; Yancheng Institute of Technology
Priority date: 2014-03-14
Filing date: 2014-03-14
Publication date: 2014-08-20
Anticipated expiration: 2034-03-14
Also published as: CN103995821B

Abstract

The invention discloses a selective clustering integration method based on a spectral clustering algorithm. The selective clustering integration method based on the spectral clustering algorithm includes the following steps that clustering members are generated; representative members are selected based on the spectral clustering algorithm; the representative members are integrated; the process comes to an end. The selective clustering integration method based on the spectral clustering algorithm has the significant advantages of being easy to implement and capable of effectively promoting the clustering integration effect.

Description

A kind of selectivity clustering ensemble method based on spectral clustering

Technical field

The present invention relates to a kind of selectivity clustering ensemble method based on spectral clustering, belong to data mining technology field.

Background technology

Cluster analysis has the research history of four more than ten years, and it has brought into play extremely important effect in fields such as machine learning, data mining, information retrieval, pattern-recognition, bioinformatics.Traditional clustering algorithm emerges in an endless stream, however do not have a kind of algorithm effectively to identify to have different sizes, difformity, different densities even may comprise noise bunch.Compare with traditional clustering algorithm, clustering ensemble technology possesses the advantages such as robustness, novelty, stability, has become one of study hotspot of machine learning at present.All there are a lot of problems and shortcomings in existing clustering ensemble method, as to bunch shape forced certain structure, to bunch size have very strong constraint, computation complexity high, obtain locally optimal solution etc.

Summary of the invention

Goal of the invention: with not enough, the invention provides a kind of selectivity clustering ensemble method based on spectral clustering that can effectively promote clustering ensemble effect for problems of the prior art.

Technical scheme: a kind of selectivity clustering ensemble method based on spectral clustering, comprises the steps:

1, cluster member generates; 2, based on spectral clustering, select line-up of delegates; 3, to line-up of delegates, carry out integrated; 4, finish.

Beneficial effect: compared with prior art, the selectivity clustering ensemble method based on spectral clustering provided by the invention realizes simply and can effectively promote the effect of clustering ensemble.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the inventive method;

Fig. 2 is the process flow diagram that cluster member generates;

Fig. 3 selects line-up of delegates's process flow diagram based on spectral clustering;

Fig. 4 carries out integrated process flow diagram to line-up of delegates;

Fig. 5 is used the process flow diagram of spectral clustering to cluster member cluster;

Fig. 6 is used the process flow diagram of spectral clustering to data clustering.

Embodiment

Below in conjunction with specific embodiment, further illustrate the present invention, should understand these embodiment is only not used in and limits the scope of the invention for the present invention is described, after having read the present invention, those skilled in the art all fall within the application's claims limited range to the modification of the various equivalent form of values of the present invention.

Method of the present invention as shown in Figure 1.Step 0 is initial actuating.Step 1 is cluster member generation, and this step is specifically introduced the part below in conjunction with Fig. 2.Step 2 is selected line-up of delegates based on spectral clustering, and this step is specifically introduced the part below in conjunction with Fig. 3.Step 3 couple line-up of delegates carries out integrated, and this step is specifically introduced the part below in conjunction with Fig. 4.Step 4 is done states of Fig. 1.

Fig. 2 describes the step 1 in Fig. 1 in detail, and its effect is to generate a plurality of cluster members.Step 10 is origination action.Step 11 is obtained cluster member number l(l and is one and is greater than 1 integer) and cluster number k(generally cluster number k be set to the true classification number that data set comprises).Step 12 is put initial value 1 by control parameter i.Step 13 judgement is controlled parameter i and whether is less than or equal to l, is to forward step 14 to, otherwise forwards step 17 to.K mean vector of the random generation of step 14, as the initial barycenter of K mean algorithm, is used K mean algorithm to divide data set.Step 15 obtains cluster result P ⁽ⁱ⁾={ C ₁ ⁽ⁱ⁾..., C _k ⁽ⁱ⁾.Step 16 adds 1 by control variable i, then forwards step 13 to.Step 17 builds cluster member and gathers P={P ⁽¹⁾..., P ^(l).Step 18 is done states of Fig. 2.

Fig. 3 describes the step 2 in Fig. 1 in detail, and its effect is to select line-up of delegates based on spectral clustering, for follow-up integrated.Step 20 is origination action.Step 21 is calculated the similarity between cluster member, i.e. NMI value between cluster member (Normalized Mutual Information, standardization mutual information).NMI value is larger, and the matching degree of two cluster results is higher, and the similarity between cluster member is larger, and its method for solving is as follows.If X and Y are respectively cluster member P ^(a)and P ^(b)the stochastic variable representing, wherein P ^(a)and P ^(b)there is respectively k ^aand k ^bindividual bunch.If for P ^(a)in bunch C _hthe object number comprising, for P ^(b)in bunch C _lthe object number comprising, n _h,lrepresent C _hand C _ltotal object number, P ^(a)and P ^(b)between NMI value be:

NMI (P^{(a)}, P^{(b)}) = \frac{Σ_{h = 1}^{k^{a}} Σ_{l = 1}^{k^{b}} n_{h, l} \log (\frac{n \cdot n_{h, l}}{{n_{h}}^{a} {n_{l}}^{b}})}{\sqrt{(Σ_{h = 1}^{k^{a}} n_{h}^{a} \log \frac{n_{h}^{a}}{n}) (Σ_{l = 1}^{k^{b}} n_{l}^{b} \log \frac{n_{l}^{b}}{n})}}

The similarity that the step 22 of Fig. 3 calculates according to step 21, is used spectral clustering to cluster member cluster, and this step is specifically introduced the part below in conjunction with Fig. 5.Step 23 is selected line-up of delegates, and system of selection is as follows.The cluster result obtaining according to step 22 is respectively selected the cluster member of the NMI value sum maximum between every other member in and this bunch as line-up of delegates from each cluster member set.Suppose that certain cluster member gathers G={P ⁽¹⁾..., P ^(m), it comprises m cluster member, the line-up of delegates P selecting ^*meet the following conditions:

P^{*} = \arg \max_{P^{(j)}} Σ_{i = 1}^{m} NMI (P^{(i)}, P^{(j)}) .

Step 24 is done states of Fig. 3.

Fig. 4 carries out integrated process flow diagram to line-up of delegates.Step 30 is origination action.Similarity between step 31 computational data point, data point d _iand d _jsimilarity be calculated as follows: S _ij=d _iwith d _jbelong to the number of times/r of same bunch.Step 32 is used spectral clustering to data clustering, and this step is specifically introduced the part below in conjunction with Fig. 6.Step 33 is done states of Fig. 4.

Fig. 5 is used the process flow diagram of spectral clustering to cluster member cluster.Step 220 is origination action.Step 221 is obtained line-up of delegates's number r that will select ₀.Transition probability matrix P corresponding to random walk on step 222 design of graphics ^l, concrete method for solving is as follows: P ^l=(D ^l) ^-1s ^l, S wherein ^lbe the similarity matrix between cluster member, the step 21 of its element value in Fig. 3 tried to achieve, D ^lto angle matrix, diagonal element step 223 solves P ^leigenvalue λ ₁>=...>=λ _lif, there is certain order i, make λ _istrictly be greater than λ _i+1, make r=i; Otherwise make r=r ₀.Step 224 is by P ^lfront r eigenvalue of maximum characteristic of correspondence vector by row discharge, build matrix U _r=[u ₁u _r].Step 225 is used K mean algorithm by U _rrow gather and gather G for r cluster member ₁..., G _r.Step 226 is done states of Fig. 5.

Fig. 6 is used the process flow diagram of spectral clustering to data clustering.Step 320 is initial actuatings.Transition probability matrix P corresponding to random walk on step 321 design of graphics, concrete method for solving is as follows: P=D ^-1s, wherein S is the similarity matrix between data point, and the step 31 of its element value in Fig. 4 tried to achieve, and D is to angle matrix, diagonal element step 322 solves front k the eigenvalue of maximum characteristic of correspondence vector of P and discharges by row, builds matrix V _k=[v ₁v _k].Step 323 is used K mean algorithm by V _krow gather bunch D for k ₁..., D _k.Step 324 is done states of Fig. 6.

Claims

1. the selectivity clustering ensemble method based on spectral clustering, is characterized in that, comprises the following steps:

(1) cluster member generates;

(2) based on spectral clustering, select line-up of delegates;

(3) to line-up of delegates, carry out integrated;

(4) finish.

2. the selectivity clustering ensemble method based on spectral clustering according to claim 1, is characterized in that, the step that described cluster member generates is:

(1) step 11 is obtained cluster member number l and cluster number k, and wherein l is one and is greater than 1 integer, and cluster number k is set to the true classification number that data set comprises;

(2) step 12 is put initial value 1 by control parameter i;

(3) whether step 13 judgement control parameter i is less than or equal to cluster member number l, is to perform step 14, otherwise forwards step 17 to;

(4) k mean vector of the random generation of step 14, as the initial barycenter of K mean algorithm, is used K mean algorithm to divide data set;

(5) step 15 obtains cluster result P ⁽ⁱ⁾={ C ₁ ⁽ⁱ⁾..., C _k ⁽ⁱ⁾;

(6) step 16 adds 1 by control parameter i, then forwards step 13 to;

(7) step 17 structure cluster member gathers P={P ⁽¹⁾..., P ^(l);

(8) finish.

3. the selectivity clustering ensemble method based on spectral clustering according to claim 1, is characterized in that, the described step based on spectral clustering selection line-up of delegates is:

(1) step 21 is calculated the similarity between cluster member;

(2) similarity that step 22 calculates according to step 2, is used spectral clustering to cluster member cluster;

(3) cluster result that step 23 obtains according to step 22 is respectively selected the cluster member of the NMI value sum maximum between every other member in and this bunch as line-up of delegates from each cluster member set;

(4) finish.

4. the selectivity clustering ensemble method based on spectral clustering according to claim 1, is characterized in that describedly line-up of delegates is carried out to integrated step being:

(1) similarity between step 31 computational data point, data point d _iand d _jsimilarity be calculated as follows: S _ij=d _iwith d _jbelong to the number of times/r of same bunch;

(2) step 32 is used spectral clustering to data clustering;

(3) finish.

5. the selectivity clustering ensemble method based on spectral clustering according to claim 3, is characterized in that selecting in line-up of delegates based on spectral clustering, and described use spectral clustering to the step of cluster member cluster is:

(1) step 221 is obtained line-up of delegates's number r that will select ₀;

(2) transition probability matrix P corresponding to the random walk on step 222 design of graphics ^l, concrete method for solving is as follows: P ^l=(D ^l) ^-1s ^l, S wherein ^lbe the similarity matrix between cluster member, the step 21 of its element value in claims 3 tried to achieve, D ^lto angle matrix, diagonal element

(3) step 223 solves P ^leigenvalue λ ₁>=...>=λ _lif, there is certain order i, make λ _istrictly be greater than λ _i+1, make r=i; Otherwise make r=r ₀;

(4) step 224 is by P ^lfront r eigenvalue of maximum characteristic of correspondence vector by row discharge, build matrix U _r=[u ₁u _r];

(5) step 225 is used K mean algorithm by U _rrow gather for r cluster member gathers G1 ..., G _r;

(4) finish.

6. the selectivity clustering ensemble method based on spectral clustering according to claim 4, is characterized in that line-up of delegates to carry out integrated, and described use spectral clustering to the step of data clustering is:

(1) transition probability matrix P corresponding to the random walk on step 321 design of graphics, concrete method for solving is as follows: P=D ^-1s, wherein S is the similarity matrix between data point, and its element value is tried to achieve by step 31, and D is to angle matrix, diagonal element

(2) step 322 solves front k the eigenvalue of maximum characteristic of correspondence vector of P and discharges by row, builds matrix V _k=[v ₁v _k];

(3) step 323 is used K mean algorithm by V _krow gather bunch D for k ₁..., D _k;

(4) finish.