CN109002858A

CN109002858A - A kind of clustering ensemble method based on evidential reasoning for user behavior analysis

Info

Publication number: CN109002858A
Application number: CN201810814178.8A
Authority: CN
Inventors: 褚燕; 王刚; 张峰; 陈刚
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2018-07-23
Filing date: 2018-07-23
Publication date: 2018-12-14
Anticipated expiration: 2038-07-23
Also published as: CN109002858B

Abstract

The present invention provides a kind of clustering ensemble method based on evidential reasoning for user behavior analysis, it can fully consider the time response of user data and the credibility of base cluster device, by using the method for evidential reasoning it is comprehensive solve the problems, such as individually to cluster the not strong and existing clustering ensemble method applicability of device robustness and stability it is poor, to improve the Clustering Effect of user behavior data.The beneficial effects of the present invention are: it can overcome the problems, such as that user behavior data fails because of traditional clustering algorithm brought by higher-dimension；It can integrate that solve the problems, such as individually to cluster the not strong and existing clustering ensemble method applicability of device robustness and stability poor, to improve the Clustering Effect of user behavior data；The present invention can be used for the cluster of user behavior data, particular with the user behavior data clustering problem of high dimensional feature, can be also used for the cluster etc. of flow data, has wide range of applications.

Description

A kind of clustering ensemble method based on evidential reasoning for user behavior analysis

Technical field

The present invention relates to clustering method technical field more particularly to it is a kind of for user behavior analysis based on evidential reasoning Clustering ensemble method.

Background technique

Currently used clustering method has five classes, including the clustering method based on division, the clustering method based on level, base In the clustering method of level, density clustering method and based on the clustering method of grid.Based on division based on division Clustering method represents method such as-mean value (k-means) clustering method, its thought is can by the object nearest apart from cluster center To be divided into a cluster；Clustering method thought based on level is carried out by creating hierachical decomposition for data-oriented object set The method of cluster；Density clustering method, represents method such as DBSCAN algorithm, which assumes that cluster structure can pass through sample The tightness degree of this distribution determines；Clustering method based on model such as EM algorithm, can be used for containing hidden variable (latentvariable) maximal possibility estimation or maximum a posteriori estimate of probability parameter model；Cluster based on grid The thought of method is that object space is quantified as to a limited number of unit, forms a reticular structure, all clusters are all at this It is carried out on reticular structure.

In general, these single clustering methods can by the analysis of excavation and behavioral trait to user behavior data, Effectively identification user behavior pattern, evaluation requirement respond potentiality, to provide decision-making foundation for the formulation of marketing program.However, With the continuous renewal of user behavior data, the rapid development of data volume, data acquire user, and there is extremely strong dispersibility to wait one The appearance of series challenge, existing method is due to being highly susceptible to data variation using its stability of single Clustering Model and accuracy Influence, generalization ability and adaptability is not strong, can not the electricity consumption behavior to different type user carry out it is deep, quick, accurate Analysis.Basic reason is the inherent ambiguity of natural grouping concept in data set.Another where the shoe pinches is clustering cluster Diversity, clustering cluster can have a different shapes, different density, different sizes, and they are often overlapped.By Often there are various problems in single clustering algorithm, occurs the research of many clustering ensemble algorithms in recent years.Clustering ensemble Thought seeks to generate a cluster collective, that is, has that multiple cluster results are available, the result clustered then in conjunction with these with It asks to obtain one and more preferably cluster.The problem of being clustered in conjunction with member in cluster collective also referred to as compatibility function problem, Huo Chengwei Integration problem.Existing clustering ensemble method includes method based on Co-Occurrence and based on MedianPartition's Method.Clustering ensemble method based on Co-Occurrence beats again label and voting method, assists matrix method and drawing method altogether Deng；Clustering ensemble method based on MedianPartition has genetic algorithm, Non-negative Matrix Factorization and kernel method etc..In recent years, The attention of many researchers has been obtained about the research of clustering ensemble, and evidential reasoning melts as a kind of effective information Conjunction method has been applied to many fields, however there has been no evidential reasoning rule is dissolved into showing during clustering ensemble at present There is technology.

Summary of the invention

In order to solve above-mentioned technological deficiency existing in the prior art, the present invention provides a kind of for user behavior analysis Clustering ensemble method based on evidential reasoning can fully consider the time response of user data and the credible journey of base cluster device Degree solves single cluster device robustness and the not strong and existing clustering ensemble side of stability by using the method for evidential reasoning is comprehensive The poor problem of method adaptability, to improve the Clustering Effect of user behavior data.

The present invention is achieved by the following technical solutions:

A kind of clustering ensemble method based on evidential reasoning for user behavior analysis, suitable for having time response Flow data set；The clustering ensemble method includes the following steps:

Step 1, for the user behavior data collection { D of different periods¹,D²,...,D^k,...,D^K, utilize different parameters FCM Algorithms generate K subordinated-degree matrix { U respectively¹,U²,...,U^k,...,U^K}；Wherein, D^kIndicate k-th period Data, U^kIndicate k-th of subordinated-degree matrix；The user behavior data collection be will with time response original stream data on time Between window cutting obtain data set；

Step 2, K subordinated-degree matrix { U step 1 obtained¹,U²,...,U^k,...,U^KTo be converted to K similar Matrix { SM¹,SM²,...,SM^k,...,SM^K, and similarity vector { SV is converted by the K similar matrix¹,SV²,..., SV^k,...,SV^K, and be normalized；Wherein, similarity vector is by SV=Ω={ H₁,H₂,...,H_m,...,H_MIndicate；

Step 3, the power set of Ω is enabled to be indicated by formula (7):

Then according to evidential reasoning rule, by the similarity vector { SV¹,SV²,...,SV^KCan be closed by iterative algorithm Integrated similarity vector SV after and^*=E (K)={ H₁,H₂,...,H_m,...,H_M, and p_H,E(K)Evidence E (K) is expressed as to the letter of H Degree；

Step 4, the integrated similarity vector SV based on evidential reasoning^*, determined using the AGNES algorithm in hierarchy clustering method Final clustering ensemble result { C₁,C₂,...,C_t,...,C_T, wherein C_tFor clustering cluster, T is final cluster number.

The beneficial effect of the present invention compared with the existing technology is:

First, the present invention by user behavior data temporally span carry out cutting, using FCM Algorithms to it is different when Between the user behavior data of section clustered, and clustering ensemble is carried out by the method based on evidential reasoning, user can be overcome The problem of behavioral data fails because of traditional clustering algorithm brought by higher-dimension.

Second, the present invention can fully consider the time response of user data and the credibility of base cluster device, by adopting Single cluster device robustness and the not strong and existing clustering ensemble method applicability of stability are solved with the method for evidential reasoning is comprehensive Poor problem, to improve the Clustering Effect of user behavior data.

Third, method proposed by the invention can be used for the cluster of user behavior data, particular with high dimensional feature User behavior data clustering problem can be also used for the cluster etc. of flow data, have wide range of applications.

Detailed description of the invention

Fig. 1 is the general flow chart of the clustering ensemble method based on evidential reasoning for user behavior analysis.

Fig. 2 is the analysis result figure of error sum of squares SSE value.

Fig. 3 is the analysis result figure of index of conformity C-index value.

Fig. 4 is the analysis result figure of silhouette coefficient SC value.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are only used to explain the present invention, It is not intended to limit the present invention.

Embodiment 1:

As shown in Figure 1, a kind of clustering ensemble method based on evidential reasoning for user behavior analysis, suitable for having The flow data set of time response, the clustering ensemble method include the following steps:

Step 1, for the user behavior data collection of different periods, according to data the feature of itself per year, the moon or day be Time window cutting is { D¹,D²,...,D^k,...,D^K, K degree of membership is generated respectively using the FCM Algorithms of different parameters Matrix { U¹,U²,...,U^k,...,U^K}；Wherein, D^kIndicate the data of k-th of period, U^kIndicate k-th of subordinated-degree matrix.User Behavioral data collection is that temporally window cutting obtains by initial data, (such as 7 years user power utilization numbers used in experiment According to if time window is set to year, by the initial data panel data that cutting is seven per year).

Specifically, step 1 further comprises following steps:

Step 1.1, the random number with value in (0,1) section initializes Subject Matrix U, and the Subject Matrix U is full The constraint of sufficient formula (1):

In formula (1), u_ijIndicate that j-th of sample point belongs to the general of ith cluster center in the subordinated-degree matrix U Rate；C indicates the cluster number of k-th of FCM Algorithms.

Step 1.2, the objective function of formula (2) construction FCM Algorithms is utilized:

In formula (2),Middle m indicates degree of membership u_ijCoefficient, general value be 2；It indicates i-th Cluster centre c_iWith j-th of data point x_jBetween Euclidean distance；Given threshold value δ or maximum number of iterations Max_ Iteration is less than threshold value δ if formula (2) or is directly entered step 1.5 if reaching maximum number of iterations, otherwise enters step 1.3。

Step 1.3, cluster centre c is updated using formula (3) and formula (4)_iWith the element u in subordinated-degree matrix U_ij:

Step 1.4, revolution executes step 1.2, by the updated cluster centre c of step 1.3_iWith element u_ijBring formula into (2)。

Step 1.5, revolution executes step 1.1, and step 1.1 to step 1.4 is repeated K times, K subordinated-degree matrix is obtained {U¹,U²,...,U^k,...,U^K}。

Step 2, K subordinated-degree matrix { U step 1 obtained¹,U²,...,U^k,...,U^KTo be converted to K similar Matrix { SM¹,SM²,...,SM^k,...,SM^K, and similarity vector { SV is converted by the K similar matrix¹,SV²,..., SV^k,...,SV^K, and be normalized；Wherein, similarity vector is by SV=Ω={ H₁,H₂,...,H_m,...,H_MIndicate.

Specifically, step 2 further comprises following steps:

Step 2.1, the subordinated-degree matrix U obtained based on step 1^k, calculated according to formula (5) and obtain kth cluster result Similar matrix SM^k:

SM^k=U^k(U^k)^T (5)

In formula (5), similar matrix SM^kIn elementIndicate the sample x in kth cluster result_iWith sample x_jCome From the joint degree of membership of the same cluster centre.

Step 2.2, enable similarity vector by SV=Ω={ H₁,H₂,...,H_m,...,H_MIndicate, and the value of similarity vector by The element of similar matrix SM diagonal line above section is constituted, the number of element

Step 2.3, using formula (6) to the similarity vector SV^kIn element be normalized, can be obtained:

In the formula (6),For similarity vector SV^kIn m-th of element, element number is sharedIt is a,To be the sum of all, p_X,kFor the element after normalizationValue.

Step 3, the power set of Ω is enabled to be indicated by formula (7):

Then according to evidential reasoning rule, by the similarity vector { SV¹,SV²,...,SV^KCan be closed by iterative algorithm Integrated similarity vector SV after and^*=E (K)={ H₁,H₂,...,H_m,...,H_M, and p_H,E(K)Evidence E (K) is expressed as to the letter of H Degree.

Specifically, step 3 further comprises following steps:

Step 3.1, w is enabled_k(0≤w_k≤ 1) and R_k(0≤R_k≤ 1) user behavior data D is respectively indicated^kWeight and similar Vector SV^kConfidence level, wherein w_k" least important " is indicated for 0, w_k" most important " is indicated for 1；R_kIt indicates " extremely can not for 0 Letter ", R_k" completely credible " is indicated for 1；In conjunction with weight w_kWith confidence level R_kAnd the mixed of k-th similarity vector is obtained according to formula (8) Close weight

In the formula (8), w_kIndicate user behavior data D^kWeight, and work as user behavior data D^kGeneration time It is more early, w_kIt is smaller；

R_kFor similarity vector SV^kConfidence level, measured by Cluster Assessment index silhouette coefficient, according to formula (9) calculate It obtains:

Wherein, a (i) indicates sample x_iWith the average distance of other samples in same clustering cluster, is calculated and obtained by formula (10) :

Wherein, b (i) indicates sample x_iWith the minimum value of the average distance of the sample of other clustering clusters, according to formula (11) It calculates and obtains:

In formula (10) and formula (11), d (i, A) and d (i, B) are calculated by Euclidean distance and are obtained, A indicate with x_iSample set in the same cluster, B is indicated and x_iSample set in different clusters；

Step 3.2, evidence E (2) are calculated to the support of H using formula (12)

In formula (12),WithDenote like vectorWithHybrid weight, p_H,1And p_H,2Table respectively Show similarity vectorWithIn element.

Step 3.3, all by what is obtained using formula (13)It is normalized, and obtains evidence E (2) to H Reliability:

In formula (13), reliability p_H,E(2)As supportValue after normalization,For's With.Step 3.4, the remaining support of evidence E (2) is calculated using formula (14)

Step 3.5, if the amalgamation result of preceding k similarity vector byIt indicates, E (k) is calculated to the support of H according to formula (15)

In formula (15), m_H,E(k-1)Support after standardizing is indicated, by initial valueSubstitute into simultaneously combinatorial formula (16) calculating is iterated to obtain；m_p(Ω),E(k-1)Remaining support after standardizing is indicated, by initial valueSubstitute into formula (18), and formula (18) and formula (15) are substituted into formula (17) and is iterated calculating acquisition；

Step 3.6, according to formula (19) to supportIt is normalized, obtains pH_,E(k):

In formula (19),Indicate allSum, and be able to satisfyBy above-mentioned The iterative step of evidential reasoning may finally obtain similarity vector { SV¹,SV²,...,SV^KAmalgamation result SV^*=E (K).

Specifically, step 4 further comprises following steps:

Step 4.1, each sample is classified as one kind, at this time T=N, wherein T is cluster number, and N is number of samples, and sample Similarity between this integrates similarity vector SV using the result of above-mentioned evidential reasoning^*It indicates.

Step 4.2, integrated similarity vector SV is found out^*In maximum element max_SV^*, by max_SV^*Representative sample x_i With sample x_jGather for one kind, if this classification is C_t。

Step 4.3, the similarity of this class Yu other classes is calculated using formula (20):

In formula (20), sim (x, x') indicates to come from clustering cluster C_sSample x and come from clustering cluster C_tSample x' it Between similarity, and with similarity vector SV^*The value of middle element corresponds, | C_s| and | C_t| respectively indicate clustering cluster C_sAnd C_tIn Number of samples.

Step 4.4, if the number of clustering cluster is T at this time, stop calculating, otherwise repeatedly step 4.2 and step 4.3 until Final cluster number reaches T.

Below with specific example, experimental demonstration is carried out for the method for the present invention, particular content is as follows:

1, data set

The present embodiment selects commercial user's electricity consumption behavioral data in China coast city to verify for user behavior point The validity of the clustering ensemble method of analysis.In this commercial user's electricity consumption data, including 169 commercial users, time span Year totally 7 years electricity consumption data from 2010 to 2016.

2, evaluation index

The present embodiment uses the common silhouette coefficient in cluster field (SC), error sum of squares (SSE) and index of conformity (C-index) it is used as experimental evaluation index.SSE is obtained by the sum of the distance for calculating central point to all sample points of each class It arrives, is the widely applied evaluation index in cluster field, the value of SSE is smaller, indicates that Clustering Effect is better.C-index mainly from The quality of reflection Clustering Effect in terms of condensation degree, its value is smaller, indicates that Clustering Effect is better.SC comprehensively considered condensation degree and Two kinds of factors of separating degree can effectively judge different clustering algorithms in the quality of same data set, and the value of SC is bigger, indicate The better the effect of cluster the higher.The calculating of silhouette coefficient, error sum of squares and index of conformity can be respectively by formula (9), formula (21) It is obtained with formula (22).

In formula (21), N_tIndicate t-th of clustering cluster C_tIn sample number,Indicate clustering cluster C_tCenter.Formula (21) inIndicate clustering cluster C_tIn the sum of the Euclidean distance between sample two-by-two,Table Show the minimum range between all samples,Indicate the maximum distance between all samples.

3, experimental result

In order to verify the validity of method proposed by the invention, the present invention is enterprising in commercial user's electricity consumption behavioral data collection Row experiment, and will be provided by the present invention for the clustering ensemble method and six kinds of comparisons based on evidential reasoning of user behavior analysis Method fuzzy C-means clustering (FCM), K mean cluster (K-Means), Density Clustering (DBSCAN), hierarchical clustering (Hierarchy), projective clustering (ProClus) and the experimental result of ballot K mean cluster (Voting-Kmeans) are compared Compared with.Experimental result is as shown in table 1, Fig. 2, Fig. 3 and Fig. 4, and abscissa indicates cluster number in Fig. 2, and ordinate indicates the value of SSE, Ordinate indicates the value of C-index in Fig. 3, and ordinate indicates the value of SC in Fig. 4.

1 ERCE of table (cluster number: 2-10) compared with comparing algorithm cluster result

From table 1 it follows that the clustering ensemble method ERCE for user behavior analysis that is mentioned of the present invention is in SSE, Other six kinds of clustering methods are superior under tri- evaluation indexes of C-index, SC.As can also be seen from Table 1, the Clustering Effect of ERCE There is promotion by a relatively large margin than clustering device FCM, this also further demonstrates the cluster proposed by the present invention based on evidential reasoning The validity of fusion method.

By Fig. 2, Fig. 3 and Fig. 4, it is apparent that the clustering ensemble method of the present invention based on evidence theory exists There is preferable performance in indices, and under different cluster numbers, method proposed by the present invention can be obtained preferably Result.In addition, can be seen that when clustering number is 6 from the curve in above-mentioned figure, curve " inflection point " is just corresponded to Position, and when clustering number greater than 6, the variation of Cluster Assessment index tends towards stability.Therefore, it is concentrated in this user behavior data, The optimal selection for clustering number is 6.When method of the invention is applied in other similar data set, the method can also be passed through To determine best cluster number.

As it will be easily appreciated by one skilled in the art that the above is merely preferred embodiments of the present invention, not to limit The present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in this Within the protection scope of invention.

Claims

1. a kind of clustering ensemble method based on evidential reasoning for user behavior analysis, suitable for having the stream of time response Data set；It is characterized by comprising the following steps:

Step 1, for the user behavior data collection { D of different periods¹,D²,...,D^k,...,D^K, utilize the Fuzzy C of different parameters Mean algorithm generates K subordinated-degree matrix { U respectively¹,U²,...,U^k,...,U^K}；Wherein, D^kIndicate the data of k-th of period, U^kIndicate k-th of subordinated-degree matrix；The user behavior data collection is will be with the original stream data temporally window of time response The data set that mouth cutting obtains；

Step 2, K subordinated-degree matrix { U step 1 obtained¹,U²,...,U^k,...,U^KBe converted to K similar matrix {SM¹,SM²,...,SM^k,...,SM^K, and similarity vector { SV is converted by the K similar matrix¹,SV²,..., SV^k,...,SV^K, and be normalized；Wherein, similarity vector is by SV=Ω={ H₁,H₂,...,H_m,...,H_MIndicate；

Step 3, the power set of Ω is enabled to be indicated by formula (7):

Then according to evidential reasoning rule, by the similarity vector { SV¹,SV²,...,SV^KCan obtain merging by iterative algorithm after Integrated similarity vector SV^*=E (K)={ H₁,H₂,...,H_m,...,H_M, and p_H,E(K)Evidence E (K) is expressed as to the reliability of H；

2. the clustering ensemble method based on evidential reasoning according to claim 1 for user behavior analysis, feature It is, the step 1 specifically includes:

Step 1.1, the random number with value in (0,1) section initializes Subject Matrix U, and the Subject Matrix U meets public affairs The constraint of formula (1):

In formula (1), u_ijIndicate that j-th of sample point belongs to the probability at ith cluster center in the subordinated-degree matrix U；

C indicates the cluster number of k-th of FCM Algorithms；

In formula (2),Middle m indicates degree of membership u_ijCoefficient, general value be 2；Indicate ith cluster Center c_iWith j-th of data point x_jBetween Euclidean distance；Given threshold value δ or maximum number of iterations Max_iteration, if Formula (2), which is less than threshold value δ or reaches maximum number of iterations, is then directly entered step 1.5, otherwise enters step 1.3；

Step 1.4, revolution executes step 1.2, by the updated cluster centre c of step 1.3_iWith element u_ijBring formula (2) into；

Step 1.5, revolution executes step 1.1, and step 1.1 to step 1.4 is repeated K times, K subordinated-degree matrix { U is obtained¹, U²,...,U^k,...,U^K}。

3. the clustering ensemble method based on evidential reasoning according to claim 1 for user behavior analysis, feature It is, the step 2 specifically includes:

Step 2.1, the subordinated-degree matrix U obtained based on step 1^k, calculated according to formula (5) and obtain the similar of kth cluster result Matrix SM^k:

SM^k=U^k(U^k)^T (5)

In formula (5), similar matrix SM^kIn elementIndicate the sample x in kth cluster result_iWith sample x_jFrom same The joint degree of membership of one cluster centre；

Step 2.2, enable similarity vector by SV=Ω={ H₁,H₂,...,H_m,...,H_MIndicate, and the value of similarity vector is by similar The element of matrix SM diagonal line above section is constituted, the number of element

In the formula (6),For similarity vector SV^kIn m-th of element, element number is sharedIt is a,For It is the sum of all, p_X,kFor the element after normalizationValue.

4. the clustering ensemble method based on evidential reasoning according to claim 1 for user behavior analysis, feature It is, the step 3 specifically includes:

Step 3.1, w is enabled_k(0≤w_k≤ 1) and R_k(0≤R_k≤ 1) user behavior data D is respectively indicated^kWeight and similarity vector SV^kConfidence level, wherein w_k" least important " is indicated for 0, w_k" most important " is indicated for 1；R_k" extremely insincere ", R are indicated for 0_kIt is 1 Indicate " completely credible "；In conjunction with weight w_kWith confidence level R_kAnd the hybrid weight of k-th similarity vector is obtained according to formula (8)

In the formula (8), w_kIndicate user behavior data D^kWeight, and work as user behavior data D^kGeneration time is more early, w_kIt is smaller；

R_kFor similarity vector SV^kConfidence level, measured by Cluster Assessment index silhouette coefficient, according to formula (9) calculate obtain :

Wherein, a (i) indicates sample x_iWith the average distance of other samples in same clustering cluster, is calculated and is obtained by formula (10):

Wherein, b (i) indicates sample x_iWith the minimum value of the average distance of the sample of other clustering clusters, is calculated and obtained according to formula (11) :

In formula (10) and formula (11), d (i, A) and d (i, B) are calculated by Euclidean distance and are obtained, and A is indicated and x_iPlace In the sample set of the same cluster, B is indicated and x_iSample set in different clusters；

Step 3.2, evidence E (2) are calculated to the support of H using formula (12)

In formula (12),WithDenote like vectorWithHybrid weight, p_H,1And p_H,2Respectively indicate phase Like vectorWithIn element；

Step 3.3, all by what is obtained using formula (13)It is normalized, and obtains evidence E (2) to the letter of H Degree:

In formula (13), reliability p_H,E(2)As supportValue after normalization,ForSum；

Step 3.4, the remaining support of evidence E (2) is calculated using formula (14)

Step 3.5, if the amalgamation result of preceding k similarity vector byIt indicates, according to Formula (15) calculates E (k) to the support of H

In formula (15), m_H,E(k-1)Support after standardizing is indicated, by initial valueIt substitutes into and combinatorial formula (16) carries out Iterative calculation obtains；m_p(Ω),E(k-1)Remaining support after standardizing is indicated, by initial valueIt substitutes into formula (18), and will Formula (18) and formula (15) substitute into formula (17) and are iterated calculating acquisition；

Step 3.6, according to formula (19) to supportIt is normalized, obtains p_H,E(k):

In formula (19),Indicate allSum, and be able to satisfyBy above-mentioned evidence The iterative step of reasoning may finally obtain similarity vector { SV¹,SV²,...,SV^KAmalgamation result SV^*=E (K).

5. the clustering ensemble method based on evidential reasoning according to claim 1 for user behavior analysis, feature It is, the step 4 specifically includes:

Step 4.1, each sample is classified as one kind, at this time T=N, wherein T is cluster number, and N is number of samples, and sample it Between similarity using above-mentioned evidential reasoning result integrate similarity vector SV^*It indicates；

Step 4.2, integrated similarity vector SV is found out^*In maximum element max_SV^*, by max_SV^*Representative sample x_iAnd sample This x_jGather for one kind, if this classification is C_t；

In formula (20), sim (x, x') indicates to come from clustering cluster C_sSample x and come from clustering cluster C_tSample x' between Similarity, and with similarity vector SV^*The value of middle element corresponds, | C_s| and | C_t| respectively indicate clustering cluster C_sAnd C_tIn sample Number；

Step 4.4, if at this time the number of clustering cluster be T, stop calculating, otherwise repeatedly step 4.2 and step 4.3 until final Cluster number reaches T.