CN103995821A - Selective clustering integration method based on spectral clustering algorithm - Google Patents

Selective clustering integration method based on spectral clustering algorithm Download PDF

Info

Publication number
CN103995821A
CN103995821A CN201410096258.6A CN201410096258A CN103995821A CN 103995821 A CN103995821 A CN 103995821A CN 201410096258 A CN201410096258 A CN 201410096258A CN 103995821 A CN103995821 A CN 103995821A
Authority
CN
China
Prior art keywords
clustering
cluster
spectral clustering
cluster member
delegates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410096258.6A
Other languages
Chinese (zh)
Other versions
CN103995821B (en
Inventor
徐森
李先锋
曹瑞
花小朋
徐静
陈荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangcheng Institute of Technology
Yancheng Institute of Technology
Original Assignee
Yangcheng Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangcheng Institute of Technology filed Critical Yangcheng Institute of Technology
Priority to CN201410096258.6A priority Critical patent/CN103995821B/en
Publication of CN103995821A publication Critical patent/CN103995821A/en
Application granted granted Critical
Publication of CN103995821B publication Critical patent/CN103995821B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a selective clustering integration method based on a spectral clustering algorithm. The selective clustering integration method based on the spectral clustering algorithm includes the following steps that clustering members are generated; representative members are selected based on the spectral clustering algorithm; the representative members are integrated; the process comes to an end. The selective clustering integration method based on the spectral clustering algorithm has the significant advantages of being easy to implement and capable of effectively promoting the clustering integration effect.

Description

A kind of selectivity clustering ensemble method based on spectral clustering
Technical field
The present invention relates to a kind of selectivity clustering ensemble method based on spectral clustering, belong to data mining technology field.
Background technology
Cluster analysis has the research history of four more than ten years, and it has brought into play extremely important effect in fields such as machine learning, data mining, information retrieval, pattern-recognition, bioinformatics.Traditional clustering algorithm emerges in an endless stream, however do not have a kind of algorithm effectively to identify to have different sizes, difformity, different densities even may comprise noise bunch.Compare with traditional clustering algorithm, clustering ensemble technology possesses the advantages such as robustness, novelty, stability, has become one of study hotspot of machine learning at present.All there are a lot of problems and shortcomings in existing clustering ensemble method, as to bunch shape forced certain structure, to bunch size have very strong constraint, computation complexity high, obtain locally optimal solution etc.
Summary of the invention
Goal of the invention: with not enough, the invention provides a kind of selectivity clustering ensemble method based on spectral clustering that can effectively promote clustering ensemble effect for problems of the prior art.
Technical scheme: a kind of selectivity clustering ensemble method based on spectral clustering, comprises the steps:
1, cluster member generates; 2, based on spectral clustering, select line-up of delegates; 3, to line-up of delegates, carry out integrated; 4, finish.
Beneficial effect: compared with prior art, the selectivity clustering ensemble method based on spectral clustering provided by the invention realizes simply and can effectively promote the effect of clustering ensemble.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the inventive method;
Fig. 2 is the process flow diagram that cluster member generates;
Fig. 3 selects line-up of delegates's process flow diagram based on spectral clustering;
Fig. 4 carries out integrated process flow diagram to line-up of delegates;
Fig. 5 is used the process flow diagram of spectral clustering to cluster member cluster;
Fig. 6 is used the process flow diagram of spectral clustering to data clustering.
Embodiment
Below in conjunction with specific embodiment, further illustrate the present invention, should understand these embodiment is only not used in and limits the scope of the invention for the present invention is described, after having read the present invention, those skilled in the art all fall within the application's claims limited range to the modification of the various equivalent form of values of the present invention.
Method of the present invention as shown in Figure 1.Step 0 is initial actuating.Step 1 is cluster member generation, and this step is specifically introduced the part below in conjunction with Fig. 2.Step 2 is selected line-up of delegates based on spectral clustering, and this step is specifically introduced the part below in conjunction with Fig. 3.Step 3 couple line-up of delegates carries out integrated, and this step is specifically introduced the part below in conjunction with Fig. 4.Step 4 is done states of Fig. 1.
Fig. 2 describes the step 1 in Fig. 1 in detail, and its effect is to generate a plurality of cluster members.Step 10 is origination action.Step 11 is obtained cluster member number l(l and is one and is greater than 1 integer) and cluster number k(generally cluster number k be set to the true classification number that data set comprises).Step 12 is put initial value 1 by control parameter i.Step 13 judgement is controlled parameter i and whether is less than or equal to l, is to forward step 14 to, otherwise forwards step 17 to.K mean vector of the random generation of step 14, as the initial barycenter of K mean algorithm, is used K mean algorithm to divide data set.Step 15 obtains cluster result P (i)={ C 1 (i)..., C k (i).Step 16 adds 1 by control variable i, then forwards step 13 to.Step 17 builds cluster member and gathers P={P (1)..., P (l).Step 18 is done states of Fig. 2.
Fig. 3 describes the step 2 in Fig. 1 in detail, and its effect is to select line-up of delegates based on spectral clustering, for follow-up integrated.Step 20 is origination action.Step 21 is calculated the similarity between cluster member, i.e. NMI value between cluster member (Normalized Mutual Information, standardization mutual information).NMI value is larger, and the matching degree of two cluster results is higher, and the similarity between cluster member is larger, and its method for solving is as follows.If X and Y are respectively cluster member P (a)and P (b)the stochastic variable representing, wherein P (a)and P (b)there is respectively k aand k bindividual bunch.If for P (a)in bunch C hthe object number comprising, for P (b)in bunch C lthe object number comprising, n h,lrepresent C hand C ltotal object number, P (a)and P (b)between NMI value be:
NMI ( P ( a ) , P ( b ) ) = Σ h = 1 k a Σ l = 1 k b n h , l log ( n · n h , l n h a n l b ) ( Σ h = 1 k a n h a log n h a n ) ( Σ l = 1 k b n l b log n l b n )
The similarity that the step 22 of Fig. 3 calculates according to step 21, is used spectral clustering to cluster member cluster, and this step is specifically introduced the part below in conjunction with Fig. 5.Step 23 is selected line-up of delegates, and system of selection is as follows.The cluster result obtaining according to step 22 is respectively selected the cluster member of the NMI value sum maximum between every other member in and this bunch as line-up of delegates from each cluster member set.Suppose that certain cluster member gathers G={P (1)..., P (m), it comprises m cluster member, the line-up of delegates P selecting *meet the following conditions: P * = arg max P ( j ) Σ i = 1 m NMI ( P ( i ) , P ( j ) ) . Step 24 is done states of Fig. 3.
Fig. 4 carries out integrated process flow diagram to line-up of delegates.Step 30 is origination action.Similarity between step 31 computational data point, data point d iand d jsimilarity be calculated as follows: S ij=d iwith d jbelong to the number of times/r of same bunch.Step 32 is used spectral clustering to data clustering, and this step is specifically introduced the part below in conjunction with Fig. 6.Step 33 is done states of Fig. 4.
Fig. 5 is used the process flow diagram of spectral clustering to cluster member cluster.Step 220 is origination action.Step 221 is obtained line-up of delegates's number r that will select 0.Transition probability matrix P corresponding to random walk on step 222 design of graphics l, concrete method for solving is as follows: P l=(D l) -1s l, S wherein lbe the similarity matrix between cluster member, the step 21 of its element value in Fig. 3 tried to achieve, D lto angle matrix, diagonal element step 223 solves P leigenvalue λ 1>=...>=λ lif, there is certain order i, make λ istrictly be greater than λ i+1, make r=i; Otherwise make r=r 0.Step 224 is by P lfront r eigenvalue of maximum characteristic of correspondence vector by row discharge, build matrix U r=[u 1u r].Step 225 is used K mean algorithm by U rrow gather and gather G for r cluster member 1..., G r.Step 226 is done states of Fig. 5.
Fig. 6 is used the process flow diagram of spectral clustering to data clustering.Step 320 is initial actuatings.Transition probability matrix P corresponding to random walk on step 321 design of graphics, concrete method for solving is as follows: P=D -1s, wherein S is the similarity matrix between data point, and the step 31 of its element value in Fig. 4 tried to achieve, and D is to angle matrix, diagonal element step 322 solves front k the eigenvalue of maximum characteristic of correspondence vector of P and discharges by row, builds matrix V k=[v 1v k].Step 323 is used K mean algorithm by V krow gather bunch D for k 1..., D k.Step 324 is done states of Fig. 6.

Claims (6)

1. the selectivity clustering ensemble method based on spectral clustering, is characterized in that, comprises the following steps:
(1) cluster member generates;
(2) based on spectral clustering, select line-up of delegates;
(3) to line-up of delegates, carry out integrated;
(4) finish.
2. the selectivity clustering ensemble method based on spectral clustering according to claim 1, is characterized in that, the step that described cluster member generates is:
(1) step 11 is obtained cluster member number l and cluster number k, and wherein l is one and is greater than 1 integer, and cluster number k is set to the true classification number that data set comprises;
(2) step 12 is put initial value 1 by control parameter i;
(3) whether step 13 judgement control parameter i is less than or equal to cluster member number l, is to perform step 14, otherwise forwards step 17 to;
(4) k mean vector of the random generation of step 14, as the initial barycenter of K mean algorithm, is used K mean algorithm to divide data set;
(5) step 15 obtains cluster result P (i)={ C 1 (i)..., C k (i);
(6) step 16 adds 1 by control parameter i, then forwards step 13 to;
(7) step 17 structure cluster member gathers P={P (1)..., P (l);
(8) finish.
3. the selectivity clustering ensemble method based on spectral clustering according to claim 1, is characterized in that, the described step based on spectral clustering selection line-up of delegates is:
(1) step 21 is calculated the similarity between cluster member;
(2) similarity that step 22 calculates according to step 2, is used spectral clustering to cluster member cluster;
(3) cluster result that step 23 obtains according to step 22 is respectively selected the cluster member of the NMI value sum maximum between every other member in and this bunch as line-up of delegates from each cluster member set;
(4) finish.
4. the selectivity clustering ensemble method based on spectral clustering according to claim 1, is characterized in that describedly line-up of delegates is carried out to integrated step being:
(1) similarity between step 31 computational data point, data point d iand d jsimilarity be calculated as follows: S ij=d iwith d jbelong to the number of times/r of same bunch;
(2) step 32 is used spectral clustering to data clustering;
(3) finish.
5. the selectivity clustering ensemble method based on spectral clustering according to claim 3, is characterized in that selecting in line-up of delegates based on spectral clustering, and described use spectral clustering to the step of cluster member cluster is:
(1) step 221 is obtained line-up of delegates's number r that will select 0;
(2) transition probability matrix P corresponding to the random walk on step 222 design of graphics l, concrete method for solving is as follows: P l=(D l) -1s l, S wherein lbe the similarity matrix between cluster member, the step 21 of its element value in claims 3 tried to achieve, D lto angle matrix, diagonal element
(3) step 223 solves P leigenvalue λ 1>=...>=λ lif, there is certain order i, make λ istrictly be greater than λ i+1, make r=i; Otherwise make r=r 0;
(4) step 224 is by P lfront r eigenvalue of maximum characteristic of correspondence vector by row discharge, build matrix U r=[u 1u r];
(5) step 225 is used K mean algorithm by U rrow gather for r cluster member gathers G1 ..., G r;
(4) finish.
6. the selectivity clustering ensemble method based on spectral clustering according to claim 4, is characterized in that line-up of delegates to carry out integrated, and described use spectral clustering to the step of data clustering is:
(1) transition probability matrix P corresponding to the random walk on step 321 design of graphics, concrete method for solving is as follows: P=D -1s, wherein S is the similarity matrix between data point, and its element value is tried to achieve by step 31, and D is to angle matrix, diagonal element
(2) step 322 solves front k the eigenvalue of maximum characteristic of correspondence vector of P and discharges by row, builds matrix V k=[v 1v k];
(3) step 323 is used K mean algorithm by V krow gather bunch D for k 1..., D k;
(4) finish.
CN201410096258.6A 2014-03-14 2014-03-14 Selective clustering integration method based on spectral clustering algorithm Expired - Fee Related CN103995821B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410096258.6A CN103995821B (en) 2014-03-14 2014-03-14 Selective clustering integration method based on spectral clustering algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410096258.6A CN103995821B (en) 2014-03-14 2014-03-14 Selective clustering integration method based on spectral clustering algorithm

Publications (2)

Publication Number Publication Date
CN103995821A true CN103995821A (en) 2014-08-20
CN103995821B CN103995821B (en) 2017-05-10

Family

ID=51309986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410096258.6A Expired - Fee Related CN103995821B (en) 2014-03-14 2014-03-14 Selective clustering integration method based on spectral clustering algorithm

Country Status (1)

Country Link
CN (1) CN103995821B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105959270A (en) * 2016-04-25 2016-09-21 盐城工学院 Network attack detection method based on spectral clustering algorithm
CN108229507A (en) * 2016-12-14 2018-06-29 中国电信股份有限公司 Data classification method and device
CN114328922A (en) * 2021-12-28 2022-04-12 盐城工学院 Selective text clustering integration method based on spectrogram theory

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101968852A (en) * 2010-09-09 2011-02-09 西安电子科技大学 Entropy sequencing-based semi-supervision spectral clustering method for determining clustering number
CN102799891A (en) * 2012-05-24 2012-11-28 浙江大学 Spectral clustering method based on landmark point representation
CN103399852A (en) * 2013-06-27 2013-11-20 江南大学 Multi-channel spectrum clustering method based on local density estimation and neighbor relation spreading

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101968852A (en) * 2010-09-09 2011-02-09 西安电子科技大学 Entropy sequencing-based semi-supervision spectral clustering method for determining clustering number
CN102799891A (en) * 2012-05-24 2012-11-28 浙江大学 Spectral clustering method based on landmark point representation
CN103399852A (en) * 2013-06-27 2013-11-20 江南大学 Multi-channel spectrum clustering method based on local density estimation and neighbor relation spreading

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
卢志茂等: "近邻传播的文本聚类集成谱算法", 《哈尔滨工程大学学报》 *
徐森: "文本聚类集成关键技术研究", 《中国博士学位论文全文数据库 信息科技辑》 *
徐森等: "基于矩阵谱分析的文本聚类集成算法", 《模式识别与人工智能》 *
黄发良等: "网络重叠社区发现的谱聚类集成算法", 《控制与决策》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105959270A (en) * 2016-04-25 2016-09-21 盐城工学院 Network attack detection method based on spectral clustering algorithm
CN108229507A (en) * 2016-12-14 2018-06-29 中国电信股份有限公司 Data classification method and device
CN114328922A (en) * 2021-12-28 2022-04-12 盐城工学院 Selective text clustering integration method based on spectrogram theory

Also Published As

Publication number Publication date
CN103995821B (en) 2017-05-10

Similar Documents

Publication Publication Date Title
EP3611799A1 (en) Array element arrangement method for l-type array antenna based on inheritance of acquired characteristics
CN105303450A (en) Complex network community discovery method based on spectral clustering improved intersection
CN103020979B (en) Image segmentation method based on sparse genetic clustering
CN102800093A (en) Multi-target remote sensing image segmentation method based on decomposition
CN102521605A (en) Wave band selection method for hyperspectral remote-sensing image
CN105488562A (en) Irregular part stock layout method based on multi-factor particle swarm algorithm
CN104637057A (en) Grayscale-gradient entropy multi-threshold fast division method based on genetic algorithm
CN106934722A (en) Multi-objective community detection method based on k node updates Yu similarity matrix
CN104268629A (en) Complex network community detecting method based on prior information and network inherent information
CN110457758A (en) Prediction technique, device, system and the storage medium in Instability of Rock Body stage
CN108197708A (en) A kind of parallel time genetic algorithm based on Spark
CN103366189A (en) Intelligent classification method for high-spectrum remote sensing image
CN102880754A (en) Method for identifying action scale of land utilization fractal dimension based on genetic algorithm
CN103365999A (en) Text clustering integrated method based on similarity degree matrix spectral factorization
CN103995821A (en) Selective clustering integration method based on spectral clustering algorithm
CN105139282A (en) Power grid index data processing method, device and calculation device
CN103793438B (en) A kind of parallel clustering method based on MapReduce
CN104657472A (en) EA (Evolutionary Algorithm)-based English text clustering method
CN105740949A (en) Group global optimization method based on randomness best strategy
CN103942318B (en) Parallel AP propagating XML big data clustering integration method
CN110362606A (en) A kind of elongated die body method for digging of time series
CN104318306B (en) Self adaptation based on Non-negative Matrix Factorization and evolution algorithm Optimal Parameters overlaps community detection method
CN102982342A (en) Positive semidefinite spectral clustering method based on Lagrange dual
CN104573004B (en) A kind of double clustering methods of the gene expression data based on double rank genetic computations
CN106156854A (en) A kind of support vector machine parameter prediction method based on DNA encoding

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170510

Termination date: 20180314

CF01 Termination of patent right due to non-payment of annual fee