CN103995821A - Selective clustering integration method based on spectral clustering algorithm - Google Patents
Selective clustering integration method based on spectral clustering algorithm Download PDFInfo
- Publication number
- CN103995821A CN103995821A CN201410096258.6A CN201410096258A CN103995821A CN 103995821 A CN103995821 A CN 103995821A CN 201410096258 A CN201410096258 A CN 201410096258A CN 103995821 A CN103995821 A CN 103995821A
- Authority
- CN
- China
- Prior art keywords
- clustering
- cluster
- spectral clustering
- cluster member
- delegates
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a selective clustering integration method based on a spectral clustering algorithm. The selective clustering integration method based on the spectral clustering algorithm includes the following steps that clustering members are generated; representative members are selected based on the spectral clustering algorithm; the representative members are integrated; the process comes to an end. The selective clustering integration method based on the spectral clustering algorithm has the significant advantages of being easy to implement and capable of effectively promoting the clustering integration effect.
Description
Technical field
The present invention relates to a kind of selectivity clustering ensemble method based on spectral clustering, belong to data mining technology field.
Background technology
Cluster analysis has the research history of four more than ten years, and it has brought into play extremely important effect in fields such as machine learning, data mining, information retrieval, pattern-recognition, bioinformatics.Traditional clustering algorithm emerges in an endless stream, however do not have a kind of algorithm effectively to identify to have different sizes, difformity, different densities even may comprise noise bunch.Compare with traditional clustering algorithm, clustering ensemble technology possesses the advantages such as robustness, novelty, stability, has become one of study hotspot of machine learning at present.All there are a lot of problems and shortcomings in existing clustering ensemble method, as to bunch shape forced certain structure, to bunch size have very strong constraint, computation complexity high, obtain locally optimal solution etc.
Summary of the invention
Goal of the invention: with not enough, the invention provides a kind of selectivity clustering ensemble method based on spectral clustering that can effectively promote clustering ensemble effect for problems of the prior art.
Technical scheme: a kind of selectivity clustering ensemble method based on spectral clustering, comprises the steps:
1, cluster member generates; 2, based on spectral clustering, select line-up of delegates; 3, to line-up of delegates, carry out integrated; 4, finish.
Beneficial effect: compared with prior art, the selectivity clustering ensemble method based on spectral clustering provided by the invention realizes simply and can effectively promote the effect of clustering ensemble.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the inventive method;
Fig. 2 is the process flow diagram that cluster member generates;
Fig. 3 selects line-up of delegates's process flow diagram based on spectral clustering;
Fig. 4 carries out integrated process flow diagram to line-up of delegates;
Fig. 5 is used the process flow diagram of spectral clustering to cluster member cluster;
Fig. 6 is used the process flow diagram of spectral clustering to data clustering.
Embodiment
Below in conjunction with specific embodiment, further illustrate the present invention, should understand these embodiment is only not used in and limits the scope of the invention for the present invention is described, after having read the present invention, those skilled in the art all fall within the application's claims limited range to the modification of the various equivalent form of values of the present invention.
Method of the present invention as shown in Figure 1.Step 0 is initial actuating.Step 1 is cluster member generation, and this step is specifically introduced the part below in conjunction with Fig. 2.Step 2 is selected line-up of delegates based on spectral clustering, and this step is specifically introduced the part below in conjunction with Fig. 3.Step 3 couple line-up of delegates carries out integrated, and this step is specifically introduced the part below in conjunction with Fig. 4.Step 4 is done states of Fig. 1.
Fig. 2 describes the step 1 in Fig. 1 in detail, and its effect is to generate a plurality of cluster members.Step 10 is origination action.Step 11 is obtained cluster member number l(l and is one and is greater than 1 integer) and cluster number k(generally cluster number k be set to the true classification number that data set comprises).Step 12 is put initial value 1 by control parameter i.Step 13 judgement is controlled parameter i and whether is less than or equal to l, is to forward step 14 to, otherwise forwards step 17 to.K mean vector of the random generation of step 14, as the initial barycenter of K mean algorithm, is used K mean algorithm to divide data set.Step 15 obtains cluster result P
(i)={ C
1 (i)..., C
k (i).Step 16 adds 1 by control variable i, then forwards step 13 to.Step 17 builds cluster member and gathers P={P
(1)..., P
(l).Step 18 is done states of Fig. 2.
Fig. 3 describes the step 2 in Fig. 1 in detail, and its effect is to select line-up of delegates based on spectral clustering, for follow-up integrated.Step 20 is origination action.Step 21 is calculated the similarity between cluster member, i.e. NMI value between cluster member (Normalized Mutual Information, standardization mutual information).NMI value is larger, and the matching degree of two cluster results is higher, and the similarity between cluster member is larger, and its method for solving is as follows.If X and Y are respectively cluster member P
(a)and P
(b)the stochastic variable representing, wherein P
(a)and P
(b)there is respectively k
aand k
bindividual bunch.If
for P
(a)in bunch C
hthe object number comprising,
for P
(b)in bunch C
lthe object number comprising, n
h,lrepresent C
hand C
ltotal object number, P
(a)and P
(b)between NMI value be:
The similarity that the step 22 of Fig. 3 calculates according to step 21, is used spectral clustering to cluster member cluster, and this step is specifically introduced the part below in conjunction with Fig. 5.Step 23 is selected line-up of delegates, and system of selection is as follows.The cluster result obtaining according to step 22 is respectively selected the cluster member of the NMI value sum maximum between every other member in and this bunch as line-up of delegates from each cluster member set.Suppose that certain cluster member gathers G={P
(1)..., P
(m), it comprises m cluster member, the line-up of delegates P selecting
*meet the following conditions:
Step 24 is done states of Fig. 3.
Fig. 4 carries out integrated process flow diagram to line-up of delegates.Step 30 is origination action.Similarity between step 31 computational data point, data point d
iand d
jsimilarity be calculated as follows: S
ij=d
iwith d
jbelong to the number of times/r of same bunch.Step 32 is used spectral clustering to data clustering, and this step is specifically introduced the part below in conjunction with Fig. 6.Step 33 is done states of Fig. 4.
Fig. 5 is used the process flow diagram of spectral clustering to cluster member cluster.Step 220 is origination action.Step 221 is obtained line-up of delegates's number r that will select
0.Transition probability matrix P corresponding to random walk on step 222 design of graphics
l, concrete method for solving is as follows: P
l=(D
l)
-1s
l, S wherein
lbe the similarity matrix between cluster member, the step 21 of its element value in Fig. 3 tried to achieve, D
lto angle matrix, diagonal element
step 223 solves P
leigenvalue λ
1>=...>=λ
lif, there is certain order i, make λ
istrictly be greater than λ
i+1, make r=i; Otherwise make r=r
0.Step 224 is by P
lfront r eigenvalue of maximum characteristic of correspondence vector by row discharge, build matrix U
r=[u
1u
r].Step 225 is used K mean algorithm by U
rrow gather and gather G for r cluster member
1..., G
r.Step 226 is done states of Fig. 5.
Fig. 6 is used the process flow diagram of spectral clustering to data clustering.Step 320 is initial actuatings.Transition probability matrix P corresponding to random walk on step 321 design of graphics, concrete method for solving is as follows: P=D
-1s, wherein S is the similarity matrix between data point, and the step 31 of its element value in Fig. 4 tried to achieve, and D is to angle matrix, diagonal element
step 322 solves front k the eigenvalue of maximum characteristic of correspondence vector of P and discharges by row, builds matrix V
k=[v
1v
k].Step 323 is used K mean algorithm by V
krow gather bunch D for k
1..., D
k.Step 324 is done states of Fig. 6.
Claims (6)
1. the selectivity clustering ensemble method based on spectral clustering, is characterized in that, comprises the following steps:
(1) cluster member generates;
(2) based on spectral clustering, select line-up of delegates;
(3) to line-up of delegates, carry out integrated;
(4) finish.
2. the selectivity clustering ensemble method based on spectral clustering according to claim 1, is characterized in that, the step that described cluster member generates is:
(1) step 11 is obtained cluster member number l and cluster number k, and wherein l is one and is greater than 1 integer, and cluster number k is set to the true classification number that data set comprises;
(2) step 12 is put initial value 1 by control parameter i;
(3) whether step 13 judgement control parameter i is less than or equal to cluster member number l, is to perform step 14, otherwise forwards step 17 to;
(4) k mean vector of the random generation of step 14, as the initial barycenter of K mean algorithm, is used K mean algorithm to divide data set;
(5) step 15 obtains cluster result P
(i)={ C
1 (i)..., C
k (i);
(6) step 16 adds 1 by control parameter i, then forwards step 13 to;
(7) step 17 structure cluster member gathers P={P
(1)..., P
(l);
(8) finish.
3. the selectivity clustering ensemble method based on spectral clustering according to claim 1, is characterized in that, the described step based on spectral clustering selection line-up of delegates is:
(1) step 21 is calculated the similarity between cluster member;
(2) similarity that step 22 calculates according to step 2, is used spectral clustering to cluster member cluster;
(3) cluster result that step 23 obtains according to step 22 is respectively selected the cluster member of the NMI value sum maximum between every other member in and this bunch as line-up of delegates from each cluster member set;
(4) finish.
4. the selectivity clustering ensemble method based on spectral clustering according to claim 1, is characterized in that describedly line-up of delegates is carried out to integrated step being:
(1) similarity between step 31 computational data point, data point d
iand d
jsimilarity be calculated as follows: S
ij=d
iwith d
jbelong to the number of times/r of same bunch;
(2) step 32 is used spectral clustering to data clustering;
(3) finish.
5. the selectivity clustering ensemble method based on spectral clustering according to claim 3, is characterized in that selecting in line-up of delegates based on spectral clustering, and described use spectral clustering to the step of cluster member cluster is:
(1) step 221 is obtained line-up of delegates's number r that will select
0;
(2) transition probability matrix P corresponding to the random walk on step 222 design of graphics
l, concrete method for solving is as follows: P
l=(D
l)
-1s
l, S wherein
lbe the similarity matrix between cluster member, the step 21 of its element value in claims 3 tried to achieve, D
lto angle matrix, diagonal element
(3) step 223 solves P
leigenvalue λ
1>=...>=λ
lif, there is certain order i, make λ
istrictly be greater than λ
i+1, make r=i; Otherwise make r=r
0;
(4) step 224 is by P
lfront r eigenvalue of maximum characteristic of correspondence vector by row discharge, build matrix U
r=[u
1u
r];
(5) step 225 is used K mean algorithm by U
rrow gather for r cluster member gathers G1 ..., G
r;
(4) finish.
6. the selectivity clustering ensemble method based on spectral clustering according to claim 4, is characterized in that line-up of delegates to carry out integrated, and described use spectral clustering to the step of data clustering is:
(1) transition probability matrix P corresponding to the random walk on step 321 design of graphics, concrete method for solving is as follows: P=D
-1s, wherein S is the similarity matrix between data point, and its element value is tried to achieve by step 31, and D is to angle matrix, diagonal element
(2) step 322 solves front k the eigenvalue of maximum characteristic of correspondence vector of P and discharges by row, builds matrix V
k=[v
1v
k];
(3) step 323 is used K mean algorithm by V
krow gather bunch D for k
1..., D
k;
(4) finish.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410096258.6A CN103995821B (en) | 2014-03-14 | 2014-03-14 | Selective clustering integration method based on spectral clustering algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410096258.6A CN103995821B (en) | 2014-03-14 | 2014-03-14 | Selective clustering integration method based on spectral clustering algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103995821A true CN103995821A (en) | 2014-08-20 |
CN103995821B CN103995821B (en) | 2017-05-10 |
Family
ID=51309986
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410096258.6A Expired - Fee Related CN103995821B (en) | 2014-03-14 | 2014-03-14 | Selective clustering integration method based on spectral clustering algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103995821B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105959270A (en) * | 2016-04-25 | 2016-09-21 | 盐城工学院 | Network attack detection method based on spectral clustering algorithm |
CN108229507A (en) * | 2016-12-14 | 2018-06-29 | 中国电信股份有限公司 | Data classification method and device |
CN114328922A (en) * | 2021-12-28 | 2022-04-12 | 盐城工学院 | Selective text clustering integration method based on spectrogram theory |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101968852A (en) * | 2010-09-09 | 2011-02-09 | 西安电子科技大学 | Entropy sequencing-based semi-supervision spectral clustering method for determining clustering number |
CN102799891A (en) * | 2012-05-24 | 2012-11-28 | 浙江大学 | Spectral clustering method based on landmark point representation |
CN103399852A (en) * | 2013-06-27 | 2013-11-20 | 江南大学 | Multi-channel spectrum clustering method based on local density estimation and neighbor relation spreading |
-
2014
- 2014-03-14 CN CN201410096258.6A patent/CN103995821B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101968852A (en) * | 2010-09-09 | 2011-02-09 | 西安电子科技大学 | Entropy sequencing-based semi-supervision spectral clustering method for determining clustering number |
CN102799891A (en) * | 2012-05-24 | 2012-11-28 | 浙江大学 | Spectral clustering method based on landmark point representation |
CN103399852A (en) * | 2013-06-27 | 2013-11-20 | 江南大学 | Multi-channel spectrum clustering method based on local density estimation and neighbor relation spreading |
Non-Patent Citations (4)
Title |
---|
卢志茂等: "近邻传播的文本聚类集成谱算法", 《哈尔滨工程大学学报》 * |
徐森: "文本聚类集成关键技术研究", 《中国博士学位论文全文数据库 信息科技辑》 * |
徐森等: "基于矩阵谱分析的文本聚类集成算法", 《模式识别与人工智能》 * |
黄发良等: "网络重叠社区发现的谱聚类集成算法", 《控制与决策》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105959270A (en) * | 2016-04-25 | 2016-09-21 | 盐城工学院 | Network attack detection method based on spectral clustering algorithm |
CN108229507A (en) * | 2016-12-14 | 2018-06-29 | 中国电信股份有限公司 | Data classification method and device |
CN114328922A (en) * | 2021-12-28 | 2022-04-12 | 盐城工学院 | Selective text clustering integration method based on spectrogram theory |
Also Published As
Publication number | Publication date |
---|---|
CN103995821B (en) | 2017-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3611799A1 (en) | Array element arrangement method for l-type array antenna based on inheritance of acquired characteristics | |
CN105303450A (en) | Complex network community discovery method based on spectral clustering improved intersection | |
CN103020979B (en) | Image segmentation method based on sparse genetic clustering | |
CN102800093A (en) | Multi-target remote sensing image segmentation method based on decomposition | |
CN102521605A (en) | Wave band selection method for hyperspectral remote-sensing image | |
CN105488562A (en) | Irregular part stock layout method based on multi-factor particle swarm algorithm | |
CN104637057A (en) | Grayscale-gradient entropy multi-threshold fast division method based on genetic algorithm | |
CN106934722A (en) | Multi-objective community detection method based on k node updates Yu similarity matrix | |
CN104268629A (en) | Complex network community detecting method based on prior information and network inherent information | |
CN110457758A (en) | Prediction technique, device, system and the storage medium in Instability of Rock Body stage | |
CN108197708A (en) | A kind of parallel time genetic algorithm based on Spark | |
CN103366189A (en) | Intelligent classification method for high-spectrum remote sensing image | |
CN102880754A (en) | Method for identifying action scale of land utilization fractal dimension based on genetic algorithm | |
CN103365999A (en) | Text clustering integrated method based on similarity degree matrix spectral factorization | |
CN103995821A (en) | Selective clustering integration method based on spectral clustering algorithm | |
CN105139282A (en) | Power grid index data processing method, device and calculation device | |
CN103793438B (en) | A kind of parallel clustering method based on MapReduce | |
CN104657472A (en) | EA (Evolutionary Algorithm)-based English text clustering method | |
CN105740949A (en) | Group global optimization method based on randomness best strategy | |
CN103942318B (en) | Parallel AP propagating XML big data clustering integration method | |
CN110362606A (en) | A kind of elongated die body method for digging of time series | |
CN104318306B (en) | Self adaptation based on Non-negative Matrix Factorization and evolution algorithm Optimal Parameters overlaps community detection method | |
CN102982342A (en) | Positive semidefinite spectral clustering method based on Lagrange dual | |
CN104573004B (en) | A kind of double clustering methods of the gene expression data based on double rank genetic computations | |
CN106156854A (en) | A kind of support vector machine parameter prediction method based on DNA encoding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170510 Termination date: 20180314 |
|
CF01 | Termination of patent right due to non-payment of annual fee |