CN104217015B

CN104217015B - Based on the hierarchy clustering method for sharing arest neighbors each other

Info

Publication number: CN104217015B
Application number: CN201410488243.4A
Authority: CN
Inventors: 周红芳; 王心怡; 刘园; 郭杰; 段文聪; 何馨依; 刘杰; 李锦�
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2014-09-22
Filing date: 2014-09-22
Publication date: 2017-11-03
Anticipated expiration: 2034-09-22
Also published as: CN104217015A

Abstract

The invention discloses based on the hierarchy clustering method for sharing arest neighbors each other, whole data set D arest neighbors matrix T1 and arest neighbors matrix T2 is calculated first；Arest neighbors ranking matrix M is calculated by arest neighbors matrix T1 and arest neighbors matrix T2；Local density is calculated by arest neighbors ranking matrix M, submanifold set is obtained；The similarity between submanifold is finally calculated, cohesion submanifold obtains final division result.Hierarchy clustering method based on shared arest neighbors each other of the invention, is solved the existing dot-dash misclassification existed based on k nearest neighbor figure cluster in rarefaction and figure partition process during generation submanifold set and misleads the problem of cause clustering precision is low.

Description

Based on the hierarchy clustering method for sharing arest neighbors each other

Technical field

The invention belongs to the data mining technology field of Computer Science and Technology, it is related to a kind of based on shared each other nearest Adjacent hierarchy clustering method.

Background technology

Clustering is an important research topic in Data Mining.Clustering technique has been widely applied to The fields such as telecommunications industry, retail business, biology, the marketing.Cluster is a kind of unsupervised classification, is for finding in data set In the data point of object feature itself and clustering, and ensure in cluster to have between similarity as big as possible, cluster and have to the greatest extent Possible big distinctiveness ratio.Existing clustering algorithm is generally divided into：1. using K-means, Fuzzy K-means, k central point as representative The clustering algorithm based on division；2. with QROCK, CURE, BIRCH, the clustering algorithm based on level for representative；3. with DBSCAN, OPTICS are the density-based algorithms of representative；4. other kinds of clustering algorithm, such as based on subspace Clustering algorithm or the clustering algorithm based on model.

The son that clustering algorithm such as Chameleon algorithms based on k neighbour's figures are produced during rarefaction and figure are divided When gathering is closed, the point for all or most that a submanifold is included is to belong to same real cluster.But, wherein include The Agglomerative Hierarchical Clustering result that wrong data may result in the next stage mixes these mistakes, causes bigger deviation.It is based on The Jarvis-Patrick algorithms of SNN similarities have one real cluster of division, merge the cluster that should divide.This two class is calculated The common ground of method is the similarity graph for constructing k neighbours figure, or the shared arest neighbors based on k neighbours, in rarefaction similarity graph Or during K arest neighbors figures, it is possible to data point can be divided to mistake, and can be by mistake amplification during cluster is condensed.

The content of the invention

It is an object of the invention to provide a kind of based on the hierarchy clustering method for sharing arest neighbors each other, existing base is solved The dot-dash misclassification existed when k nearest neighbor figure clusters and produces submanifold set in rarefaction and figure partition process, which is misled, causes clustering precision Low the problem of.

The technical solution adopted in the present invention is, will be pending based on the hierarchy clustering method for sharing arest neighbors each other Data set is set to D, if cluster numbers are K, if arest neighbors value one is K1, specifically real according to following steps if arest neighbors value two is K2 Apply：

Step 1, data set D arest neighbors matrix is calculated by the K1 of the arest neighbors value one and K2 of arest neighbors value two respectively, is obtained Arest neighbors matrix T1 and arest neighbors matrix T2；

Step 2, each neighborhood point successively in searching data collection D in each data point i arest neighbors matrix T2 Arest neighbors matrix T1 ', if data point i is included in arest neighbors matrix T1 ', by the data point in arest neighbors matrix T2 I retains, and is otherwise deleted, obtains data point i arest neighbors ranking matrix M_i, all data points in ergodic data collection D obtain To arest neighbors ranking matrix M；

Step 3, the local density D of each data point i in data set D is calculated by arest neighbors ranking matrix M_i, and And by these data points according to local density D_iSize carry out descending arrangement；

Step 4, preceding K × 10 data point after sequence is taken as submanifold central point, and with submanifold central point and submanifold Nearest-neighbor point composition submanifold in the arest neighbors ranking matrix of heart point；The data point not divided is divided into the data point Arest neighbors in the submanifold that occurs at first, obtain some submanifolds；

Step 5, the similarity of each submanifold that calculation procedure 4 finally gives between any two, by the submanifold pair that similarity is maximum Merge；

Step 6, the submanifold number after merging then performs step 5 if less than K；Submanifold number after merging if equal to K, then perform step 7；

Step 7, the son nearest from unassigned data point will be divided into data set D from unassigned data point i In cluster, final division result is obtained, division result is K submanifold.

The features of the present invention is also resided in,

Local density D in step 3_iCalculated according to below equation：

D_i=count (M_i),i∈n (1)

Wherein, M_iFor the arest neighbors ranking matrix of i-th of data point in arest neighbors ranking matrix M.

The similarity of submanifold between any two is calculated in accordance with the following methods in step 5：

Provided with submanifold C_i, submanifold C_j, 0<I, j≤n, arest neighbors ranking matrix M, then：The similarity of submanifold between any two is：

Wherein, NumNeighborC_i(C_j) it is submanifold C_iIn all arest neighbors of the point in arest neighbors ranking matrix M Point, in the nearest-neighbor of these nearest neighbor points, occurs belonging to submanifold C_jPoint number of times；

NumNeighborC_i(C_i) it is submanifold C_iIn all arest neighbors of the point in arest neighbors ranking matrix M point, In the nearest-neighbor of these nearest neighbor points, occur belonging to submanifold C_iPoint number of times；

CountNeighbor(C_i) it is submanifold C_iIn all arest neighbors of the point in arest neighbors ranking matrix M point, this A little nearest neighbor points adhere to the submanifold number of different submanifolds separately；

CountNeighbor(C_j) it is submanifold C_jIn all arest neighbors of the point in arest neighbors ranking matrix M point, this A little nearest neighbor points adhere to the submanifold number of different submanifolds separately.

The data point not divided is divided into step 4 in the submanifold occurred at first in the arest neighbors of the data point, Refer to the arest neighbors ranking matrix M of the data point_iIn all nearest-neighbor points in if including submanifold central point, just should Data point i is divided into the submanifold；If included in all nearest-neighbor points in data point i arest neighbors ranking matrix Data point i, then be divided into the submanifold of that submanifold central point in the top by multiple submanifold central points.

The submanifold nearest from unassigned data point in step 7, refers in data set D from unassigned data point and step The submanifold of Euclidean distance minimum between the K submanifold obtained in rapid 6.

The beneficial effects of the invention are as follows：

1. Clustering Effect is good.The present invention generated data collection DB1, DB2, DB3 and UCI standard data set Iris, Wine, There is obvious advantage result on Soybean, Unbalanced, can obtain with the total purity of higher cluster and relatively low comentropy Cluster result.

2. clustering precision is high.The similarity function of the present invention can will merge the cluster of mistake for the wrong cluster merged Push away at the time of being put into more late to carry out the merging of next step, can effectively avoid the accumulation step by step of mistake from expanding, obtain more Good clustering precision.

Brief description of the drawings

Fig. 1 is schematic flow sheet of the present invention based on the hierarchy clustering method for sharing arest neighbors each other；

Fig. 2 is original state of the present invention based on data set in the hierarchy clustering method cluster process for sharing arest neighbors each other Figure；

Fig. 3 be the present invention based on each other share arest neighbors hierarchy clustering method for choose data set in local density values most The candidate centers point diagram of big data point formation；

Fig. 4 is the generated data collection DB1 used during the present invention is tested；

Fig. 5 is the generated data collection DB2 used during the present invention is tested；

Fig. 6 is the generated data collection DB3 used during the present invention is tested；

Fig. 7 is the generated data collection DB4 used during the present invention is tested；

Fig. 8 is the present invention based on cluster result of the hierarchy clustering method to synthesis data set DB1 for sharing arest neighbors each other Figure；

Fig. 9 is the present invention based on cluster result of the hierarchy clustering method to synthesis data set DB2 for sharing arest neighbors each other Figure；

Figure 10 is the present invention based on cluster knot of the hierarchy clustering method to synthesis data set DB3 for sharing arest neighbors each other Fruit is schemed；

Figure 11 is the present invention based on cluster knot of the hierarchy clustering method to synthesis data set DB4 for sharing arest neighbors each other Fruit is schemed.

Embodiment

The present invention is described in detail with reference to the accompanying drawings and detailed description.

Related definition in the present invention is as follows：

1 arest neighbors ranking matrix is defined, refers to the matrix that using data point and its arest neighbors data point is built as row each other.

2 local densities are defined, are the expressions of regional area dense degree in whole data set where data point, greatly The number of the small point equal to the data point nearest-neighbor.

Define 3 one class cluster C_iPurity：Total purity of cluster result is：

Wherein, p_iRefer to class cluster C_iPurity；

Mi refers to class cluster C_iThe number of middle data point；

M refers to the sum of data intensive data point；

K refers to the number of class cluster in data set D；

Define the entropy of 4 cluster results：

Firstly, it is necessary to calculate class cluster C_iMiddle data point belongs to class cluster C_jProbability：

Wherein, m_iIt is class cluster C_iThe number of middle data point；

m_ijIt is class cluster C_iIn belong to class cluster C_jPoint number.

Then, each class cluster C is calculated_iEntropy

L is the number of class cluster.

Finally, the entropy of cluster result is calculated

K is the number of cluster；

m_iIt is class cluster C_iThe number of middle data point；

M is the sum of data point in data set.

Define the combination that 5 F measurements are precision and recall rate.

Class cluster C_iOn class cluster C_jPrecision:

Precision (i, j)=p_ij (7)

Class cluster C_iOn class cluster C_jRecall rate

Thus：Class cluster C_iOn class cluster C_jF measurement

The invention provides a kind of based on the hierarchy clustering method for sharing arest neighbors each other, pending data set is set to D, if cluster numbers are K, if arest neighbors value one is K1, if arest neighbors value two is K2, as shown in figure 1, specific real according to following steps Apply：

Step 3, the local density D of each data point i in data set D is calculated by arest neighbors ranking matrix M_i, and And by these data points according to local density D_iSize carry out descending arrangement；Local density D_iCalculated according to below equation：

D_i=count (M_i),i∈n (1)

Wherein, M_iFor the arest neighbors ranking matrix of i-th of data point in arest neighbors ranking matrix M；

Step 4, preceding K × 10 data point after sequence is taken as submanifold central point, and with submanifold central point and submanifold Nearest-neighbor point composition submanifold in the arest neighbors ranking matrix of heart point；The data point not divided is divided into the data point Arest neighbors in the submanifold that occurs at first, obtain some submanifolds；The data point not divided is divided into the data point In the submanifold occurred at first in arest neighbors, refer to the arest neighbors ranking matrix M of the data point_iIn all nearest-neighbor points in such as Fruit includes submanifold central point, and just data point i is divided into the submanifold；If data point i arest neighbors ranking matrix In all nearest-neighbor points in include multiple submanifold central points, then data point i is divided into that in the top son In the submanifold of cluster central point；

Step 5, the similarity of some submanifolds that calculation procedure 4 finally gives between any two, by the submanifold that similarity is maximum To merging；The similarity of submanifold between any two is calculated in accordance with the following methods：

CountNeighbor(C_j) it is submanifold C_jIn all arest neighbors of the point in arest neighbors ranking matrix M point, this A little nearest neighbor points adhere to the submanifold number of different submanifolds separately；

Step 7, the son nearest from unassigned data point will be divided into data set D from unassigned data point i In cluster, final division result is obtained, division result is K class cluster；Wherein, the submanifold nearest from unassigned data point, Refer to Euclidean distance minimum between the K submanifold obtained in data set D from unassigned data point and step 6 Submanifold.

Embodiment：

The present invention is made up of based on the hierarchy clustering method for sharing arest neighbors each other three committed steps：Calculate arest neighbors Domain matrix, matrix are divided, hierarchical clustering.Whole data set D arest neighbors matrix T1 and arest neighbors matrix T2 is calculated first, (ginseng Number k1, k2 is input parameter, k2>k1)；Arest neighbors ranking matrix M is calculated by arest neighbors matrix T1 and arest neighbors matrix T2； Local density is calculated by arest neighbors ranking matrix M, submanifold set is obtained；Finally calculate the similarity between submanifold, cohesion Cluster obtains K class cluster.

First, nearest-neighbor matrix is calculated, detailed process is as follows：

Assuming that data set D k1 nearest-neighbor matrixes areK1 is input parameter, 0<i≤n, 0<j≤k1； Data set D k2 nearest-neighbor matrixes areK2 is algorithm input parameter, 0<i≤n, 0<j≤k2,k1<K2, then： Arest neighbors ranking matrix M=[x_ij], and0<i≤n,0<j≤k2.Pass through control Parameter k1 processed, k2 size (reducing k1, increase k2), can obtain bigger and denser submanifold.In conditionFor Preceding k1 row.

By taking the data set X in Fig. 2 as an example, the arest neighbors matrix T1 and arest neighbors square of data set X in Fig. 2 is calculated first Battle array T2, as shown in table 1；Then based on arest neighbors matrix T2, each data point xi arest neighbors is filtered, if most Point in neighbour includes xi in T1 arest neighbors, then retains this neighborhood point, otherwise deletes neighborhood point.For example, the k2 of point 0 is nearest Adjacent (k2=10) is { 4,2,1,3,5,6,11,9,7,10 }, and the T1 arest neighbors (k1=3) of point 4 is { 2,3,0 }, comprising putting 0, then Remain, according to the method described above point successively in inquiry neighborhood, the arest neighbors ranking for finally giving a little 0 is { 4,2 }.Traversal All data points, obtain final arest neighbors ranking matrix, as shown in table 2.

Data set X K arest neighbors, K=10 in the Fig. 1 of table 1

Data point	K arest neighbors lists
		0	4,2,1,3,5,6,9,11,10,7
1	3,5,4,0,2,6,9,10,7,14
		2	4,0,3,1,5,11,6,15,14,9
3	4,1,2,0,5,11,6,14,15,10
		4	2,0,3,1,5,6,11,14,9,15
5	6,1,9,10,3,7,8,14,4,0
		6	9,7,5,8,10,1,14,3,0,4
7	8,9,6,10,5,1,14,3,0,4
		8	9,7,10,6,5,14,1,3,12,15
9	8,10,6,7,5,14,1,3,15,12
		10	9,8,6,7,5,14,1,12,15,3
11	13,15,14,12,3,2,4,1,5,0
		12	15,14,13,11,10,9,5,8,3,6
13	15,11,12,14,3,10,5,4,2,1
		14	15,12,10,5,9,13,11,8,6,3
15	13,12,14,11,10,3,5,1,9,4

The arest neighbors ranking matrix of table 2

Data point	Arest neighbors list each other
		0	4,2
1	3,5,0
		2	4,0,3
3	4,1,2
		4	2,0,3,1
5	6,1
		6	9,7,5,10
7	8,6
		8	9,7,10
9	8,10,6,7,5
		10	9,8,14
11	13
		12	15,14,13
13	15,11,12
		14	15,12,11
15	13,12,14,11

Secondly, local density is calculated by Neighborhood matrix, submanifold set is obtained.

Provided with arest neighbors ranking matrix M, Mi represents the arest neighbors matrix of i-th of data point, then local density

D_i=count (M_i),i∈n (1)

The number of the point of the nearest-neighbor of i.e. one data point is more, then the regional area where this data point is whole It is denser in data set.Therefore, when the local density of a data point is very big, it is regarded as the center of its this neighborhood Point.

For example：Local density and the sequence of each point can be calculated by table 2, result is obtained as shown in table 3.Here, K*2 point is chosen as the central point of candidate cluster (K is the cluster numbers of data set).Also, as can be seen from Figure 2 by No. 11-15 The density of the class of point composition is far smaller than the density of two classes above figure, but can also obtain its central point 15, this It is due to that the ranking matrix based on arest neighbors is not to directly rely on distance function as measurement, but uses mutual distance to arrange Name relation calculates local density as measurement foundation, thus, obtains the maximal point of local density, enabling processing is different close The cluster of degree.

The ranking results of the local density of table 3

Data point	Local density
		9	5
4	4
		6	4
15	4
		1	3
2	3
		3	3
8	3
		10	3
12	3
		13	3
0	2
		5	2
7	2
		14	2
11	1

Finally, clustered according to class cluster similarity.

Provided with Ci, Cj, 0<I, j≤n, arest neighbors ranking matrix M, then：Submanifold similarity is：

For class Ci and class Cj similarity definition, comprising two parts, a part is submanifold Ci, its phase to submanifold Cj Like degree；Another part is submanifold Cj, its similarity to submanifold Ci, i.e. similarity between them is not reciprocity.For example, right For interpersonal relation, PersonA best friends are PersonB, and only one of which, and for PersonB For may there is ace buddy to have many, PersonA is one of them, so if weighing good friend's degree with numerical value Talk about, good friend's degree between them is simultaneously unequal.So, for merging submanifold for, we take such a strategy to close And：The neighbouring submanifold number of each submanifold is minimum, and the number of the point of the arest neighbors between submanifold pair is most.

All data points, are further divided into recently by such as shown result in figure 3 according to arest neighbors ranking matrix Candidate centers submanifold in, obtain result as shown in table 4.Next, according to arest neighbors ranking matrix, counting submanifold adjacent The number of times of other submanifolds is connect, result is obtained as shown in table 5.Calculate the similarity of 6 submanifolds between any two.For example：The neighbour of cluster 1 Connecing cluster has two (representing cluster using the numbering for the central point for representing cluster), is cluster 4 and cluster 9 respectively, obtain similarity (C1 → C4)=2/2, equally, similarity (C4 → C1)=2/3 is calculated, both are added the similarity for obtaining cluster 1 and cluster 4 Similarity (C1, C4)=similarity (C1 → C4)+similarity (C4 → C1) ≈ 1.667.

All points are divided into after the submanifold of candidate centers by table 4, obtained submanifold

Cluster label	Data point in cluster
		1	1
2	2
		4	4,2,0,3,1
6	6
		9	9,8,10,6,7,5
15	15,13,12,14,11

The submanifold of table 5 and its number of times for abutting submanifold are counted

Cluster label	Adjacent cluster and occurrence number
		1	4=2,9=1
2	4=3
		4	1=2,2=3,9=1
6	9=3
		9	1=1,6=4
15	9=1

Clustering method performance evaluating of the present invention：

In order to verify the validity of clustering method of the present invention, two kinds of algorithms are selected to be contrasted：Based on figure Chameleon methods and Jarvis-Patrick (JP) method.Chameleon methods have stronger discovery arbitrary size and shape The ability of the cluster of shape.JP methods are good at the close cluster for finding strong correlation object.Both approaches are with the present invention based on common each other The hierarchy clustering method identical place for enjoying arest neighbors is to be required for calculating K arest neighbors, then passes through each different methods Calculating obtains final result.

The present invention is using four artificial data collection and 6 UCI standard data sets come testing algorithm performance.Four artificial datas Collect DB1, DB2, DB3, DB4, data distribution is respectively as shown in Fig. 4, Fig. 5, Fig. 6, Fig. 7.6 UCI standard data sets are：cpu- with-vendor,glass,iris,soybean,unbalanced, wine.4 artificial data collection and 6 UCI normal datas The attribute of collection is as shown in table 6 and table 7.

The artificial data set attribute of table 6

The UCI data set attributes of table 7

Experimental result is contrasted：

The total purity Purity of cluster result, precision and recall rate composite function F-measure, cluster result entropy are used herein Tri- kinds of evaluation functions of Entropy evaluate the validity of cluster result, three kinds of evaluation functions be specifically defined as defined above 4, Define 5 and define shown in 6.

Table 8 is the present invention based on the hierarchy clustering method and Chameleon methods and JP methods for sharing arest neighbors each other Experimental result on correction data collection, from Fig. 8, Fig. 9, Figure 10, Figure 11 and table 8 as can be seen that the present invention based on each other The hierarchy clustering method of shared arest neighbors generated data collection DB1, DB2, DB3 and UCI standard data set Iris, Wine, There is obvious advantage result on Soybean, Unbalanced.Shown by Cluster Validity external evaluation index, Chameleon methods and JP methods have very bad result on some UCI data sets, and its reason is mixing in data set The variable of categorical attribute, is caused due to being set algorithm parameter so that cluster result is excessively poor.It is less for some Data set, for example, Cpu-with-vendor, Glass, of the invention to be led to based on the hierarchy clustering method for sharing arest neighbors each other Similarity function is crossed, can be pushed away at the time of being put into more late to carry out the merging of next step for the wrong cluster that has merged, The accumulation step by step of mistake so can be effectively avoided to expand.

8 three kinds of methods experiment Comparative results of table

Claims

1. based on the hierarchy clustering method for sharing arest neighbors each other, it is characterised in that pending data set is set into D, if poly- Class number is K, if arest neighbors value one is K1, if arest neighbors value two is K2, and K1<K2, specifically implements according to following steps：

Step 1, data set D arest neighbors matrix is calculated by the K1 of the arest neighbors value one and K2 of arest neighbors value two respectively, is obtained recently Adjacent matrix T1 and arest neighbors matrix T2；

Step 2, each neighborhood point successively in searching data collection D in each data point i arest neighbors matrix T2 is nearest Adjacent matrix T1 ', if including data point i in T1 ', the data point i in arest neighbors matrix T2 is retained, otherwise deleted Remove, obtain data point i arest neighbors ranking matrix M_i, arest neighbors ranking matrix M_iRefer to data point i and its arest neighbors number each other All data points in the matrix that strong point builds for row, ergodic data collection D, obtain arest neighbors ranking matrix M；

Step 3, the local density D of each data point i in data set D is calculated by arest neighbors ranking matrix M_i, local density D_iIt is the expression of regional area dense degree in whole data set where data point i, and by these data points according to office Portion density D_iSize carry out descending arrangement；

Wherein, local density D_iCalculated according to below equation：

D_i=count (M_i),i∈n (1)

M_iFor the arest neighbors ranking matrix of i-th of data point in arest neighbors ranking matrix M；

Step 4, preceding K × 10 data point after sequence is taken as submanifold central point, and with submanifold central point and submanifold central point Arest neighbors ranking matrix included in group of data points into submanifold；The data point not divided is divided into the data point In the submanifold occurred at first in arest neighbors, some submanifolds are obtained；

Step 5, the similarity of each submanifold that calculation procedure 4 finally gives between any two, by the maximum submanifold of similarity to carrying out Merge；

Step 6, the submanifold number after merging then performs step 4 if less than K；Submanifold number after merging is if equal to K, then Perform step 7；

Step 7, by data set D from unassigned data point i is divided into the submanifold nearest from unassigned data point, Final division result is obtained, the division result is K class cluster.

2. it is according to claim 1 based on the hierarchy clustering method for sharing arest neighbors each other, it is characterised in that in step 5 The similarity of submanifold between any two is calculated in accordance with the following methods：

Provided with submanifold C_x, submanifold C_y, 0<X, y≤z, arest neighbors ranking matrix M, then：The similarity of submanifold between any two is：

<mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mi>i</mi> <mi>l</mi> <mi>a</mi> <mi>r</mi> <mi>i</mi> <mi>t</mi> <mi>y</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>x</mi> </msub> <mo>,</mo> <msub> <mi>C</mi> <mi>y</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>NumNeighbor</mi> <msub> <mi>C</mi> <mi>x</mi> </msub> </msub> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>y</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>C</mi> <mi>o</mi> <mi>u</mi> <mi>n</mi> <mi>t</mi> <mi>N</mi> <mi>e</mi> <mi>i</mi> <mi>g</mi> <mi>h</mi> <mi>b</mi> <mi>o</mi> <mi>r</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>x</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>+</mo> <mfrac> <mrow> <msub> <mi>NumNeighbor</mi> <msub> <mi>C</mi> <mi>x</mi> </msub> </msub> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>x</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>C</mi> <mi>o</mi> <mi>u</mi> <mi>n</mi> <mi>t</mi> <mi>N</mi> <mi>e</mi> <mi>i</mi> <mi>g</mi> <mi>h</mi> <mi>b</mi> <mi>o</mi> <mi>r</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>y</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>

Wherein,It is submanifold C_xIn all arest neighbors of the point in arest neighbors ranking matrix M point, In the arest neighbors ranking matrix of these nearest neighbor points, occur belonging to submanifold C_yPoint number of times；

It is submanifold C_xIn all arest neighbors of the point in arest neighbors ranking matrix M point, at these In the arest neighbors ranking matrix of nearest neighbor point, occur belonging to submanifold C_xPoint number of times；

CountNeighbor(C_x) it is submanifold C_xIn all arest neighbors of the point in arest neighbors ranking matrix M point, these are most Neighbor Points adhere to the submanifold number of different submanifolds separately；

CountNeighbor(C_y) it is submanifold C_yIn all arest neighbors of the point in arest neighbors ranking matrix M point, these are most Neighbor Points adhere to the submanifold number of different submanifolds separately.

3. the hierarchy clustering method based on shared arest neighbors each other according to claim 1, it is characterised in that step 4 Described in the data point not divided is divided into the submanifold occurred at first in the arest neighbors of the data point, refer to the data If including submanifold central point in the arest neighbors ranking matrix of point, just the data point is divided into the submanifold；If the number Include multiple submanifold central points in the arest neighbors ranking matrix at strong point, then the data point is divided into that son in the top In the submanifold of cluster central point.

4. it is according to claim 1 based on the hierarchy clustering method for sharing arest neighbors each other, it is characterised in that in step 7 The submanifold nearest from unassigned data point, refers to K obtained in data set D from unassigned data point and step 6 The submanifold of Euclidean distance minimum between submanifold.