CN108764307A - The density peaks clustering method of natural arest neighbors optimization - Google Patents

The density peaks clustering method of natural arest neighbors optimization Download PDF

Info

Publication number
CN108764307A
CN108764307A CN201810463136.4A CN201810463136A CN108764307A CN 108764307 A CN108764307 A CN 108764307A CN 201810463136 A CN201810463136 A CN 201810463136A CN 108764307 A CN108764307 A CN 108764307A
Authority
CN
China
Prior art keywords
cluster
density
neighbours
neighbors
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810463136.4A
Other languages
Chinese (zh)
Inventor
钱雪忠
金辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN201810463136.4A priority Critical patent/CN108764307A/en
Publication of CN108764307A publication Critical patent/CN108764307A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Abstract

The present invention relates to a kind of density peaks clustering methods (TNDP methods) of natural arest neighbors optimization, and the local density of data point is calculated with nature nearest-neighbors, does not need parameter, avoids parameter sensitivity problem;Local density is calculated with natural nearest-neighbors, since natural nearest-neighbors accurately reflect the attribute feature of data point, so the local density calculated in this way can accurately indicate the density size of each data point, improves Clustering Effect;Since natural nearest-neighbors do not include noise spot and abnormal point, so reducing the influence of noise spot and abnormal point to cluster result.

Description

The density peaks clustering method of natural arest neighbors optimization
Technical field
The present invention relates to clustering methods, more particularly to the density peaks clustering method of natural arest neighbors optimization.
Background technology
In the evolution of clustering, KMEANS, DBSCAN, FCM are proposed in succession, and a series of cluster such as AP is calculated Method document, 2014《Science》On delivered one《Clusteringby fast search and find offast search》, paper proposes a kind of fast search and finds the clustering algorithm of density peaks.The algorithm can provide data set sample automatically This class cluster center, and there is no harsh requirement to the shape of data set sample, it can to the data set sample of arbitrary shape Realize efficient cluster.The core concept of the algorithm is to assert cluster centre while meeting 2 basic demands:1) itself is close Degree is very big, i.e., the density of its surrounding neighbours point is big without it;2) and than " distance " between the data point of its density bigger Bigger.However the disadvantage and difficult point of DPC algorithms should not be underestimated:1) when using DPC algorithms, block distance is every field The parameter that the algorithm must be set, people always set the parameter by hand, and manual setting is there are certain randomness and artificially Factor influences clustering result quality;2) be always to the analyzing processing of higher dimensional data DPC algorithms short slab, higher dimensional data Self structure possesses sparsity and space complexity so that traditional Euclidean distance is in the similitude between reflecting data object It is unable to reach accurate, rational purpose, therefore the algorithm is caused to fail;3) although DPC algorithms claim to automatically determine cluster knot Fruit, but the selected of manual progress cluster result is needed in practical cluster operation, cluster result cannot provide automatically.
For deficiency existing for DPC clustering algorithms, Zhang WenKai combine the algorithm and CHAMELEON algorithms, propose E_CFSFDP, solves the problems, such as not handling in a class cluster that there are one above density peaks points in CFSFDP algorithms, but Be the algorithm performance need to be further increased and the ability on processing high dimensional data have it is to be strengthened.LiuY proposes a kind of base In k nearest neighbor fast density peak value searching and efficiently distribute sample algorithm KNN-DPC, solve CFSFDP algorithm cluster results To more sensitive and because caused by a step distribution the problem of related assignment error, but the cluster of the algorithm of blocking distance dc Cluster result is more sensitive to the selection of neighbour's number K.Rashid Mehmood propose Fuzzy-CFSFDP algorithms, by fuzzy rule Then be used for CFSFDP algorithms class cluster central point determine in, improve class cluster central point choose and cluster result it is accurate Rate, but it is slightly inadequate when handling complex data.
There are following technical problems for traditional technology:
There are parameter sensitivity, processing aspherical surface data and complicated manifold data clusters for existing density-based algorithms The problem of effect difference.
Invention content
Based on this, it is necessary in view of the above technical problems, provide a kind of density peaks cluster side of natural arest neighbors optimization Method avoids parameter sensitivity problem, improves Clustering Effect.
A kind of density peaks clustering method of nature arest neighbors optimization, including:
Find density peak all in data set;
The sparse neighbours at the density peak He the density peak are assigned to the same cluster by one density peak of random access;
A point is arbitrarily found in the cluster, and the sparse neighbours of this point are assigned into the cluster, until described The sparse neighbours of all the points in cluster assign to the cluster;
Repeating step, " one density peak of random access is assigned to the sparse neighbours at the density peak He the density peak same Cluster;" " arbitrarily find a point in the cluster, and the sparse neighbours of this point are assigned into the cluster, until described poly- The sparse neighbours of all the points in class assign to the cluster;", until accessing all density peaks;
Similarity between class between the cluster formed through the above steps according to all density peaks, merging similarity is high to gather Class.
In other one embodiment, middle data amount check between the cluster that all density peaks are formed through the above steps Cluster less than smallest natural neighbours' number is removed from cluster result, and is noise spot by these data markers in clustering, and is obtained Final cluster result, the smallest natural neighbours number refer in cluster in the natural nearest-neighbors number of all data points most Small value.
A kind of computer equipment, including memory, processor and storage can be run on a memory and on a processor The step of computer program, the processor realizes any one the method when executing described program.
A kind of computer readable storage medium, is stored thereon with computer program, which realizes when being executed by processor The step of any one the method.
The density peaks clustering method (TNDP methods) of above-mentioned nature arest neighbors optimization, number is calculated with nature nearest-neighbors The local density at strong point, does not need parameter, avoids parameter sensitivity problem;Local density is calculated with natural nearest-neighbors, due to Natural nearest-neighbors accurately reflect the attribute feature of data point, so the local density calculated in this way can accurately indicate each The density size of data point improves Clustering Effect;Since natural nearest-neighbors do not include noise spot and abnormal point, so reducing The influence of noise spot and abnormal point to cluster result.
Description of the drawings
Fig. 1 is a kind of flow chart of the density peaks clustering method of natural arest neighbors optimization provided by the embodiments of the present application.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
Refering to fig. 1, a kind of density peaks clustering method of natural arest neighbors optimization, including:
Find density peak all in data set;
The sparse neighbours at the density peak He the density peak are assigned to the same cluster by one density peak of random access;
A point is arbitrarily found in the cluster, and the sparse neighbours of this point are assigned into the cluster, until described The sparse neighbours of all the points in cluster assign to the cluster;
Repeating step, " one density peak of random access is assigned to the sparse neighbours at the density peak He the density peak same Cluster;" " arbitrarily find a point in the cluster, and the sparse neighbours of this point are assigned into the cluster, until described poly- The sparse neighbours of all the points in class assign to the cluster;", until accessing all density peaks;
Similarity between class between the cluster formed through the above steps according to all density peaks, merging similarity is high to gather Class.
In other one embodiment, middle data amount check between the cluster that all density peaks are formed through the above steps Cluster less than smallest natural neighbours' number is removed from cluster result, and is noise spot by these data markers in clustering, and is obtained Final cluster result, the smallest natural neighbours number refer in cluster in the natural nearest-neighbors number of all data points most Small value.
A kind of computer equipment, including memory, processor and storage can be run on a memory and on a processor The step of computer program, the processor realizes any one the method when executing described program.
A kind of computer readable storage medium, is stored thereon with computer program, which realizes when being executed by processor The step of any one the method.
The density peaks clustering method (TNDP methods) of above-mentioned nature arest neighbors optimization, number is calculated with nature nearest-neighbors The local density at strong point, does not need parameter, avoids parameter sensitivity problem;Local density is calculated with natural nearest-neighbors, due to Natural nearest-neighbors accurately reflect the attribute feature of data point, so the local density calculated in this way can accurately indicate each The density size of data point improves Clustering Effect;Since natural nearest-neighbors do not include noise spot and abnormal point, so reducing The influence of noise spot and abnormal point to cluster result.
The local density of data point is determined according to the concept of natural nearest-neighbors first, then according to density peak local density Highest and cluster centre is determined by sparse region segmentation, similarity concept is multiple to solve between finally proposing a kind of new class cluster Miscellaneous manifold problem.
Natural nearest-neighbors (Natural Nearest Neighbor:TN it is) a kind of new nearest-neighbors concept, it is A kind of nearest-neighbors of scale free, this is also it and K- arest neighbors and the maximum difference of ε-arest neighbors.Natural nearest-neighbors Basic thought be exactly that the data points of close quarters in data set possesses more neighbours, the data point of sparse region possesses less Neighbours, and the data point most to peel off in data set only has several or not have nearest-neighbors, the characteristics of natural nearest-neighbors be meter Calculation process does not need any parameter, and data point obtains accurate neighbours according to the attribute feature of data set itself, neighbours' number due to The dense degree of data has differences, since noise spot and abnormal point do not have neighbours, so normal point will not noise spot and Abnormal point is as neighbours.
Define 1 natural nearest-neighbors (Natural Nearest Neighbor:TN):It is searched for and is calculated based on nature nearest-neighbors Method (TN-Searching algorithms), if fruit dot X belongs to the neighbours of point Y, and point Y belongs to the neighbours of point X, then point X and Y belong to that This natural nearest-neighbors.
Define 2 physical feature values (supk):According to TN-Searching algorithms, each point has the neighbours of different number, right In any point i, neighbours' quantity is nb (i).But TN-Searching, there are one the neighbours of par, referred to as supk, it is Physical feature value.The formula for calculating supk is as follows:
Define 3R- neighborhoods (R-neighbor):FindKNN (xi, r) indicates KNN search functions, it returns to r-th of xi Neighbours, KNNr (xi) are the subsets of X, are defined as follows:
Define the density (Den (Pi)) of 4 data points:The density defined based on natural nearest-neighbors is as follows:
Here nb (i) is the natural nearest-neighbors number for each of obtaining putting according to TN-Searching algorithms, N (i, nb (i)) be point i a natural nearest-neighbors of nb (i), dist (i, j) is the distance between data point i and j.
It defines 5 and represents point (Exemplar):The representative point of data point q is defined as:
Exemplar (q)=max Den (NN (p) &&pq }
Define 6 density peaks (DensityPeak):If data point p meets following condition, just data point p is referred to as one close Spend peak:
Define similarity (Similarity Between Clusters) between 7 classes:
| Ci ∩ Cj | refer to that the common portion of class Ci and class Cj, supk are nature neighbors feature values, Sim's (Ci, Cj) Value is not less than 0, if the two adjacent initial clusters are divided by sparse region, the similitude between the two clusters by very little, It is two individual clusters.On the contrary, if the two adjacent initial clusters are connected by density area, the two adjacent clusters it Between similitude can be very big, then the two clusters will be merged into a cluster.
8 sparse neighbours and intensive neighbours (Sparse and Dense Neighbor) are defined, if the density of data point q Density and q less than data point p are the natural nearest-neighbors of p, then it is the sparse neighbours of p to claim q, if opposite data point q's is close Density of the degree more than or equal to data point p and q are the natural nearest-neighbors of p, then it is the intensive neighbours of p to claim q, is defined as follows:
SN (p)=q | Den (q)<Den(p)&&q∈NN(p)}
DN (p)=q | Den (q)>Den(p)&&q∈NN(p)}
The main flow for the TN-Searching algorithms that the present invention mentions:
Step 1:Input data set X, enables r=1, and xi is found with the method that K-d trees are searched for each of data concentration point xi R neighbours knnr (xi), and the r neighbours knnr (xi) of xi is merged into the R- neighborhoods KNNr (xi) of xi;
Step 2:If xi is in the R- neighborhoods KNNr (knnr (xi)) of the r neighbours knnr (xi) of xi, and xi and knnr (xi) it is not mutual natural nearest-neighbors, xi and knnr (xi) is just defined as mutual natural nearest-neighbors.
Step 3:R=r+1 repeats step 1 and 2, if the number for the point that natural nearest-neighbors are 0 does not change, Step 4 is jumped to, otherwise repeatedly step 3;
Step 4:Export physical feature value r, the natural nearest-neighbors number of each data point and the natural arest neighbors each put Occupy set.
One possible TN-Searching specific implementations code is as follows:
The possible specific implementation code that the density peaks clustering method of the natural arest neighbors optimization of one present invention is realized is such as Under:
By following table it is found that in accuracy rate, TNDP methods will be substantially better than DPC, DBSCAN, KMEANS algorithm, in F values All it is that TNDP methods are apparent in addition to DBSCAN is better than TNDP methods on wpbc data sets, on other data sets in calculating Correct class number can be clustered out better than DPC, DBSCAN, KMEANS algorithm, and to these data sets TNDP methods.Integrate this Three aspects, it is clear that TNDP methods are classic.
The concrete application embodiment of a application is described below:
Step 1:TNDP obtains the natural arest neighbors of each data point in data set X using TN-Searching algorithms It occupies, then calculates the density of each data point;
Step 2:The representative point of each data point and sparse neighbours are found with defining 5 and defining 8;
Step 3:All density peaks and one density peak of random access are found, it is assigned to together with its sparse neighbours One cluster;
Step 4:A point is arbitrarily found in this cluster, and the sparse neighbours of this point and this point are classified as together One cluster, until all the points of this cluster are all accessed;
Step 5:It finds the density peak that one does not access and repeats the above steps, until all density peaks are all accessed It crosses;
Step 6:Initial classes cluster has been divided, according to the similarity relationship between initial classes cluster, has merged high initial of similarity Class cluster;
Step 7:The class cluster that data amount check in class cluster is less than to smallest natural neighbours' number is removed from cluster result, and will Data markers in these class clusters are noise spot, obtain final cluster result.
Each technical characteristic of embodiment described above can be combined arbitrarily, to keep description succinct, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, it is all considered to be the range of this specification record.
Several embodiments of the invention above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection of the present invention Range.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims (4)

1. a kind of density peaks clustering method of nature arest neighbors optimization, which is characterized in that including:
Find density peak all in the data set;
The sparse neighbours at the density peak He the density peak are assigned to the same cluster by one density peak of random access;
A point is arbitrarily found in the cluster, and the sparse neighbours of this point are assigned into the cluster, until the cluster In the sparse neighbours of all the points assign to the cluster;
Repeating step, " one density peak of random access is assigned to the sparse neighbours at the density peak He the density peak same poly- Class;" " arbitrarily find a point in the cluster, and the sparse neighbours of this point are assigned into the cluster, until the cluster In the sparse neighbours of all the points assign to the cluster;", until accessing all density peaks;
Similarity between class between the cluster formed through the above steps according to all density peaks merges the high cluster of similarity.
2. the density peaks clustering method of nature arest neighbors optimization according to claim 1, which is characterized in that further include: Middle data amount check is less than the cluster of smallest natural neighbours' number from poly- between the cluster that all density peaks are formed through the above steps It is removed in class result, and is noise spot by these data markers in clustering, obtain final cluster result, the smallest natural Neighbours' number refers to the minimum value in the natural nearest-neighbors number of all data points in cluster.
3. a kind of computer equipment, including memory, processor and storage are on a memory and the meter that can run on a processor Calculation machine program, which is characterized in that the processor realizes any one of claim 1-2 the methods when executing described program The step of.
4. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The step of claim 1-2 any one the methods are realized when row.
CN201810463136.4A 2018-05-15 2018-05-15 The density peaks clustering method of natural arest neighbors optimization Pending CN108764307A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810463136.4A CN108764307A (en) 2018-05-15 2018-05-15 The density peaks clustering method of natural arest neighbors optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810463136.4A CN108764307A (en) 2018-05-15 2018-05-15 The density peaks clustering method of natural arest neighbors optimization

Publications (1)

Publication Number Publication Date
CN108764307A true CN108764307A (en) 2018-11-06

Family

ID=64007783

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810463136.4A Pending CN108764307A (en) 2018-05-15 2018-05-15 The density peaks clustering method of natural arest neighbors optimization

Country Status (1)

Country Link
CN (1) CN108764307A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110046665A (en) * 2019-04-17 2019-07-23 成都信息工程大学 Based on isolated two abnormal classification point detecting method of forest, information data processing terminal
CN111260503A (en) * 2020-01-13 2020-06-09 浙江大学 Wind turbine generator power curve outlier detection method based on cluster center optimization
TWI717259B (en) * 2019-04-17 2021-01-21 南韓商韓領有限公司 Computer-implemented system and method for batch picking optimization
CN116756526A (en) * 2023-08-17 2023-09-15 北京英沣特能源技术有限公司 Full life cycle performance detection and analysis system of energy storage equipment

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110046665A (en) * 2019-04-17 2019-07-23 成都信息工程大学 Based on isolated two abnormal classification point detecting method of forest, information data processing terminal
TWI717259B (en) * 2019-04-17 2021-01-21 南韓商韓領有限公司 Computer-implemented system and method for batch picking optimization
TWI750947B (en) * 2019-04-17 2021-12-21 南韓商韓領有限公司 Computer-implemented system and method for batch picking optimization
CN111260503A (en) * 2020-01-13 2020-06-09 浙江大学 Wind turbine generator power curve outlier detection method based on cluster center optimization
CN111260503B (en) * 2020-01-13 2023-10-27 浙江大学 Wind turbine generator power curve outlier detection method based on cluster center optimization
CN116756526A (en) * 2023-08-17 2023-09-15 北京英沣特能源技术有限公司 Full life cycle performance detection and analysis system of energy storage equipment
CN116756526B (en) * 2023-08-17 2023-10-13 北京英沣特能源技术有限公司 Full life cycle performance detection and analysis system of energy storage equipment

Similar Documents

Publication Publication Date Title
Lengyel et al. Silhouette width using generalized mean—A flexible method for assessing clustering efficiency
US11132388B2 (en) Efficient spatial queries in large data tables
CN108764307A (en) The density peaks clustering method of natural arest neighbors optimization
CN109034562B (en) Social network node importance evaluation method and system
CN111259933B (en) High-dimensional characteristic data classification method and system based on distributed parallel decision tree
CN103888541A (en) Method and system for discovering cells fused with topology potential and spectral clustering
CN106796589A (en) The indexing means and system of spatial data object
CN106228554A (en) Fuzzy coarse central coal dust image partition methods based on many attribute reductions
CN106951526B (en) Entity set extension method and device
CN108829804A (en) Based on the high dimensional data similarity join querying method and device apart from partition tree
CN106934410A (en) The sorting technique and system of data
CN108549696B (en) Time series data similarity query method based on memory calculation
CN110135180A (en) Meet the degree distribution histogram dissemination method of node difference privacy
CN108304404B (en) Data frequency estimation method based on improved Sketch structure
Bruzzese et al. DESPOTA: DEndrogram slicing through a pemutation test approach
Yin et al. Finding the informative and concise set through approximate skyline queries
CN110781943A (en) Clustering method based on adjacent grid search
CN110580252A (en) Space object indexing and query method under multi-objective optimization
KR101994871B1 (en) Apparatus for generating index to multi dimensional data
CN104794237B (en) web information processing method and device
CN109245948B (en) Security-aware virtual network mapping method and device
CN115205699B (en) Map image spot clustering fusion processing method based on CFSFDP improved algorithm
Zhang et al. A novel method for detecting outlying subspaces in high-dimensional databases using genetic algorithm
CN107562872A (en) Metric space data similarity search method and device based on SQL
Mouratidis et al. Medoid queries in large spatial databases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181106