CN108764307A - The density peaks clustering method of natural arest neighbors optimization - Google Patents
The density peaks clustering method of natural arest neighbors optimization Download PDFInfo
- Publication number
- CN108764307A CN108764307A CN201810463136.4A CN201810463136A CN108764307A CN 108764307 A CN108764307 A CN 108764307A CN 201810463136 A CN201810463136 A CN 201810463136A CN 108764307 A CN108764307 A CN 108764307A
- Authority
- CN
- China
- Prior art keywords
- cluster
- density
- neighbours
- neighbors
- point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
Abstract
The present invention relates to a kind of density peaks clustering methods (TNDP methods) of natural arest neighbors optimization, and the local density of data point is calculated with nature nearest-neighbors, does not need parameter, avoids parameter sensitivity problem;Local density is calculated with natural nearest-neighbors, since natural nearest-neighbors accurately reflect the attribute feature of data point, so the local density calculated in this way can accurately indicate the density size of each data point, improves Clustering Effect;Since natural nearest-neighbors do not include noise spot and abnormal point, so reducing the influence of noise spot and abnormal point to cluster result.
Description
Technical field
The present invention relates to clustering methods, more particularly to the density peaks clustering method of natural arest neighbors optimization.
Background technology
In the evolution of clustering, KMEANS, DBSCAN, FCM are proposed in succession, and a series of cluster such as AP is calculated
Method document, 2014《Science》On delivered one《Clusteringby fast search and find offast
search》, paper proposes a kind of fast search and finds the clustering algorithm of density peaks.The algorithm can provide data set sample automatically
This class cluster center, and there is no harsh requirement to the shape of data set sample, it can to the data set sample of arbitrary shape
Realize efficient cluster.The core concept of the algorithm is to assert cluster centre while meeting 2 basic demands:1) itself is close
Degree is very big, i.e., the density of its surrounding neighbours point is big without it;2) and than " distance " between the data point of its density bigger
Bigger.However the disadvantage and difficult point of DPC algorithms should not be underestimated:1) when using DPC algorithms, block distance is every field
The parameter that the algorithm must be set, people always set the parameter by hand, and manual setting is there are certain randomness and artificially
Factor influences clustering result quality;2) be always to the analyzing processing of higher dimensional data DPC algorithms short slab, higher dimensional data
Self structure possesses sparsity and space complexity so that traditional Euclidean distance is in the similitude between reflecting data object
It is unable to reach accurate, rational purpose, therefore the algorithm is caused to fail;3) although DPC algorithms claim to automatically determine cluster knot
Fruit, but the selected of manual progress cluster result is needed in practical cluster operation, cluster result cannot provide automatically.
For deficiency existing for DPC clustering algorithms, Zhang WenKai combine the algorithm and CHAMELEON algorithms, propose
E_CFSFDP, solves the problems, such as not handling in a class cluster that there are one above density peaks points in CFSFDP algorithms, but
Be the algorithm performance need to be further increased and the ability on processing high dimensional data have it is to be strengthened.LiuY proposes a kind of base
In k nearest neighbor fast density peak value searching and efficiently distribute sample algorithm KNN-DPC, solve CFSFDP algorithm cluster results
To more sensitive and because caused by a step distribution the problem of related assignment error, but the cluster of the algorithm of blocking distance dc
Cluster result is more sensitive to the selection of neighbour's number K.Rashid Mehmood propose Fuzzy-CFSFDP algorithms, by fuzzy rule
Then be used for CFSFDP algorithms class cluster central point determine in, improve class cluster central point choose and cluster result it is accurate
Rate, but it is slightly inadequate when handling complex data.
There are following technical problems for traditional technology:
There are parameter sensitivity, processing aspherical surface data and complicated manifold data clusters for existing density-based algorithms
The problem of effect difference.
Invention content
Based on this, it is necessary in view of the above technical problems, provide a kind of density peaks cluster side of natural arest neighbors optimization
Method avoids parameter sensitivity problem, improves Clustering Effect.
A kind of density peaks clustering method of nature arest neighbors optimization, including:
Find density peak all in data set;
The sparse neighbours at the density peak He the density peak are assigned to the same cluster by one density peak of random access;
A point is arbitrarily found in the cluster, and the sparse neighbours of this point are assigned into the cluster, until described
The sparse neighbours of all the points in cluster assign to the cluster;
Repeating step, " one density peak of random access is assigned to the sparse neighbours at the density peak He the density peak same
Cluster;" " arbitrarily find a point in the cluster, and the sparse neighbours of this point are assigned into the cluster, until described poly-
The sparse neighbours of all the points in class assign to the cluster;", until accessing all density peaks;
Similarity between class between the cluster formed through the above steps according to all density peaks, merging similarity is high to gather
Class.
In other one embodiment, middle data amount check between the cluster that all density peaks are formed through the above steps
Cluster less than smallest natural neighbours' number is removed from cluster result, and is noise spot by these data markers in clustering, and is obtained
Final cluster result, the smallest natural neighbours number refer in cluster in the natural nearest-neighbors number of all data points most
Small value.
A kind of computer equipment, including memory, processor and storage can be run on a memory and on a processor
The step of computer program, the processor realizes any one the method when executing described program.
A kind of computer readable storage medium, is stored thereon with computer program, which realizes when being executed by processor
The step of any one the method.
The density peaks clustering method (TNDP methods) of above-mentioned nature arest neighbors optimization, number is calculated with nature nearest-neighbors
The local density at strong point, does not need parameter, avoids parameter sensitivity problem;Local density is calculated with natural nearest-neighbors, due to
Natural nearest-neighbors accurately reflect the attribute feature of data point, so the local density calculated in this way can accurately indicate each
The density size of data point improves Clustering Effect;Since natural nearest-neighbors do not include noise spot and abnormal point, so reducing
The influence of noise spot and abnormal point to cluster result.
Description of the drawings
Fig. 1 is a kind of flow chart of the density peaks clustering method of natural arest neighbors optimization provided by the embodiments of the present application.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
Refering to fig. 1, a kind of density peaks clustering method of natural arest neighbors optimization, including:
Find density peak all in data set;
The sparse neighbours at the density peak He the density peak are assigned to the same cluster by one density peak of random access;
A point is arbitrarily found in the cluster, and the sparse neighbours of this point are assigned into the cluster, until described
The sparse neighbours of all the points in cluster assign to the cluster;
Repeating step, " one density peak of random access is assigned to the sparse neighbours at the density peak He the density peak same
Cluster;" " arbitrarily find a point in the cluster, and the sparse neighbours of this point are assigned into the cluster, until described poly-
The sparse neighbours of all the points in class assign to the cluster;", until accessing all density peaks;
Similarity between class between the cluster formed through the above steps according to all density peaks, merging similarity is high to gather
Class.
In other one embodiment, middle data amount check between the cluster that all density peaks are formed through the above steps
Cluster less than smallest natural neighbours' number is removed from cluster result, and is noise spot by these data markers in clustering, and is obtained
Final cluster result, the smallest natural neighbours number refer in cluster in the natural nearest-neighbors number of all data points most
Small value.
A kind of computer equipment, including memory, processor and storage can be run on a memory and on a processor
The step of computer program, the processor realizes any one the method when executing described program.
A kind of computer readable storage medium, is stored thereon with computer program, which realizes when being executed by processor
The step of any one the method.
The density peaks clustering method (TNDP methods) of above-mentioned nature arest neighbors optimization, number is calculated with nature nearest-neighbors
The local density at strong point, does not need parameter, avoids parameter sensitivity problem;Local density is calculated with natural nearest-neighbors, due to
Natural nearest-neighbors accurately reflect the attribute feature of data point, so the local density calculated in this way can accurately indicate each
The density size of data point improves Clustering Effect;Since natural nearest-neighbors do not include noise spot and abnormal point, so reducing
The influence of noise spot and abnormal point to cluster result.
The local density of data point is determined according to the concept of natural nearest-neighbors first, then according to density peak local density
Highest and cluster centre is determined by sparse region segmentation, similarity concept is multiple to solve between finally proposing a kind of new class cluster
Miscellaneous manifold problem.
Natural nearest-neighbors (Natural Nearest Neighbor:TN it is) a kind of new nearest-neighbors concept, it is
A kind of nearest-neighbors of scale free, this is also it and K- arest neighbors and the maximum difference of ε-arest neighbors.Natural nearest-neighbors
Basic thought be exactly that the data points of close quarters in data set possesses more neighbours, the data point of sparse region possesses less
Neighbours, and the data point most to peel off in data set only has several or not have nearest-neighbors, the characteristics of natural nearest-neighbors be meter
Calculation process does not need any parameter, and data point obtains accurate neighbours according to the attribute feature of data set itself, neighbours' number due to
The dense degree of data has differences, since noise spot and abnormal point do not have neighbours, so normal point will not noise spot and
Abnormal point is as neighbours.
Define 1 natural nearest-neighbors (Natural Nearest Neighbor:TN):It is searched for and is calculated based on nature nearest-neighbors
Method (TN-Searching algorithms), if fruit dot X belongs to the neighbours of point Y, and point Y belongs to the neighbours of point X, then point X and Y belong to that
This natural nearest-neighbors.
Define 2 physical feature values (supk):According to TN-Searching algorithms, each point has the neighbours of different number, right
In any point i, neighbours' quantity is nb (i).But TN-Searching, there are one the neighbours of par, referred to as supk, it is
Physical feature value.The formula for calculating supk is as follows:
Define 3R- neighborhoods (R-neighbor):FindKNN (xi, r) indicates KNN search functions, it returns to r-th of xi
Neighbours, KNNr (xi) are the subsets of X, are defined as follows:
Define the density (Den (Pi)) of 4 data points:The density defined based on natural nearest-neighbors is as follows:
Here nb (i) is the natural nearest-neighbors number for each of obtaining putting according to TN-Searching algorithms, N (i, nb
(i)) be point i a natural nearest-neighbors of nb (i), dist (i, j) is the distance between data point i and j.
It defines 5 and represents point (Exemplar):The representative point of data point q is defined as:
Exemplar (q)=max Den (NN (p) &&pq }
Define 6 density peaks (DensityPeak):If data point p meets following condition, just data point p is referred to as one close
Spend peak:
Define similarity (Similarity Between Clusters) between 7 classes:
| Ci ∩ Cj | refer to that the common portion of class Ci and class Cj, supk are nature neighbors feature values, Sim's (Ci, Cj)
Value is not less than 0, if the two adjacent initial clusters are divided by sparse region, the similitude between the two clusters by very little,
It is two individual clusters.On the contrary, if the two adjacent initial clusters are connected by density area, the two adjacent clusters it
Between similitude can be very big, then the two clusters will be merged into a cluster.
8 sparse neighbours and intensive neighbours (Sparse and Dense Neighbor) are defined, if the density of data point q
Density and q less than data point p are the natural nearest-neighbors of p, then it is the sparse neighbours of p to claim q, if opposite data point q's is close
Density of the degree more than or equal to data point p and q are the natural nearest-neighbors of p, then it is the intensive neighbours of p to claim q, is defined as follows:
SN (p)=q | Den (q)<Den(p)&&q∈NN(p)}
DN (p)=q | Den (q)>Den(p)&&q∈NN(p)}
The main flow for the TN-Searching algorithms that the present invention mentions:
Step 1:Input data set X, enables r=1, and xi is found with the method that K-d trees are searched for each of data concentration point xi
R neighbours knnr (xi), and the r neighbours knnr (xi) of xi is merged into the R- neighborhoods KNNr (xi) of xi;
Step 2:If xi is in the R- neighborhoods KNNr (knnr (xi)) of the r neighbours knnr (xi) of xi, and xi and knnr
(xi) it is not mutual natural nearest-neighbors, xi and knnr (xi) is just defined as mutual natural nearest-neighbors.
Step 3:R=r+1 repeats step 1 and 2, if the number for the point that natural nearest-neighbors are 0 does not change,
Step 4 is jumped to, otherwise repeatedly step 3;
Step 4:Export physical feature value r, the natural nearest-neighbors number of each data point and the natural arest neighbors each put
Occupy set.
One possible TN-Searching specific implementations code is as follows:
The possible specific implementation code that the density peaks clustering method of the natural arest neighbors optimization of one present invention is realized is such as
Under:
By following table it is found that in accuracy rate, TNDP methods will be substantially better than DPC, DBSCAN, KMEANS algorithm, in F values
All it is that TNDP methods are apparent in addition to DBSCAN is better than TNDP methods on wpbc data sets, on other data sets in calculating
Correct class number can be clustered out better than DPC, DBSCAN, KMEANS algorithm, and to these data sets TNDP methods.Integrate this
Three aspects, it is clear that TNDP methods are classic.
The concrete application embodiment of a application is described below:
Step 1:TNDP obtains the natural arest neighbors of each data point in data set X using TN-Searching algorithms
It occupies, then calculates the density of each data point;
Step 2:The representative point of each data point and sparse neighbours are found with defining 5 and defining 8;
Step 3:All density peaks and one density peak of random access are found, it is assigned to together with its sparse neighbours
One cluster;
Step 4:A point is arbitrarily found in this cluster, and the sparse neighbours of this point and this point are classified as together
One cluster, until all the points of this cluster are all accessed;
Step 5:It finds the density peak that one does not access and repeats the above steps, until all density peaks are all accessed
It crosses;
Step 6:Initial classes cluster has been divided, according to the similarity relationship between initial classes cluster, has merged high initial of similarity
Class cluster;
Step 7:The class cluster that data amount check in class cluster is less than to smallest natural neighbours' number is removed from cluster result, and will
Data markers in these class clusters are noise spot, obtain final cluster result.
Each technical characteristic of embodiment described above can be combined arbitrarily, to keep description succinct, not to above-mentioned reality
It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited
In contradiction, it is all considered to be the range of this specification record.
Several embodiments of the invention above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously
It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art
It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection of the present invention
Range.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.
Claims (4)
1. a kind of density peaks clustering method of nature arest neighbors optimization, which is characterized in that including:
Find density peak all in the data set;
The sparse neighbours at the density peak He the density peak are assigned to the same cluster by one density peak of random access;
A point is arbitrarily found in the cluster, and the sparse neighbours of this point are assigned into the cluster, until the cluster
In the sparse neighbours of all the points assign to the cluster;
Repeating step, " one density peak of random access is assigned to the sparse neighbours at the density peak He the density peak same poly-
Class;" " arbitrarily find a point in the cluster, and the sparse neighbours of this point are assigned into the cluster, until the cluster
In the sparse neighbours of all the points assign to the cluster;", until accessing all density peaks;
Similarity between class between the cluster formed through the above steps according to all density peaks merges the high cluster of similarity.
2. the density peaks clustering method of nature arest neighbors optimization according to claim 1, which is characterized in that further include:
Middle data amount check is less than the cluster of smallest natural neighbours' number from poly- between the cluster that all density peaks are formed through the above steps
It is removed in class result, and is noise spot by these data markers in clustering, obtain final cluster result, the smallest natural
Neighbours' number refers to the minimum value in the natural nearest-neighbors number of all data points in cluster.
3. a kind of computer equipment, including memory, processor and storage are on a memory and the meter that can run on a processor
Calculation machine program, which is characterized in that the processor realizes any one of claim 1-2 the methods when executing described program
The step of.
4. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor
The step of claim 1-2 any one the methods are realized when row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810463136.4A CN108764307A (en) | 2018-05-15 | 2018-05-15 | The density peaks clustering method of natural arest neighbors optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810463136.4A CN108764307A (en) | 2018-05-15 | 2018-05-15 | The density peaks clustering method of natural arest neighbors optimization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108764307A true CN108764307A (en) | 2018-11-06 |
Family
ID=64007783
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810463136.4A Pending CN108764307A (en) | 2018-05-15 | 2018-05-15 | The density peaks clustering method of natural arest neighbors optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108764307A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110046665A (en) * | 2019-04-17 | 2019-07-23 | 成都信息工程大学 | Based on isolated two abnormal classification point detecting method of forest, information data processing terminal |
CN111260503A (en) * | 2020-01-13 | 2020-06-09 | 浙江大学 | Wind turbine generator power curve outlier detection method based on cluster center optimization |
TWI717259B (en) * | 2019-04-17 | 2021-01-21 | 南韓商韓領有限公司 | Computer-implemented system and method for batch picking optimization |
CN116756526A (en) * | 2023-08-17 | 2023-09-15 | 北京英沣特能源技术有限公司 | Full life cycle performance detection and analysis system of energy storage equipment |
-
2018
- 2018-05-15 CN CN201810463136.4A patent/CN108764307A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110046665A (en) * | 2019-04-17 | 2019-07-23 | 成都信息工程大学 | Based on isolated two abnormal classification point detecting method of forest, information data processing terminal |
TWI717259B (en) * | 2019-04-17 | 2021-01-21 | 南韓商韓領有限公司 | Computer-implemented system and method for batch picking optimization |
TWI750947B (en) * | 2019-04-17 | 2021-12-21 | 南韓商韓領有限公司 | Computer-implemented system and method for batch picking optimization |
CN111260503A (en) * | 2020-01-13 | 2020-06-09 | 浙江大学 | Wind turbine generator power curve outlier detection method based on cluster center optimization |
CN111260503B (en) * | 2020-01-13 | 2023-10-27 | 浙江大学 | Wind turbine generator power curve outlier detection method based on cluster center optimization |
CN116756526A (en) * | 2023-08-17 | 2023-09-15 | 北京英沣特能源技术有限公司 | Full life cycle performance detection and analysis system of energy storage equipment |
CN116756526B (en) * | 2023-08-17 | 2023-10-13 | 北京英沣特能源技术有限公司 | Full life cycle performance detection and analysis system of energy storage equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lengyel et al. | Silhouette width using generalized mean—A flexible method for assessing clustering efficiency | |
US11132388B2 (en) | Efficient spatial queries in large data tables | |
CN108764307A (en) | The density peaks clustering method of natural arest neighbors optimization | |
CN109034562B (en) | Social network node importance evaluation method and system | |
CN111259933B (en) | High-dimensional characteristic data classification method and system based on distributed parallel decision tree | |
CN103888541A (en) | Method and system for discovering cells fused with topology potential and spectral clustering | |
CN106796589A (en) | The indexing means and system of spatial data object | |
CN106228554A (en) | Fuzzy coarse central coal dust image partition methods based on many attribute reductions | |
CN106951526B (en) | Entity set extension method and device | |
CN108829804A (en) | Based on the high dimensional data similarity join querying method and device apart from partition tree | |
CN106934410A (en) | The sorting technique and system of data | |
CN108549696B (en) | Time series data similarity query method based on memory calculation | |
CN110135180A (en) | Meet the degree distribution histogram dissemination method of node difference privacy | |
CN108304404B (en) | Data frequency estimation method based on improved Sketch structure | |
Bruzzese et al. | DESPOTA: DEndrogram slicing through a pemutation test approach | |
Yin et al. | Finding the informative and concise set through approximate skyline queries | |
CN110781943A (en) | Clustering method based on adjacent grid search | |
CN110580252A (en) | Space object indexing and query method under multi-objective optimization | |
KR101994871B1 (en) | Apparatus for generating index to multi dimensional data | |
CN104794237B (en) | web information processing method and device | |
CN109245948B (en) | Security-aware virtual network mapping method and device | |
CN115205699B (en) | Map image spot clustering fusion processing method based on CFSFDP improved algorithm | |
Zhang et al. | A novel method for detecting outlying subspaces in high-dimensional databases using genetic algorithm | |
CN107562872A (en) | Metric space data similarity search method and device based on SQL | |
Mouratidis et al. | Medoid queries in large spatial databases |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181106 |