CN107563400A - A kind of density peaks clustering method and system based on grid - Google Patents
A kind of density peaks clustering method and system based on grid Download PDFInfo
- Publication number
- CN107563400A CN107563400A CN201610515319.7A CN201610515319A CN107563400A CN 107563400 A CN107563400 A CN 107563400A CN 201610515319 A CN201610515319 A CN 201610515319A CN 107563400 A CN107563400 A CN 107563400A
- Authority
- CN
- China
- Prior art keywords
- cell
- data
- grid
- density
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 17
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 29
- 238000013507 mapping Methods 0.000 claims abstract description 8
- 238000010586 diagram Methods 0.000 claims description 5
- 239000011159 matrix material Substances 0.000 claims description 5
- 238000013459 approach Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 4
- 238000007621 cluster analysis Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Landscapes
- Image Analysis (AREA)
Abstract
The present invention proposes a kind of density peaks clustering method and system based on grid, first, data space is divided into etc. to the rectangular unit grid of size, then, respectively by each Mapping of data points into corresponding cell, the data message of each cell is counted again, sees each cell as a data point, finally cell is clustered using density peaks algorithm.This method can not only effectively improve the operational efficiency of density peaks algorithm, handle large data sets well, find the cluster of arbitrary shape, effectively handle high dimensional data, and can handle noise isolated point well, have Clustering Effect well.
Description
Technical field
The present invention relates to pattern-recognition and machine learning field, and in particular to a kind of density peaks cluster side based on grid
Method and system.
Background technology
Cluster analysis is unsupervised learning, and its target is exactly to make the similarity of sample in same class cluster larger, inhomogeneity cluster
Between sample similarity it is smaller.Cluster analysis is an active research direction of data mining, and in market analysis, pattern is known
Not, the fields such as gene studies, image procossing have certain application value.Clustering algorithm can substantially be divided into based on division, base
In level, based on model, based on density and based on clustering algorithms such as grids.
Clustering algorithm performance based on grid is good, efficiency high, run time independently of data point number, only and dividing regions
It is related per one-dimensional grid cell in domain, there is higher practicality, and the result clustered to the analyzing and processing of large data sets
It is unrelated with the order of input data, thus be widely used.But the clustering algorithm based on grid is highly dependent on density threshold
Selection, it is poor to the noise data recognition capability in border mesh.Density-based algorithms are with data set in data space
In densely distributed degree clustered for certain foundation, while the shape clustered does not have benchmark, and can be in needs
When remove noise data, but density-based algorithms computation complexity is higher.Although the clustering algorithm based on grid
Efficiency high, but due to substantially the defects of, clustering precision is not high, so can only regard a kind of compression means as, is combined with density
To improve clustering performance.It is also next frequently by being combined with grid and density-based algorithms are due to complexity is high
Operand is reduced, both combinations can effectively improve operational efficiency.
DPC algorithms based on density can be used for the cluster analysis of different pieces of information, it is not necessary to class number of clusters is preset, can be with
Class cluster center is found out according to decision diagram, and can apply to the data of arbitrary shape.But due to DPC algorithms need to calculate in advance it is all
Distance between points, and when data set is increasing, especially this is a big data epoch, this calculating local density
Method need to take some time cost.
The content of the invention
In order to solve the above problems, the present invention proposes a kind of density peaks clustering method and system based on grid.First,
Data space is divided into etc. to the rectangular unit grid of size, then, respectively by each Mapping of data points into corresponding cell,
The data message of each cell is counted again, each cell is seen as a data point, finally using density peaks algorithm
Cell is clustered.This method can not only effectively improve the operational efficiency of density peaks algorithm, well the big number of processing
According to collection, the cluster of arbitrary shape is found, effectively handles high dimensional data, and noise isolated point can be handled well, had and gather well
Class effect.
The present invention is achieved by the following scheme:
The present invention relates to a kind of density peaks clustering method based on grid, based on the DPC algorithms based on density,
The thought of grid is introduced when calculating local density's property value of each data point, to reduce amount of calculation, improves operational efficiency.
The present invention comprises the following steps that:
Step 1:The every of S spaces one-dimensional is divided into the size grid cell such as mutually disjoint using grid ideas.
Step 2:By each Mapping of data points into corresponding grid cell.
Step 3:The number of data point in each grid cell is counted, the local density ρ as this celli。
Step 4:With reference to DPC algorithms, using cell as data point, distance matrix d is formedij。
Step 5:UtilizeComputing unit lattice and with more highdensity recently between cell
Distance property δi。
Step 6:According to the above-mentioned local density attribute ρ obtainediWith distance property δi, decision diagram is drawn, takes two property values
All high cell is as cluster centre.
Step 7:Remaining cell is clustered, current cell is attributed into density is at or above current cell
Nearest cell it is a kind of.
Step 8:The border of current class is calculated, then finds out the density of border Midst density highest cell as threshold
Value, remove the cell for being less than this density in current class.
By above content, the application provides a kind of density peaks clustering method and system based on grid,
Initialization cluster is carried out to data by the CLIQUE algorithms based on grid first, the regional space of input data is partitioned into
Etc. the grid rectangular element of size, then by all Mapping of data points to cell, and the data message of each cell is counted.
Then see each cell as a data point, cell is clustered using DPC algorithms.The application can not only have
Effect improves the operational efficiency of density peaks algorithm, handles large data sets well, finds the cluster of arbitrary shape, effectively handle higher-dimension
Data, and noise isolated point can be handled well, there is Clustering Effect well.
Brief description of the drawings
In order to be further understood to the present invention, the embodiment of the present invention is illustrated more clearly that, in being described below to embodiment
The required accompanying drawing used is briefly described.
Fig. 1 is a kind of flow chart for density peaks cluster based on grid that the application case study on implementation provides.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present application, the technical scheme in the embodiment of the present application is carried out clear, complete
Site preparation describes.Obviously, described case study on implementation is only some embodiments of the present application, rather than whole embodiments.Base
Embodiment in the application, those of ordinary skill in the art obtained under the premise of creative work is not paid it is all its
Its embodiment, belong to the scope of the application protection.
Embodiment 1
As shown in figure 1, the implementation case comprises the following steps:
Input:Data set X=X1, X2, X3 ... and Xn }, cell local density parameter dc, mesh spacing parameter ξ.
Output:Cluster result.
Step 1, initialization cluster is carried out to data using the CLIQUE algorithms based on grid, the region of input data is empty
Between the grid rectangular element of size such as be partitioned into, then by all Mapping of data points to cell, and count each cell
Data message.
Step 1.1:If A={ A1, A2, A3......An } is the set of a N-dimensional, S=A1*A*A3......*An is
One N-dimensional space, V={ v1, v2, v3......vn }, wherein vi={ vil, vi2......vin } and vij ∈ Aj.According to
The data distribution of different pieces of information collection, ξ parameters are inputted, by ξ being step-length, one-dimensional to be divided into mutually disjoint grade big by S spaces every
Small grid unit.
Step 1.2:Then each grid cell is regarded as { u1, u2......un }.By data point V=v1, v2,
V3......vn } it is mapped in u={ u1, u2......un } unit.
Step 1.3:The number of data point in each cell u is counted, the local density ρ as each celli。
Such as:Point v1={ 2.2,2.3 }, v2={ 3.1,3.2 }, v3={ 2.5,2.9 }.ξ=1 is taken to carry out mesh generation,
Now v1, v3 are divided in { 2,2 } grid, and v2 is divided in { 3,3 } grid.So density p ({ 2,2 }) of grid { 2,2 }
=2, the density p ({ 3,3 })=1 of grid { 3,3 }.
Step 2, see each cell as a data point, cell is clustered using DPC algorithms.
Step 2.1:With reference to DPC algorithms, using cell as data point, each cell pre-sub is taken to calculate two-by-two respectively
The distance between cell, form distance matrix dij, such as two cell pre-subs are respectively a (x11, x12 ..., x1n)
With b (x21, x22 ..., x2n), the Euclidean distance between this element lattice is as follows:
Step 2.2:Computing unit lattice are with having more highdensity the distance between cell attribute δ recentlyi, it calculates public
Formula is as follows:
Step 2.3:According to the local density attribute ρ of the first stepiWith distance property δi, decision diagram is drawn, takes two property values
All high cell is as cluster centre.
Step 2.4:The cluster of remaining cell is carried out using nearest neighbor algorithm, current point is attributed into density is equal to or high
It is a kind of in the closest approach of current point.
Step 2.5:Using Boundary value method in DPC algorithms, the border of current class is calculated, is then found out close in border
The density of peak is spent as threshold value, removes the point for being less than this density in current class.
Step 3:Return to final cluster result.
Claims (5)
1. a kind of density peaks clustering method and system based on grid, it is characterised in that using grid ideas by data space
The grid cell of size such as be divided into, initialization cluster then carried out to data, by Mapping of data points corresponding to grid cell
In, and the data message of grid cell is counted, then see each cell as a data point, using DPC algorithms to unit
Lattice are clustered, and draw cluster result.
2. according to the method for claim 1, it is characterized in that, described data set X={ X1, X2, X3 ... ... Xn } is one
N*d matrix, the often row of matrix represent a data point, and each column represents an attribute, therefore this data set includes n data
Point, each data point have d attribute.
3. according to the method for claim 1, it is characterized in that, described initialization cluster refers to:Will using CLIQUE algorithms
Data space then by all Mapping of data points to corresponding unit lattice, and counts per the one-dimensional grid cell for the size such as being divided into
Local density ρ of the data point number of each cell as this element latticei。
4. according to the method for claim 1, it is characterized in that, described carries out poly- include using DPC algorithms to cell:
Step 1:See ready-portioned grid cell as a data point;
Step 2:Take each cell pre-sub to calculate the distance between cell two-by-two respectively, form distance matrix dij;
Step 3:Utilize formulaComputing unit lattice and with more highdensity recently between cell
Distance property δi;
Step 4:According to above-mentioned required local density attribute ρiWith distance property δi, drawing unit lattice decision diagram, take two attributes
The all high cell of value is as cluster centre;
Step 5:The cluster of remaining cell is carried out using nearest neighbor algorithm, current point is attributed to density at or above current
The closest approach of point is a kind of;
Step 6:Using Boundary value method in DPC algorithms, the border of current class is calculated, then finds out border Midst density highest
The density of point removes the point for being less than this density in current class as threshold value.
A kind of 5. system for realizing any of the above-described claim methods described, it is characterised in that:Mesh generation module and density peak
Be worth cluster module, wherein mesh generation module by each data point carry out preliminary clusters, first divide data space into etc. size
Grid cell, then by Mapping of data points into corresponding grid, count the number of data point in grid cell;Density peaks cluster
Module first solves the δ of each grid celli, decision diagram selection cluster centre is then drawn, distributes all remaining grid lists
Member, element of noise is removed, export cluster result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610515319.7A CN107563400A (en) | 2016-06-30 | 2016-06-30 | A kind of density peaks clustering method and system based on grid |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610515319.7A CN107563400A (en) | 2016-06-30 | 2016-06-30 | A kind of density peaks clustering method and system based on grid |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107563400A true CN107563400A (en) | 2018-01-09 |
Family
ID=60968747
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610515319.7A Pending CN107563400A (en) | 2016-06-30 | 2016-06-30 | A kind of density peaks clustering method and system based on grid |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107563400A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108594250A (en) * | 2018-05-15 | 2018-09-28 | 北京石油化工学院 | A kind of point cloud data denoising point methods and device |
CN108897847A (en) * | 2018-06-28 | 2018-11-27 | 中国人民解放军国防科技大学 | Multi-GPU Density Peak Clustering Method Based on Locality Sensitive Hashing |
CN109255384A (en) * | 2018-09-12 | 2019-01-22 | 湖州市特种设备检测研究院 | A kind of traffic flow pattern recognition methods based on density peaks clustering algorithm |
CN109461198A (en) * | 2018-11-12 | 2019-03-12 | 网易(杭州)网络有限公司 | The processing method and processing device of grid model |
CN109658265A (en) * | 2018-12-13 | 2019-04-19 | 平安医疗健康管理股份有限公司 | The recognition methods of payment excess, equipment, storage medium and device based on big data |
CN110083475A (en) * | 2019-04-23 | 2019-08-02 | 新华三信息安全技术有限公司 | A kind of detection method and device of abnormal data |
CN110161464A (en) * | 2019-06-14 | 2019-08-23 | 成都纳雷科技有限公司 | A kind of Radar Multi Target clustering method and device |
CN110488259A (en) * | 2019-08-30 | 2019-11-22 | 成都纳雷科技有限公司 | A kind of classification of radar targets method and device based on GDBSCAN |
CN113112069A (en) * | 2021-04-13 | 2021-07-13 | 北京阿帕科蓝科技有限公司 | Population distribution prediction method, population distribution prediction system and electronic equipment |
CN113361411A (en) * | 2021-06-07 | 2021-09-07 | 国网新疆电力有限公司哈密供电公司 | Random pulse interference signal elimination method based on grid and density clustering algorithm |
CN113449208A (en) * | 2020-03-26 | 2021-09-28 | 阿里巴巴集团控股有限公司 | Space query method, device, system and storage medium |
-
2016
- 2016-06-30 CN CN201610515319.7A patent/CN107563400A/en active Pending
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108594250A (en) * | 2018-05-15 | 2018-09-28 | 北京石油化工学院 | A kind of point cloud data denoising point methods and device |
CN108897847A (en) * | 2018-06-28 | 2018-11-27 | 中国人民解放军国防科技大学 | Multi-GPU Density Peak Clustering Method Based on Locality Sensitive Hashing |
CN108897847B (en) * | 2018-06-28 | 2021-05-14 | 中国人民解放军国防科技大学 | Multi-GPU density peak clustering method based on locality sensitive hashing |
CN109255384A (en) * | 2018-09-12 | 2019-01-22 | 湖州市特种设备检测研究院 | A kind of traffic flow pattern recognition methods based on density peaks clustering algorithm |
CN109461198A (en) * | 2018-11-12 | 2019-03-12 | 网易(杭州)网络有限公司 | The processing method and processing device of grid model |
CN109461198B (en) * | 2018-11-12 | 2023-05-26 | 网易(杭州)网络有限公司 | Grid model processing method and device |
CN109658265A (en) * | 2018-12-13 | 2019-04-19 | 平安医疗健康管理股份有限公司 | The recognition methods of payment excess, equipment, storage medium and device based on big data |
CN110083475B (en) * | 2019-04-23 | 2022-10-25 | 新华三信息安全技术有限公司 | Abnormal data detection method and device |
CN110083475A (en) * | 2019-04-23 | 2019-08-02 | 新华三信息安全技术有限公司 | A kind of detection method and device of abnormal data |
CN110161464A (en) * | 2019-06-14 | 2019-08-23 | 成都纳雷科技有限公司 | A kind of Radar Multi Target clustering method and device |
CN110161464B (en) * | 2019-06-14 | 2023-03-10 | 成都纳雷科技有限公司 | Radar multi-target clustering method and device |
CN110488259A (en) * | 2019-08-30 | 2019-11-22 | 成都纳雷科技有限公司 | A kind of classification of radar targets method and device based on GDBSCAN |
CN113449208A (en) * | 2020-03-26 | 2021-09-28 | 阿里巴巴集团控股有限公司 | Space query method, device, system and storage medium |
CN113112069A (en) * | 2021-04-13 | 2021-07-13 | 北京阿帕科蓝科技有限公司 | Population distribution prediction method, population distribution prediction system and electronic equipment |
CN113361411A (en) * | 2021-06-07 | 2021-09-07 | 国网新疆电力有限公司哈密供电公司 | Random pulse interference signal elimination method based on grid and density clustering algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107563400A (en) | A kind of density peaks clustering method and system based on grid | |
JP5167442B2 (en) | Image identification apparatus and program | |
CN102222092B (en) | Massive high-dimension data clustering method for MapReduce platform | |
CN107016407A (en) | A kind of reaction type density peaks clustering method and system | |
CN102682477B (en) | Regular scene three-dimensional information extracting method based on structure prior | |
Wu et al. | Entropy-based active learning for object detection with progressive diversity constraint | |
Ghoshdastidar et al. | Spectral clustering using multilinear SVD: Analysis, approximations and applications | |
CN102682115B (en) | Dot density thematic map making method based on Voronoi picture | |
CN106845536A (en) | A kind of parallel clustering method based on image scaling | |
CN106650744A (en) | Image object co-segmentation method guided by local shape migration | |
Kaur | A survey of clustering techniques and algorithms | |
CN107194415A (en) | Peak clustering method based on Laplace centrality | |
CN111339924A (en) | Polarized SAR image classification method based on superpixel and full convolution network | |
CN107180079A (en) | The image search method of index is combined with Hash based on convolutional neural networks and tree | |
CN116304768A (en) | High-dimensional density peak clustering method based on improved equidistant mapping | |
CN106022359A (en) | Fuzzy entropy space clustering analysis method based on orderly information entropy | |
CN110781943A (en) | Clustering method based on adjacent grid search | |
He et al. | Nas-lid: Efficient neural architecture search with local intrinsic dimension | |
CN108510010A (en) | A kind of density peaks clustering method and system based on prescreening | |
CN113112177A (en) | Transformer area line loss processing method and system based on mixed indexes | |
Yarramalle et al. | Unsupervised image segmentation using finite doubly truncated Gaussian mixture model and hierarchical clustering | |
CN115344996A (en) | Wind power typical scene construction method and system based on multi-characteristic quantity indexes of improved K-means algorithm | |
Li et al. | High resolution radar data fusion based on clustering algorithm | |
CN104021563B (en) | Method for segmenting noise image based on multi-objective fuzzy clustering and opposing learning | |
Zhao et al. | Mining co-location patterns with spatial distribution characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180109 |