CN107563400A - A kind of density peaks clustering method and system based on grid - Google Patents

A kind of density peaks clustering method and system based on grid Download PDF

Info

Publication number
CN107563400A
CN107563400A CN201610515319.7A CN201610515319A CN107563400A CN 107563400 A CN107563400 A CN 107563400A CN 201610515319 A CN201610515319 A CN 201610515319A CN 107563400 A CN107563400 A CN 107563400A
Authority
CN
China
Prior art keywords
cell
data
grid
density
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610515319.7A
Other languages
Chinese (zh)
Inventor
丁世飞
徐晓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Mining and Technology CUMT
Original Assignee
China University of Mining and Technology CUMT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Mining and Technology CUMT filed Critical China University of Mining and Technology CUMT
Priority to CN201610515319.7A priority Critical patent/CN107563400A/en
Publication of CN107563400A publication Critical patent/CN107563400A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The present invention proposes a kind of density peaks clustering method and system based on grid, first, data space is divided into etc. to the rectangular unit grid of size, then, respectively by each Mapping of data points into corresponding cell, the data message of each cell is counted again, sees each cell as a data point, finally cell is clustered using density peaks algorithm.This method can not only effectively improve the operational efficiency of density peaks algorithm, handle large data sets well, find the cluster of arbitrary shape, effectively handle high dimensional data, and can handle noise isolated point well, have Clustering Effect well.

Description

A kind of density peaks clustering method and system based on grid
Technical field
The present invention relates to pattern-recognition and machine learning field, and in particular to a kind of density peaks cluster side based on grid Method and system.
Background technology
Cluster analysis is unsupervised learning, and its target is exactly to make the similarity of sample in same class cluster larger, inhomogeneity cluster Between sample similarity it is smaller.Cluster analysis is an active research direction of data mining, and in market analysis, pattern is known Not, the fields such as gene studies, image procossing have certain application value.Clustering algorithm can substantially be divided into based on division, base In level, based on model, based on density and based on clustering algorithms such as grids.
Clustering algorithm performance based on grid is good, efficiency high, run time independently of data point number, only and dividing regions It is related per one-dimensional grid cell in domain, there is higher practicality, and the result clustered to the analyzing and processing of large data sets It is unrelated with the order of input data, thus be widely used.But the clustering algorithm based on grid is highly dependent on density threshold Selection, it is poor to the noise data recognition capability in border mesh.Density-based algorithms are with data set in data space In densely distributed degree clustered for certain foundation, while the shape clustered does not have benchmark, and can be in needs When remove noise data, but density-based algorithms computation complexity is higher.Although the clustering algorithm based on grid Efficiency high, but due to substantially the defects of, clustering precision is not high, so can only regard a kind of compression means as, is combined with density To improve clustering performance.It is also next frequently by being combined with grid and density-based algorithms are due to complexity is high Operand is reduced, both combinations can effectively improve operational efficiency.
DPC algorithms based on density can be used for the cluster analysis of different pieces of information, it is not necessary to class number of clusters is preset, can be with Class cluster center is found out according to decision diagram, and can apply to the data of arbitrary shape.But due to DPC algorithms need to calculate in advance it is all Distance between points, and when data set is increasing, especially this is a big data epoch, this calculating local density Method need to take some time cost.
The content of the invention
In order to solve the above problems, the present invention proposes a kind of density peaks clustering method and system based on grid.First, Data space is divided into etc. to the rectangular unit grid of size, then, respectively by each Mapping of data points into corresponding cell, The data message of each cell is counted again, each cell is seen as a data point, finally using density peaks algorithm Cell is clustered.This method can not only effectively improve the operational efficiency of density peaks algorithm, well the big number of processing According to collection, the cluster of arbitrary shape is found, effectively handles high dimensional data, and noise isolated point can be handled well, had and gather well Class effect.
The present invention is achieved by the following scheme:
The present invention relates to a kind of density peaks clustering method based on grid, based on the DPC algorithms based on density, The thought of grid is introduced when calculating local density's property value of each data point, to reduce amount of calculation, improves operational efficiency.
The present invention comprises the following steps that:
Step 1:The every of S spaces one-dimensional is divided into the size grid cell such as mutually disjoint using grid ideas.
Step 2:By each Mapping of data points into corresponding grid cell.
Step 3:The number of data point in each grid cell is counted, the local density ρ as this celli
Step 4:With reference to DPC algorithms, using cell as data point, distance matrix d is formedij
Step 5:UtilizeComputing unit lattice and with more highdensity recently between cell Distance property δi
Step 6:According to the above-mentioned local density attribute ρ obtainediWith distance property δi, decision diagram is drawn, takes two property values All high cell is as cluster centre.
Step 7:Remaining cell is clustered, current cell is attributed into density is at or above current cell Nearest cell it is a kind of.
Step 8:The border of current class is calculated, then finds out the density of border Midst density highest cell as threshold Value, remove the cell for being less than this density in current class.
By above content, the application provides a kind of density peaks clustering method and system based on grid, Initialization cluster is carried out to data by the CLIQUE algorithms based on grid first, the regional space of input data is partitioned into Etc. the grid rectangular element of size, then by all Mapping of data points to cell, and the data message of each cell is counted. Then see each cell as a data point, cell is clustered using DPC algorithms.The application can not only have Effect improves the operational efficiency of density peaks algorithm, handles large data sets well, finds the cluster of arbitrary shape, effectively handle higher-dimension Data, and noise isolated point can be handled well, there is Clustering Effect well.
Brief description of the drawings
In order to be further understood to the present invention, the embodiment of the present invention is illustrated more clearly that, in being described below to embodiment The required accompanying drawing used is briefly described.
Fig. 1 is a kind of flow chart for density peaks cluster based on grid that the application case study on implementation provides.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present application, the technical scheme in the embodiment of the present application is carried out clear, complete Site preparation describes.Obviously, described case study on implementation is only some embodiments of the present application, rather than whole embodiments.Base Embodiment in the application, those of ordinary skill in the art obtained under the premise of creative work is not paid it is all its Its embodiment, belong to the scope of the application protection.
Embodiment 1
As shown in figure 1, the implementation case comprises the following steps:
Input:Data set X=X1, X2, X3 ... and Xn }, cell local density parameter dc, mesh spacing parameter ξ.
Output:Cluster result.
Step 1, initialization cluster is carried out to data using the CLIQUE algorithms based on grid, the region of input data is empty Between the grid rectangular element of size such as be partitioned into, then by all Mapping of data points to cell, and count each cell Data message.
Step 1.1:If A={ A1, A2, A3......An } is the set of a N-dimensional, S=A1*A*A3......*An is One N-dimensional space, V={ v1, v2, v3......vn }, wherein vi={ vil, vi2......vin } and vij ∈ Aj.According to The data distribution of different pieces of information collection, ξ parameters are inputted, by ξ being step-length, one-dimensional to be divided into mutually disjoint grade big by S spaces every Small grid unit.
Step 1.2:Then each grid cell is regarded as { u1, u2......un }.By data point V=v1, v2, V3......vn } it is mapped in u={ u1, u2......un } unit.
Step 1.3:The number of data point in each cell u is counted, the local density ρ as each celli
Such as:Point v1={ 2.2,2.3 }, v2={ 3.1,3.2 }, v3={ 2.5,2.9 }.ξ=1 is taken to carry out mesh generation, Now v1, v3 are divided in { 2,2 } grid, and v2 is divided in { 3,3 } grid.So density p ({ 2,2 }) of grid { 2,2 } =2, the density p ({ 3,3 })=1 of grid { 3,3 }.
Step 2, see each cell as a data point, cell is clustered using DPC algorithms.
Step 2.1:With reference to DPC algorithms, using cell as data point, each cell pre-sub is taken to calculate two-by-two respectively The distance between cell, form distance matrix dij, such as two cell pre-subs are respectively a (x11, x12 ..., x1n) With b (x21, x22 ..., x2n), the Euclidean distance between this element lattice is as follows:
Step 2.2:Computing unit lattice are with having more highdensity the distance between cell attribute δ recentlyi, it calculates public Formula is as follows:
Step 2.3:According to the local density attribute ρ of the first stepiWith distance property δi, decision diagram is drawn, takes two property values All high cell is as cluster centre.
Step 2.4:The cluster of remaining cell is carried out using nearest neighbor algorithm, current point is attributed into density is equal to or high It is a kind of in the closest approach of current point.
Step 2.5:Using Boundary value method in DPC algorithms, the border of current class is calculated, is then found out close in border The density of peak is spent as threshold value, removes the point for being less than this density in current class.
Step 3:Return to final cluster result.

Claims (5)

1. a kind of density peaks clustering method and system based on grid, it is characterised in that using grid ideas by data space The grid cell of size such as be divided into, initialization cluster then carried out to data, by Mapping of data points corresponding to grid cell In, and the data message of grid cell is counted, then see each cell as a data point, using DPC algorithms to unit Lattice are clustered, and draw cluster result.
2. according to the method for claim 1, it is characterized in that, described data set X={ X1, X2, X3 ... ... Xn } is one N*d matrix, the often row of matrix represent a data point, and each column represents an attribute, therefore this data set includes n data Point, each data point have d attribute.
3. according to the method for claim 1, it is characterized in that, described initialization cluster refers to:Will using CLIQUE algorithms Data space then by all Mapping of data points to corresponding unit lattice, and counts per the one-dimensional grid cell for the size such as being divided into Local density ρ of the data point number of each cell as this element latticei
4. according to the method for claim 1, it is characterized in that, described carries out poly- include using DPC algorithms to cell:
Step 1:See ready-portioned grid cell as a data point;
Step 2:Take each cell pre-sub to calculate the distance between cell two-by-two respectively, form distance matrix dij
Step 3:Utilize formulaComputing unit lattice and with more highdensity recently between cell Distance property δi
Step 4:According to above-mentioned required local density attribute ρiWith distance property δi, drawing unit lattice decision diagram, take two attributes The all high cell of value is as cluster centre;
Step 5:The cluster of remaining cell is carried out using nearest neighbor algorithm, current point is attributed to density at or above current The closest approach of point is a kind of;
Step 6:Using Boundary value method in DPC algorithms, the border of current class is calculated, then finds out border Midst density highest The density of point removes the point for being less than this density in current class as threshold value.
A kind of 5. system for realizing any of the above-described claim methods described, it is characterised in that:Mesh generation module and density peak Be worth cluster module, wherein mesh generation module by each data point carry out preliminary clusters, first divide data space into etc. size Grid cell, then by Mapping of data points into corresponding grid, count the number of data point in grid cell;Density peaks cluster Module first solves the δ of each grid celli, decision diagram selection cluster centre is then drawn, distributes all remaining grid lists Member, element of noise is removed, export cluster result.
CN201610515319.7A 2016-06-30 2016-06-30 A kind of density peaks clustering method and system based on grid Pending CN107563400A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610515319.7A CN107563400A (en) 2016-06-30 2016-06-30 A kind of density peaks clustering method and system based on grid

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610515319.7A CN107563400A (en) 2016-06-30 2016-06-30 A kind of density peaks clustering method and system based on grid

Publications (1)

Publication Number Publication Date
CN107563400A true CN107563400A (en) 2018-01-09

Family

ID=60968747

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610515319.7A Pending CN107563400A (en) 2016-06-30 2016-06-30 A kind of density peaks clustering method and system based on grid

Country Status (1)

Country Link
CN (1) CN107563400A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108594250A (en) * 2018-05-15 2018-09-28 北京石油化工学院 A kind of point cloud data denoising point methods and device
CN108897847A (en) * 2018-06-28 2018-11-27 中国人民解放军国防科技大学 Multi-GPU Density Peak Clustering Method Based on Locality Sensitive Hashing
CN109255384A (en) * 2018-09-12 2019-01-22 湖州市特种设备检测研究院 A kind of traffic flow pattern recognition methods based on density peaks clustering algorithm
CN109461198A (en) * 2018-11-12 2019-03-12 网易(杭州)网络有限公司 The processing method and processing device of grid model
CN109658265A (en) * 2018-12-13 2019-04-19 平安医疗健康管理股份有限公司 The recognition methods of payment excess, equipment, storage medium and device based on big data
CN110083475A (en) * 2019-04-23 2019-08-02 新华三信息安全技术有限公司 A kind of detection method and device of abnormal data
CN110161464A (en) * 2019-06-14 2019-08-23 成都纳雷科技有限公司 A kind of Radar Multi Target clustering method and device
CN110488259A (en) * 2019-08-30 2019-11-22 成都纳雷科技有限公司 A kind of classification of radar targets method and device based on GDBSCAN
CN113112069A (en) * 2021-04-13 2021-07-13 北京阿帕科蓝科技有限公司 Population distribution prediction method, population distribution prediction system and electronic equipment
CN113361411A (en) * 2021-06-07 2021-09-07 国网新疆电力有限公司哈密供电公司 Random pulse interference signal elimination method based on grid and density clustering algorithm
CN113449208A (en) * 2020-03-26 2021-09-28 阿里巴巴集团控股有限公司 Space query method, device, system and storage medium

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108594250A (en) * 2018-05-15 2018-09-28 北京石油化工学院 A kind of point cloud data denoising point methods and device
CN108897847A (en) * 2018-06-28 2018-11-27 中国人民解放军国防科技大学 Multi-GPU Density Peak Clustering Method Based on Locality Sensitive Hashing
CN108897847B (en) * 2018-06-28 2021-05-14 中国人民解放军国防科技大学 Multi-GPU density peak clustering method based on locality sensitive hashing
CN109255384A (en) * 2018-09-12 2019-01-22 湖州市特种设备检测研究院 A kind of traffic flow pattern recognition methods based on density peaks clustering algorithm
CN109461198A (en) * 2018-11-12 2019-03-12 网易(杭州)网络有限公司 The processing method and processing device of grid model
CN109461198B (en) * 2018-11-12 2023-05-26 网易(杭州)网络有限公司 Grid model processing method and device
CN109658265A (en) * 2018-12-13 2019-04-19 平安医疗健康管理股份有限公司 The recognition methods of payment excess, equipment, storage medium and device based on big data
CN110083475B (en) * 2019-04-23 2022-10-25 新华三信息安全技术有限公司 Abnormal data detection method and device
CN110083475A (en) * 2019-04-23 2019-08-02 新华三信息安全技术有限公司 A kind of detection method and device of abnormal data
CN110161464A (en) * 2019-06-14 2019-08-23 成都纳雷科技有限公司 A kind of Radar Multi Target clustering method and device
CN110161464B (en) * 2019-06-14 2023-03-10 成都纳雷科技有限公司 Radar multi-target clustering method and device
CN110488259A (en) * 2019-08-30 2019-11-22 成都纳雷科技有限公司 A kind of classification of radar targets method and device based on GDBSCAN
CN113449208A (en) * 2020-03-26 2021-09-28 阿里巴巴集团控股有限公司 Space query method, device, system and storage medium
CN113112069A (en) * 2021-04-13 2021-07-13 北京阿帕科蓝科技有限公司 Population distribution prediction method, population distribution prediction system and electronic equipment
CN113361411A (en) * 2021-06-07 2021-09-07 国网新疆电力有限公司哈密供电公司 Random pulse interference signal elimination method based on grid and density clustering algorithm

Similar Documents

Publication Publication Date Title
CN107563400A (en) A kind of density peaks clustering method and system based on grid
JP5167442B2 (en) Image identification apparatus and program
CN102222092B (en) Massive high-dimension data clustering method for MapReduce platform
CN107016407A (en) A kind of reaction type density peaks clustering method and system
CN102682477B (en) Regular scene three-dimensional information extracting method based on structure prior
Wu et al. Entropy-based active learning for object detection with progressive diversity constraint
Ghoshdastidar et al. Spectral clustering using multilinear SVD: Analysis, approximations and applications
CN102682115B (en) Dot density thematic map making method based on Voronoi picture
CN106845536A (en) A kind of parallel clustering method based on image scaling
CN106650744A (en) Image object co-segmentation method guided by local shape migration
Kaur A survey of clustering techniques and algorithms
CN107194415A (en) Peak clustering method based on Laplace centrality
CN111339924A (en) Polarized SAR image classification method based on superpixel and full convolution network
CN107180079A (en) The image search method of index is combined with Hash based on convolutional neural networks and tree
CN116304768A (en) High-dimensional density peak clustering method based on improved equidistant mapping
CN106022359A (en) Fuzzy entropy space clustering analysis method based on orderly information entropy
CN110781943A (en) Clustering method based on adjacent grid search
He et al. Nas-lid: Efficient neural architecture search with local intrinsic dimension
CN108510010A (en) A kind of density peaks clustering method and system based on prescreening
CN113112177A (en) Transformer area line loss processing method and system based on mixed indexes
Yarramalle et al. Unsupervised image segmentation using finite doubly truncated Gaussian mixture model and hierarchical clustering
CN115344996A (en) Wind power typical scene construction method and system based on multi-characteristic quantity indexes of improved K-means algorithm
Li et al. High resolution radar data fusion based on clustering algorithm
CN104021563B (en) Method for segmenting noise image based on multi-objective fuzzy clustering and opposing learning
Zhao et al. Mining co-location patterns with spatial distribution characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180109