CN107038248A - A kind of massive spatial data Density Clustering method based on elasticity distribution data set - Google Patents

A kind of massive spatial data Density Clustering method based on elasticity distribution data set Download PDF

Info

Publication number
CN107038248A
CN107038248A CN201710298705.XA CN201710298705A CN107038248A CN 107038248 A CN107038248 A CN 107038248A CN 201710298705 A CN201710298705 A CN 201710298705A CN 107038248 A CN107038248 A CN 107038248A
Authority
CN
China
Prior art keywords
grid
data
density
halation
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201710298705.XA
Other languages
Chinese (zh)
Inventor
沈晔
周天和
李思剑
任培荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yang Fan Technology Co Ltd
Original Assignee
Hangzhou Yang Fan Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Yang Fan Technology Co Ltd filed Critical Hangzhou Yang Fan Technology Co Ltd
Priority to CN201710298705.XA priority Critical patent/CN107038248A/en
Publication of CN107038248A publication Critical patent/CN107038248A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Abstract

The present invention relates to a kind of massive spatial data Density Clustering method based on elasticity distribution data set, this method is directed to the aggregation properties base for quickly excavating extensive spatial data, based on " parallel computation merges local result in RDD partition areas " thought design, first according to data space distribution situation, automatic grid division simultaneously distributes data, so that data volume relative equilibrium in grid, reaches the purpose of balancing algorithms node load;Then, propose that a kind of local density suitable for parallel computation defines, and improve the calculation of cluster centre, solving primal algorithm needs to judge the defect of cluster centre object by drawing decision diagram;Finally, by the optimisation strategies such as merging that clustered in grid and between grid, the quick clustering processing of extensive spatial data is realized.The present invention can effectively realize the quick clustering of extensive spatial data, have higher accuracy and more preferable system process performance compared with traditional Density Clustering method.

Description

A kind of massive spatial data Density Clustering method based on elasticity distribution data set
Technical field
The present invention relates to mobile device, more particularly to a kind of massive spatial data density based on elasticity distribution data set are poly- Class method.
Background technology
Clustering plays the part of important role in Spatial Data Mining.Space cluster analysis is assembled spatial data by it Characteristic is divided into some cluster so that have larger similitude positioned at the same data clustered, and positioned at the different data clustered With larger otherness.According to different guiding theory, clustering algorithm can be divided into the cluster based on division, based on level Cluster, density clustering, the cluster based on grid and the cluster based on particular model.Classical division formula algorithm k- Means and its innovatory algorithm k-medoids, k-means++, the center of clustering is determined by successive ignition and data are sorted out into Algorithm is realized simply, but to noise-sensitive, poor to the treatment effect clustered of aspherical.
With the surge of data scale, traditional clustering algorithm compels to be essential often due to data volume is excessive and can not run Will high speed, effective, high flexible mass data clustering algorithm.Computer-oriented cluster GFS, BigTable and MapReduce skill Art provides thinking for the clustering of mass data.As the realization of increasing income of above-mentioned technology, Hadoop parallel computation frames exist Clustering field is widely used.Due to pursuing high-throughput, the parallel clustering based on Hadoop-MapReduce frameworks is calculated Method needs repeatedly read-write disk to access intermediate result, to cause algorithm I/O expenses larger, with higher delay, it is impossible to be used for Cluster in real time.
The content of the invention
The present invention is to overcome above-mentioned weak point, it is therefore intended that provide a kind of magnanimity based on elasticity distribution data set empty Between packing density clustering method, this method, which is directed to, quickly excavates the aggregation properties base of extensive spatial data, based on " RDD points Area's -- parallel computation in area -- merges local result " thought design, first according to data in the distribution situation in space, automatic division Grid simultaneously distributes data so that data volume relative equilibrium in grid, reaches the purpose of balancing algorithms node load;Then, propose A kind of local density suitable for parallel computation defines, and improves the calculation of cluster centre, solves primal algorithm needs The defect of cluster centre object is judged by drawing decision diagram;Finally, the optimization plan such as merging that clustered in grid and between grid is passed through Slightly, the quick clustering processing of extensive spatial data is realized.
The present invention is to reach above-mentioned purpose by the following technical programs:A kind of magnanimity space based on elasticity distribution data set Packing density clustering method, comprises the following steps:
(1) introduce space lattice index generation in the distribution situation in space based on data and be based on grid RDD subregions:
(1.1) using y-bend index generation space lattice, successively partition space and grid is built with reference to strategy from up to down Index, until the sub-grid boundary length of generation is not more than given pre-value;(1.2) MAP-Reduce thoughts are used, statistics is each The number of data object, broadcast index structure and respectively data object to be clustered to each calculate node, merger grid in layer grid Interior data amount information, obtains complete grid index structure;
(1.3) traversal index, searching data amount is less than the maximum mesh of set-point, and grid is based on according to lookup result generation The Key-Value RDD of numbering, generation is based on grid RDD subregions:
(2) cluster calculation in subregion is carried out:Foundation is defined as with improved local density, on each obtained subregion simultaneously Row operation cluster_dp algorithms determine the center that clusters so that data object has identical local density;
(3) local result merging is carried out by the merging optimisation strategy that clusters in grid and between adjacent mesh, completes cluster Processing.
Preferably, the space lattice is defined as follows:
Space S is divided into the subregion of several non-overlapping copies, then each region is a space lattice, is designated as G;Its InFor projection of the net boundary end points on kth dimension axle.
Preferably, the adjacent mesh is defined as follows:
For anyIn the presence of Then claim grid g1And g2It is adjacent.
Preferably, (1.1) are specific as follows;
Using y-bend index generation space lattice, wherein, the essential information and grid of each nodes records grid of index Interior data object number, takes tactful successively partition space from up to down and builds grid index, space is halved, storage life Into sub-grid information in grid index, and access newborn grid, newborn grid halved and stored again, until generation Sub-grid boundary length is not more than given pre-value.
Preferably, the step (1.3) is specially:After complete index structure is obtained, traversal index, searching data amount Then stop continuing down to travel through less than the maximum mesh of set-point, after finding, obtain the result result mappings accordingly of space division Data object, is generated the Key-ValueRDD numbered based on grid, utilizes Key-ValueRDD's MapPartitionWithIndex function interfaces, are automatically generated based on grid RDD subregions.
Preferably, the definition of the improved local density is:If ρ 'iFor data object piImprovement local density, then Have
Wherein, with data-oriented object piCentered on, its radius is that the k dimension spaces in dc are referred to as piDc neighborhoods it is adjacent to dc Data object p in domainj, there is dist (pi,pj)<dc。
Preferably, described when running cluster_dp algorithms, Design assistant function gamma judges the center that clusters, tool automatically Body is as follows:
The local density of given data object is ρ 'i, its minimum high density distance is δi, then set:
Wherein, max (ρ) * max (δ) are the maximum local density and the product of minimum high intensity values in grid;ρiIt is close for part Degree, is defined as piDc neighborhoods in the number of data object be referred to as piLocal density, be designated as ρi, formula is as follows:
Wherein,
Preferably, described minimum high density is apart from δiIt is defined as:If pjIt is that all local densities are higher than ρiData pair As middle apart from piNearest object, then claim NN (pi)=pjFor piNearest high density neighbours, claim δi=dist (pi, pj) it is piMost Small high density distance, defined formula is as follows:
Preferably, the step (3) is specially:By calculating the average local density between two clusters, by the member that clusters Labeled as core member and halation, wherein core member is the core clustered, is made up of high density point, is stable data Object is assembled;The corresponding periphery that clusters of halation, is the aggregation of the unstable partial data clustered comprising low-density data point; Using core and the concept of halation, the consolidation strategy clustered between grid is proposed:
If the data object distribution in adjacent grid close to net boundary has the following two kinds situation, need to adjust data pair Clustered as affiliated:
(a) there is the kernel object that clusters in adjacent mesh at proximal border, and kernel object is close to each other, then merges two Cluster;
(b) there is halation object in the boundaries of two adjoining grids, then need to reappraise that halation point belonged to clusters.
Reappraise the method clustered that halation point belonged to preferably, described and be:The grid where halation object Search the data object that density is higher than halation object in adjacent grid, and calculate the data object of the condition that meets to halation point away from From:If the minimum range calculated is less than the minimum high density distance of current halation object, the nearest height of halation point is updated Density neighbours and minimum high density distance, and halation object is assigned to new cluster according to the nearest high density neighbours after renewal In.
The beneficial effects of the present invention are:The inventive method realizes the quick clustering processing of extensive spatial data, gram The problem of postponing in cluster is taken.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the inventive method.
Embodiment
With reference to specific embodiment, the present invention is described further, but protection scope of the present invention is not limited in This:
Embodiment:In the present embodiment, if D={ p1,p2,...,pnIt is data set to be clustered, its residing k dimension spaces area Domain S is D calculating space.For the data object (1≤i≤n) in k dimension spaces,For piIn kth Projection on dimension axle.A kind of massive spatial data Density Clustering method based on elasticity distribution data set produces the main of cluster Step is as shown in figure 1, the realization of the present invention is based on following basic conception:
Define 1 (dc neighborhoods):With data-oriented object piCentered on, its radius is that the k dimension spaces in dc are referred to as piDc it is adjacent Domain is to the data object p in dc neighborhoodsj, there is dist (pi,pj)<dc。
Define 2 (local density pi):piDc neighborhoods in the number of data object be referred to as piLocal density, be designated as ρi. Have:
Wherein
Defining 3, (minimum high density is apart from δi):If pjIt is that all local densities are higher than ρiData object in apart from piRecently Object, then claim NN (pi)=pjFor piNearest high density neighbours claim δi=dist (pi, pj) it is piMinimum high density apart from Have:
Define 4 (space lattices):S is divided into the subregion of some non-overlapping copies, then each region is a space lattice, It is designated as G. whereinFor projection of the net boundary end points on kth dimension axle.
Define 5 (adjacent mesh):To anyIn the presence of Then claim grid g1And g2It is adjacent.
In the present embodiment, PClusterdp overall frameworks PClusterdp utilizes RDD data storages, based on " RDD points Parallel computation-merging local result in area-area " thought is designed, and RDD subregions are realized and parallel by segmentation S.Algorithm totality frame Frame is as follows:
Input:
D:a set of points to be clustered
S:computing space of D
dc:a user input radius distance
maxPointInGrid:a parameter determine the max number of points in a grid
Output:
C:a set of clusters
Method:
/*partitioning phase*/
DatasetRDDD
Execute a Space Partition Algorithm to splits S into grids
Get a grid set G using the split result
For each point in datasetRDD do
Map each point into correspond g belong to G
end for
generate a partitioned PointsRDD based on assigned points
/*paraell computing phase*/
Map partition:
For each partition in partitionedPointsRDD do
Execute a modified cluster_dp algorithm to generate local cluster set C’
End map
/*Mergeing phase*/
Execute merge local clusters Algorithm on C’to build final clustered data C
The space lattice G after data object to segmentation is mapped, the Key-ValueRDD numbered based on grid is generated.Utilize MapPartition interfaces, RDD is divided according to grid numbering, distributes the data object of same district to identical calculations node.Each node Independent operating density clustering algorithm obtains the local cluster based on grid division.Then, local cluster in adjacent mesh is merged, generation is most Whole cluster result.
In the present embodiment, PClusterdp algorithms introduce space lattice index, it is ensured that data volume relative equilibrium in grid, Utilize y-bend index generation space lattice.The essential information (grid) of each nodes records grid of index and data pair in grid As number (count), root vertex storage S and D.Algorithm takes tactful from up to down, successively partition space and builds grid rope Draw.S is halved, the sub-grid information of generation is stored in grid index.Newborn grid is then accessed, it is halved simultaneously again Storage, until the sub-grid boundary length of generation is less than or equal to given pre-value.To changeless S, grid index can be preserved And reuse, the time for building index is saved, efficiency of algorithm is improved.After the essential information for obtaining each layer grid, each layer is counted The number of data object in grid.Statistic algorithm uses the design of Map-Reduce thoughts to improve arithmetic speed.Broadcast index knot Structure and respectively data object to be clustered are to each calculate node.Each node is each counted in data and then merger grid in grid Data amount information.Specific space partitioning algorithm is as follows:
Obtain after complete index structure, traversal index, searching data amount is less than the maximum mesh of set-point, if finding Stop continue down travel through, obtain space division result accordingly result mapping data object, generate based on grid number Key-ValueRDD.Using Key-ValueRDD MapPartitionWithIndex function interfaces, automatically generate based on grid RDD subregions.
In the present embodiment, it is balancing algorithms speed and computational solution precision, realizes parallel computation, is defined as follows improved Local density's calculation.
Define 4 (improved local density ρ 'i):If ρ 'iFor data object piImprovement local density, have
Formula (3) considers the compactness of data object in neighborhood on the basis of formula (1).For being gathered around in dc neighborhoods There is the data object of identical neighbours' number, expand the difference of local density by calculating the backfence distance of data object and its.Can To think, under neighbours' number same case in dc contiguous ranges, and the more close data object of combination of its neighbour possesses more Big local density.Local density's definition proposed by the present invention, specifically, data object is limited in by the calculating of local density Within field, the object of grid division where data object is only considered when calculating local density and its adjoining grid division is kept away Exempt to travel through whole data set, reduce the work expense of calculate node.
Original cluster_dp it is determined that cluster center when, decision diagram need to be drawn, and judged by man-machine interaction.For Dependence and human intervention of the algorithm to decision diagram are broken away from, Design assistant function gamma judges the center that clusters automatically.Given data object Local density be ρ 'i, its minimum high density distance is δi, then set:
Wherein, max (ρ) * max (δ) are the maximum local density and the product of minimum high intensity values in grid.Due to local close Degree has different yardsticks from minimum high density distance, therefore by local density in grid and the maximum of minimum high density distance Value carries out simple normalization operation.By γiIt is limited to after [0,1], its descending is arranged, it can be seen that the non-center object γ that clusters Convergence 0, the discrete distribution of γ values for the center object that clusters and remote origin.Thus, the center clustered can be determined by pre-set threshold value Candidate target.The selection of preset value relies on actual application environment, selects γ>0.2 data object is used as the core candidate pair that clusters As the local cluster of generation, ideal cluster result can obtain.Obtain clustering after core candidate object, you can it is determined that the number clustered Mesh, and then data in grid are referred in corresponding local cluster.
By stages clusters consolidation strategy each RDD subregions one space lattice of correspondence, for the data pair close to net boundary As, it is necessary to assess its aggregation properties again between adjacent mesh, it is to avoid the classification mistake caused due to dividing RDD passes through meter The average local density between two clusters is calculated, the member that will cluster is labeled as core member (clustercore) and halation (clus- terhalo).Wherein, core member is the core clustered, is made up of high density point, is stable data object aggregation;And The corresponding periphery that clusters of halation, is the aggregation of the unstable partial data clustered comprising low-density data point.Utilize core and halation Concept, if proposing that the data object distribution between grid in the adjacent grid of the merging method that clusters close to net boundary is present Following situation, then need to cluster belonging to adjustment data object.
There is the kernel object that clusters in situation 1, adjacent mesh at proximal border, and kernel object is close to each other.Due to net The presence of lattice, should be classified as the same data object clustered by script and be assigned in different cluster, and merge two in the case of this Cluster.
There is halation object in situation 2, the boundaries of two adjoining grids, now need to reappraise that halation point belonged to is poly- Cluster.Specific adjustment algorithm is as follows.
The data object that density is higher than halation object is searched in the adjoining grid of grid where halation object, and calculates full Distance of the data object of sufficient condition to halation point.If the minimum range calculated is less than the minimum highly dense of current halation object Distance is spent, then updates the nearest high density neighbours of halation point and minimum high density distance, and according to the nearest high density after renewal Halation object is assigned in new cluster by neighbours.
The technical principle for being the specific embodiment of the present invention and being used above, if conception under this invention institute The change of work, during the spirit that function produced by it is still covered without departing from specification and accompanying drawing, should belong to the present invention's Protection domain.

Claims (10)

1. a kind of massive spatial data Density Clustering method based on elasticity distribution data set, it is characterised in that including following step Suddenly:
(1) introduce space lattice index generation in the distribution situation in space based on data and be based on grid RDD subregions:
(1.1) using y-bend index generation space lattice, successively partition space and grid index is built with reference to strategy from up to down, Until the sub-grid boundary length of generation is not more than given pre-value;
(1.2) MAP-Reduce thoughts are used, the number of data object in each layer grid is counted, index structure is broadcasted and divides equally and treat Cluster data object is to each calculate node, and data amount information in merger grid obtains complete grid index structure;
(1.3) traversal index, searching data amount is less than the maximum mesh of set-point, according to lookup result generation based on grid numbering Key-Value RDD, generation be based on grid RDD subregions:
(2) cluster calculation in subregion is carried out:Foundation is defined as with improved local density, transported parallel on each obtained subregion Row cluster_dp algorithms determine the center that clusters so that data object has identical local density;
(3) local result merging is carried out by the merging optimisation strategy that clusters in grid and between adjacent mesh, completes clustering processing.
2. a kind of massive spatial data Density Clustering method based on elasticity distribution data set according to claim 1, its It is characterised by:The space lattice is defined as follows:
Space S is divided into the subregion of several non-overlapping copies, then each region is a space lattice, is designated as G;WhereinFor projection of the net boundary end points on kth dimension axle.
3. a kind of massive spatial data Density Clustering method based on elasticity distribution data set according to claim 1, its It is characterised by:The adjacent mesh is defined as follows:
For anyIn the presence of Then Claim grid g1And g2It is adjacent.
4. a kind of massive spatial data Density Clustering method based on elasticity distribution data set according to claim 1, its It is characterised by:(1.1) are specific as follows;
Using y-bend index generation space lattice, wherein, number in the essential information and grid of each nodes records grid of index According to object number, take tactful successively partition space from up to down and build grid index, space is halved, storage generation Sub-grid information accesses newborn grid in grid index, and newborn grid is halved and stored again, until the subnet of generation Lattice boundary length is not more than given pre-value.
5. a kind of massive spatial data Density Clustering method based on elasticity distribution data set according to claim 1, its It is characterised by:The step (1.3) is specially:After complete index structure is obtained, traversal index, searching data amount is less than given The maximum mesh of value, after finding then stop continue down travel through, obtain space division result accordingly result mapping data pair As generating the Key-ValueRDD numbered based on grid, utilizing Key-ValueRDD MapPartitionWithIndex functions Interface, is automatically generated based on grid RDD subregions.
6. a kind of massive spatial data Density Clustering method based on elasticity distribution data set according to claim 1, its It is characterised by:The definition of the improved local density is:If ρ 'iFor data object piImprovement local density, then have
Wherein, with data-oriented object piCentered on, its radius is that the k dimension spaces in dc are referred to as piDc neighborhoods in dc neighborhoods Data object pj, there is dist (pi,pj)<dc。
7. a kind of massive spatial data Density Clustering method based on elasticity distribution data set according to claim 1, its It is characterised by:It is described when running cluster_dp algorithms, Design assistant function gamma judges to cluster center automatically, specific as follows:
The local density of given data object is ρ 'i, its minimum high density distance is δi, then set:
Wherein, max (ρ) * max (δ) are the maximum local density and the product of minimum high intensity values in grid;ρiFor local density, determine Justice is piDc neighborhoods in the number of data object be referred to as piLocal density, be designated as ρi, formula is as follows:
Wherein,
8. a kind of massive spatial data Density Clustering method based on elasticity distribution data set according to claim 7, its It is characterised by:Described minimum high density is apart from δiIt is defined as:If pjIt is that all local densities are higher than ρiData object in distance piNearest object, then claim NN (pi)=pjFor piNearest high density neighbours, claim δi=dist (pi, pj) it is piMinimum high density Distance, defined formula is as follows:
9. a kind of massive spatial data Density Clustering method based on elasticity distribution data set according to claim 1, its It is characterised by:The step (3) is specially:By calculating the average local density between two clusters, the member that will cluster is labeled as core Heart member and halation, wherein core member are the core clustered, are made up of high density point, are that stable data object gathers Collection;The corresponding periphery that clusters of halation, is the aggregation of the unstable partial data clustered comprising low-density data point;Utilize core The concept of the heart and halation, proposes the consolidation strategy clustered between grid:If close to the data object point of net boundary in adjacent grid There is the following two kinds situation in cloth, then need to cluster belonging to adjustment data object:
(a) there is the kernel object that clusters in adjacent mesh at proximal border, and kernel object is close to each other, then merges two and gather Cluster;
(b) there is halation object in the boundaries of two adjoining grids, then need to reappraise that halation point belonged to clusters.
10. a kind of massive spatial data Density Clustering method based on elasticity distribution data set according to claim 9, its It is characterised by:It is described to reappraise the method clustered that halation point belonged to and be:The adjoining grid of grid where halation object It is middle to search the data object that density is higher than halation object, and calculating meets the data object of condition to the distance of halation point:If meter The minimum range drawn is less than the minimum high density distance of current halation object, then updates the nearest high density neighbours of halation point With minimum high density distance, and halation object is assigned in new cluster according to the nearest high density neighbours after renewal.
CN201710298705.XA 2017-04-27 2017-04-27 A kind of massive spatial data Density Clustering method based on elasticity distribution data set Withdrawn CN107038248A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710298705.XA CN107038248A (en) 2017-04-27 2017-04-27 A kind of massive spatial data Density Clustering method based on elasticity distribution data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710298705.XA CN107038248A (en) 2017-04-27 2017-04-27 A kind of massive spatial data Density Clustering method based on elasticity distribution data set

Publications (1)

Publication Number Publication Date
CN107038248A true CN107038248A (en) 2017-08-11

Family

ID=59538432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710298705.XA Withdrawn CN107038248A (en) 2017-04-27 2017-04-27 A kind of massive spatial data Density Clustering method based on elasticity distribution data set

Country Status (1)

Country Link
CN (1) CN107038248A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537274A (en) * 2018-04-08 2018-09-14 武汉大学 A kind of Multi scale Fast Speed Clustering based on grid
CN109389140A (en) * 2017-08-14 2019-02-26 中国科学院计算技术研究所 The method and system of quick searching cluster centre based on Spark
CN109408562A (en) * 2018-11-07 2019-03-01 广东工业大学 A kind of grouping recommended method and its device based on client characteristics
CN109452935A (en) * 2017-09-06 2019-03-12 塔塔咨询服务有限公司 The non-invasive methods and system from photoplethysmogram estimated blood pressure are post-processed using statistics
CN109783240A (en) * 2019-01-27 2019-05-21 中国人民解放军国防科技大学 local optimization structured grid load balancing method based on MINMAX
CN110008215A (en) * 2019-03-22 2019-07-12 武汉大学 A kind of big data searching method based on improved KD tree parallel algorithm
CN110161464A (en) * 2019-06-14 2019-08-23 成都纳雷科技有限公司 A kind of Radar Multi Target clustering method and device
CN110224847A (en) * 2018-05-02 2019-09-10 腾讯科技(深圳)有限公司 Group dividing method, device, storage medium and equipment based on social networks
CN110414587A (en) * 2019-07-23 2019-11-05 南京邮电大学 Depth convolutional neural networks training method and system based on progressive learning
CN110427531A (en) * 2019-07-19 2019-11-08 清华大学 Grid layout visualization method and system are carried out to multiple samples
CN110597935A (en) * 2019-08-05 2019-12-20 北京云和时空科技有限公司 Space analysis method and device
CN111291276A (en) * 2020-01-13 2020-06-16 武汉大学 Clustering method based on local direction centrality measurement
US10803096B2 (en) 2017-09-28 2020-10-13 Here Global B.V. Parallelized clustering of geospatial data
CN112100243A (en) * 2020-09-15 2020-12-18 山东理工大学 Abnormal aggregation detection method based on mass space-time data analysis
CN113449052A (en) * 2020-03-26 2021-09-28 丰图科技(深圳)有限公司 Method for establishing spatial index, method and device for querying spatial region
TWI764205B (en) * 2019-08-27 2022-05-11 南韓商韓領有限公司 Computer-implemented system and method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163224A (en) * 2011-04-06 2011-08-24 中南大学 Adaptive spatial clustering method
CN105404648A (en) * 2015-10-29 2016-03-16 东北大学 Density and closeness clustering based user moving behavior determination method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163224A (en) * 2011-04-06 2011-08-24 中南大学 Adaptive spatial clustering method
CN105404648A (en) * 2015-10-29 2016-03-16 东北大学 Density and closeness clustering based user moving behavior determination method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李璐明 等: "基于弹性分布数据集的海量空间数据密度聚类", 《湖南大学学报(自然科学版)》 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389140A (en) * 2017-08-14 2019-02-26 中国科学院计算技术研究所 The method and system of quick searching cluster centre based on Spark
CN109452935A (en) * 2017-09-06 2019-03-12 塔塔咨询服务有限公司 The non-invasive methods and system from photoplethysmogram estimated blood pressure are post-processed using statistics
US10803096B2 (en) 2017-09-28 2020-10-13 Here Global B.V. Parallelized clustering of geospatial data
CN108537274B (en) * 2018-04-08 2020-06-19 武汉大学 Method for rapidly clustering POI (Point of interest) position points in space on multiple scales based on grids
CN108537274A (en) * 2018-04-08 2018-09-14 武汉大学 A kind of Multi scale Fast Speed Clustering based on grid
CN110224847A (en) * 2018-05-02 2019-09-10 腾讯科技(深圳)有限公司 Group dividing method, device, storage medium and equipment based on social networks
CN109408562A (en) * 2018-11-07 2019-03-01 广东工业大学 A kind of grouping recommended method and its device based on client characteristics
CN109408562B (en) * 2018-11-07 2021-11-26 广东工业大学 Grouping recommendation method and device based on client characteristics
CN109783240A (en) * 2019-01-27 2019-05-21 中国人民解放军国防科技大学 local optimization structured grid load balancing method based on MINMAX
CN109783240B (en) * 2019-01-27 2020-08-25 中国人民解放军国防科技大学 Local optimization structured grid parallel computing load balancing method based on MINMAX
CN110008215A (en) * 2019-03-22 2019-07-12 武汉大学 A kind of big data searching method based on improved KD tree parallel algorithm
CN110161464A (en) * 2019-06-14 2019-08-23 成都纳雷科技有限公司 A kind of Radar Multi Target clustering method and device
CN110161464B (en) * 2019-06-14 2023-03-10 成都纳雷科技有限公司 Radar multi-target clustering method and device
CN110427531A (en) * 2019-07-19 2019-11-08 清华大学 Grid layout visualization method and system are carried out to multiple samples
CN110414587A (en) * 2019-07-23 2019-11-05 南京邮电大学 Depth convolutional neural networks training method and system based on progressive learning
CN110597935A (en) * 2019-08-05 2019-12-20 北京云和时空科技有限公司 Space analysis method and device
TWI764205B (en) * 2019-08-27 2022-05-11 南韓商韓領有限公司 Computer-implemented system and method
CN111291276A (en) * 2020-01-13 2020-06-16 武汉大学 Clustering method based on local direction centrality measurement
CN111291276B (en) * 2020-01-13 2023-05-19 武汉大学 Clustering method based on local direction centrality measurement
CN113449052A (en) * 2020-03-26 2021-09-28 丰图科技(深圳)有限公司 Method for establishing spatial index, method and device for querying spatial region
CN112100243A (en) * 2020-09-15 2020-12-18 山东理工大学 Abnormal aggregation detection method based on mass space-time data analysis
CN112100243B (en) * 2020-09-15 2024-02-20 山东理工大学 Abnormal aggregation detection method based on massive space-time data analysis

Similar Documents

Publication Publication Date Title
CN107038248A (en) A kind of massive spatial data Density Clustering method based on elasticity distribution data set
Cai et al. Evolving an optimal kernel extreme learning machine by using an enhanced grey wolf optimization strategy
Gufler et al. Load balancing in mapreduce based on scalable cardinality estimates
CN110222029A (en) A kind of big data multidimensional analysis computational efficiency method for improving and system
Ferranti et al. A distributed approach to multi-objective evolutionary generation of fuzzy rule-based classifiers from big data
CN105631068B (en) A kind of net boundary conditional processing method that unstrctured grid CFD is calculated
CN109033340A (en) A kind of searching method and device of the point cloud K neighborhood based on Spark platform
CN106127244A (en) A kind of parallelization K means improved method and system
CN106096052A (en) A kind of consumer&#39;s clustering method towards wechat marketing
CN111382320A (en) Large-scale data increment processing method for knowledge graph
CN108764307A (en) The density peaks clustering method of natural arest neighbors optimization
CN105447519A (en) Model detection method based on feature selection
CN108416381A (en) A kind of multi-density clustering method towards three-dimensional point set
CN113128617B (en) Spark and ASPSO based parallelization K-means optimization method
CN107301094A (en) The dynamic self-adapting data model inquired about towards extensive dynamic transaction
CN108052832B (en) Sorting-based micro-aggregation anonymization method
Zhang et al. A multiobjective cellular genetic algorithm based on 3D structure and cosine crowding measurement
CN108897820B (en) Parallelization method of DENCLUE algorithm
CN106780747A (en) A kind of method that Fast Segmentation CFD calculates grid
Yu et al. DBWGIE-MR: A density-based clustering algorithm by using the weighted grid and information entropy based on MapReduce
CN105337759B (en) It is a kind of based on inside and outside community structure than measure and community discovery method
Tan et al. An improved cuckoo search algorithm for multilevel color image thresholding based on modified fuzzy entropy
Diao et al. An improved DBSCAN algorithm using local parameters
KR101953479B1 (en) Group search optimization data clustering method and system using the relative ratio of distance
Wang et al. Research on Clustream Algorithm Based on Spark

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20170811