CN107038248A - A kind of massive spatial data Density Clustering method based on elasticity distribution data set - Google Patents
A kind of massive spatial data Density Clustering method based on elasticity distribution data set Download PDFInfo
- Publication number
- CN107038248A CN107038248A CN201710298705.XA CN201710298705A CN107038248A CN 107038248 A CN107038248 A CN 107038248A CN 201710298705 A CN201710298705 A CN 201710298705A CN 107038248 A CN107038248 A CN 107038248A
- Authority
- CN
- China
- Prior art keywords
- grid
- data
- density
- halation
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
Abstract
The present invention relates to a kind of massive spatial data Density Clustering method based on elasticity distribution data set, this method is directed to the aggregation properties base for quickly excavating extensive spatial data, based on " parallel computation merges local result in RDD partition areas " thought design, first according to data space distribution situation, automatic grid division simultaneously distributes data, so that data volume relative equilibrium in grid, reaches the purpose of balancing algorithms node load;Then, propose that a kind of local density suitable for parallel computation defines, and improve the calculation of cluster centre, solving primal algorithm needs to judge the defect of cluster centre object by drawing decision diagram;Finally, by the optimisation strategies such as merging that clustered in grid and between grid, the quick clustering processing of extensive spatial data is realized.The present invention can effectively realize the quick clustering of extensive spatial data, have higher accuracy and more preferable system process performance compared with traditional Density Clustering method.
Description
Technical field
The present invention relates to mobile device, more particularly to a kind of massive spatial data density based on elasticity distribution data set are poly-
Class method.
Background technology
Clustering plays the part of important role in Spatial Data Mining.Space cluster analysis is assembled spatial data by it
Characteristic is divided into some cluster so that have larger similitude positioned at the same data clustered, and positioned at the different data clustered
With larger otherness.According to different guiding theory, clustering algorithm can be divided into the cluster based on division, based on level
Cluster, density clustering, the cluster based on grid and the cluster based on particular model.Classical division formula algorithm k-
Means and its innovatory algorithm k-medoids, k-means++, the center of clustering is determined by successive ignition and data are sorted out into
Algorithm is realized simply, but to noise-sensitive, poor to the treatment effect clustered of aspherical.
With the surge of data scale, traditional clustering algorithm compels to be essential often due to data volume is excessive and can not run
Will high speed, effective, high flexible mass data clustering algorithm.Computer-oriented cluster GFS, BigTable and MapReduce skill
Art provides thinking for the clustering of mass data.As the realization of increasing income of above-mentioned technology, Hadoop parallel computation frames exist
Clustering field is widely used.Due to pursuing high-throughput, the parallel clustering based on Hadoop-MapReduce frameworks is calculated
Method needs repeatedly read-write disk to access intermediate result, to cause algorithm I/O expenses larger, with higher delay, it is impossible to be used for
Cluster in real time.
The content of the invention
The present invention is to overcome above-mentioned weak point, it is therefore intended that provide a kind of magnanimity based on elasticity distribution data set empty
Between packing density clustering method, this method, which is directed to, quickly excavates the aggregation properties base of extensive spatial data, based on " RDD points
Area's -- parallel computation in area -- merges local result " thought design, first according to data in the distribution situation in space, automatic division
Grid simultaneously distributes data so that data volume relative equilibrium in grid, reaches the purpose of balancing algorithms node load;Then, propose
A kind of local density suitable for parallel computation defines, and improves the calculation of cluster centre, solves primal algorithm needs
The defect of cluster centre object is judged by drawing decision diagram;Finally, the optimization plan such as merging that clustered in grid and between grid is passed through
Slightly, the quick clustering processing of extensive spatial data is realized.
The present invention is to reach above-mentioned purpose by the following technical programs:A kind of magnanimity space based on elasticity distribution data set
Packing density clustering method, comprises the following steps:
(1) introduce space lattice index generation in the distribution situation in space based on data and be based on grid RDD subregions:
(1.1) using y-bend index generation space lattice, successively partition space and grid is built with reference to strategy from up to down
Index, until the sub-grid boundary length of generation is not more than given pre-value;(1.2) MAP-Reduce thoughts are used, statistics is each
The number of data object, broadcast index structure and respectively data object to be clustered to each calculate node, merger grid in layer grid
Interior data amount information, obtains complete grid index structure;
(1.3) traversal index, searching data amount is less than the maximum mesh of set-point, and grid is based on according to lookup result generation
The Key-Value RDD of numbering, generation is based on grid RDD subregions:
(2) cluster calculation in subregion is carried out:Foundation is defined as with improved local density, on each obtained subregion simultaneously
Row operation cluster_dp algorithms determine the center that clusters so that data object has identical local density;
(3) local result merging is carried out by the merging optimisation strategy that clusters in grid and between adjacent mesh, completes cluster
Processing.
Preferably, the space lattice is defined as follows:
Space S is divided into the subregion of several non-overlapping copies, then each region is a space lattice, is designated as G;Its
InFor projection of the net boundary end points on kth dimension axle.
Preferably, the adjacent mesh is defined as follows:
For anyIn the presence of
Then claim grid g1And g2It is adjacent.
Preferably, (1.1) are specific as follows;
Using y-bend index generation space lattice, wherein, the essential information and grid of each nodes records grid of index
Interior data object number, takes tactful successively partition space from up to down and builds grid index, space is halved, storage life
Into sub-grid information in grid index, and access newborn grid, newborn grid halved and stored again, until generation
Sub-grid boundary length is not more than given pre-value.
Preferably, the step (1.3) is specially:After complete index structure is obtained, traversal index, searching data amount
Then stop continuing down to travel through less than the maximum mesh of set-point, after finding, obtain the result result mappings accordingly of space division
Data object, is generated the Key-ValueRDD numbered based on grid, utilizes Key-ValueRDD's
MapPartitionWithIndex function interfaces, are automatically generated based on grid RDD subregions.
Preferably, the definition of the improved local density is:If ρ 'iFor data object piImprovement local density, then
Have
Wherein, with data-oriented object piCentered on, its radius is that the k dimension spaces in dc are referred to as piDc neighborhoods it is adjacent to dc
Data object p in domainj, there is dist (pi,pj)<dc。
Preferably, described when running cluster_dp algorithms, Design assistant function gamma judges the center that clusters, tool automatically
Body is as follows:
The local density of given data object is ρ 'i, its minimum high density distance is δi, then set:
Wherein, max (ρ) * max (δ) are the maximum local density and the product of minimum high intensity values in grid;ρiIt is close for part
Degree, is defined as piDc neighborhoods in the number of data object be referred to as piLocal density, be designated as ρi, formula is as follows:
Wherein,
Preferably, described minimum high density is apart from δiIt is defined as:If pjIt is that all local densities are higher than ρiData pair
As middle apart from piNearest object, then claim NN (pi)=pjFor piNearest high density neighbours, claim δi=dist (pi, pj) it is piMost
Small high density distance, defined formula is as follows:
Preferably, the step (3) is specially:By calculating the average local density between two clusters, by the member that clusters
Labeled as core member and halation, wherein core member is the core clustered, is made up of high density point, is stable data
Object is assembled;The corresponding periphery that clusters of halation, is the aggregation of the unstable partial data clustered comprising low-density data point;
Using core and the concept of halation, the consolidation strategy clustered between grid is proposed:
If the data object distribution in adjacent grid close to net boundary has the following two kinds situation, need to adjust data pair
Clustered as affiliated:
(a) there is the kernel object that clusters in adjacent mesh at proximal border, and kernel object is close to each other, then merges two
Cluster;
(b) there is halation object in the boundaries of two adjoining grids, then need to reappraise that halation point belonged to clusters.
Reappraise the method clustered that halation point belonged to preferably, described and be:The grid where halation object
Search the data object that density is higher than halation object in adjacent grid, and calculate the data object of the condition that meets to halation point away from
From:If the minimum range calculated is less than the minimum high density distance of current halation object, the nearest height of halation point is updated
Density neighbours and minimum high density distance, and halation object is assigned to new cluster according to the nearest high density neighbours after renewal
In.
The beneficial effects of the present invention are:The inventive method realizes the quick clustering processing of extensive spatial data, gram
The problem of postponing in cluster is taken.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the inventive method.
Embodiment
With reference to specific embodiment, the present invention is described further, but protection scope of the present invention is not limited in
This:
Embodiment:In the present embodiment, if D={ p1,p2,...,pnIt is data set to be clustered, its residing k dimension spaces area
Domain S is D calculating space.For the data object (1≤i≤n) in k dimension spaces,For piIn kth
Projection on dimension axle.A kind of massive spatial data Density Clustering method based on elasticity distribution data set produces the main of cluster
Step is as shown in figure 1, the realization of the present invention is based on following basic conception:
Define 1 (dc neighborhoods):With data-oriented object piCentered on, its radius is that the k dimension spaces in dc are referred to as piDc it is adjacent
Domain is to the data object p in dc neighborhoodsj, there is dist (pi,pj)<dc。
Define 2 (local density pi):piDc neighborhoods in the number of data object be referred to as piLocal density, be designated as ρi.
Have:
Wherein
Defining 3, (minimum high density is apart from δi):If pjIt is that all local densities are higher than ρiData object in apart from piRecently
Object, then claim NN (pi)=pjFor piNearest high density neighbours claim δi=dist (pi, pj) it is piMinimum high density apart from
Have:
Define 4 (space lattices):S is divided into the subregion of some non-overlapping copies, then each region is a space lattice,
It is designated as G. whereinFor projection of the net boundary end points on kth dimension axle.
Define 5 (adjacent mesh):To anyIn the presence of
Then claim grid g1And g2It is adjacent.
In the present embodiment, PClusterdp overall frameworks PClusterdp utilizes RDD data storages, based on " RDD points
Parallel computation-merging local result in area-area " thought is designed, and RDD subregions are realized and parallel by segmentation S.Algorithm totality frame
Frame is as follows:
Input:
D:a set of points to be clustered
S:computing space of D
dc:a user input radius distance
maxPointInGrid:a parameter determine the max number of points in a
grid
Output:
C:a set of clusters
Method:
/*partitioning phase*/
DatasetRDDD
Execute a Space Partition Algorithm to splits S into grids
Get a grid set G using the split result
For each point in datasetRDD do
Map each point into correspond g belong to G
end for
generate a partitioned PointsRDD based on assigned points
/*paraell computing phase*/
Map partition:
For each partition in partitionedPointsRDD do
Execute a modified cluster_dp algorithm to generate local cluster set
C’
End map
/*Mergeing phase*/
Execute merge local clusters Algorithm on C’to build final clustered
data C
The space lattice G after data object to segmentation is mapped, the Key-ValueRDD numbered based on grid is generated.Utilize
MapPartition interfaces, RDD is divided according to grid numbering, distributes the data object of same district to identical calculations node.Each node
Independent operating density clustering algorithm obtains the local cluster based on grid division.Then, local cluster in adjacent mesh is merged, generation is most
Whole cluster result.
In the present embodiment, PClusterdp algorithms introduce space lattice index, it is ensured that data volume relative equilibrium in grid,
Utilize y-bend index generation space lattice.The essential information (grid) of each nodes records grid of index and data pair in grid
As number (count), root vertex storage S and D.Algorithm takes tactful from up to down, successively partition space and builds grid rope
Draw.S is halved, the sub-grid information of generation is stored in grid index.Newborn grid is then accessed, it is halved simultaneously again
Storage, until the sub-grid boundary length of generation is less than or equal to given pre-value.To changeless S, grid index can be preserved
And reuse, the time for building index is saved, efficiency of algorithm is improved.After the essential information for obtaining each layer grid, each layer is counted
The number of data object in grid.Statistic algorithm uses the design of Map-Reduce thoughts to improve arithmetic speed.Broadcast index knot
Structure and respectively data object to be clustered are to each calculate node.Each node is each counted in data and then merger grid in grid
Data amount information.Specific space partitioning algorithm is as follows:
Obtain after complete index structure, traversal index, searching data amount is less than the maximum mesh of set-point, if finding
Stop continue down travel through, obtain space division result accordingly result mapping data object, generate based on grid number
Key-ValueRDD.Using Key-ValueRDD MapPartitionWithIndex function interfaces, automatically generate based on grid
RDD subregions.
In the present embodiment, it is balancing algorithms speed and computational solution precision, realizes parallel computation, is defined as follows improved
Local density's calculation.
Define 4 (improved local density ρ 'i):If ρ 'iFor data object piImprovement local density, have
Formula (3) considers the compactness of data object in neighborhood on the basis of formula (1).For being gathered around in dc neighborhoods
There is the data object of identical neighbours' number, expand the difference of local density by calculating the backfence distance of data object and its.Can
To think, under neighbours' number same case in dc contiguous ranges, and the more close data object of combination of its neighbour possesses more
Big local density.Local density's definition proposed by the present invention, specifically, data object is limited in by the calculating of local density
Within field, the object of grid division where data object is only considered when calculating local density and its adjoining grid division is kept away
Exempt to travel through whole data set, reduce the work expense of calculate node.
Original cluster_dp it is determined that cluster center when, decision diagram need to be drawn, and judged by man-machine interaction.For
Dependence and human intervention of the algorithm to decision diagram are broken away from, Design assistant function gamma judges the center that clusters automatically.Given data object
Local density be ρ 'i, its minimum high density distance is δi, then set:
Wherein, max (ρ) * max (δ) are the maximum local density and the product of minimum high intensity values in grid.Due to local close
Degree has different yardsticks from minimum high density distance, therefore by local density in grid and the maximum of minimum high density distance
Value carries out simple normalization operation.By γiIt is limited to after [0,1], its descending is arranged, it can be seen that the non-center object γ that clusters
Convergence 0, the discrete distribution of γ values for the center object that clusters and remote origin.Thus, the center clustered can be determined by pre-set threshold value
Candidate target.The selection of preset value relies on actual application environment, selects γ>0.2 data object is used as the core candidate pair that clusters
As the local cluster of generation, ideal cluster result can obtain.Obtain clustering after core candidate object, you can it is determined that the number clustered
Mesh, and then data in grid are referred in corresponding local cluster.
By stages clusters consolidation strategy each RDD subregions one space lattice of correspondence, for the data pair close to net boundary
As, it is necessary to assess its aggregation properties again between adjacent mesh, it is to avoid the classification mistake caused due to dividing RDD passes through meter
The average local density between two clusters is calculated, the member that will cluster is labeled as core member (clustercore) and halation (clus-
terhalo).Wherein, core member is the core clustered, is made up of high density point, is stable data object aggregation;And
The corresponding periphery that clusters of halation, is the aggregation of the unstable partial data clustered comprising low-density data point.Utilize core and halation
Concept, if proposing that the data object distribution between grid in the adjacent grid of the merging method that clusters close to net boundary is present
Following situation, then need to cluster belonging to adjustment data object.
There is the kernel object that clusters in situation 1, adjacent mesh at proximal border, and kernel object is close to each other.Due to net
The presence of lattice, should be classified as the same data object clustered by script and be assigned in different cluster, and merge two in the case of this
Cluster.
There is halation object in situation 2, the boundaries of two adjoining grids, now need to reappraise that halation point belonged to is poly-
Cluster.Specific adjustment algorithm is as follows.
The data object that density is higher than halation object is searched in the adjoining grid of grid where halation object, and calculates full
Distance of the data object of sufficient condition to halation point.If the minimum range calculated is less than the minimum highly dense of current halation object
Distance is spent, then updates the nearest high density neighbours of halation point and minimum high density distance, and according to the nearest high density after renewal
Halation object is assigned in new cluster by neighbours.
The technical principle for being the specific embodiment of the present invention and being used above, if conception under this invention institute
The change of work, during the spirit that function produced by it is still covered without departing from specification and accompanying drawing, should belong to the present invention's
Protection domain.
Claims (10)
1. a kind of massive spatial data Density Clustering method based on elasticity distribution data set, it is characterised in that including following step
Suddenly:
(1) introduce space lattice index generation in the distribution situation in space based on data and be based on grid RDD subregions:
(1.1) using y-bend index generation space lattice, successively partition space and grid index is built with reference to strategy from up to down,
Until the sub-grid boundary length of generation is not more than given pre-value;
(1.2) MAP-Reduce thoughts are used, the number of data object in each layer grid is counted, index structure is broadcasted and divides equally and treat
Cluster data object is to each calculate node, and data amount information in merger grid obtains complete grid index structure;
(1.3) traversal index, searching data amount is less than the maximum mesh of set-point, according to lookup result generation based on grid numbering
Key-Value RDD, generation be based on grid RDD subregions:
(2) cluster calculation in subregion is carried out:Foundation is defined as with improved local density, transported parallel on each obtained subregion
Row cluster_dp algorithms determine the center that clusters so that data object has identical local density;
(3) local result merging is carried out by the merging optimisation strategy that clusters in grid and between adjacent mesh, completes clustering processing.
2. a kind of massive spatial data Density Clustering method based on elasticity distribution data set according to claim 1, its
It is characterised by:The space lattice is defined as follows:
Space S is divided into the subregion of several non-overlapping copies, then each region is a space lattice, is designated as G;WhereinFor projection of the net boundary end points on kth dimension axle.
3. a kind of massive spatial data Density Clustering method based on elasticity distribution data set according to claim 1, its
It is characterised by:The adjacent mesh is defined as follows:
For anyIn the presence of Then
Claim grid g1And g2It is adjacent.
4. a kind of massive spatial data Density Clustering method based on elasticity distribution data set according to claim 1, its
It is characterised by:(1.1) are specific as follows;
Using y-bend index generation space lattice, wherein, number in the essential information and grid of each nodes records grid of index
According to object number, take tactful successively partition space from up to down and build grid index, space is halved, storage generation
Sub-grid information accesses newborn grid in grid index, and newborn grid is halved and stored again, until the subnet of generation
Lattice boundary length is not more than given pre-value.
5. a kind of massive spatial data Density Clustering method based on elasticity distribution data set according to claim 1, its
It is characterised by:The step (1.3) is specially:After complete index structure is obtained, traversal index, searching data amount is less than given
The maximum mesh of value, after finding then stop continue down travel through, obtain space division result accordingly result mapping data pair
As generating the Key-ValueRDD numbered based on grid, utilizing Key-ValueRDD MapPartitionWithIndex functions
Interface, is automatically generated based on grid RDD subregions.
6. a kind of massive spatial data Density Clustering method based on elasticity distribution data set according to claim 1, its
It is characterised by:The definition of the improved local density is:If ρ 'iFor data object piImprovement local density, then have
Wherein, with data-oriented object piCentered on, its radius is that the k dimension spaces in dc are referred to as piDc neighborhoods in dc neighborhoods
Data object pj, there is dist (pi,pj)<dc。
7. a kind of massive spatial data Density Clustering method based on elasticity distribution data set according to claim 1, its
It is characterised by:It is described when running cluster_dp algorithms, Design assistant function gamma judges to cluster center automatically, specific as follows:
The local density of given data object is ρ 'i, its minimum high density distance is δi, then set:
Wherein, max (ρ) * max (δ) are the maximum local density and the product of minimum high intensity values in grid;ρiFor local density, determine
Justice is piDc neighborhoods in the number of data object be referred to as piLocal density, be designated as ρi, formula is as follows:
Wherein,
8. a kind of massive spatial data Density Clustering method based on elasticity distribution data set according to claim 7, its
It is characterised by:Described minimum high density is apart from δiIt is defined as:If pjIt is that all local densities are higher than ρiData object in distance
piNearest object, then claim NN (pi)=pjFor piNearest high density neighbours, claim δi=dist (pi, pj) it is piMinimum high density
Distance, defined formula is as follows:
9. a kind of massive spatial data Density Clustering method based on elasticity distribution data set according to claim 1, its
It is characterised by:The step (3) is specially:By calculating the average local density between two clusters, the member that will cluster is labeled as core
Heart member and halation, wherein core member are the core clustered, are made up of high density point, are that stable data object gathers
Collection;The corresponding periphery that clusters of halation, is the aggregation of the unstable partial data clustered comprising low-density data point;Utilize core
The concept of the heart and halation, proposes the consolidation strategy clustered between grid:If close to the data object point of net boundary in adjacent grid
There is the following two kinds situation in cloth, then need to cluster belonging to adjustment data object:
(a) there is the kernel object that clusters in adjacent mesh at proximal border, and kernel object is close to each other, then merges two and gather
Cluster;
(b) there is halation object in the boundaries of two adjoining grids, then need to reappraise that halation point belonged to clusters.
10. a kind of massive spatial data Density Clustering method based on elasticity distribution data set according to claim 9, its
It is characterised by:It is described to reappraise the method clustered that halation point belonged to and be:The adjoining grid of grid where halation object
It is middle to search the data object that density is higher than halation object, and calculating meets the data object of condition to the distance of halation point:If meter
The minimum range drawn is less than the minimum high density distance of current halation object, then updates the nearest high density neighbours of halation point
With minimum high density distance, and halation object is assigned in new cluster according to the nearest high density neighbours after renewal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710298705.XA CN107038248A (en) | 2017-04-27 | 2017-04-27 | A kind of massive spatial data Density Clustering method based on elasticity distribution data set |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710298705.XA CN107038248A (en) | 2017-04-27 | 2017-04-27 | A kind of massive spatial data Density Clustering method based on elasticity distribution data set |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107038248A true CN107038248A (en) | 2017-08-11 |
Family
ID=59538432
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710298705.XA Withdrawn CN107038248A (en) | 2017-04-27 | 2017-04-27 | A kind of massive spatial data Density Clustering method based on elasticity distribution data set |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107038248A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108537274A (en) * | 2018-04-08 | 2018-09-14 | 武汉大学 | A kind of Multi scale Fast Speed Clustering based on grid |
CN109389140A (en) * | 2017-08-14 | 2019-02-26 | 中国科学院计算技术研究所 | The method and system of quick searching cluster centre based on Spark |
CN109408562A (en) * | 2018-11-07 | 2019-03-01 | 广东工业大学 | A kind of grouping recommended method and its device based on client characteristics |
CN109452935A (en) * | 2017-09-06 | 2019-03-12 | 塔塔咨询服务有限公司 | The non-invasive methods and system from photoplethysmogram estimated blood pressure are post-processed using statistics |
CN109783240A (en) * | 2019-01-27 | 2019-05-21 | 中国人民解放军国防科技大学 | local optimization structured grid load balancing method based on MINMAX |
CN110008215A (en) * | 2019-03-22 | 2019-07-12 | 武汉大学 | A kind of big data searching method based on improved KD tree parallel algorithm |
CN110161464A (en) * | 2019-06-14 | 2019-08-23 | 成都纳雷科技有限公司 | A kind of Radar Multi Target clustering method and device |
CN110224847A (en) * | 2018-05-02 | 2019-09-10 | 腾讯科技(深圳)有限公司 | Group dividing method, device, storage medium and equipment based on social networks |
CN110414587A (en) * | 2019-07-23 | 2019-11-05 | 南京邮电大学 | Depth convolutional neural networks training method and system based on progressive learning |
CN110427531A (en) * | 2019-07-19 | 2019-11-08 | 清华大学 | Grid layout visualization method and system are carried out to multiple samples |
CN110597935A (en) * | 2019-08-05 | 2019-12-20 | 北京云和时空科技有限公司 | Space analysis method and device |
CN111291276A (en) * | 2020-01-13 | 2020-06-16 | 武汉大学 | Clustering method based on local direction centrality measurement |
US10803096B2 (en) | 2017-09-28 | 2020-10-13 | Here Global B.V. | Parallelized clustering of geospatial data |
CN112100243A (en) * | 2020-09-15 | 2020-12-18 | 山东理工大学 | Abnormal aggregation detection method based on mass space-time data analysis |
CN113449052A (en) * | 2020-03-26 | 2021-09-28 | 丰图科技(深圳)有限公司 | Method for establishing spatial index, method and device for querying spatial region |
TWI764205B (en) * | 2019-08-27 | 2022-05-11 | 南韓商韓領有限公司 | Computer-implemented system and method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102163224A (en) * | 2011-04-06 | 2011-08-24 | 中南大学 | Adaptive spatial clustering method |
CN105404648A (en) * | 2015-10-29 | 2016-03-16 | 东北大学 | Density and closeness clustering based user moving behavior determination method |
-
2017
- 2017-04-27 CN CN201710298705.XA patent/CN107038248A/en not_active Withdrawn
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102163224A (en) * | 2011-04-06 | 2011-08-24 | 中南大学 | Adaptive spatial clustering method |
CN105404648A (en) * | 2015-10-29 | 2016-03-16 | 东北大学 | Density and closeness clustering based user moving behavior determination method |
Non-Patent Citations (1)
Title |
---|
李璐明 等: "基于弹性分布数据集的海量空间数据密度聚类", 《湖南大学学报(自然科学版)》 * |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109389140A (en) * | 2017-08-14 | 2019-02-26 | 中国科学院计算技术研究所 | The method and system of quick searching cluster centre based on Spark |
CN109452935A (en) * | 2017-09-06 | 2019-03-12 | 塔塔咨询服务有限公司 | The non-invasive methods and system from photoplethysmogram estimated blood pressure are post-processed using statistics |
US10803096B2 (en) | 2017-09-28 | 2020-10-13 | Here Global B.V. | Parallelized clustering of geospatial data |
CN108537274B (en) * | 2018-04-08 | 2020-06-19 | 武汉大学 | Method for rapidly clustering POI (Point of interest) position points in space on multiple scales based on grids |
CN108537274A (en) * | 2018-04-08 | 2018-09-14 | 武汉大学 | A kind of Multi scale Fast Speed Clustering based on grid |
CN110224847A (en) * | 2018-05-02 | 2019-09-10 | 腾讯科技(深圳)有限公司 | Group dividing method, device, storage medium and equipment based on social networks |
CN109408562A (en) * | 2018-11-07 | 2019-03-01 | 广东工业大学 | A kind of grouping recommended method and its device based on client characteristics |
CN109408562B (en) * | 2018-11-07 | 2021-11-26 | 广东工业大学 | Grouping recommendation method and device based on client characteristics |
CN109783240A (en) * | 2019-01-27 | 2019-05-21 | 中国人民解放军国防科技大学 | local optimization structured grid load balancing method based on MINMAX |
CN109783240B (en) * | 2019-01-27 | 2020-08-25 | 中国人民解放军国防科技大学 | Local optimization structured grid parallel computing load balancing method based on MINMAX |
CN110008215A (en) * | 2019-03-22 | 2019-07-12 | 武汉大学 | A kind of big data searching method based on improved KD tree parallel algorithm |
CN110161464A (en) * | 2019-06-14 | 2019-08-23 | 成都纳雷科技有限公司 | A kind of Radar Multi Target clustering method and device |
CN110161464B (en) * | 2019-06-14 | 2023-03-10 | 成都纳雷科技有限公司 | Radar multi-target clustering method and device |
CN110427531A (en) * | 2019-07-19 | 2019-11-08 | 清华大学 | Grid layout visualization method and system are carried out to multiple samples |
CN110414587A (en) * | 2019-07-23 | 2019-11-05 | 南京邮电大学 | Depth convolutional neural networks training method and system based on progressive learning |
CN110597935A (en) * | 2019-08-05 | 2019-12-20 | 北京云和时空科技有限公司 | Space analysis method and device |
TWI764205B (en) * | 2019-08-27 | 2022-05-11 | 南韓商韓領有限公司 | Computer-implemented system and method |
CN111291276A (en) * | 2020-01-13 | 2020-06-16 | 武汉大学 | Clustering method based on local direction centrality measurement |
CN111291276B (en) * | 2020-01-13 | 2023-05-19 | 武汉大学 | Clustering method based on local direction centrality measurement |
CN113449052A (en) * | 2020-03-26 | 2021-09-28 | 丰图科技(深圳)有限公司 | Method for establishing spatial index, method and device for querying spatial region |
CN112100243A (en) * | 2020-09-15 | 2020-12-18 | 山东理工大学 | Abnormal aggregation detection method based on mass space-time data analysis |
CN112100243B (en) * | 2020-09-15 | 2024-02-20 | 山东理工大学 | Abnormal aggregation detection method based on massive space-time data analysis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107038248A (en) | A kind of massive spatial data Density Clustering method based on elasticity distribution data set | |
Cai et al. | Evolving an optimal kernel extreme learning machine by using an enhanced grey wolf optimization strategy | |
Gufler et al. | Load balancing in mapreduce based on scalable cardinality estimates | |
CN110222029A (en) | A kind of big data multidimensional analysis computational efficiency method for improving and system | |
Ferranti et al. | A distributed approach to multi-objective evolutionary generation of fuzzy rule-based classifiers from big data | |
CN105631068B (en) | A kind of net boundary conditional processing method that unstrctured grid CFD is calculated | |
CN109033340A (en) | A kind of searching method and device of the point cloud K neighborhood based on Spark platform | |
CN106127244A (en) | A kind of parallelization K means improved method and system | |
CN106096052A (en) | A kind of consumer's clustering method towards wechat marketing | |
CN111382320A (en) | Large-scale data increment processing method for knowledge graph | |
CN108764307A (en) | The density peaks clustering method of natural arest neighbors optimization | |
CN105447519A (en) | Model detection method based on feature selection | |
CN108416381A (en) | A kind of multi-density clustering method towards three-dimensional point set | |
CN113128617B (en) | Spark and ASPSO based parallelization K-means optimization method | |
CN107301094A (en) | The dynamic self-adapting data model inquired about towards extensive dynamic transaction | |
CN108052832B (en) | Sorting-based micro-aggregation anonymization method | |
Zhang et al. | A multiobjective cellular genetic algorithm based on 3D structure and cosine crowding measurement | |
CN108897820B (en) | Parallelization method of DENCLUE algorithm | |
CN106780747A (en) | A kind of method that Fast Segmentation CFD calculates grid | |
Yu et al. | DBWGIE-MR: A density-based clustering algorithm by using the weighted grid and information entropy based on MapReduce | |
CN105337759B (en) | It is a kind of based on inside and outside community structure than measure and community discovery method | |
Tan et al. | An improved cuckoo search algorithm for multilevel color image thresholding based on modified fuzzy entropy | |
Diao et al. | An improved DBSCAN algorithm using local parameters | |
KR101953479B1 (en) | Group search optimization data clustering method and system using the relative ratio of distance | |
Wang et al. | Research on Clustream Algorithm Based on Spark |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20170811 |