CN107330466A - Very fast geographical GeoHash clustering methods - Google Patents

Very fast geographical GeoHash clustering methods Download PDF

Info

Publication number
CN107330466A
CN107330466A CN201710527438.9A CN201710527438A CN107330466A CN 107330466 A CN107330466 A CN 107330466A CN 201710527438 A CN201710527438 A CN 201710527438A CN 107330466 A CN107330466 A CN 107330466A
Authority
CN
China
Prior art keywords
node
poi
layer
geohash
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710527438.9A
Other languages
Chinese (zh)
Other versions
CN107330466B (en
Inventor
蔡启振
张圭煜
杨林畅
季波
季一波
孙嘉磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Lianshang Network Technology Co Ltd
Original Assignee
Shanghai Lianshang Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lianshang Network Technology Co Ltd filed Critical Shanghai Lianshang Network Technology Co Ltd
Priority to CN201710527438.9A priority Critical patent/CN107330466B/en
Publication of CN107330466A publication Critical patent/CN107330466A/en
Priority to PCT/CN2018/089639 priority patent/WO2019001223A1/en
Application granted granted Critical
Publication of CN107330466B publication Critical patent/CN107330466B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses very fast geographical GeoHash clustering methods.One embodiment of this method includes:It is determined that the corresponding destination layer in tree construction cluster data storehouse of the clustering precision needed for being clustered to POI samples;The node for cluster is chosen from destination layer, the POI samples in region corresponding to node are clustered, and obtain cluster result.On the one hand, cluster is rapidly completed to the POI samples of magnanimity, on the other hand, can neatly adjusts clustering precision.

Description

Very fast geographical GeoHash clustering methods
Technical field
The application is related to computer realm, and in particular to internet arena, more particularly to very fast geographical GeoHash cluster sides Method.
Background technology
In LBS (Location Based Services, the service based on geographical position), frequent with good grounds geographical position Put the demand clustered to POI (Point of Interest, point of interest) sample.At present, the algorithm generally used includes: K-Means (K average algorithms), DBSCAN (Density-Based Spatial Clustering of Applications With Noise, have noisy density clustering method).
However, when being clustered using above-mentioned algorithm to POI samples, on the one hand, need to calculate each POI sample The position relationship in geographical position, because the quantity of POI samples is magnanimity rank, causes cluster speed slow, on the other hand, it is impossible to spirit The clustering precision of cluster is adjusted livingly.
The content of the invention
This application provides very fast geographical GeoHash clustering methods, for solving the skill that above-mentioned background section is present Art problem.
This application provides very fast geographical GeoHash clustering methods, this method includes:It is determined that being clustered to POI samples Required clustering precision corresponding destination layer in tree construction cluster data storehouse;The node for cluster is chosen from destination layer, POI samples in region corresponding to node are clustered, and obtain cluster result.
The very fast geographical GeoHash clustering methods that the application is provided, it is poly- needed for being clustered to POI samples by determining Class precision corresponding destination layer in tree construction cluster data storehouse;The node for cluster is chosen from destination layer, to node pair POI samples in the region answered are clustered, and obtain cluster result.On the one hand, cluster is rapidly completed to the POI samples of magnanimity, On the other hand, clustering precision can neatly be adjusted.
Brief description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 shows the flow chart of one embodiment of the very fast geographical GeoHash clustering methods according to the application;
Fig. 2 shows a structural representation in tree construction cluster data storehouse;
Fig. 3 shows that POI samples participate in an effect diagram of the structure in tree construction cluster data storehouse;
Fig. 4 shows the schematic diagram clustered to POI samples;
Fig. 5 shows another schematic diagram clustered to POI samples.
Embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that, in order to Be easy to description, illustrate only in accompanying drawing to about the related part of invention.
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combination.Describe the application in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is refer to, it illustrates the stream of one embodiment of the very fast geographical GeoHash clustering methods according to the application Cheng Tu.The method includes the steps of:
Step 101, it is determined that the clustering precision needed for being clustered to POI samples is corresponding in tree construction cluster data storehouse Destination layer.
In the present embodiment, a POI sample is the set of the information associated with a POI.Advance structure can be utilized The tree construction cluster data storehouse built is clustered to POI samples.Tree construction cluster data storehouse can be built based on tree construction, tree knot Structure can include but is not limited to:Trie trees (prefix trees), Suffix Tree (suffix tree), B+Tree (multi-path search tree).
In the present embodiment, POI samples can include but is not limited to:Mark, geographical position, address, trade name, phone number Code.
For example, POI is a dining room, then the ID in the dining room, geographical position, address, trade name, telephone number can be with structures Into a POI sample, POI geographical position includes longitude and latitude.
In the present embodiment, the geographical position of POI samples is the corresponding POI of POI samples geographical position.POI samples Corresponding GeoHash coded strings are the corresponding POI of the POI samples corresponding GeoHash coded strings in geographical position.
In some optional implementations of the present embodiment, tree construction cluster data storehouse is corresponding by POI samples Constructed by GeoHash coded strings, including multiple layers, GeoHash code characters string length is corresponding to tree construction cluster data The quantity of the layer in storehouse.Character corresponding node in one clustering precision of each layer of correspondence, GeoHash coded strings, each layer Including one or more nodes, there are one or more POI samples in node corresponding region.
In the present embodiment, clustering precision can be a numerical intervals, be entered to POI samples according to a clustering precision During row cluster, the distance between geographical position of any two POI samples should be in the numerical value in the cluster result of a POI sample In interval.In GeoHash algorithms, a GeoHash coded string can represent a region, GeoHash coded strings Length it is longer, the scope in the region of expression is smaller.The length of one GeoHash coded string also corresponds to a precision, should Precision is also a numerical intervals, the geography of POI sample of any two in the region that the GeoHash coded strings are represented The distance between position is located in the data interval.
Therefore, in the present embodiment, tree construction cluster data storehouse can be by the corresponding GeoHash code characters of POI samples String is built.Node in one layer can use each node in one or more character representations, each layer corresponding The quantity of character is identical.
For example, the character quantity that the corresponding GeoHash coded strings of POI samples are included is expression one in 8, each layer The quantity of the character of node is 2, then tree construction cluster data storehouse includes 4 layers.In another example, the corresponding GeoHash codings of POI samples The character quantity that character string is included is 8, and tree construction cluster data storehouse includes the number for the character that a node is represented in 3 layers, the 1st layer Measure as 4, the quantity that the character of a node is represented in the 2nd layer be that the quantity of the character of one node of expression in the 2, the 3rd layer is 2.
In the present embodiment, the corresponding region of node of the first floor in tree construction cluster data storehouse can be the expression node Character composition the region that represents of GeoHash coded strings, the non-first floor node in tree construction cluster data storehouse is corresponding Region can be saved for this by the character and grandfather for representing father node of the character of the non-first floor node with representing the non-first floor node The region of the string representation of the character composition of point.
One section is represented as 8, in each layer using the character quantity that the corresponding GeoHash coded strings of POI samples are included The quantity of the character of point is 2, and tree construction cluster data storehouse is included exemplified by 4 layers, it is assumed that the 1st layer includes the node represented by kj, the 2nd The node that child node of the layer comprising the kj nodes represented is represented by b3, the 3rd layer of child node comprising the b3 nodes represented is by dk tables The node shown, the node that the 4th layer of child node comprising the dk nodes represented is represented by p9.The corresponding region of node that p9 is represented is The region that GeoHash coded strings kjb3dkp9 is represented.
Because the quantity of the corresponding character of node in each layer is identical, therefore, any one node in a layer Expression the node character with represent the node father node character and grandparent node character composition character string number Amount is identical.In other words, the scope in the corresponding region of any one node in a layer is identical, correspondingly, one The corresponding precision in the corresponding region of any one node is identical in layer.It therefore, it can the node correspondence area in a floor The corresponding precision in domain as clustering precision so that, one clustering precision of each layer of correspondence.Some in the present embodiment are optional real In existing mode, in tree construction cluster data storehouse, each character is in tree construction cluster data storehouse in GeoHash coded strings In corresponding layer order tagmeme of the number of plies with each character in GeoHash coded strings it is corresponding, the corresponding area of node Domain is each on the region represented by the node in tree construction cluster data storehouse to the path between first floor node, path Layer where node is differed, and the relation between adjacent node on path is set membership.
In tree construction cluster data storehouse, one clustering precision of each layer of correspondence, the character pair in GeoHash coded strings Node is answered, each layer includes one or more nodes, and a node can correspond to a character.One node can include multiple Child node a, node only has a father node.The set membership between node in tree construction cluster data storehouse can be with table Show the inclusion relation between the corresponding region of node.Each layer in tree construction cluster data storehouse can include multiple same words The node represented is accorded with, the father node of each node of same character representation is differed.Node corresponding region in each layer Inside there are one or more POI samples.
In the present embodiment, the corresponding region of each node in tree construction cluster data storehouse is tree construction cluster data It is every on the region represented by the node to the path correspondence GeoHash coded strings between first floor node in storehouse, path Layer where one node is differed, and the relation between adjacent node on path is set membership.
In the present embodiment, because a node only has a father node, therefore, between the node of node to the first floor Path is unique.Node to the path correspondence GeoHash coded strings between first floor node can be by first floor node pair The character answered is as first character, and the node of the layer between being arrived successively with the first floor on path with the layer where the node is corresponding The character string that character character splicing corresponding with the node is obtained.
In the present embodiment, one clustering precision of each layer of correspondence in tree construction cluster data storehouse, each layer is corresponding poly- Corresponding error is represented when class precision can use the quantity for the character that GeoHash coded strings include for the order of this layer.
For example, in GeoHash algorithms, when the quantity for the character that GeoHash coded strings are included is 2, corresponding mistake Difference is -630km-630km, and the 2nd layer of corresponding clustering precision in tree construction cluster data storehouse can be represented using the error.When right Clustering precision needed for POI samples are clustered is corresponding mistake when the quantity of the character that GeoHash coded strings are included is 2 , the 2nd layer of the corresponding tree construction cluster data storehouse of the clustering precision, can be by tree construction cluster numbers when difference is -630km-630km According to the 2nd layer of storehouse as the clustering precision needed for being clustered to POI samples the corresponding layer in tree construction cluster data storehouse.
In the present embodiment, the node data of node can include geographical position the POI in the corresponding region of the node POI sample sizes, the POI sample queries number of times of sample.The node data of the node of the bottom in tree construction cluster data storehouse POI samples can be included.
Fig. 2 is refer to, it has shown a structural representation in tree construction cluster data storehouse.
It should be understood that exemplarily only showing the node of the different character representation of use in each layer in fig. 2.In tree knot In structure cluster data storehouse, each layer can include multiple nodes using same character representation.For example, in tree construction cluster data In the 2nd layer of storehouse, multiple node h can be included, each node h has a father node and each node h father's section Point is differed.
By taking the node 0 shown in Fig. 2, node h, node 2, node 5 as an example, illustrate including between the corresponding region of node Relation.G represents the layer in tree construction cluster data storehouse, and one layer of correspondence, one clustering precision, clustering precision can be compiled using GeoHash Corresponding error is represented when the quantity for the character that code character string is included is the order.For example, g=1, clustering precision is GeoHash Corresponding error -2500km-2500km during the character quantity 1 of coded string, g=2, clustering precision are GeoHash coded words Corresponding error -630km-630km when according with the character quantity 2 of string.
The node h positioned at the 2nd layer, node 0 are included positioned at the child node of the 1st layer of the node 0 in tree construction cluster data storehouse Corresponding region is the corresponding GeoHash codings in path between the node 0 of node 0 to the 1st layer of tree construction cluster data storehouse Character string is the region that GeoHash coded strings 0 are represented.The corresponding regions of node h are node h to tree construction cluster data storehouse The 1st layer of node 0 between the corresponding GeoHash coded strings in path be represented by GeoHash coded strings 0h Region.Because the region represented by GeoHash coded strings 0h is to draw the region represented by GeoHash coded strings 0 It is divided into the sub-regions in 32 sub-regions, therefore, the corresponding regions of node h are the subregion in the corresponding region of node 0.
Child node positioned at the 2nd layer of the node h in tree construction cluster data storehouse is included positioned at tree construction cluster data storehouse Node 2, the node 5 of third layer.The corresponding region of node 2 be node 2 to tree construction cluster data storehouse the 1st layer of node 0 it Between the corresponding GeoHash coded strings in path represented by region.Node 2 to the tree construction in tree construction cluster data storehouse gathers Path between the node 0 of the first layer of class database includes node 2, node h, node 0, node 2 to tree construction cluster data The corresponding GeoHash coded strings in path between 1st layer of the node 0 in storehouse are 0h2, and the corresponding region of node 2 is Region represented by GeoHash coded strings 0h2.Similarly, the corresponding region of node 5 is GeoHash coded strings 0h5 institutes The region of expression.Due to represented by the region represented by GeoHash coded strings 0h2, GeoHash coded strings 0h5 Region division represented by GeoHash coded strings 0h is respectively the sub-regions in 32 sub-regions by region, because This, the corresponding region of node 2, the subregion that the corresponding region of node 5 is the corresponding regions of node h.
Represent geographical position in the corresponding areas of node h positioned at the node h of the 2nd floor in tree construction cluster data storehouse n=33 The quantity of POI samples in domain is 33.In other words, when building tree construction cluster data storehouse, there are 33 POI samples processes pair After the longitude and latitude in the geographical position of POI samples are encoded, the 2nd word in corresponding GeoHash coded strings Symbol is character h.
In some optional implementations of the present embodiment, by the corresponding GeoHash coded strings institute of POI samples When building tree construction cluster data storehouse, multiple POI samples can be obtained in advance, and tree construction cluster is built using multiple POI samples Database.Use GeoHash algorithms with pre-arranged code length respectively to the corresponding longitude in geographical position of each POI sample and Latitude is encoded, and obtains the corresponding GeoHash coded strings of each POI sample, and pre-arranged code length is equal to tree construction The quantity of the layer in cluster data storehouse.
For example, pre-arranged code length is 8, then the longitude and latitude in the geographical position of POI samples are carried out encoding what is obtained The length of GeoHash coded strings is that the corresponding GeoHash coded strings of 8, i.e. POI samples include 8 characters, tree construction The number of plies in cluster data storehouse is 8.
, can be respectively to each after the corresponding GeoHash coded strings of each POI sample are respectively obtained The corresponding GeoHash coded strings of POI samples perform following operate:Determine the corresponding GeoHash coded strings of POI samples The character representation in each GeoHash coded string on the corresponding path in tree construction cluster data storehouse, path Node in tree construction cluster data storehouse should order tagmeme of the layer with character in GeoHash coded strings it is corresponding.
It is determined that during POI samples corresponding GeoHash coded strings corresponding path in tree construction cluster data storehouse, It can determine the corresponding GeoHash coded strings of POI samples in tree construction cluster data storehouse on corresponding path first Each node on node, the path is a character representation in the corresponding GeoHash coded strings of POI samples respectively Node, node on path should tree construction cluster data storehouse in layer character corresponding with the node in GeoHash Order tagmeme in coded string is corresponded, it is thus possible to determine that the corresponding GeoHash coded strings of POI samples exist Corresponding path in tree construction cluster data storehouse.
In the present embodiment, the corresponding GeoHash coded strings of POI samples are corresponding in tree construction cluster data storehouse It is each being located on the path in set membership, tree construction cluster data storehouse between adjacent node on path Relation between the node of character representation in GeoHash coded strings is set membership.
For example, after being encoded using 8 code lengths to the longitude and latitude in the geographical position of a POI sample, The obtained corresponding GeoHash coded strings of the POI samples are kjb3dkp9.GeoHash coded strings kjb3dkp9 exists Represented in tree construction cluster data storehouse on corresponding path comprising character k node, the character j represented node, the character b represented The node that node, the character 9 that node, the character p that node, node, the character d of the expression of character 3 are represented are represented are represented.On the path The node that character k is represented should be at the 1st layer of tree construction cluster data storehouse, and the node that the character j on the path is represented should be in tree knot The 2nd layer of structure cluster data storehouse, by that analogy, can determine respectively each character representation on path node should tree Layer in documents structured Cluster database.
Because a node only has a father node, therefore, GeoHash coded strings in tree construction cluster data storehouse Kjb3dkp9 corresponding paths in tree construction cluster data storehouse are unique.So as to each in tree construction cluster data storehouse In layer in the case of multiple nodes comprising same character representation, each layer in tree construction cluster data storehouse can be accurately determined In the node of the character representation on path that should include.Then, then in each layer of decision tree documents structured Cluster database whether The node of character representation is on the path deposited.
For example, for the node represented in the 8th layer of tree construction cluster data storehouse comprising multiple characters 9, GeoHash codings The node that characters 9 of the character string kjb3dkp9 in tree construction cluster data storehouse on corresponding path is represented should be clustered in tree construction The node that the 8th layer of database, i.e. the 8th of tree construction cluster data storehouse layer should be represented comprising the character 9 on path.It may determine that The node that the character 9 whether the 8th layer of tree construction cluster data storehouse has been present on path is represented.
When the character representation on the path node in tree construction cluster data storehouse should layer in be not present path on Character representation node when, then create the word on the node and more new route of the character representation on the path in this layer Accord with the POI sample sizes in the node data of the node represented.
When the character representation on the path node in tree construction cluster data storehouse should layer in deposit on the path The character representation node when, the POI sample sizes in the node data for the node for updating the character representation on the path.
For example, the node that the character 9 on the path is represented should work as tree construction in the 8th in tree construction cluster data storehouse layer During the node 9 that the character 9 being not present in the 8th layer of cluster data storehouse on the path is represented, then created in the 8th layer on the path The node 9 that represents of character 9, and nodes for the node 9 that the node 9 in tree construction cluster data storehouse on the path is represented POI sample sizes in add 1.
When depositing the node 9 of the expression of character 9 on the path in the 8th layer of tree construction cluster data storehouse, Ze Jianggai roads POI sample sizes in the node data for the node 9 that the 8th layer of character 9 on footpath is represented add 1.
Fig. 3 is refer to, an effect diagram of the structure in tree construction cluster data storehouse is participated in it illustrates POI samples.
In figure 3,301 represent the corresponding GeoHash coded strings kjb3dkp9 of POI samples in tree construction cluster data Connected between adjacent node in storehouse on corresponding path, g represents the layer in tree construction cluster data storehouse.
According to character k, j, b, 3, d, k, p, 9 order in GeoHash coded strings kjb3dkp9, on the path Node k, node j, node b, node 3, node d, node k, node p, node 9 should be respectively in tree construction cluster data storehouse 1-8 layers, the relation between adjacent node on path is set membership.
Because a node only has a father node, therefore, GeoHash coded strings in tree construction cluster data storehouse Kjb3dkp9 corresponding paths in tree construction cluster data storehouse are unique.So as to each in tree construction cluster data storehouse In layer in the case of multiple nodes comprising same character representation, the section of the character representation on path can be found exactly Point.For example, for the node represented in the 8th layer of tree construction cluster data storehouse comprising multiple characters 9, due to the section required to look up Point 9 is nodes of the kjb3dkp9 in tree construction cluster data storehouse on corresponding path, therefore, the node 9 is found exactly, And the POI sample sizes in the node data of the node 9 are added 1.
Find the node k for being located at the 1st layer on the path respectively in cluster knot, be located at the 2nd layer on the path Node j, the node b being located in the 3rd layer on the path, on the path be located at the 4th layer of interior joint 3, the position on the path In the node k being located in the 6th layer on the node d in the 5th layer, the path, the node p being located in the 7th layer on the path, it is somebody's turn to do The node 9 being located in the 8th layer on path.By the node k for being located at the 1st layer on the path, the 2nd layer of node j, in the 3rd layer Node b, the 4th layer of interior joint 3, the node d in the 5th layer, the node k in the 6th layer, the node p in the 7th layer, the section in the 8th layer POI sample sizes in the node data of point 9 add 1.
In some optional implementations of the present embodiment, it is determined that the clustering precision needed for being clustered to POI samples Corresponding layer includes in tree construction cluster data storehouse:Use GeoHash algorithms with pre-arranged code length to POI samples to be checked This corresponding longitude in geographical position and latitude is encoded, and obtains the corresponding GeoHash code characters of POI samples to be checked String;Determine character corresponding node in tree construction cluster data storehouse in GeoHash coded strings;In GeoHash coded words The character length required for the clustering precision of POI sample clusterings is determined in symbol string;Inquiry should in tree construction cluster data storehouse The node of character representation in GeoHash coded strings, the destination layer corresponding with character length until arriving at.
Step 102, the node for cluster is chosen from destination layer, the POI samples in region corresponding to node are carried out Cluster, obtains cluster result.
In the present embodiment, the clustering precision needed for determining to cluster POI samples by step 101 is in tree construction In cluster data storehouse after corresponding layer, the clustering precision needed for being clustered to POI samples is in tree construction cluster data Pair the POI samples chosen in storehouse in corresponding destination layer in the node for cluster, region corresponding to node are clustered, i.e., POI sample of the geographical position in the corresponding region of node of selection is clustered, and obtains cluster result.
For example, the character that the clustering precision needed for being clustered to POI samples includes for GeoHash coded strings , the 2nd layer of clustering precision correspondence tree construction cluster data storehouse, will when quantity is 2 during corresponding error -630km-630km The 2nd layer of tree construction cluster data storehouse is as the clustering precision needed for being clustered to POI samples in tree construction cluster data storehouse In corresponding layer.The node for cluster can be chosen from the 2nd layer of tree construction cluster data storehouse, geographical position is being chosen The corresponding region of node in POI samples clustered, obtain cluster result.
Fig. 4 is refer to, it illustrates clustered to POI samples a schematic diagram.
In fig. 4 it is shown that eventually for the node 401 of cluster, g represents the layer in tree construction cluster data storehouse, to POI samples This progress cluster needed for clustering precision be when the quantity of the character that GeoHash coded strings are included is 2 corresponding error i.e.- 630km-630km, the 2nd layer of clustering precision correspondence tree construction cluster data storehouse.
When being clustered to POI samples, egress h, node m, node n are chosen from the 2nd layer as cluster Node.POI sample sizes in node h node data are that 33, i.e. geographical position are in the corresponding regions of node h The quantity for the POI samples in region that GeoHash coded strings 0h is represented is 33.POI samples in node m node data This quantity is that 20, i.e. geographical position are in the region that GeoHash coded strings 9m is represented in the corresponding regions of node m The quantity of POI samples is 20.POI sample sizes in node n node data are 13, i.e. geographical position in n pairs of node The quantity for the POI samples in region that i.e. GeoHash coded strings 9n is represented in the region answered is 13.
POI sample of the geographical position in the corresponding regions of node h of selection is clustered, h pairs of node can be obtained 33 POI samples are included in the cluster result answered, the corresponding cluster results of node h.It is corresponding in the node m of selection to geographical position Region in POI samples clustered, can obtain in the corresponding cluster results of node m, the corresponding cluster results of node m wrap Containing 20 POI samples.POI sample of the geographical position in the corresponding regions of node n of selection is clustered, can be saved 13 POI samples are included in the corresponding cluster results of point n, the corresponding cluster results of node n.
The corresponding cluster result of clustering precision includes:Each self-corresponding cluster result of node h, node m, node n.
In some optional implementations of the present embodiment, when geographical position is in the corresponding region of node of selection When POI sample sizes are less than amount threshold, POI sample of the geographical position in the corresponding region of node of selection is gathered Class, obtains the corresponding cluster result of clustering precision;When POI sample size of the geographical position in the corresponding region of node of selection During equal to or more than amount threshold, POI sample of the geographical position in the corresponding region of child node of the node of selection is carried out Cluster, obtains the corresponding cluster result of clustering precision.
Fig. 5 is refer to, it illustrates another schematic diagram clustered to POI samples.
In fig. 5 it is shown that eventually for the node 501 of cluster, g represents the layer in tree construction cluster data storehouse, to POI samples This progress cluster needed for clustering precision be when the quantity of the character that GeoHash coded strings are included is 2 corresponding error i.e.- 630km-630km, the 2nd layer of clustering precision correspondence tree construction cluster data storehouse.
When being clustered to POI samples, the node that node h, node m, node n are used to cluster is chosen from the 2nd layer.Section POI sample sizes in point h node data are that the POI sample sizes in 33, node m node data are 20, node n Node data in POI sample sizes be 13.
, then can be to geographical position because the quantity of POI sample of the geographical position in the corresponding regions of node h is more than threshold value The POI samples in child node 2,5 corresponding regions setting in node h are clustered.To geographical position in the corresponding area of node 2 POI samples in domain are clustered, and can obtain including 18 in the corresponding cluster result of node 2, the corresponding cluster result of node 2 Individual POI samples.POI sample of the geographical position in the corresponding region of node 5 is clustered, node 5 can be obtained corresponding 15 POI samples are included in cluster result, the corresponding cluster result of node 5.
POI sample of the geographical position in the corresponding regions of node m is clustered, node m can be obtained corresponding poly- 20 POI samples are included in class result, the corresponding cluster results of node m.To geographical position in the corresponding regions of node n POI samples are clustered, and can be obtained in the corresponding cluster results of node n, the corresponding cluster results of node n comprising 13 POI Sample.
After being clustered to POI samples, the corresponding cluster result of clustering precision can be obtained, clustering precision is corresponding poly- Class result includes:The corresponding cluster result of node 2, the corresponding cluster result of node 5, the corresponding cluster results of node m, node n Corresponding cluster result.
Present invention also provides a kind of electronic equipment, the electronic equipment can be configured with one or more processors;Storage Device, is retouched for storing to include in one or more programs, one or more programs to perform in above-mentioned steps 101-102 The instruction for the operation stated.When one or more programs are executed by one or more processors so that one or more processors Perform the operation described in above-mentioned steps 101-102.
Present invention also provides a kind of computer-readable medium, the computer-readable medium can be wrapped on electronic equipment Include;Can also be individualism, without on supplying electronic equipment.The computer-readable medium carries one or more journey Sequence, when one or more program is performed by the server so that the electronic equipment:It is determined that carrying out cluster institute to POI samples The clustering precision needed corresponding destination layer in tree construction cluster data storehouse;The node for cluster is chosen from destination layer, it is right POI samples in the corresponding region of node are clustered, and obtain cluster result.
It should be noted that above computer computer-readable recording medium can be computer-readable signal media or computer-readable Storage medium either the two any combination.Computer-readable recording medium for example can include but is not limited to electricity, magnetic, Optical, electromagnetic, the system of infrared ray or semiconductor, device or device, or any combination above.Computer-readable storage medium The more specifically example of matter can include but is not limited to:Electrical connection, portable computer diskette with one or more wires, Hard disk, random access storage device (RAM), read-only storage (ROM), erasable programmable read only memory (EPROM or flash memory), Optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic memory device or above-mentioned any conjunction Suitable combination.In this application, computer-readable recording medium can be it is any including or storage program tangible medium, the journey Sequence can be commanded execution system, device or device and use or in connection.And in this application, it is computer-readable Signal media can include in a base band or as carrier wave a part propagate data-signal, can wherein carrying computer The program code of reading.The data-signal of this propagation can be diversified forms, including but not limited to electromagnetic signal, optical signal or on Any appropriate combination stated.Computer-readable signal media can also be any meter beyond computer-readable recording medium Calculation machine computer-readable recording medium, the computer-readable medium can send, propagate or transmit for by instruction execution system, device or Device is used or program in connection.The program code included on computer-readable medium can be with any appropriate Medium is transmitted, and is included but is not limited to:Wirelessly, electric wire, optical cable, RF etc., or above-mentioned any appropriate combination.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.People in the art Member should be appreciated that invention scope involved in the application, however it is not limited to the technology of the particular combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from the design of the application, is carried out by above-mentioned technical characteristic or its equivalent feature Other technical schemes formed by any combination.Such as features described above has similar work(with (but not limited to) disclosed herein The technical characteristic of energy carries out technology formed by replacement mutually.

Claims (8)

1. a kind of very fast geographical GeoHash clustering methods, it is characterised in that methods described includes:
It is determined that the corresponding destination layer in tree construction cluster data storehouse of the clustering precision needed for being clustered to POI samples;
The node for cluster is chosen from destination layer, the POI samples in node corresponding region are clustered, cluster knot is obtained Really.
2. according to the method described in claim 1, it is characterised in that the tree construction cluster data storehouse is corresponded to by POI samples GeoHash coded strings constructed by, including multiple layers, GeoHash code characters string length corresponds to tree construction cluster numbers According to the quantity of the layer in storehouse, one clustering precision of each layer of correspondence, the character corresponding node in GeoHash coded strings is each Layer includes there are one or more POI samples in one or more nodes, the node corresponding region.
3. method according to claim 2, it is characterised in that in the tree construction cluster data storehouse, GeoHash codings The number of plies of the character of each in character string layer corresponding in tree construction cluster data storehouse is encoded with the character in GeoHash Order tagmeme correspondence in character string, the node corresponding region is by the node in tree construction cluster data storehouse to first floor node Between path represented by region, layer where each node on the path differs, the adjacent node on path Between relation be set membership.
4. method according to claim 2, it is characterised in that the tree construction cluster data storehouse is corresponded to by POI samples GeoHash coded strings constructed by include:
Obtain multiple POI samples;
GeoHash algorithms are used to enter with pre-arranged code length to the corresponding longitude in the geographical position of each POI sample and latitude Row coding, obtains the corresponding GeoHash coded strings of each POI sample, wherein, GeoHash code character string length pair Should be in the quantity of the layer in tree construction cluster data storehouse;
GeoHash coded strings corresponding to each POI sample perform following operate:
Determine the node in the corresponding GeoHash coded strings of the POI samples represented by character in tree construction cluster data Corresponding path in storehouse;
When the character for representing the node is not present on the path, fresh character node is created in the corresponding layer of character, more The corresponding POI sample sizes of each node layer on the new path;
When there is the character for representing the node on the path, the corresponding POI samples of each node layer on the path are updated Quantity.
5. according to the method in claim 2 or 3, it is characterised in that the determination POI samples are clustered needed for gather Class precision corresponding destination layer in tree construction cluster data storehouse, including:
GeoHash algorithms are used with geographical position corresponding longitude and latitude of the pre-arranged code length to POI samples to be checked Encoded, obtain the corresponding GeoHash coded strings of POI samples to be checked;
Determine character corresponding node in tree construction cluster data storehouse in GeoHash coded strings;
The character length required for the clustering precision of POI sample clusterings is determined in GeoHash coded strings;
The node of character representation in the GeoHash coded strings is inquired about in tree construction cluster data storehouse, until arrive at The corresponding destination layer of the character length.
6. method according to claim 4, it is characterised in that the node for cluster is chosen from destination layer, to node POI samples in corresponding region are clustered, and obtaining cluster result includes:
When the POI sample sizes in the corresponding region of the node of selection are less than predetermined number threshold value, to the node correspondence of selection Region in POI samples clustered, obtain the corresponding cluster result of the clustering precision;
When the POI sample sizes in the corresponding region of the node of selection are equal to or more than amount threshold, to the node of selection POI samples in the corresponding region of child node are clustered, and obtain the corresponding cluster result of the clustering precision.
7. a kind of electronic equipment, it is characterised in that including:
One or more processors;
Memory, for storing one or more programs,
When one or more of programs are by one or more of computing devices so that one or more of processors Realize the method as described in any in claim 1-6.
8. a kind of readable computer storage medium, it is characterised in that be stored thereon with computer program, it is characterised in that the journey The method as described in any in claim 1-6 is realized when sequence is executed by processor.
CN201710527438.9A 2017-06-30 2017-06-30 Extremely-fast geographic GeoHash clustering method Active CN107330466B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710527438.9A CN107330466B (en) 2017-06-30 2017-06-30 Extremely-fast geographic GeoHash clustering method
PCT/CN2018/089639 WO2019001223A1 (en) 2017-06-30 2018-06-01 Extreme geographical geohash clustering method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710527438.9A CN107330466B (en) 2017-06-30 2017-06-30 Extremely-fast geographic GeoHash clustering method

Publications (2)

Publication Number Publication Date
CN107330466A true CN107330466A (en) 2017-11-07
CN107330466B CN107330466B (en) 2023-01-24

Family

ID=60199544

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710527438.9A Active CN107330466B (en) 2017-06-30 2017-06-30 Extremely-fast geographic GeoHash clustering method

Country Status (2)

Country Link
CN (1) CN107330466B (en)
WO (1) WO2019001223A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019001223A1 (en) * 2017-06-30 2019-01-03 上海连尚网络科技有限公司 Extreme geographical geohash clustering method
CN109299747A (en) * 2018-10-24 2019-02-01 北京字节跳动网络技术有限公司 Determination method, apparatus, computer equipment and the storage medium at one type cluster center
CN109992638A (en) * 2019-03-29 2019-07-09 北京三快在线科技有限公司 Generation method, device, electronic equipment and the storage medium of geographical location POI
CN113378922A (en) * 2021-06-09 2021-09-10 南京邮电大学 GeoHash-based geographic coordinate point density clustering method

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259076A (en) * 2020-01-14 2020-06-09 航科院中宇(北京)新技术发展有限公司 Cluster storage method of airborne navigation data
CN113868487B (en) * 2021-09-29 2024-06-07 平安银行股份有限公司 Method, device, equipment and medium for selecting member based on GeoHash address codes
CN115827814B (en) * 2023-02-13 2023-06-06 深圳市泰比特科技有限公司 Method, system and related equipment for loading and displaying vehicle points in visual field area

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2926914A1 (en) * 2008-01-28 2009-07-31 Viamichelin Soc Par Actions Si Geocoding method for digital road network system, involves dividing initial zone to obtain sub-zones, where dividing phase is recursively performed by generating small sub-zones according to reduction criteria till stop threshold is reached
US20130324162A1 (en) * 2012-05-29 2013-12-05 Alibaba Group Holding Limited Method and Apparatus of Recommending Candidate Terms Based on Geographical Location
WO2015023482A1 (en) * 2013-08-13 2015-02-19 Mapquest, Inc. Systems and methods for processing search queries utilizing hierarchically organized data
CN105677804A (en) * 2015-12-31 2016-06-15 百度在线网络技术(北京)有限公司 Determination of authority stations and building method and device of authority station database
CN105701123A (en) * 2014-11-27 2016-06-22 阿里巴巴集团控股有限公司 Passenger-vehicle relationship identification method and apparatus
US20160321351A1 (en) * 2015-04-30 2016-11-03 Verint Systems Ltd. System and method for spatial clustering using multiple-resolution grids
CN106528597A (en) * 2016-09-23 2017-03-22 百度在线网络技术(北京)有限公司 POI (Point Of Interest) labeling method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103383682B (en) * 2012-05-01 2017-12-26 刘龙 A kind of Geocoding, position enquiring system and method
CN105531698B (en) * 2013-03-15 2019-08-13 美国结构数据有限公司 Equipment, system and method for batch and real time data processing
CN107330466B (en) * 2017-06-30 2023-01-24 上海连尚网络科技有限公司 Extremely-fast geographic GeoHash clustering method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2926914A1 (en) * 2008-01-28 2009-07-31 Viamichelin Soc Par Actions Si Geocoding method for digital road network system, involves dividing initial zone to obtain sub-zones, where dividing phase is recursively performed by generating small sub-zones according to reduction criteria till stop threshold is reached
US20130324162A1 (en) * 2012-05-29 2013-12-05 Alibaba Group Holding Limited Method and Apparatus of Recommending Candidate Terms Based on Geographical Location
WO2015023482A1 (en) * 2013-08-13 2015-02-19 Mapquest, Inc. Systems and methods for processing search queries utilizing hierarchically organized data
CN105701123A (en) * 2014-11-27 2016-06-22 阿里巴巴集团控股有限公司 Passenger-vehicle relationship identification method and apparatus
US20160321351A1 (en) * 2015-04-30 2016-11-03 Verint Systems Ltd. System and method for spatial clustering using multiple-resolution grids
CN105677804A (en) * 2015-12-31 2016-06-15 百度在线网络技术(北京)有限公司 Determination of authority stations and building method and device of authority station database
CN106528597A (en) * 2016-09-23 2017-03-22 百度在线网络技术(北京)有限公司 POI (Point Of Interest) labeling method and device

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019001223A1 (en) * 2017-06-30 2019-01-03 上海连尚网络科技有限公司 Extreme geographical geohash clustering method
CN109299747A (en) * 2018-10-24 2019-02-01 北京字节跳动网络技术有限公司 Determination method, apparatus, computer equipment and the storage medium at one type cluster center
CN109299747B (en) * 2018-10-24 2020-12-15 北京字节跳动网络技术有限公司 Method and device for determining cluster center, computer equipment and storage medium
CN109992638A (en) * 2019-03-29 2019-07-09 北京三快在线科技有限公司 Generation method, device, electronic equipment and the storage medium of geographical location POI
CN113378922A (en) * 2021-06-09 2021-09-10 南京邮电大学 GeoHash-based geographic coordinate point density clustering method
CN113378922B (en) * 2021-06-09 2022-07-15 南京邮电大学 GeoHash-based geographic coordinate point density clustering method

Also Published As

Publication number Publication date
CN107330466B (en) 2023-01-24
WO2019001223A1 (en) 2019-01-03

Similar Documents

Publication Publication Date Title
CN107330466A (en) Very fast geographical GeoHash clustering methods
Gao et al. Extracting urban functional regions from points of interest and human activities on location‐based social networks
Zhao et al. Efficient shortest paths on massive social graphs
CN108846013B (en) Space keyword query method and device based on geohash and Patricia Trie
CN110110020B (en) Method and device for identifying and transmitting a location
US8842520B2 (en) Apparatus and method for identifying optimal node placement to form redundant paths around critical nodes and critical links in a multi-hop network
CN109376761A (en) The method for digging and device of a kind of address mark and its longitude and latitude
CN107766433A (en) A kind of range query method and device based on Geo BTree
Ben Ticha et al. A branch‐and‐price algorithm for the vehicle routing problem with time windows on a road network
CN113505306A (en) Interest point recommendation method, system and medium based on heterogeneous graph neural network
CN110263117A (en) It is a kind of for determining the method and apparatus of point of interest POI data
Brass et al. Improved analysis of a multirobot graph exploration strategy
CN110109055A (en) A kind of indoor orientation method based on RSSI ranging
CN108776666B (en) Space keyword query method and device based on keyword inversion and Trie
Bast et al. Metro maps on octilinear grid graphs
CN117235285B (en) Method and device for fusing knowledge graph data
Coene et al. Balancing profits and costs on trees
US20170108340A1 (en) Generation of location area boundaries
CN111310985B (en) Journey planning method and system
CN109344152A (en) Data processing method, device, electronic equipment and storage medium
Bi et al. Algorithms for computing Wiener indices of acyclic and unicyclic graphs
CN102577282B (en) Network delay estimation unit and network delay method of estimation
CN110972258A (en) Method and device for establishing position fingerprint database
US20060222167A1 (en) System and method for imputing data into a database
Wen et al. An efficient preprocessing method for suboptimal route computation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant