CN107330466B - Extremely-fast geographic GeoHash clustering method - Google Patents

Extremely-fast geographic GeoHash clustering method Download PDF

Info

Publication number
CN107330466B
CN107330466B CN201710527438.9A CN201710527438A CN107330466B CN 107330466 B CN107330466 B CN 107330466B CN 201710527438 A CN201710527438 A CN 201710527438A CN 107330466 B CN107330466 B CN 107330466B
Authority
CN
China
Prior art keywords
clustering
node
poi
geohash
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710527438.9A
Other languages
Chinese (zh)
Other versions
CN107330466A (en
Inventor
蔡启振
张圭煜
杨林畅
季一波
孙嘉磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Lianshang Network Technology Co Ltd
Original Assignee
Shanghai Lianshang Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lianshang Network Technology Co Ltd filed Critical Shanghai Lianshang Network Technology Co Ltd
Priority to CN201710527438.9A priority Critical patent/CN107330466B/en
Publication of CN107330466A publication Critical patent/CN107330466A/en
Priority to PCT/CN2018/089639 priority patent/WO2019001223A1/en
Application granted granted Critical
Publication of CN107330466B publication Critical patent/CN107330466B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a very fast geographic GeoHash clustering method. One embodiment of the method comprises: determining a corresponding target layer of clustering precision required for clustering the POI samples in a tree structure clustering database; and selecting a node for clustering from the target layer, and clustering POI samples in the region corresponding to the node to obtain a clustering result. On one hand, clustering is rapidly completed on massive POI samples, and on the other hand, clustering precision can be flexibly adjusted.

Description

Extremely-fast geographic GeoHash clustering method
Technical Field
The application relates to the field of computers, in particular to the field of the Internet, and especially relates to a top-speed geographical GeoHash clustering method.
Background
In LBS (Location Based Services), there is often a need to cluster POI (Point of Interest) samples according to geographical Location. Currently, algorithms commonly used include: K-Means (K-averaging algorithm), DBSCAN (sensitivity-Based Spatial Clustering of Applications with Noise, density-Based Clustering method with Noise).
However, when the above algorithm is used to cluster POI samples, on one hand, the position relationship of the geographic position of each POI sample needs to be calculated, and since the number of POI samples is a massive level, the clustering speed is slow, and on the other hand, the clustering accuracy of the clustering cannot be flexibly adjusted.
Disclosure of Invention
The application provides a very-fast geographic GeoHash clustering method which is used for solving the technical problems existing in the background technology part.
The application provides a very fast geographic GeoHash clustering method, which comprises the following steps: determining a corresponding target layer of clustering precision required for clustering the POI samples in a tree structure clustering database; and selecting a node for clustering from the target layer, and clustering POI samples in the region corresponding to the node to obtain a clustering result.
According to the method for the extremely-fast GeoHash clustering, a corresponding target layer of clustering precision required by clustering POI samples in a tree structure clustering database is determined; and selecting a node for clustering from the target layer, and clustering POI samples in the region corresponding to the node to obtain a clustering result. On one hand, clustering is rapidly completed on massive POI samples, and on the other hand, clustering precision can be flexibly adjusted.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 illustrates a flow diagram of one embodiment of a very fast GeoHash clustering method according to the application;
FIG. 2 is a schematic diagram of a tree-structured clustering database;
FIG. 3 is a schematic diagram illustrating an effect of POI sample participation in construction of a tree-structured clustering database;
FIG. 4 shows a schematic diagram of clustering POI samples;
fig. 5 shows another schematic diagram of clustering POI samples.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Referring to fig. 1, a flow diagram of one embodiment of a very fast geo-hash clustering method according to the present application is shown. The method comprises the following steps:
step 101, determining a target layer corresponding to the clustering precision required for clustering the POI samples in the tree structure clustering database.
In the present embodiment, one POI sample is a set of information associated with one POI. The POI samples may be clustered using a pre-constructed tree structure clustering database. The tree structure clustering database may be constructed based on a tree structure, which may include, but is not limited to: trie Tree (prefix Tree), suffix Tree, B + Tree (multi-way search Tree).
In this embodiment, POI samples may include, but are not limited to: identification, geographic location, address, store name, telephone number.
For example, if a POI is a restaurant, the restaurant ID, geographic location, address, store name, and phone number may constitute a POI sample, and the geographic location of the POI includes longitude and latitude.
In this embodiment, the geographic location of the POI sample is the geographic location of the POI corresponding to the POI sample. The GeoHash code character string corresponding to the POI sample is the GeoHash code character string corresponding to the geographic position of the POI corresponding to the POI sample.
In some optional implementation manners of this embodiment, the tree structure clustering database is constructed by GeoHash code strings corresponding to the POI samples, and includes a plurality of layers, and the length of the GeoHash code string corresponds to the number of the layers of the tree structure clustering database. Each layer corresponds to a clustering precision, characters in the GeoHash code character string correspond to nodes, each layer comprises one or more nodes, and one or more POI samples exist in the corresponding region of the nodes.
In this embodiment, the clustering precision may be a numerical range, and when clustering the POI samples according to a clustering precision, the distance between the geographic positions of any two POI samples in the clustering result of one POI sample should be within the numerical range. In the GeoHash algorithm, a GeoHash code character string can represent a region, and the longer the length of the GeoHash code character string is, the smaller the range of the represented region is. The length of a GeoHash code string also corresponds to an accuracy, which is also a numerical range, and the distance between the geographic positions of any two POI samples in the area represented by the GeoHash code string is within the data range.
Therefore, in this embodiment, the tree structure cluster database may be constructed by a GeoHash code string corresponding to the POI sample. Nodes in a layer can be represented by one or more characters, and the number of the characters corresponding to each node in each layer is the same.
For example, if the GeoHash code string corresponding to the POI sample contains 8 characters, and the number of characters representing one node in each layer is 2, the tree structured clustering database contains 4 layers. For another example, the GeoHash code string corresponding to the POI sample includes 8 characters, the tree structure clustering database includes 3 layers, the number of characters representing one node in layer 1 is 4, the number of characters representing one node in layer 2 is 2, and the number of characters representing one node in layer 3 is 2.
In this embodiment, the region corresponding to the first-level node in the tree-structured clustering database may be a region represented by a GeoHash encoded character string composed of characters representing the node, and the region corresponding to the non-first-level node in the tree-structured clustering database may be a region represented by a character string composed of characters representing the non-first-level node, characters representing a parent node and a grandparent node of the non-first-level node.
Taking the example that the GeoHash code string corresponding to the POI sample contains 8 characters, the number of characters representing a node in each layer is 2, and the tree structure cluster database contains 4 layers, it is assumed that the layer 1 contains a node represented by kj, the layer 2 contains a node represented by kj whose child node is b3, the layer 3 contains a node represented by dk whose child node is b3, and the layer 4 contains a node represented by p9 whose child node is dk. The region corresponding to the node denoted by p9 is a region denoted by the GeoHash code string kjb3 dkp.
Since the number of characters corresponding to the nodes in each layer is the same, the character representing the node of any one node in one layer is the same as the number of character strings composed of the character representing the parent node of the node and the character of the grandparent node. In other words, the range of the region corresponding to any one node in one layer is the same, and accordingly, the accuracy corresponding to the region corresponding to any one node in one layer is the same. Therefore, the accuracy with which the region corresponds to the node in one layer can be taken as the clustering accuracy, and thus, each layer corresponds to one clustering accuracy. In some optional implementation manners of this embodiment, in the tree structure clustering database, the number of layers of a layer corresponding to each character in the GeoHash encoded string in the tree structure clustering database corresponds to an order ordinal of each character in the GeoHash encoded string, a region corresponding to a node is a region represented by a path from a node in the tree structure clustering database to a node in a first layer, the layers of the nodes in the path are different, and a relationship between adjacent nodes in the path is a parent-child relationship.
In the tree structure clustering database, each layer corresponds to one clustering precision, characters in the GeoHash code character string correspond to nodes, each layer comprises one or more nodes, and one node can correspond to one character. A node may contain multiple children, with a node having only one parent. Parent-child relationships between nodes in the tree-structured clustering database may represent containment relationships between regions to which the nodes correspond. Each level in the tree-structured clustering database may contain a plurality of nodes represented by the same character, and the parent node of each node represented by the same character is different. There are one or more POI samples within the region corresponding to the nodes in each layer.
In this embodiment, the region corresponding to each node in the tree-structured clustering database is a region represented by a GeoHash code string corresponding to a path from the node to a first-layer node in the tree-structured clustering database, layers in which each node on the path is located are different, and a relationship between adjacent nodes on the path is a parent-child relationship.
In this embodiment, since one node has only one parent node, the path from the node to the node of the top layer is unique. The GeoHash coding string corresponding to the path from the node to the first layer node may be a string obtained by splicing a character corresponding to the first layer node as a first character with a character corresponding to the first layer node on the path, a node from the layer where the node is located to the layer between the first layer node and the first layer node, and a character corresponding to the node.
In this embodiment, each layer of the tree structure clustering database corresponds to one clustering precision, and the clustering precision corresponding to each layer can be represented by a corresponding error when the number of characters included in the GeoHash encoded character string is the order of the layer.
For example, in the GeoHash algorithm, when the number of characters included in the GeoHash code string is 2, the corresponding error is-630 km-630km, and the clustering precision corresponding to the layer 2 of the tree-structured clustering database can be represented by the error. When the clustering accuracy required for clustering the POI samples is-630 km-630km, which is a corresponding error when the number of characters included in the GeoHash code string is 2, the layer 2 of the tree structure clustering database corresponding to the clustering accuracy can be used as a layer corresponding to the clustering accuracy required for clustering the POI samples in the tree structure clustering database.
In this embodiment, the node data of a node may include the number of POI samples and the number of times of query of the POI samples whose geographic positions are within the area corresponding to the node. The node data of the lowest level node in the tree-structured clustering database may include POI samples.
Please refer to fig. 2, which shows a structural diagram of a tree-structured clustering database.
It should be understood that the nodes represented by different characters in each layer are only exemplarily shown in fig. 2. In a tree-structured clustering database, each layer may contain multiple nodes represented by the same character. For example, in the level 2 of the tree-structured clustering database, a plurality of nodes h may be included, each having a parent node and each having a different parent node.
Taking the node 0, the node h, the node 2, and the node 5 shown in fig. 2 as an example, the inclusion relationship between the areas corresponding to the nodes will be described. g represents the layer of the tree structure clustering database, one layer corresponds to a clustering precision, and the clustering precision can be represented by the corresponding error when the number of the characters contained in the GeoHash code character string is the sequence. For example, g =1, the clustering precision is-2500 km-2500km corresponding to the character number 1 of the GeoHash code character string, g =2, and the clustering precision is-630 km-630km corresponding to the character number 2 of the GeoHash code character string.
The child nodes of the node 0 at the 1 st layer of the tree-structured clustering database comprise a node h at the 2 nd layer, and the region corresponding to the node 0 is a region represented by a GeoHash code character string, namely the GeoHash code character string 0, corresponding to a path from the node 0 to the node 0 at the 1 st layer of the tree-structured clustering database. The region corresponding to the node h is a region represented by a GeoHash code character string corresponding to a path from the node h to a node 0 on the 1 st layer of the tree structure clustering database, namely the GeoHash code character string 0 h. Since the region represented by the GeoHash code string 0h is a sub-region of the 32 sub-regions, the region corresponding to the node h is a sub-region of the region corresponding to the node 0.
The child nodes of the node h located at the 2 nd level of the tree-structured clustering database include the node 2 and the node 5 located at the third level of the tree-structured clustering database. The region corresponding to the node 2 is a region represented by a GeoHash code character string corresponding to a path from the node 2 to a node 0 of the layer 1 of the tree-structured clustering database. The path from the node 2 to the node 0 of the first layer of the tree structure cluster database comprises the node 2, the node h and the node 0, the GeoHash code character string corresponding to the path from the node 2 to the node 0 of the 1 st layer of the tree structure cluster database is 0h2, and the area corresponding to the node 2 is the area represented by the GeoHash code character string 0h 2. Similarly, the region corresponding to the node 5 is the region represented by the GeoHash code string 0h 5. Since the region represented by the GeoHash code string 0h2 and the region represented by the GeoHash code string 0h5 are sub-regions obtained by dividing the region represented by the GeoHash code string 0h into 32 sub-regions, the region corresponding to the node 2 and the region corresponding to the node 5 are sub-regions of the region corresponding to the node h.
N =33 of the node h located at the 2 nd level of the tree-structured clustering database indicates that the number of POI samples whose geographical positions are within the area corresponding to the node h is 33. In other words, when the tree structure clustering database is constructed, after 33 POI samples are encoded with the longitude and latitude of the geographic position of the POI sample, the 2 nd character in the corresponding GeoHash encoding character string is the character h.
In some optional implementation manners of this embodiment, when the tree structure clustering database is constructed by the GeoHash code strings corresponding to the POI samples, a plurality of POI samples may be obtained in advance, and the tree structure clustering database is constructed by using the POI samples. And respectively coding the longitude and the latitude corresponding to the geographic position of each POI sample by adopting a GeoHash algorithm according to a preset coding length to obtain a GeoHash coding character string corresponding to each POI sample, wherein the preset coding length is equal to the number of layers of the tree structure clustering database.
For example, if the preset coding length is 8, the length of the GeoHash coding string obtained by coding the longitude and latitude of the geographic position of the POI sample is 8, that is, the GeoHash coding string corresponding to the POI sample contains 8 characters, and the number of layers in the tree-structured clustering database is 8.
After obtaining the GeoHash code string corresponding to each POI sample, the following operations may be performed on the GeoHash code string corresponding to each POI sample: and determining a path corresponding to the GeoHash code character string corresponding to the POI sample in the tree structure clustering database, wherein the layer, in the tree structure clustering database, of the node represented by the characters in each GeoHash code character string on the path corresponds to the sequence order of the characters in the GeoHash code character string.
When determining the path corresponding to the GeoHash code string corresponding to the POI sample in the tree structure cluster database, the node on the path corresponding to the GeoHash code string corresponding to the POI sample in the tree structure cluster database may be first determined, where each node on the path is a node represented by one character in the GeoHash code string corresponding to the POI sample, and a layer in the tree structure cluster database where the node on the path is located corresponds to an order bit of the character corresponding to the node in the GeoHash code string, so that the path corresponding to the GeoHash code string corresponding to the POI sample in the tree structure cluster database may be determined.
In this embodiment, the GeoHash code strings corresponding to the POI samples are in a parent-child relationship between adjacent nodes on a corresponding path in the tree-structured clustering database, and the relationship between nodes represented by characters in each GeoHash code string located on the path in the tree-structured clustering database is in a parent-child relationship.
For example, after the longitude and latitude of the geographic position of a POI sample are encoded by using an 8-bit encoding length, the obtained GeoHash encoding string corresponding to the POI sample is kjb3 dkp. The GeoHash code string kjb dkp includes a node represented by a character k, a node represented by a character j, a node represented by a character b, a node represented by a character 3, a node represented by a character d, a node represented by a character p, and a node represented by a character 9 on a corresponding path in the tree-structured clustering database. The node represented by the character k on the path should be at the level 1 of the tree-structured clustering database, the node represented by the character j on the path should be at the level 2 of the tree-structured clustering database, and so on, the level of the tree-structured clustering database where the node represented by each character on the path should be can be determined respectively.
Since one node in the tree-structured cluster database has only one parent node, the corresponding path of the GeoHash code string kjb3dkp in the tree-structured cluster database is unique. Thus, in the case where a plurality of nodes represented by the same character are included in each layer of the tree-structured cluster database, it is possible to accurately determine the nodes represented by the characters on the path that should be included in each layer of the tree-structured cluster database. Then, whether a node represented by the character on the path already exists in each layer of the tree-structured clustering database is judged.
For example, for a node containing a plurality of characters 9 in the 8 th layer of the tree structure cluster database, the node represented by the character 9 on the corresponding path of the GeoHash code string kjb3dkp in the tree structure cluster database should be in the 8 th layer of the tree structure cluster database, that is, the 8 th layer of the tree structure cluster database should contain the node represented by the character 9 on the path. It can be determined whether there is already a node represented by character 9 on the path at level 8 of the tree-structured clustering database.
When the node represented by the character on the route does not exist in the layer where the node represented by the character on the route should exist in the tree-structured clustering database, the node represented by the character on the route and the number of POI samples in the node data of the node represented by the character on the updated route are created in the layer.
And when the node represented by the character on the path exists in the layer where the node represented by the character on the path should be in the tree-structured clustering database, updating the POI sample number in the node data of the node represented by the character on the path.
For example, the node represented by the character 9 on the route should be at the 8 th level in the tree-structured clustering database, and when the node 9 represented by the character 9 on the route does not exist in the 8 th level of the tree-structured clustering database, the node 9 represented by the character 9 on the route is created in the 8 th level, and the number of POI samples in the node data of the node 9 represented by the node 9 on the route in the tree-structured clustering database is increased by 1.
When the node 9 represented by the character 9 on the route exists in the 8 th layer of the tree-structured clustering database, 1 is added to the number of POI samples in the node data of the node 9 represented by the character 9 on the 8 th layer on the route.
Please refer to fig. 3, which shows a schematic diagram of the POI sample participating in the construction of the tree-structured clustering database.
In fig. 3, 301 indicates that the GeoHash code string kjb3dkp corresponding to the POI sample is connected between adjacent nodes on the corresponding path in the tree-structured cluster database, and g indicates the layer of the tree-structured cluster database.
According to the sequence of the characters k, j, b, 3, d, k, p and 9 in the GeoHash coding character string kjb dkp, the node k, the node j, the node b, the node 3, the node d, the node k, the node p and the node 9 on the path are respectively positioned at 1-8 layers in the tree structure clustering database, and the relationship between adjacent nodes on the path is a parent-child relationship.
Since one node in the tree-structured cluster database has only one parent node, the corresponding path of the GeoHash code string kjb3dkp in the tree-structured cluster database is unique. Therefore, under the condition that each layer in the tree structure clustering database contains a plurality of nodes represented by the same character, the nodes represented by the characters on the path can be accurately found. For example, for a node represented by a plurality of characters 9 in the 8 th layer of the tree-structured clustering database, since the node 9 to be searched is a node on the path corresponding to kjb3dkp in the tree-structured clustering database, the node 9 is accurately searched, and the number of POI samples in the node data of the node 9 is added by 1.
And respectively finding a node k positioned in the layer 1 on the path, a node j positioned in the layer 2 on the path, a node b positioned in the layer 3 on the path, a node 3 positioned in the layer 4 on the path, a node d positioned in the layer 5 on the path, a node k positioned in the layer 6 on the path, a node p positioned in the layer 7 on the path and a node 9 positioned in the layer 8 on the path in the clustering knot. The number of POI samples in the node data of the node k at the 1 st level, the node j at the 2 nd level, the node b at the 3 rd level, the node 3 at the 4 th level, the node d at the 5 th level, the node k at the 6 th level, the node p at the 7 th level, and the node 9 at the 8 th level on the route is added by 1.
In some optional implementation manners of this embodiment, determining a corresponding layer of clustering precision required for clustering the POI samples in the tree structure clustering database includes: encoding longitude and latitude corresponding to the geographic position of the POI sample to be queried by adopting a GeoHash algorithm with a preset encoding length to obtain a GeoHash encoding character string corresponding to the POI sample to be queried; determining a node corresponding to the character in the GeoHash code character string in a tree structure clustering database; determining the character length required by the clustering precision of POI sample clustering in a GeoHash coding character string; and querying nodes represented by characters in the GeoHash coded character string in a tree structure clustering database until a target layer corresponding to the character length is reached.
And 102, selecting a node for clustering from the target layer, and clustering POI samples in an area corresponding to the node to obtain a clustering result.
In this embodiment, after determining the layer corresponding to the clustering precision required for clustering the POI samples in the tree structure clustering database in step 101, the node used for clustering can be selected from the target layer corresponding to the tree structure clustering database according to the clustering precision required for clustering the POI samples, and the POI samples in the region corresponding to the node are clustered, that is, the POI samples in the region corresponding to the selected node are clustered, so as to obtain a clustering result.
For example, when the clustering accuracy required for clustering the POI samples is-630 km-630km, which is an error corresponding to the case where the number of characters included in the GeoHash code string is 2, the clustering accuracy corresponds to the layer 2 of the tree-structured clustering database, and the layer 2 of the tree-structured clustering database is used as a layer corresponding to the clustering accuracy required for clustering the POI samples in the tree-structured clustering database. And selecting a node for clustering from the 2 nd layer of the tree structure clustering database, and clustering POI samples of which the geographic positions are in the area corresponding to the selected node to obtain a clustering result.
Please refer to fig. 4, which shows a schematic diagram of clustering POI samples.
In fig. 4, it is shown that the nodes 401, g finally used for clustering represent the layers of the tree-structured clustering database, and the clustering precision required for clustering POI samples is-630 km-630km, which is the corresponding error when the number of characters included in the GeoHash code string is 2, and the clustering precision corresponds to the layer 2 of the tree-structured clustering database.
When the POI samples are clustered, a node h, a node m and a node n are selected from the layer 2 and used as nodes for clustering. The number of POI samples in the node data of the node h is 33, that is, the number of POI samples whose geographical positions are in the area corresponding to the node h, that is, the area represented by the GeoHash code string 0h is 33. The number of POI samples in the node data of the node m is 20, that is, the number of POI samples whose geographical positions are within the area corresponding to the node m, that is, the area represented by the GeoHash code string 9m is 20. The number of POI samples in the node data of the node n is 13, that is, the number of POI samples whose geographic positions are in the region corresponding to the node n, that is, the region represented by the GeoHash encoding string 9n is 13.
And clustering POI samples of the geographical position in the area corresponding to the selected node h to obtain a clustering result corresponding to the node h, wherein the clustering result corresponding to the node h comprises 33 POI samples. And clustering POI samples of the geographical position in the selected area corresponding to the node m to obtain a clustering result corresponding to the node m, wherein the clustering result corresponding to the node m comprises 20 POI samples. And clustering POI samples of the geographical position in the area corresponding to the selected node n to obtain a clustering result corresponding to the node n, wherein the clustering result corresponding to the node n comprises 13 POI samples.
The clustering result corresponding to the clustering precision comprises the following steps: and clustering results corresponding to the node h, the node m and the node n respectively.
In some optional implementation manners of this embodiment, when the number of POI samples of the geographic location in the area corresponding to the selected node is smaller than the number threshold, clustering the POI samples of the geographic location in the area corresponding to the selected node to obtain a clustering result corresponding to the clustering accuracy; and when the number of the POI samples of the geographic position in the area corresponding to the selected node is equal to or larger than the number threshold, clustering the POI samples of the geographic position in the area corresponding to the sub-node of the selected node to obtain a clustering result corresponding to the clustering precision.
Please refer to fig. 5, which shows another schematic diagram of clustering POI samples.
In fig. 5, the nodes 501, g finally used for clustering are shown to represent the layers of the tree-structured clustering database, and the clustering precision required for clustering the POI samples is from-630 km to 630km, which is the corresponding error when the number of characters contained in the GeoHash code string is 2, and the clustering precision corresponds to the layer 2 of the tree-structured clustering database.
When the POI samples are clustered, a node h, a node m and a node n are selected from the layer 2 for clustering. The number of POI samples in the node data of the node h is 33, the number of POI samples in the node data of the node m is 20, and the number of POI samples in the node data of the node n is 13.
Since the number of POI samples of the geographical position in the area corresponding to the node h is greater than the threshold, the POI samples of the geographical position in the area corresponding to the child nodes 2 and 5 of the node h may be clustered. And clustering POI samples of the geographic position in the region corresponding to the node 2 to obtain a clustering result corresponding to the node 2, wherein the clustering result corresponding to the node 2 comprises 18 POI samples. And clustering POI samples of the geographic position in the area corresponding to the node 5 to obtain a clustering result corresponding to the node 5, wherein the clustering result corresponding to the node 5 comprises 15 POI samples.
And clustering POI samples of the geographic position in the area corresponding to the node m to obtain a clustering result corresponding to the node m, wherein the clustering result corresponding to the node m comprises 20 POI samples. And clustering POI samples of the geographic position in the area corresponding to the node n to obtain a clustering result corresponding to the node n, wherein the clustering result corresponding to the node n comprises 13 POI samples.
After the POI samples are clustered, a clustering result corresponding to the clustering precision can be obtained, wherein the clustering result corresponding to the clustering precision comprises: a clustering result corresponding to the node 2, a clustering result corresponding to the node 5, a clustering result corresponding to the node m, and a clustering result corresponding to the node n.
The present application further provides an electronic device that may be configured with one or more processors; a memory for storing one or more programs, the one or more programs may include instructions for performing the operations described in steps 101-102 above. The one or more programs, when executed by the one or more processors, cause the one or more processors to perform the operations described in steps 101-102 above.
The present application also provides a computer readable medium, which may be included on an electronic device; or the device can be independently arranged and not assembled on the electronic equipment. The computer readable medium carries one or more programs which, when executed by the server, cause the electronic device to: determining a corresponding target layer of clustering precision required for clustering the POI samples in a tree structure clustering database; and selecting a node for clustering from the target layer, and clustering POI samples in the region corresponding to the node to obtain a clustering result.
It should be noted that the computer readable medium can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the present application. Such as those described above, are interchangeable with other features disclosed herein (but not limited to) having similar functionality.

Claims (7)

1. A method for extremely-fast GeoHash clustering is applied to electronic equipment and is characterized in that the method is applied to services based on geographic positions and comprises the following steps:
determining a corresponding target layer of the clustering precision required for clustering the POI samples in the tree structure clustering database;
selecting a node for clustering from the target layer, and clustering POI samples in the region corresponding to the node to obtain a clustering result;
the tree structure clustering database is constructed by GeoHash coding character strings corresponding to POI samples and comprises a plurality of layers, the length of the GeoHash coding character strings corresponds to the number of the layers of the tree structure clustering database, each layer corresponds to a clustering precision, characters in the GeoHash coding character strings correspond to nodes, each layer comprises one or more nodes, and one or more POI samples exist in a corresponding region of each node;
the POI sample comprises the geographic location of the corresponding POI, and the geographic location of the POI comprises longitude and latitude;
the tree structure clustering database is constructed by GeoHash coding character strings corresponding to POI samples and comprises the following steps:
and encoding the longitude and the latitude corresponding to the geographic position of each POI sample by adopting a GeoHash algorithm according to a preset encoding length.
2. The method according to claim 1, wherein in the tree-structured clustering database, the number of layers of the corresponding layers of each character in the GeoHash code string in the tree-structured clustering database corresponds to the sequence order of the characters in the GeoHash code string, the node corresponding region is a region represented by a path from a node in the tree-structured clustering database to a first layer node, the layers of each node on the path are different, and the relationship between adjacent nodes on the path is a parent-child relationship.
3. The method as claimed in claim 1, wherein the tree-structured clustering database is constructed by a GeoHash code string corresponding to the POI sample, and comprises:
acquiring a plurality of POI samples;
obtaining a GeoHash code character string corresponding to each POI sample, wherein the length of the GeoHash code character string corresponds to the number of layers of a tree structure clustering database;
performing the following operations on the GeoHash coding string corresponding to each POI sample:
determining a path of a node represented by characters in a GeoHash code character string corresponding to the POI sample in a tree structure clustering database;
when the characters representing the nodes do not exist in the path, creating new character nodes in layers corresponding to the characters, and updating the number of POI samples corresponding to the nodes in each layer on the path;
and when the characters representing the nodes exist on the path, updating the number of POI samples corresponding to the nodes on each layer on the path.
4. The method according to claim 1 or 2, wherein the determining of the corresponding target layer of the clustering precision required for clustering the POI samples in the tree-structured clustering database comprises:
encoding longitude and latitude corresponding to the geographic position of the POI sample to be queried by adopting a GeoHash algorithm with a preset encoding length to obtain a GeoHash encoding character string corresponding to the POI sample to be queried;
determining a node corresponding to the character in the GeoHash code character string in a tree structure clustering database;
determining the character length required by the clustering precision of POI sample clustering in a GeoHash coding character string;
and querying nodes represented by characters in the GeoHash code character string in a tree structure clustering database until a target layer corresponding to the character length is reached.
5. The method of claim 3, wherein selecting a node for clustering from the target layer, and clustering POI samples in a region corresponding to the node to obtain a clustering result comprises:
when the number of POI samples in the area corresponding to the selected node is smaller than a preset number threshold, clustering the POI samples in the area corresponding to the selected node to obtain a clustering result corresponding to the clustering precision;
and when the number of the POI samples in the area corresponding to the selected node is equal to or larger than a number threshold value, clustering the POI samples in the area corresponding to the sub-node of the selected node to obtain a clustering result corresponding to the clustering precision.
6. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-5.
7. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN201710527438.9A 2017-06-30 2017-06-30 Extremely-fast geographic GeoHash clustering method Active CN107330466B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710527438.9A CN107330466B (en) 2017-06-30 2017-06-30 Extremely-fast geographic GeoHash clustering method
PCT/CN2018/089639 WO2019001223A1 (en) 2017-06-30 2018-06-01 Extreme geographical geohash clustering method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710527438.9A CN107330466B (en) 2017-06-30 2017-06-30 Extremely-fast geographic GeoHash clustering method

Publications (2)

Publication Number Publication Date
CN107330466A CN107330466A (en) 2017-11-07
CN107330466B true CN107330466B (en) 2023-01-24

Family

ID=60199544

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710527438.9A Active CN107330466B (en) 2017-06-30 2017-06-30 Extremely-fast geographic GeoHash clustering method

Country Status (2)

Country Link
CN (1) CN107330466B (en)
WO (1) WO2019001223A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330466B (en) * 2017-06-30 2023-01-24 上海连尚网络科技有限公司 Extremely-fast geographic GeoHash clustering method
CN109299747B (en) * 2018-10-24 2020-12-15 北京字节跳动网络技术有限公司 Method and device for determining cluster center, computer equipment and storage medium
CN109992638B (en) * 2019-03-29 2020-11-20 北京三快在线科技有限公司 Method and device for generating geographical position POI, electronic equipment and storage medium
CN111259076A (en) * 2020-01-14 2020-06-09 航科院中宇(北京)新技术发展有限公司 Cluster storage method of airborne navigation data
CN113378922B (en) * 2021-06-09 2022-07-15 南京邮电大学 GeoHash-based geographic coordinate point density clustering method
CN113868487B (en) * 2021-09-29 2024-06-07 平安银行股份有限公司 Method, device, equipment and medium for selecting member based on GeoHash address codes
CN115827814B (en) * 2023-02-13 2023-06-06 深圳市泰比特科技有限公司 Method, system and related equipment for loading and displaying vehicle points in visual field area

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2926914A1 (en) * 2008-01-28 2009-07-31 Viamichelin Soc Par Actions Si Geocoding method for digital road network system, involves dividing initial zone to obtain sub-zones, where dividing phase is recursively performed by generating small sub-zones according to reduction criteria till stop threshold is reached
CN103383682B (en) * 2012-05-01 2017-12-26 刘龙 A kind of Geocoding, position enquiring system and method
CN107589855B (en) * 2012-05-29 2021-05-28 阿里巴巴集团控股有限公司 Method and device for recommending candidate words according to geographic positions
EP2973041B1 (en) * 2013-03-15 2018-08-01 Factual Inc. Apparatus, systems, and methods for batch and realtime data processing
US9471596B2 (en) * 2013-08-13 2016-10-18 Mapquest, Inc. Systems and methods for processing search queries utilizing hierarchically organized data
CN105701123B (en) * 2014-11-27 2019-07-16 阿里巴巴集团控股有限公司 The recognition methods of man-vehicle interface and device
IL238562B (en) * 2015-04-30 2019-05-30 Verint Systems Ltd System and method for spatial clustering using multiple-resolution grids
CN105677804B (en) * 2015-12-31 2020-08-07 百度在线网络技术(北京)有限公司 Method and device for determining authoritative site and establishing database of authoritative site
CN106528597B (en) * 2016-09-23 2019-07-05 百度在线网络技术(北京)有限公司 The mask method and device of point of interest
CN107330466B (en) * 2017-06-30 2023-01-24 上海连尚网络科技有限公司 Extremely-fast geographic GeoHash clustering method

Also Published As

Publication number Publication date
WO2019001223A1 (en) 2019-01-03
CN107330466A (en) 2017-11-07

Similar Documents

Publication Publication Date Title
CN107330466B (en) Extremely-fast geographic GeoHash clustering method
US11461289B2 (en) Apparatus, systems, and methods for providing location information
CN107609186B (en) Information processing method and device, terminal device and computer readable storage medium
US20230161822A1 (en) Fast and accurate geomapping
US9435878B1 (en) Positioning using audio recognition
KR101841751B1 (en) Callpath finder
US9747380B2 (en) Grid-based geofence data indexing
CN109376761A (en) The method for digging and device of a kind of address mark and its longitude and latitude
CN108304423A (en) A kind of information identifying method and device
CN108763522A (en) POI retrieval orderings method, apparatus and computer readable storage medium
WO2021072874A1 (en) Dual array-based location query method and apparatus, computer device, and storage medium
CN108304585B (en) Result data selection method based on space keyword search and related device
US20190020573A1 (en) Identifying shortest paths
US20170303106A1 (en) Enhanced spatial index for point in polygon operations
US20170103099A1 (en) Database table data fabrication
Abdolmajidi et al. Matching authority and VGI road networks using an extended node-based matching algorithm
CN108345629A (en) Scenic region guide method and device
CN106776348B (en) Test case management method and device
CN113569564B (en) Address information processing and displaying method and device
CN103064872A (en) Processing search queries in a network of interconnected nodes
CN110209829B (en) Information processing method and device
KR101499842B1 (en) Method and Apparatus for searching for data object
CN110619086B (en) Method and apparatus for processing information
CN117215644A (en) Data processing method, device, electronic equipment and storage medium
JP6065708B2 (en) Information processing method, apparatus and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant