CN114791942B - Spatial text density clustering retrieval method - Google Patents

Spatial text density clustering retrieval method Download PDF

Info

Publication number
CN114791942B
CN114791942B CN202210704570.3A CN202210704570A CN114791942B CN 114791942 B CN114791942 B CN 114791942B CN 202210704570 A CN202210704570 A CN 202210704570A CN 114791942 B CN114791942 B CN 114791942B
Authority
CN
China
Prior art keywords
text
cluster
distance
clusters
objects
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210704570.3A
Other languages
Chinese (zh)
Other versions
CN114791942A (en
Inventor
李晓涛
王艺沾
朱海平
罗昌银
张卫平
金炯华
倪明堂
黄培
吴淑敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Intelligent Robotics Institute
Original Assignee
Guangdong Intelligent Robotics Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Intelligent Robotics Institute filed Critical Guangdong Intelligent Robotics Institute
Priority to CN202210704570.3A priority Critical patent/CN114791942B/en
Publication of CN114791942A publication Critical patent/CN114791942A/en
Application granted granted Critical
Publication of CN114791942B publication Critical patent/CN114791942B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions

Abstract

The invention discloses a spatial text density clustering retrieval method, which comprises the following steps: acquiring road network information and query keywords; the road network information includes a text object; constructing a mixed index structure based on the inverted file and according to the road network information; the inverted file comprises a publishing list of the query keywords; obtaining an object set according to the mixed index structure and the issuing list of the query keywords; calculating the shortest path between any two text objects in the object set and the number of the text objects on the shortest path; determining the mutual reachable distance of any two text objects; establishing a minimum spanning tree according to the mutual reachable distance between any two text objects, and storing the minimum spanning tree to a queue after processing; and taking the value in the queue as a retrieval result. The method can solve the problem of top-K space text clustering retrieval.

Description

Spatial text density clustering retrieval method
Technical Field
The invention relates to the technical field of space keyword retrieval, in particular to a space text density clustering retrieval method.
Background
The spatial keyword query takes a position and a group of keywords as parameters, returns objects related to the parameters, and plays an indispensable role in geographic text information retrieval and personalized services. The location in the query represents the user's location intent, while the keywords describe the user's actual needs.
In recent years, spatial keyword queries have become a hot direction in the research community, and many different types of spatial keyword queries have been proposed. However, these queries are all different from the top-k space text clustering search query problem, the top-k space text clustering search returns k sets of space text objects containing the search keywords, each set is a cluster implemented by a density function, so that each cluster contains related space web objects related to the search keywords, and the density of each cluster satisfies the query constraints; and establishing a cost function according to the space distance of the clusters and the text correlation of the query parameters, and sequencing the clusters. This design allows the shape of the recovery area to no longer be a fixed size rectangle or circle, while enhancing the robustness of the algorithm.
However, the existing spatial text clustering retrieval method only focuses on Euclidean space, and ignores the actual distance to the target. In practical application, the position and accessibility of the space text object are limited by network connectivity, and on the premise, the method for solving the problem of road network Top-k space text clustering query has practical value and significance.
Disclosure of Invention
Therefore, the technical problem to be solved by the present invention is to overcome the defects in the prior art, and to provide a spatial text density clustering retrieval method.
The invention provides a spatial text density clustering retrieval method, which comprises the following steps:
acquiring road network information and query keywords; the road network information includes a text object;
constructing a mixed index structure based on the inverted file and according to the road network information; the inverted file comprises a publishing list of the query keywords;
the mixed index structure is used for organizing the text objects in the road network information and storing the text objects; obtaining an object set according to the mixed index structure and the distribution list of the query keywords;
calculating the shortest path between any two text objects in the object set and the number of the text objects on the shortest path; determining the mutual reachable distance of any two text objects according to the shortest path and the number of the text objects on the shortest path; establishing a minimum spanning tree according to the mutual reachable distance between any two text objects, and storing the minimum spanning tree to a queue after processing; and taking the value in the queue as a retrieval result.
Preferably, a mixed index structure is constructed based on the road network information, the G tree and the inverted file;
the process of obtaining the hybrid index structure is as follows: constructing a G tree according to road network information, adding a distance matrix and a pointer pointing to an inverted file for each node in the G tree from bottom to top on the basis of the G tree, and constructing a mixed index structure;
text objects are stored in leaf nodes of the mixed index structure.
Preferably, any two text objects are respectively marked as a and b; then the mutual reachable distance of any two text objects is denoted as dmreach-k (a, b); the calculation formula is as follows:
Figure 667358DEST_PATH_IMAGE001
wherein the core k (a) Representing the spatial distance between the text object a and the k nearest neighbor text object; core k (b) Representing the spatial distance between the text object b and the kth nearest neighbor text object; d (a, b) represents the road network distance between the text object a and the text object b.
Preferably, a k-order triangulation structure is adopted, a subgraph is formed according to the mutual reachable distance of any two text objects, and then a minimum spanning tree is established.
Preferably, the specific process of establishing the minimum spanning tree is as follows:
regarding the minimum spanning tree as a weighted graph, wherein the text objects are used as vertexes, and the mutual reachable distance between any two text objects is used as the weight of an edge between any two text objects; reducing the edges between any two text objects which need to be considered for establishing a minimum spanning tree by adopting a k-order triangulation structure; the remaining edges in the weighted graph and the text objects will form subgraphs from which the minimum spanning tree is built.
Preferably, the set of objects is a set of objects related to the query keyword.
Preferably, compressing the minimum spanning tree, extracting density clusters, and storing the density clusters into a queue; and selecting the density clusters in the queue as retrieval results.
Preferably, density clusters include cluster stability; the stability of the clusters was noted:
Figure 164198DEST_PATH_IMAGE002
wherein, in the step (A),
Figure DEST_PATH_IMAGE003
(ii) a The persistence of the clusters is noted as:
Figure 17885DEST_PATH_IMAGE003
Figure 676399DEST_PATH_IMAGE004
the inverse of distance when the node p representing the minimum spanning tree under the current cluster is separated from the current cluster,
Figure 45064DEST_PATH_IMAGE005
the reciprocal of distance when the current cluster is generated by splitting is represented, the distance represents the size of an edge in a minimum spanning tree corresponding to the current cluster when the current cluster is separated, lambda represents the continuity of the cluster, and lambda is the reciprocal of the distance; the increase of lambda is in direct proportion to the reduction of the cluster, and the cluster is continuously reduced until the cluster disappears or is split into sub-clusters; cluster denotes the current cluster.
Preferably, the process of extracting density clusters is as follows: all edges in the minimum spanning tree are subjected to incremental sequencing, and for each edge, a parallel search set is adopted to combine two subgraphs with edge links, so that the minimum spanning tree is compressed, and the minimum spanning tree is compressed and converted into a tree structure; traversing the tree structure from the leaf nodes to the top from bottom to top, calculating the stability of all clusters in the tree structure, and extracting the cluster with the best stability; when the sum of the stability of the sub-clusters is larger than the stability of the clusters, replacing the stability of the clusters with the sum of the stability of the sub-clusters; otherwise, merging all the sub-clusters; and when traversing to the root node of the tree structure, taking the extracted cluster as a density cluster.
Preferably, the density clusters selected in the queue are screened through a cost function, and the cost function is recorded as: cost; the cost function calculation formula is as follows:
Figure 876753DEST_PATH_IMAGE006
wherein, alpha is (0, 1)]Indicates a user preference, tr q.ψ (R) represents the maximum text relevance value of the text object in the density cluster.
The technical scheme of the invention has the following advantages: the efficient pruning of the spatial information can be realized through the mixed index structure, and the efficient retrieval can be further realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is an exemplary diagram of k-STC on a road network in an implementation of the present invention;
FIG. 2 is a flow chart of a retrieval method in an implementation of the present invention;
FIG. 3 is a schematic diagram of a road network in which the present invention is implemented;
FIG. 4 is a schematic view of the road network shown in FIG. 3 after being divided;
FIG. 5 is a diagram illustrating a hybrid index structure in accordance with an embodiment of the present invention;
FIG. 6 is a diagram illustrating inverted files in the hybrid index structure shown in FIG. 5;
FIG. 7 is a diagram illustrating distance matrices and shortcut keys in the hybrid index structure shown in FIG. 5.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in fig. 1, there are 8 objects thereino 1 ,o 2 ,o 3 ,…,o 8 Dispersed in the road network, each object has a separate keyword as a tag. In fig. 1 a set of solutions is searched for containing restaurants, hotels and shops, and if the actual distance between objects is not taken into account, four objects in the dashed box in the lower right cornero 2 ,o 4 ,o 5 ,o 7 The optimal result is obtained; in practice, however, the distance between objects is determined by the shortest distance between them, and the query point cannot reach directly across the highwayo 2 ,o 4 ,o 5 The position of the four objects in the upper left-hand dotted line boxo 1 ,o 3 ,o 6 ,o 8 May be a better choice. Therefore, the problem of spatial text clustering query on a research road network cannot ignore the actual distance to the target.
As shown in fig. 2, this embodiment provides a spatial text density clustering retrieval method, which adopts an HDBSCAN clustering method, and the retrieval method includes the steps of:
acquiring road network information and query keywords; the road network information includes a text object;
as shown in FIG. 3, the present implementation provides a road network comprising 14 nodes and 17 edges, among others11 text objects are distributed in the road network;v 0 ,v 1 ,v 2 ,…,v 13 representing nodes on a road network;o 1 ,o 2 ,…,o 11 representing a text object;
table 1 represents a document vector of a text object;
Figure 647263DEST_PATH_IMAGE007
the document vector contained in each text object can be known from table 1.
Constructing a mixed index structure based on the inverted file and according to the road network information; the inverted file comprises a publishing list of the query keywords;
in this embodiment, the inverted file further includes a query keyword, an object containing the query keyword, and a weight (frequency) thereof.
Specifically, a hybrid index structure is constructed based on road network information, a G tree and inverted files;
the process of obtaining the hybrid index structure is as follows: constructing a G tree according to the road network information; and adding a distance matrix and a pointer of the pointed inverted file to each node in the G tree from bottom to top based on the G tree.
The leaf nodes are also stored with text objects, the distance matrix in the leaf nodes stores the distance from the top point to the boundary point in the corresponding subgraph, and the pointed reverse file comprises the keywords and the weight of all related text objects in the corresponding subgraph; the pointed inverted file indexes the text information of all the text objects stored in the leaf nodes;
for non-leaf nodes, the distance matrix stores the shortest distance between the boundary points of all child nodes of the non-leaf nodes; the pointed inverted file stores keywords contained in child nodes in the non-leaf nodes and the maximum weight of the keywords; and constructing a mixed index structure based on the process.
As shown in FIG. 4, the G-tree divides the road network into severalSub-graphs of almost the same size are created and a tree is created for the sub-graphs. G-tree at selected nodes on the basis of G-tree(s) ((v 3 ,v 4 ,v 5 ,v 6 ,v 7 ,v 8 ,v 9 ,v 10 ,v 11 ,v 12 ,v 13 ) And each shortcut stores the distance between the boundaries of two nodes, and the shortest path distance between two vertexes is calculated efficiently. The nodes are grouped on the basis of FIG. 3, and can be divided into G tree according to the division rule of G treeG 1G 2 (ii) a Then, the number of each division is 2 after the specification, the maximum number of the node trees contained in each group is 4, and finally, the node trees can be further dividedG 3G 4G 5G 6 And four groups. In the figure, if an edge exists between two groups and the groups can be connected through the edge, two end points of the edge are boundary points of the two groups respectively and are stored in nodes. Taking the example of fig. 4 as an example,v 0 ,v 2 ,v 3 is thatG 5 AndG 6 the boundary point of (a) is,v 0 ,v 1 ,v 4 is thatG 1 AndG 2 the boundary point of (2). G, the tree does not store the distance of each node, but stores the shortest path distances between the boundary points and the boundary points, and the shortest path distances are stored in a distance matrix; for example, forG 5 In other words, the boundary points arev 2 Thus, therefore, it isG 5 Will contain the distance matrix fromv 2 ToG 5 All other nodes in (b), (c), (d) and (d)v 2 ,v 10 ,v 11 ) The shortest path of (2). For G tree, two shortcut keys are usedS 1 AndS 2 connection ofG 3 AndG 5 and anG 4 AndG 6 . Thus, if you want to inquirev 4 Tov 10 Distance between, only access to shortcut keyS 1 And is divided intoThe distance matrix of the group can obtain the query result.
As shown in fig. 5, a structural example of a hybrid index structure (IG tree) is provided; inverted file in FIG. 5: (IF 0IF 1IF 2IF 3IF 4IF 5IF 6 ) As shown in fig. 6; distance matrix in fig. 5 (G 0G 1G 2G 3G 4G 5G 6 ) And a shortcut key (S 1S 2 ) As shown in fig. 7.
All the nodes contained in each group are stored, and a matrix holding nodes to boundary points. Each group also contains a pointer to an inverted file that indexes the textual information of all the textual objects stored in the node. As shown in fig. 5, groupG 3 Containing objectso 6, o 7 ,o 8 (ii) a Group ofG 4 Containing objectso 9, o 10 ,o 11 (ii) a Group ofG 5 Containing objectso 1, o 2 (ii) a Group ofG 6 Containing objectso 3, o 4 ,o 5 (ii) a Each group constitutes a leaf node of the hybrid index structure (IG tree), each leaf node also containing a pointer to the inverted file.
The mixed index structure is used for organizing the text objects in the road network information and storing the text objects; obtaining an object set according to the mixed index structure and the issuing list of the query keywords; the set of objects is a set of objects related to the query keyword.
Calculating the shortest path between any two text objects in the object set and the number of the text objects on the shortest path; determining the mutual reachable distance of any two text objects according to the shortest path and the number of the text objects on the shortest path; establishing a minimum spanning tree according to the mutual reachable distance between any two text objects, compressing the minimum spanning tree, extracting density clusters, and storing the density clusters into a queue; and selecting top-k density clusters in the queue as retrieval results.
In this embodiment, any two text objects are respectively marked as a and b; then the mutual reachable distance of any two text objects is denoted as dmreach-k (a, b); the calculation formula is as follows:
Figure 476679DEST_PATH_IMAGE008
wherein the core k (a) Representing the spatial distance between the text object a and the k nearest neighbor text object; core k (b) Representing the spatial distance between the text object b and the k nearest neighbor text object; d (a, b) represents the road network distance between the text object a and the text object b.
And when the number of the text objects on the shortest path of the two text objects a and b is larger than k-2 (not including a and b), the shortest path between the two text objects is the mutual reachable distance between any two text objects. When a is the center of the circle b, core k (b) Is the shortest path between two text objects within the radius of the text object is core k (b) In that respect When b is at a as the center of the circle core k (a) Is the shortest path between two text objects within the radius of the text object is core k (a)。
Furthermore, a k-order triangulation structure is adopted, a subgraph is formed according to the mutual reachable distance of any two text objects, and then a minimum spanning tree is established.
The specific process of establishing the minimum spanning tree is as follows:
regarding the minimum spanning tree as a weighted graph, wherein the text objects are used as vertexes, and the mutual reachable distance between any two text objects is used as the weight of an edge between any two text objects; reducing the edges between any two text objects which need to be considered for establishing a minimum spanning tree by adopting a k-order triangulation structure; due to the HDBSCAN clustering, the state of a text object can be changed along with the increase of neighbor objects, a new cluster can be generated, and two old clusters can be fused into the new cluster; it is assumed that there is only one circle, p and q are located on the boundary of the circle, and there are at most k points belonging to D inside the circle, which is called a k-order Delaunay edge, abbreviated as k-od edge. According to the definition of the mutual reachable distance of any two text objects, if one edge is not a k-od edge, p and q are already in the same connected component, so that a subgraph only containing the k-od edge is consistent with the minimum spanning tree generated by the original graph, a k-order triangulation structure is adopted for auxiliary calculation, and the minimum spanning tree is constructed in a faster mode; the remaining edges in the weighted graph and the text objects will form subgraphs from which the minimum spanning tree is built.
In this embodiment, density clusters include cluster stability and cluster persistence; the stability of the clusters was noted:
Figure 329710DEST_PATH_IMAGE002
(ii) a The persistence of the cluster is noted as:
Figure 168353DEST_PATH_IMAGE009
Figure 527790DEST_PATH_IMAGE004
the inverse of distance when the node p representing the minimum spanning tree under the current cluster is separated from the current cluster,
Figure 793686DEST_PATH_IMAGE005
the reciprocal of distance when the splitting generates the current cluster is represented, the distance represents the size of the corresponding minimum spanning tree edge when the current cluster is separated, the lambda represents the continuity of the cluster, and the lambda is the reciprocal of the distance. As λ increases (i.e., distance decreases), the clusters become smaller and smaller until they disappear or split into sub-clusters (an increase in λ is proportional to a decrease in the clusters, which decrease until they disappear or split into sub-clusters); cluster denotes the current cluster; by selecting a suitable lambda, a more stable cluster can be selected. And the process of extracting density clusters is as follows: all edges in the minimum spanning tree are subjected to incremental sequencing, and for each edge, a parallel search set is adopted to combine two subgraphs with edge links, so that the minimum spanning tree is compressed, and the minimum spanning tree is compressed and converted into a tree structure; traversing the tree structure from the leaf nodes to the top from the bottom, calculating the stability of all clusters in the tree structure, and extracting the cluster with the best stability; wherein the sum of the stabilities of the sub-clusters is greater than that of the clustersReplacing the stability of the cluster with the sum of the stabilities of the sub-clusters; otherwise, merging all the sub-clusters; and when traversing to the root node of the tree structure, taking the extracted cluster as a density cluster.
For the density clusters selected in the queue, in this embodiment, the density clusters are screened through a cost function, and the cost function is recorded as: cost; the cost function calculation formula is as follows:
Figure 871364DEST_PATH_IMAGE010
wherein, alpha is (0, 1)]Indicates a user preference, tr q.ψ (R) represents the maximum text relevance value of the text object in the density cluster.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims (7)

1. A spatial text density clustering retrieval method is characterized by comprising the following steps:
acquiring road network information and query keywords; the road network information comprises a text object;
constructing a mixed index structure based on the inverted file and according to the road network information; the inverted file comprises a publishing list of the query keywords;
constructing a mixed index structure based on the road network information, the G tree and the inverted file;
the process of obtaining the hybrid index structure is as follows: constructing a G tree according to the road network information, adding a distance matrix and a pointer pointing to an inverted file for each node in the G tree from bottom to top based on the G tree, and constructing the mixed index structure;
the text object is stored in a leaf node of a mixed index structure;
for non-leaf nodes, the distance matrix stores the shortest distance between the boundary points of all child nodes of the non-leaf nodes; the pointed inverted file stores keywords contained in child nodes in the non-leaf nodes and the maximum weight of the keywords;
the mixed index structure is used for organizing text objects in the road network information and storing the text objects; obtaining an object set according to the mixed index structure and the issuing list of the query keywords;
calculating the shortest path between any two text objects in the object set and the number of the text objects on the shortest path; determining the mutual reachable distance of any two text objects according to the shortest path and the number of the text objects on the shortest path; establishing a minimum spanning tree according to the mutual reachable distance between any two text objects, and storing the minimum spanning tree to a queue after processing; taking the value in the queue as a retrieval result;
forming a subgraph according to the mutual reachable distance of any two text objects by adopting a k-order triangulation structure, and further establishing a minimum spanning tree;
the specific process of establishing the minimum spanning tree is as follows:
regarding the minimum spanning tree as a weighted graph, wherein the text object is used as a vertex, and the mutual reachable distance between any two text objects is used as the weight of an edge between any two text objects; reducing edges between any two text objects to be considered for establishing the minimum spanning tree by adopting a k-order triangulation structure; the remaining edges in the weighted graph and the text objects form subgraphs, and a minimum spanning tree is built according to the subgraphs.
2. The spatial text density clustering retrieval method according to claim 1, characterized in that any two text objects are respectively marked as a, b; then the mutual reachable distance of any two text objects is recorded as dmreach-k (a, b); the calculation formula is as follows:
Figure 69801DEST_PATH_IMAGE001
wherein the core k (a) Representing the spatial distance between the text object a and the k nearest neighbor text object; core k (b) Representing the spatial distance between the text object b and the k nearest neighbor text object; d (a, b) represents the road network distance between the text object a and the text object b.
3. The method of claim 1, wherein the set of objects is a set of objects related to the query keyword.
4. The spatial text density clustering retrieval method according to claim 2, characterized in that the minimum spanning tree is compressed, density clusters are extracted, and the density clusters are stored in the queue; and selecting the density clusters in the queue as retrieval results.
5. The spatial text density cluster retrieval method according to claim 4, wherein the density cluster comprises cluster stability and cluster persistence; the stability of the clusters was noted:
Figure 318380DEST_PATH_IMAGE002
(ii) a The cluster persistence is noted as:
Figure 979168DEST_PATH_IMAGE003
Figure 262382DEST_PATH_IMAGE004
the inverse of distance when the node p representing the minimum spanning tree under the current cluster is separated from the current cluster,
Figure 288107DEST_PATH_IMAGE005
representing the reciprocal of distance when the split produces the current cluster, distance representing the size of the corresponding minimum spanning tree edge when the current cluster splits,λ represents the persistence of the cluster, and λ is the inverse of distance; the increase of lambda is in direct proportion to the reduction of the cluster, and the cluster is continuously reduced until the cluster disappears or is split into sub-clusters; cluster denotes the current cluster.
6. The spatial text density clustering retrieval method according to claim 5, wherein the process of extracting density clusters is: all edges in the minimum spanning tree are subjected to increasing sequencing, and for each edge, a parallel search set is adopted to combine two subgraphs linked by the edges, so that the minimum spanning tree is compressed, and the minimum spanning tree is compressed and converted into a tree structure; traversing the tree structure from leaf nodes to top, calculating the stability of all clusters in the tree structure, and extracting the cluster with the best stability; when the sum of the stability of the sub-clusters is larger than the stability of the clusters, replacing the stability of the clusters with the sum of the stability of the sub-clusters; otherwise, merging all the sub-clusters; and when traversing to the root node of the tree structure, taking the extracted cluster as a density cluster.
7. The spatial text density clustering retrieval method according to claim 6, characterized in that the density clusters selected in the queue are screened by a cost function, and the cost function is recorded as: cost; the cost function calculation formula is as follows:
Figure 23982DEST_PATH_IMAGE006
wherein, α ∈ (0, 1)]Indicates a user preference, tr q.ψ (R) represents the maximum text relevance value of the text object in the density cluster.
CN202210704570.3A 2022-06-21 2022-06-21 Spatial text density clustering retrieval method Active CN114791942B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210704570.3A CN114791942B (en) 2022-06-21 2022-06-21 Spatial text density clustering retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210704570.3A CN114791942B (en) 2022-06-21 2022-06-21 Spatial text density clustering retrieval method

Publications (2)

Publication Number Publication Date
CN114791942A CN114791942A (en) 2022-07-26
CN114791942B true CN114791942B (en) 2022-09-20

Family

ID=82463538

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210704570.3A Active CN114791942B (en) 2022-06-21 2022-06-21 Spatial text density clustering retrieval method

Country Status (1)

Country Link
CN (1) CN114791942B (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2005246368A1 (en) * 2004-05-19 2005-12-01 Metacarta, Inc. Systems and methods of geographical text indexing
US9721157B2 (en) * 2006-08-04 2017-08-01 Nokia Technologies Oy Systems and methods for obtaining and using information from map images
CN104376112B (en) * 2014-11-27 2018-09-14 苏州大学 A kind of method of road cyberspace key search

Also Published As

Publication number Publication date
CN114791942A (en) 2022-07-26

Similar Documents

Publication Publication Date Title
Rocha-Junior et al. Top-k spatial keyword queries on road networks
Ester et al. Clustering for mining in large spatial databases
JP6183376B2 (en) Index generation apparatus and method, search apparatus, and search method
CN106095920B (en) Distributed index method towards extensive High dimensional space data
Demiryurek et al. Indexing network voronoi diagrams
CN111639075B (en) Non-relational database vector data management method based on flattened R tree
Chen et al. Metric similarity joins using MapReduce
CN111813778B (en) Approximate keyword storage and query method for large-scale road network data
CN103500165B (en) A kind of combination cluster and the high-dimensional vector quantity search method of double key value
CN101266607A (en) High dimension data index method based on maximum clearance space mappings
Zheng et al. Searching activity trajectory with keywords
CN114791942B (en) Spatial text density clustering retrieval method
Desai et al. Issues and challenges in big graph modelling for smart city: an extensive survey
He et al. Imagerank: spectral techniques for structural analysis of image database
CN110297874B (en) Multi-scale road network skyline query method based on Voronoi
CN113407669A (en) Semantic track query method based on activity influence
Srividhya et al. Comparative analysis of r-tree and r-tree in spatial database
Lin Efficient and compact indexing structure for processing of spatial queries in line-based databases
CN110928968B (en) Two-dimensional geographic space big data storage and query computer medium
CN107463673A (en) The expansion searching algorithm of track inquiry based on interest region
CN107463672A (en) Expansion search extension algorithm based on the track inquiry with sequence interest region
Wang et al. A hyperplane based indexing technique for high-dimensional data
Planas et al. MeTree: A Metric Spatial Index
Dohnal et al. Similarity searching: Towards bulk-loading peer-to-peer networks
Luo et al. Accelerate data retrieval by multi-dimensional indexing in switch-centric data centers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant