CN107145526B - Reverse-nearest neighbor query processing method for geographic social keywords under road network - Google Patents

Reverse-nearest neighbor query processing method for geographic social keywords under road network Download PDF

Info

Publication number
CN107145526B
CN107145526B CN201710244072.4A CN201710244072A CN107145526B CN 107145526 B CN107145526 B CN 107145526B CN 201710244072 A CN201710244072 A CN 201710244072A CN 107145526 B CN107145526 B CN 107145526B
Authority
CN
China
Prior art keywords
user
users
count table
road network
social
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710244072.4A
Other languages
Chinese (zh)
Other versions
CN107145526A (en
Inventor
高云君
赵靖文
陈刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201710244072.4A priority Critical patent/CN107145526B/en
Publication of CN107145526A publication Critical patent/CN107145526A/en
Application granted granted Critical
Publication of CN107145526B publication Critical patent/CN107145526B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Remote Sensing (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a reverse nearest neighbor query processing method for a geographic social keyword under a road network, which is characterized in that a GIM (geographic information model) tree is used for storing spatial road network, text and social data, and a branch boundary method is used for traversing indexes; when traversing indexes, the invention firstly calculates the minimum similarity count table and the maximum similarity count table of index nodes, then prunes by using the minimum similarity count table and the maximum similarity count table, and accelerates the query execution by using a filtering and refining algorithm. The invention combines the prior art of the spatial database, reduces the times of calculating the similarity of the geo-social texts, thereby improving the query performance.

Description

Reverse-nearest neighbor query processing method for geographic social keywords under road network
Technical Field
The invention relates to an indexing and query technology of a spatial database, in particular to a method for processing reverse nearest neighbor query of a social contact keyword under a road network.
Background
Spatial data refers to the sum of application-related geospatial data stored by a geographic information system on a computer physical storage medium for the purpose of storing, managing, and retrieving various geospatial data. Among them, road network spatial data has gained more and more attention as an important component of a spatial database. In order to quickly and effectively access the road network spatial data, expert scholars propose a plurality of road network spatial data indexing methods. At present, the G tree indexing method is the most effective road network spatial data indexing method. The road network is divided into a plurality of sub-images, and the road network distance of each boundary point is calculated in advance, so that the purpose of reducing the shortest path calculation cost is achieved.
Anti-nearest neighbor queries have received extensive attention from the academia due to their important applications in decision support and potential user discovery. In the related research of the reverse nearest neighbor query, the reverse nearest neighbor query of the spatial key words under the road network is used by people to find an interest set. An interest set refers to a group of people interested in a certain point of interest. However, the spatial keyword anti-nearest neighbor query under the road network only considers text and spatial information and finds those people who are most likely to become potential users.
As social networks evolve, the volume of social network data is larger and larger. In a social network, users with social connections may have similar interests, and thus such data may provide support for predictions and recommendations. Based on this, people have studied geo-social keyword queries. Given a geo-social keyword query and the user submitting the query, the query returns the closest spatially most similar textual interest point, and the greatest number of times the user's friends visited the interest point.
At present, a mature solution is provided for spatial keyword reverse nearest neighbor query and geographic social keyword query under a road network. In some application scenarios, however, the anti-nearest neighbor query not only considers spatial and textual information, but also considers social information between users and check-in information of the users to the points of interest. However, the existing query processing methods have not been able to effectively solve the above query problem.
Disclosure of Invention
The invention overcomes the problem that the prior art can not effectively process reverse nearest neighbor query of geographic social keywords under a road network, and provides a reverse nearest neighbor query processing method of the geographic social keywords under the road network.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps: a reverse nearest neighbor query processing method for a geographic social keyword under a road network comprises the following steps:
step (1): collecting users and interest points, and constructing a GIM tree index structure for the users and the interest points;
step (2): calculating a minimum similarity count table and a maximum similarity count table of the geo-social keywords of the nodes of each GIM tree index structure;
and (3): filtering the users and the interest points collected in the step (1) by using a pruning algorithm;
and (4): and (4) according to the filtered result in the step (3), rejecting users which do not meet the requirement through a refining algorithm to obtain a final result set.
Further, the constructing step of the GIM tree index structure in the step (1) is as follows: dividing the whole road network into a plurality of subgraphs, and defining road network nodes belonging to the subgraphs as boundary points; calculating the road network distance between all boundary points in advance; each GIM tree index structure node comprises a road network subgraph, an intersection inverted file and two matrixes; the cross-parallel inverted file describes text information between the user and the interest points; the two matrixes are a user check-in matrix and a user social relationship matrix, the user check-in matrix stores check-in times of the users to each interest point, and the user social relationship matrix stores social relationships among the users.
Further, the calculation method of the minimum similarity count table and the maximum similarity count table in step (2) is as follows:
giving a group of users and a group of interest points, and calculating the minimum value and the maximum value of the similarity of the geographic social keywords between the users and the interest points by multiplying the two matrixes of the user check-in matrix and the user social relationship matrix in the step (1); and constructing a minimum similarity count table and a maximum similarity count table of the user by using the minimum value and the maximum value.
Further, the pruning algorithm in the step (3) is specifically as follows:
giving a query point, obtaining the minimum value and the maximum value of the similarity between the query point and the user according to the calculation method in the step (2), and pruning the user by combining the minimum similarity count table and the maximum similarity count table obtained in the step (2), wherein:
1) and if the maximum value of the similarity between the query point and the user set is smaller than the lower bound value of the minimum similarity count table, discarding the group of users.
2) And if the minimum value of the similarity between the query point and the user set is larger than the upper bound value of the maximum similarity count table, inserting the group of users into a final result set.
Further, the filtering process in the step (3) is as follows:
1) initializing a user queue and an interest point queue, putting a user data set of a GIM tree index root node into the user queue, and putting an interest point data set into the interest point queue;
2) initializing a candidate user set and a final result set, and respectively storing users which are not pruned and users confirmed as final results in the currently accessed GIM tree index node;
3) if the user queue is empty, returning a candidate user set and a final result set; otherwise, taking out the first element of the user queue, pruning the child node of the element in the GIM tree index structure by using the pruning algorithm in the step (3), and if the condition can be met, inserting the child node into the final result set; if not pruned, it is inserted into the candidate user set.
Further, the refining algorithm in the step (4) comprises the following specific steps:
1) taking out each user in the candidate user set in the step (3);
2) finding out a set of inquiry results of the geographic social keywords under the road network of the user according to the spatial distance sequence;
3) if the query point is in the result set, inserting the user into a final result set; otherwise, discarding the user;
4) and returning the final result set.
The invention has the beneficial effects that: the invention fully utilizes the existing index technology, reverse nearest neighbor query and space keyword query technology in the spatial database, divides the road network into a plurality of subnets, and calculates the shortest path distance between the subnets in advance, thereby reducing the shortest path calculation cost; designing index structures of a minimum counting table and a maximum counting table, and pruning the subnet; designing a high-efficiency cutting algorithm, thereby greatly reducing the I/O times and the CPU calculation time; the method for calculating the social similarity by using the matrix is provided, so that the calculation cost is reduced; a branch boundary algorithm is provided, so that repeated access to an index structure is avoided, and the query efficiency is improved.
Drawings
FIG. 1 is a flow chart of the steps of the present invention.
Detailed Description
The technical solution of the present invention will be further explained with reference to the accompanying drawings and specific implementation:
as shown in fig. 1, the specific implementation process and the working principle of the present invention are as follows:
step (1): collecting users and interest points, and constructing a GIM tree index structure for the users and the interest points;
step (2): calculating a minimum similarity count table and a maximum similarity count table of the geo-social keywords of the nodes of each GIM tree index structure;
and (3): filtering the users and the interest points collected in the step (1) by using a pruning algorithm;
and (4): and (4) according to the filtered result in the step (3), rejecting users which do not meet the requirement through a refining algorithm to obtain a final result set.
Further, the information of each point of interest in the step (1) includes location information, text information and check-in information, wherein the location information is a geographical coordinate, the text information is a group of keywords, the check-in information is a group of records, and each record includes when a user reaches the point of interest; the user information includes location information, text information and social information, wherein the location information is a current location of the user, the text information is a set of keywords, and the social information is a friendship between the users. All information is stored in the GIM tree index structure. The construction steps of the GIM tree index structure are as follows: dividing the whole road network into a plurality of subgraphs, and defining road network nodes belonging to the subgraphs as boundary points; pre-calculating the road network distance between all boundary points to accelerate the calculation of the shortest path distance; each GIM tree index structure node comprises a road network subgraph, an intersection inverted file and two matrixes; the cross-parallel inverted file describes text information between the user and the interest points; the two matrixes are a user check-in matrix and a user social relationship matrix, the user check-in matrix stores check-in times of the users to each interest point, and the user social relationship matrix stores social relationships among the users.
Further, the calculation method of the minimum similarity count table and the maximum similarity count table in step (2) is as follows: giving a group of users and a group of interest points, and multiplying the two matrixes of the user check-in matrix and the user social relationship matrix in the step (1) to calculate the minimum value and the maximum value of the similarity of the geographic social keywords between the users and the interest points; and constructing a minimum similarity count table and a maximum similarity count table of the user by using the minimum value and the maximum value. In order to improve the speed of calculating the social distance, the invention provides a matrix-based calculation method, which obtains the social similarity between a group of users and a group of interest points by multiplying a user social relationship matrix and a user check-in matrix.
For example: given 2 GIM Tree nodes N1And N2Taking out the nodes N respectively1User set U in (1)1And node N2Interest point set O in (1)2(ii) a Respectively calculate U1And O2The minimum and maximum of the text similarity, spatial similarity, and social similarity of (a); for user set U1By means of U1Constructing a minimum similarity count table with the minimum value of the similarity of the interest point set, wherein each element in the count table comprises: a set of points of interest Oi,OiNumber of points of interest | OiI, | and U1And OiA minimum similarity value of; similarly, with U1Constructing a maximum similarity count table with the maximum value of the similarity of the interest point set, wherein each element in the count table comprises: a set of points of interest Oi,OiNumber of points of interest | OiI, | and U1And OiThe maximum similarity value of.
Further, the pruning algorithm in the step (3) is specifically as follows:
giving a query point, obtaining the minimum value and the maximum value of the similarity between the query point and the user according to the calculation method in the step (2), and pruning the user by combining the minimum similarity count table and the maximum similarity count table obtained in the step (2), wherein:
1) and if the maximum value of the similarity between the query point and the user set is smaller than the lower bound value of the minimum similarity count table, discarding the group of users.
2) And if the minimum value of the similarity between the query point and the user set is larger than the upper bound value of the maximum similarity count table, inserting the group of users into a final result set.
Further, the filtering process in the step (3) is as follows:
1) initializing a user queue and an interest point queue, putting a user data set of a GIM tree index root node into the user queue, and putting an interest point data set into the interest point queue;
2) initializing a candidate user set and a final result set, and respectively storing users which are not pruned and users confirmed as final results in the currently accessed GIM tree index node;
3) if the user queue is empty, returning a candidate user set and a final result set; otherwise, taking out the first element of the user queue, pruning the child node of the element in the GIM tree index structure by using the pruning algorithm in the step (3), and if the condition can be met, inserting the child node into the final result set; if not pruned, it is inserted into the candidate user set.
Further, the refining algorithm in the step (4) comprises the following specific steps:
1) taking out each user in the candidate user set in the step (3);
2) finding out a set of inquiry results of the geographic social keywords under the road network of the user according to the spatial distance sequence;
3) if the query point is in the result set, inserting the user into a final result set; otherwise, discarding the user;
4) and returning the final result set.

Claims (5)

1. A reverse nearest neighbor query processing method for a geographic social keyword under a road network is characterized by comprising the following steps: the method comprises the following steps:
step (1): collecting users and interest points, and constructing a GIM index structure for the users and the interest points, wherein the construction steps of the GIM index structure are as follows: dividing the whole road network into a plurality of subgraphs, and defining road network nodes belonging to the subgraphs as boundary points; calculating the road network distance between all boundary points in advance; each GIM tree index structure node comprises a road network subgraph, an intersection inverted file and two matrixes; the cross-parallel inverted file describes text information between the user and the interest points; the two matrixes are a user check-in matrix and a user social relationship matrix, the user check-in matrix stores check-in times of users to each interest point, and the user social relationship matrix stores social relationships among the users;
step (2): calculating a minimum similarity count table and a maximum similarity count table of the geo-social keywords of the nodes of each GIM tree index structure;
and (3): filtering the users and the interest points collected in the step (1) by using a pruning algorithm;
and (4): and (4) according to the filtered result in the step (3), rejecting users which do not meet the requirement through a refining algorithm to obtain a final result set.
2. The method of claim 1, wherein the method comprises: the calculation method of the minimum similarity count table and the maximum similarity count table in the step (2) is as follows:
giving a group of users and a group of interest points, and calculating the minimum value and the maximum value of the similarity of the geographic social keywords between the users and the interest points by multiplying the two matrixes, namely the user sign-in matrix and the user social relationship matrix; and constructing a minimum similarity count table and a maximum similarity count table of the user by using the minimum value and the maximum value.
3. The method for processing reverse-nearest-neighbor query of social networking keywords under a road network of claim 2, wherein: the pruning algorithm in the step (3) is as follows:
giving a query point, obtaining the minimum value and the maximum value of the similarity between the query point and the user according to the calculation method in the step (2), and pruning the user by combining the minimum similarity count table and the maximum similarity count table obtained in the step (2), wherein:
1) if the maximum value of the similarity between the query point and the user set is smaller than the lower bound value of the minimum similarity count table, discarding the group of users;
2) and if the minimum value of the similarity between the query point and the user set is larger than the upper bound value of the maximum similarity count table, inserting the group of users into a final result set.
4. The method for processing reverse-nearest-neighbor query of social networking keywords under a road network of claim 3, wherein: the filtering process in the step (3) is as follows:
(3.1) initializing a user queue and an interest point queue, putting a user set of a GIM tree index root node into the user queue, and putting an interest point set into the interest point queue;
(3.2) initializing a candidate user set and a final result set, and respectively storing the users which are not pruned and the users confirmed as the final results in the currently accessed GIM tree index node;
(3.3) if the user queue is empty, returning a candidate user set and a final result set; otherwise, taking out the first element of the user queue, pruning the element in the child node of the GIM tree index structure by using the pruning algorithm in the step (3), and if the condition can be met, inserting the element into the final result set; if not pruned, it is inserted into the candidate user set.
5. The method of claim 4, wherein the method comprises: the refining algorithm in the step (4) comprises the following specific steps:
(4.1) taking out each user in the candidate user set in the step (3);
(4.2) finding out a set of results of the social keyword query under the user's road network in the spatial distance sequence;
(4.3) if the query point is in the result set, inserting the user into the final result set; otherwise, discarding the user;
and (4.4) returning a final result set.
CN201710244072.4A 2017-04-14 2017-04-14 Reverse-nearest neighbor query processing method for geographic social keywords under road network Active CN107145526B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710244072.4A CN107145526B (en) 2017-04-14 2017-04-14 Reverse-nearest neighbor query processing method for geographic social keywords under road network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710244072.4A CN107145526B (en) 2017-04-14 2017-04-14 Reverse-nearest neighbor query processing method for geographic social keywords under road network

Publications (2)

Publication Number Publication Date
CN107145526A CN107145526A (en) 2017-09-08
CN107145526B true CN107145526B (en) 2020-06-05

Family

ID=59774821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710244072.4A Active CN107145526B (en) 2017-04-14 2017-04-14 Reverse-nearest neighbor query processing method for geographic social keywords under road network

Country Status (1)

Country Link
CN (1) CN107145526B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908722B (en) * 2017-11-14 2021-10-12 华东师范大学 Reverse k ranking query method based on distance
CN108733803B (en) * 2018-05-18 2022-04-29 电子科技大学 Multi-user space keyword query method under road network
CN109408738B (en) * 2018-09-10 2021-04-06 中南民族大学 Method and system for querying space entity in traffic network
CN111813778B (en) * 2020-07-08 2024-03-29 安徽工业大学 Approximate keyword storage and query method for large-scale road network data
CN112883272B (en) * 2021-03-16 2022-04-29 山东大学 Method for determining recommended object
CN113868549B (en) * 2021-09-22 2024-05-17 浙江大学 Advertisement putting optimization method and device, electronic equipment and storage medium
CN114780875B (en) * 2022-06-22 2022-09-06 广东省智能机器人研究院 Dynamic group travel planning query method
CN117076726B (en) * 2023-09-14 2024-06-07 上海交通大学 Approximate neighbor searching method, system, medium and device based on ray tracing intersection

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104408117A (en) * 2014-11-26 2015-03-11 浙江大学 Best consumer real-time searching method based on road network continuous aggregation nearest neighbor query
CN103345509B (en) * 2013-07-04 2016-08-10 上海交通大学 Obtain the level partition tree method and system of the most farthest multiple neighbours on road network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345509B (en) * 2013-07-04 2016-08-10 上海交通大学 Obtain the level partition tree method and system of the most farthest multiple neighbours on road network
CN104408117A (en) * 2014-11-26 2015-03-11 浙江大学 Best consumer real-time searching method based on road network continuous aggregation nearest neighbor query

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Finding Top-k Local Users in Geo-Tagged Social;Jinling Jiang 等;《2015 IEEE 31st International Conference on Data Engineering》;20150601;第267-278页 *
Visible Reverse k-Nearest Neighbor Query Processing in Spatial Databases;Yunjun Gao 等;《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》;20090508;第21卷(第9期);第1314-1327页 *

Also Published As

Publication number Publication date
CN107145526A (en) 2017-09-08

Similar Documents

Publication Publication Date Title
CN107145526B (en) Reverse-nearest neighbor query processing method for geographic social keywords under road network
CN104346444B (en) A kind of the best site selection method based on the anti-spatial key inquiry of road network
CN105721279B (en) A kind of the relationship cycle method for digging and system of subscribers to telecommunication network
CN108932347B (en) Spatial keyword query method based on social perception in distributed environment
CN104462190A (en) On-line position prediction method based on mass of space trajectory excavation
CN110059264B (en) Site retrieval method, equipment and computer storage medium based on knowledge graph
CN109977309B (en) Combined interest point query method based on multiple keywords and user preferences
Yuan et al. RSkNN: kNN search on road networks by incorporating social influence
CN102521364B (en) Method for inquiring shortest path between two points on map
CN102253961A (en) Method for querying road network k aggregation nearest neighboring node based on Voronoi graph
CN105719191A (en) System and method of discovering social group having unspecified behavior senses in multi-dimensional space
CN110275929B (en) Candidate road section screening method based on grid segmentation and grid segmentation method
CN103856462A (en) Method and system for managing sessions
CN105550332A (en) Dual-layer index structure based origin graph query method
CN104298669A (en) Person geographic information mining model based on social network
Fu et al. Mining frequent route patterns based on personal trajectory abstraction
CN109614521B (en) Efficient privacy protection sub-graph query processing method
CN104750860B (en) A kind of date storage method of uncertain data
Wang et al. Top-k socially constrained spatial keyword search in large siot networks
CN112765288A (en) Knowledge graph construction method and system and information query method and system
CN116839613A (en) Multi-attribute-oriented dynamic group travel planning method and device
CN103345509A (en) Method and system for obtaining grading partition tree of dual-reverse furthest neighbors on road network
Sui et al. A privacy-preserving compression storage method for large trajectory data in road network
CN114691958A (en) Community retrieval method based on user geographical location diversity
CN113792206A (en) Data processing method and device, computer readable storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant