CN107145526B

CN107145526B - Reverse-nearest neighbor query processing method for geographic social keywords under road network

Info

Publication number: CN107145526B
Application number: CN201710244072.4A
Authority: CN
Inventors: 高云君; 赵靖文; 陈刚
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2017-04-14
Filing date: 2017-04-14
Publication date: 2020-06-05
Anticipated expiration: 2037-04-14
Also published as: CN107145526A

Abstract

The invention discloses a reverse nearest neighbor query processing method for a geographic social keyword under a road network, which is characterized in that a GIM (geographic information model) tree is used for storing spatial road network, text and social data, and a branch boundary method is used for traversing indexes; when traversing indexes, the invention firstly calculates the minimum similarity count table and the maximum similarity count table of index nodes, then prunes by using the minimum similarity count table and the maximum similarity count table, and accelerates the query execution by using a filtering and refining algorithm. The invention combines the prior art of the spatial database, reduces the times of calculating the similarity of the geo-social texts, thereby improving the query performance.

Description

Reverse-nearest neighbor query processing method for geographic social keywords under road network

Technical Field

The invention relates to an indexing and query technology of a spatial database, in particular to a method for processing reverse nearest neighbor query of a social contact keyword under a road network.

Background

Spatial data refers to the sum of application-related geospatial data stored by a geographic information system on a computer physical storage medium for the purpose of storing, managing, and retrieving various geospatial data. Among them, road network spatial data has gained more and more attention as an important component of a spatial database. In order to quickly and effectively access the road network spatial data, expert scholars propose a plurality of road network spatial data indexing methods. At present, the G tree indexing method is the most effective road network spatial data indexing method. The road network is divided into a plurality of sub-images, and the road network distance of each boundary point is calculated in advance, so that the purpose of reducing the shortest path calculation cost is achieved.

Anti-nearest neighbor queries have received extensive attention from the academia due to their important applications in decision support and potential user discovery. In the related research of the reverse nearest neighbor query, the reverse nearest neighbor query of the spatial key words under the road network is used by people to find an interest set. An interest set refers to a group of people interested in a certain point of interest. However, the spatial keyword anti-nearest neighbor query under the road network only considers text and spatial information and finds those people who are most likely to become potential users.

As social networks evolve, the volume of social network data is larger and larger. In a social network, users with social connections may have similar interests, and thus such data may provide support for predictions and recommendations. Based on this, people have studied geo-social keyword queries. Given a geo-social keyword query and the user submitting the query, the query returns the closest spatially most similar textual interest point, and the greatest number of times the user's friends visited the interest point.

At present, a mature solution is provided for spatial keyword reverse nearest neighbor query and geographic social keyword query under a road network. In some application scenarios, however, the anti-nearest neighbor query not only considers spatial and textual information, but also considers social information between users and check-in information of the users to the points of interest. However, the existing query processing methods have not been able to effectively solve the above query problem.

Disclosure of Invention

The invention overcomes the problem that the prior art can not effectively process reverse nearest neighbor query of geographic social keywords under a road network, and provides a reverse nearest neighbor query processing method of the geographic social keywords under the road network.

The technical scheme adopted by the invention for solving the technical problem comprises the following steps: a reverse nearest neighbor query processing method for a geographic social keyword under a road network comprises the following steps:

step (1): collecting users and interest points, and constructing a GIM tree index structure for the users and the interest points;

step (2): calculating a minimum similarity count table and a maximum similarity count table of the geo-social keywords of the nodes of each GIM tree index structure;

and (3): filtering the users and the interest points collected in the step (1) by using a pruning algorithm;

and (4): and (4) according to the filtered result in the step (3), rejecting users which do not meet the requirement through a refining algorithm to obtain a final result set.

Further, the constructing step of the GIM tree index structure in the step (1) is as follows: dividing the whole road network into a plurality of subgraphs, and defining road network nodes belonging to the subgraphs as boundary points; calculating the road network distance between all boundary points in advance; each GIM tree index structure node comprises a road network subgraph, an intersection inverted file and two matrixes; the cross-parallel inverted file describes text information between the user and the interest points; the two matrixes are a user check-in matrix and a user social relationship matrix, the user check-in matrix stores check-in times of the users to each interest point, and the user social relationship matrix stores social relationships among the users.

Further, the calculation method of the minimum similarity count table and the maximum similarity count table in step (2) is as follows:

giving a group of users and a group of interest points, and calculating the minimum value and the maximum value of the similarity of the geographic social keywords between the users and the interest points by multiplying the two matrixes of the user check-in matrix and the user social relationship matrix in the step (1); and constructing a minimum similarity count table and a maximum similarity count table of the user by using the minimum value and the maximum value.

Further, the pruning algorithm in the step (3) is specifically as follows:

giving a query point, obtaining the minimum value and the maximum value of the similarity between the query point and the user according to the calculation method in the step (2), and pruning the user by combining the minimum similarity count table and the maximum similarity count table obtained in the step (2), wherein:

1) and if the maximum value of the similarity between the query point and the user set is smaller than the lower bound value of the minimum similarity count table, discarding the group of users.

2) And if the minimum value of the similarity between the query point and the user set is larger than the upper bound value of the maximum similarity count table, inserting the group of users into a final result set.

Further, the filtering process in the step (3) is as follows:

1) initializing a user queue and an interest point queue, putting a user data set of a GIM tree index root node into the user queue, and putting an interest point data set into the interest point queue;

2) initializing a candidate user set and a final result set, and respectively storing users which are not pruned and users confirmed as final results in the currently accessed GIM tree index node;

3) if the user queue is empty, returning a candidate user set and a final result set; otherwise, taking out the first element of the user queue, pruning the child node of the element in the GIM tree index structure by using the pruning algorithm in the step (3), and if the condition can be met, inserting the child node into the final result set; if not pruned, it is inserted into the candidate user set.

Further, the refining algorithm in the step (4) comprises the following specific steps:

1) taking out each user in the candidate user set in the step (3);

2) finding out a set of inquiry results of the geographic social keywords under the road network of the user according to the spatial distance sequence;

3) if the query point is in the result set, inserting the user into a final result set; otherwise, discarding the user;

4) and returning the final result set.

The invention has the beneficial effects that: the invention fully utilizes the existing index technology, reverse nearest neighbor query and space keyword query technology in the spatial database, divides the road network into a plurality of subnets, and calculates the shortest path distance between the subnets in advance, thereby reducing the shortest path calculation cost; designing index structures of a minimum counting table and a maximum counting table, and pruning the subnet; designing a high-efficiency cutting algorithm, thereby greatly reducing the I/O times and the CPU calculation time; the method for calculating the social similarity by using the matrix is provided, so that the calculation cost is reduced; a branch boundary algorithm is provided, so that repeated access to an index structure is avoided, and the query efficiency is improved.

Drawings

FIG. 1 is a flow chart of the steps of the present invention.

Detailed Description

The technical solution of the present invention will be further explained with reference to the accompanying drawings and specific implementation:

as shown in fig. 1, the specific implementation process and the working principle of the present invention are as follows:

Further, the information of each point of interest in the step (1) includes location information, text information and check-in information, wherein the location information is a geographical coordinate, the text information is a group of keywords, the check-in information is a group of records, and each record includes when a user reaches the point of interest; the user information includes location information, text information and social information, wherein the location information is a current location of the user, the text information is a set of keywords, and the social information is a friendship between the users. All information is stored in the GIM tree index structure. The construction steps of the GIM tree index structure are as follows: dividing the whole road network into a plurality of subgraphs, and defining road network nodes belonging to the subgraphs as boundary points; pre-calculating the road network distance between all boundary points to accelerate the calculation of the shortest path distance; each GIM tree index structure node comprises a road network subgraph, an intersection inverted file and two matrixes; the cross-parallel inverted file describes text information between the user and the interest points; the two matrixes are a user check-in matrix and a user social relationship matrix, the user check-in matrix stores check-in times of the users to each interest point, and the user social relationship matrix stores social relationships among the users.

Further, the calculation method of the minimum similarity count table and the maximum similarity count table in step (2) is as follows: giving a group of users and a group of interest points, and multiplying the two matrixes of the user check-in matrix and the user social relationship matrix in the step (1) to calculate the minimum value and the maximum value of the similarity of the geographic social keywords between the users and the interest points; and constructing a minimum similarity count table and a maximum similarity count table of the user by using the minimum value and the maximum value. In order to improve the speed of calculating the social distance, the invention provides a matrix-based calculation method, which obtains the social similarity between a group of users and a group of interest points by multiplying a user social relationship matrix and a user check-in matrix.

For example: given 2 GIM Tree nodes N₁And N₂Taking out the nodes N respectively₁User set U in (1)₁And node N₂Interest point set O in (1)₂(ii) a Respectively calculate U₁And O₂The minimum and maximum of the text similarity, spatial similarity, and social similarity of (a); for user set U₁By means of U₁Constructing a minimum similarity count table with the minimum value of the similarity of the interest point set, wherein each element in the count table comprises: a set of points of interest O_i，O_iNumber of points of interest | O_iI, | and U₁And O_iA minimum similarity value of; similarly, with U₁Constructing a maximum similarity count table with the maximum value of the similarity of the interest point set, wherein each element in the count table comprises: a set of points of interest O_i，O_iNumber of points of interest | O_iI, | and U₁And O_iThe maximum similarity value of.

Further, the pruning algorithm in the step (3) is specifically as follows:

Further, the filtering process in the step (3) is as follows:

1) taking out each user in the candidate user set in the step (3);

4) and returning the final result set.

Claims

1. A reverse nearest neighbor query processing method for a geographic social keyword under a road network is characterized by comprising the following steps: the method comprises the following steps:

step (1): collecting users and interest points, and constructing a GIM index structure for the users and the interest points, wherein the construction steps of the GIM index structure are as follows: dividing the whole road network into a plurality of subgraphs, and defining road network nodes belonging to the subgraphs as boundary points; calculating the road network distance between all boundary points in advance; each GIM tree index structure node comprises a road network subgraph, an intersection inverted file and two matrixes; the cross-parallel inverted file describes text information between the user and the interest points; the two matrixes are a user check-in matrix and a user social relationship matrix, the user check-in matrix stores check-in times of users to each interest point, and the user social relationship matrix stores social relationships among the users;

2. The method of claim 1, wherein the method comprises: the calculation method of the minimum similarity count table and the maximum similarity count table in the step (2) is as follows:

giving a group of users and a group of interest points, and calculating the minimum value and the maximum value of the similarity of the geographic social keywords between the users and the interest points by multiplying the two matrixes, namely the user sign-in matrix and the user social relationship matrix; and constructing a minimum similarity count table and a maximum similarity count table of the user by using the minimum value and the maximum value.

3. The method for processing reverse-nearest-neighbor query of social networking keywords under a road network of claim 2, wherein: the pruning algorithm in the step (3) is as follows:

1) if the maximum value of the similarity between the query point and the user set is smaller than the lower bound value of the minimum similarity count table, discarding the group of users;

4. The method for processing reverse-nearest-neighbor query of social networking keywords under a road network of claim 3, wherein: the filtering process in the step (3) is as follows:

(3.1) initializing a user queue and an interest point queue, putting a user set of a GIM tree index root node into the user queue, and putting an interest point set into the interest point queue;

(3.2) initializing a candidate user set and a final result set, and respectively storing the users which are not pruned and the users confirmed as the final results in the currently accessed GIM tree index node;

(3.3) if the user queue is empty, returning a candidate user set and a final result set; otherwise, taking out the first element of the user queue, pruning the element in the child node of the GIM tree index structure by using the pruning algorithm in the step (3), and if the condition can be met, inserting the element into the final result set; if not pruned, it is inserted into the candidate user set.

5. The method of claim 4, wherein the method comprises: the refining algorithm in the step (4) comprises the following specific steps:

(4.1) taking out each user in the candidate user set in the step (3);

(4.2) finding out a set of results of the social keyword query under the user's road network in the spatial distance sequence;

(4.3) if the query point is in the result set, inserting the user into the final result set; otherwise, discarding the user;

and (4.4) returning a final result set.