CN110263108B - Keyword Skyline fuzzy query method and system based on road network - Google Patents

Keyword Skyline fuzzy query method and system based on road network Download PDF

Info

Publication number
CN110263108B
CN110263108B CN201910388590.2A CN201910388590A CN110263108B CN 110263108 B CN110263108 B CN 110263108B CN 201910388590 A CN201910388590 A CN 201910388590A CN 110263108 B CN110263108 B CN 110263108B
Authority
CN
China
Prior art keywords
node
keyword
interest
keywords
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910388590.2A
Other languages
Chinese (zh)
Other versions
CN110263108A (en
Inventor
秦小麟
李星罗
王宁
鲍斌国
张彤
陈骏岭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN201910388590.2A priority Critical patent/CN110263108B/en
Publication of CN110263108A publication Critical patent/CN110263108A/en
Application granted granted Critical
Publication of CN110263108B publication Critical patent/CN110263108B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/325Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Remote Sensing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a keyword Skyline fuzzy query method and a system based on a road network, wherein the method comprises the following steps: firstly, constructing a KR-Tree index, then converting keywords input by a user into a triplet form < L, R, T > which can be identified by a computer, then calling the KR-Tree index, quickly searching a database according to a user query condition, and finally returning a search result to the user; the system comprises: the system comprises an index construction module for constructing a KR-Tree index, an input module for inputting keywords, a conversion module for converting the keywords into a triplet form, a retrieval module for retrieving a database and an output module for displaying retrieval results. The invention fully meets the preference requirement of the inquiring user, increases the fault tolerance of the inquiry, and improves the inquiring efficiency by improving the pruning efficiency of irrelevant nodes in the inquiring process.

Description

Keyword Skyline fuzzy query method and system based on road network
Technical Field
The invention belongs to the technical field of space database query, and particularly relates to a keyword Skyline fuzzy query method and system based on a road network.
Background
With the rapid development of GPS positioning technology, the popularity of wireless terminal devices, location-based services and applications have generated vast amounts of spatial text data, such as: the method includes the steps that merchant information (including spatial position information of merchants and descriptive label information of merchants) applied by mobile phones is starved, and messages (including spatial geographic position information and text keyword information when the users issue messages) issued by users on the newwave microblogs. Aiming at massive space text data, aiming at multi-preference query proposed by a user, how to simultaneously select a group of relatively better results in multiple dimensions is integrated into a hot spot of current research.
Skyline queries are often used to solve multi-objective decision-making problems. For the object set in the multi-dimensional dataset, if the attribute of the object A in all latitudes is not weaker than that of the object B, and the attribute of the object A in at least one latitude is better than that of the object B, the object A is said to dominate the object B. While the result set of Skyline queries is a set of objects that are not subject to any object. The choice of whether to govern the choice of objects in the computing process is often determined in practice by the preferences of the querying user.
The existing space keyword Skyline query method takes the shortest path between the Point of interest (Point of interest) and the query Point as a measurement distance, but as the complexity of the road network environment increases, the time overhead of road network distance calculation increases, so that the query efficiency decreases sharply. The user may input the keyword "starbucks" as "starbucks" during the actual query process, so that the user may not obtain the desired query result. However, in the conventional solution, the similarity filtering between the keywords needs to rely on a uniform threshold value for filtering, and for the query keywords with different lengths, it is difficult to perform similarity measurement through a uniform threshold value; secondly, the traditional method mainly aims at solving the problem of fuzzy matching of single keywords, and has low support degree for solving the problem of fuzzy matching of multiple keywords.
The patent applications related to this document are as follows:
[1] ciphertext-based multi-keyword fuzzy query method under cloud environment (application date: 2018.05.23, publication number: CN 108710698A).
[2] A hybrid spatial index mechanism (date of application: 2017.10.12, publication number: CN108052514 a) that handles geographical text Skyline queries.
[3] Skyline query method based on space time sequence data stream application (application date: 2016.12.14, publication number: CN 106708989A).
Disclosure of Invention
The invention aims to: the invention provides a keyword Skyline fuzzy query method and a keyword Skyline fuzzy query system based on a road network, which are used for solving the problem that in the prior art, the fuzzy matching efficiency is reduced due to the fact that keywords with different lengths are input in error.
The technical scheme is as follows: the invention provides a keyword Skyline fuzzy query method based on a road network, which comprises the following steps:
step 1, constructing a corresponding KR-Tree index in a memory aiming at a database stored in a disk;
step 2, converting keywords input by a user into a triplet form < L, R, T > which can be identified by a computer, wherein L is the spatial position of the user, R is the radius of a query area of the user, and T is a keyword set input by the user;
step 3, calling a KR-Tree index, and searching a database by using the KR-Tree index according to the depth priority principle and < L, R, T >;
and step 4, returning the search result to the user after the search is finished.
Further, the specific method for constructing the KR-Tree index is as follows:
step A, estimating the size of a required memory space according to the number of the interesting points in the database, and applying for the memory space with the corresponding size to a computer;
step B, initializing an index head in a starting section of the memory space so as to generate a root node, and starting to access leaf nodes under the root node from top to bottom;
step C, traversing all the interest points in the database, sequentially inserting the space coordinates of all the interest points into leaf nodes as key values, and introducing father nodes if necessary, thereby completing the construction of a KR-Tree index frame;
step D, traversing the keywords held by all the interest points, and inserting the keywords held by all the interest points into specific positions of AK-Table indexes of the corresponding leaf nodes, so as to construct the AK-Table indexes of the leaf nodes; the corresponding leaf nodes are the leaf nodes where the coordinates of the interest points of a certain keyword are located;
e, after the AK-Table indexes of the leaf nodes are built, building AK-Table indexes of all father nodes;
and F, after the whole KR-Tree index is built, the index in the memory is written into the disk in a blocking way by taking the node as a basic unit.
Further, the specific method in the step C is as follows:
step C1, inserting a key value into a current node, and if the number of the key values in the node exceeds a maximum value F, splitting the node according to a neighbor principle so as to generate two new nodes; if the father node points downwards to the current node, the father node points downwards to generate two new nodes and update the information of the father node, if no father node points downwards to the current node, a new father node is generated to the upper layer, the pointer of the father node points downwards to split to generate two new nodes, and then the information of the father node is updated;
and C2, starting from the root node again, sequentially inserting the key values of the rest interest points into the existing leaf nodes according to the neighbor principle until the key values of all the interest points are sequentially inserted into the leaf nodes.
Further, the specific operation of updating the information of the parent node is as follows: and taking the father node as the current node, and inserting the space region coordinate information respectively represented by the two new nodes into the father node as two key values.
Further, the specific method in the step D is as follows: inputting all keywords held by a current interest point into a Hash function one by one to obtain a Hash value of each keyword in the interest point, inserting the keywords into specific positions of AK-Table indexes of corresponding leaf nodes according to the Hash value of each keyword, recording the ids of the interest point, and if the positions have recorded the same keywords as the keywords in other interest points, newly adding a subsequent linked list node in the positions, and recording the keywords and the ids of the current interest points in the subsequent linked list node; until all keywords in all interest points are inserted into the AK-Table index.
Further, the specific method in the step E is as follows: the key word of each father node is a set of all key words of the next layer node connected with the father node, all key words held by each father node are input into a Hash function one by one to obtain a Hash value of each key word, the Hash value of each key word is inserted into a specific position of an AK-Table index of the father node according to the Hash value of each key word, the id of a position area where the key word is located is recorded, if the position is already recorded, a subsequent linked list node is newly added behind the position, and the key word and the id of the position area where the key word is located are recorded in the subsequent linked list node.
Further, the specific method for searching the database is as follows:
step 3.1, starting to access the indexed nodes from top to bottom by the root node indexed by the KR-Tree according to the query condition;
step 3.2, judging whether the current node is a leaf node, if so, turning to step 3.8, otherwise turning to step 3.3;
step 3.3, judging whether an overlapping area exists in the space of the area where the current node is located and the query area, and if so, turning to step 3.4; otherwise, turning to step 3.6;
step 3.4, calculating whether the text similarity between the set of keywords in the AK-Table index of the current node and T is smaller than or equal to a threshold K; if yes, turning to step 3.5; otherwise, turning to step 3.6;
step 3.5, accessing a subsequent node of the current node, namely a next layer node connected with the current node according to a depth priority principle, and converting the step 3.2;
step 3.6, judging whether the current node has a brother node which is not accessed yet, if so, accessing the brother node and jumping to the step 3.2; otherwise, stopping downward access, returning to the last node, and turning to the step 3.7;
step 3.7, judging whether the current node is a root node, if so, turning to step 3.9; otherwise, turning to step 3.6;
step 3.8, comparing all the interest points in the current leaf node with all the interest points in the candidate set, eliminating the candidate set and the interest points which are governed by other interest points in the current leaf node, and reserving the rest of the interest points in the candidate set to form a new candidate set, and turning to step 3.6 after the comparison is finished;
and 3.9, traversing all leaf nodes meeting the query condition, and taking the interest points in the final candidate set as query results.
Further, the method for calculating the text similarity in the step 3.4 is as follows:
if the user inputs a single keyword, the text similarity is calculated by adopting the following formula:
Figure GDA0004143382500000041
wherein S (t) q ,T o ) Keyword t representing user q input q Set T of all keywords in region o where current node is located o Text similarity of (c); ED (t) q ,t o ) For the key word t q Changing the editing operation into the keyword t after the editing operation of adding, deleting and modifying o Is the least operand of (1); w (t) q ) For keyword t q Weight value of (2); max is the weight value of the keyword with the largest weight value in the database; wherein W (t) q )=TF(t q ,T)*IDF(t q ,U);TF(t q T) represents the keyword T q Frequency of occurrence in keyword set T, IDF (T q U) is the inverse document frequency, representing the keyword t q Reciprocal of frequency of occurrence in all points of interest of the database;
if the user inputs multiple keywords, the text similarity is calculated by adopting the following formula:
Figure GDA0004143382500000042
where |T| represents the number of query keywords.
Further, the specific method for comparing the dominant relations in the step 2.8 is as follows: if the point of interest i is not emptyThe inter-attribute is not weaker than the interest point j, and the text space distance of the interest point i is closer to the user than the interest point j, so that the interest point i dominates the interest point j; text space distance D of specific user and interest point t The calculation method of (q, i) is as follows, wherein the interest points i and j are the interest points in the leaf nodes:
D t (q,i)=D r (q,i)/S
wherein D is r (q, i) represents the road network distance from the user q to the queried interest point i, S is the keyword or keyword set input by the user and the keyword set T held by the interest point i i Text similarity between them, s=s (t if a single keyword is input by the user q ,T i ) The method comprises the steps of carrying out a first treatment on the surface of the If the user inputs multiple keywords then s=s (T, T i )。
Further, the method comprises the steps of: the device comprises an index construction module, an input module, a conversion module, a retrieval module and an output module;
the index construction module is used for sequentially inserting coordinates of the interest points and keywords held by the interest points into leaf nodes of the index in the memory so as to construct a KR-Tree index;
the input module is used for inputting keywords by a user and transmitting the input keywords to the conversion module;
the conversion module is used for converting the keywords input by the user into a form which can be identified by a computer and transmitting the keywords to the retrieval module;
the retrieval module utilizes the KR-Tree index constructed by the index construction module to retrieve based on keywords input by a user, and transmits retrieval results to the output module;
the output module is used for displaying the search result.
The beneficial effects are that:
(1) The invention provides a space text index structure KR-Tree, which stores space information and text information in an index node at the same time. In the space region query process, query keyword information can be utilized to efficiently prune the query region, so that the Skyline query efficiency is further improved.
(2) Aiming at the problem that error input possibly exists in the user query process, the invention provides a keyword similarity measurement scheme based on the edit distance, and the TF-IDF model is utilized to endow each existing keyword with initialization weight, so that the measurement value is more in accordance with user preference, and the query fault tolerance is increased.
Drawings
FIG. 1 is an overall flow chart of the present invention;
FIG. 2 is a flow chart of comparing dominant relationships between points of interest according to the present invention;
FIG. 3 is a schematic diagram of a KR-Tree index structure according to the present invention;
fig. 4 is a flow chart of KR-Tree index creation of the present invention.
Detailed Description
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
In order to solve the problem of multi-preference demand of user query, the invention provides a keyword Skyline fuzzy query method based on a road network, and the specific flow of the method is shown in figure 1:
step 1, constructing a corresponding KR-Tree index in a memory aiming at a database stored in a disk;
step 2, converting keywords input by a user into a triplet form < L, R, T > which can be identified by a computer, wherein L is the spatial position of the user, R is the radius of a query area of the user, and T is a keyword set input by the user;
step 3, calling a KR-Tree index, and searching a database by using the KR-Tree index according to the depth priority principle and < L, R, T >;
and step 4, returning the search result to the user after the search is finished.
The specific flow of the step 3 is as follows:
step 3.1: starting to access the indexed nodes from top to bottom by the root node indexed by the KR-Tree according to the query condition;
step 3.2: judging whether the current node is a leaf node, if so, turning to step 3.8, otherwise turning to step 3.3;
step 3.3: judging whether an overlapping area exists between the area of the current node and the query area space, and if so, turning to step 3.4; otherwise, turning to step 3.6;
step 3.4: calculating whether the text similarity between a set of keywords in an AK-Table index of a current node and T is smaller than or equal to a threshold K; if yes, turning to step 3.5; otherwise, turning to step 3.6;
step 3.5: accessing a subsequent node of the current node, namely a next layer node connected with the current node according to a depth priority principle, and converting the next layer node into a step 3.2;
step 3.6: judging whether the current node has a brother node which is not accessed yet, if so, accessing the brother node and jumping to the step 3.2; otherwise, stopping downward access, returning to the last node, and turning to the step 3.7;
step 3.7: judging whether the current node is a root node or not, if so, turning to step 3.9; otherwise, turning to step 3.6;
step 3.8: comparing all the interest points in the current leaf node with all the interest points in the candidate set, eliminating the candidate set and the interest points which are governed by other interest points in the current leaf node, and reserving the rest of the interest points in the candidate set to form a new candidate set, and turning to step 3.6 after the comparison is finished;
step 3.9: all leaf nodes meeting the query condition are traversed, and the interest points in the candidate set finally obtained are used as query results.
The method for calculating the text similarity in the step 3.4 comprises the following steps:
if the user inputs a single keyword, the text similarity is calculated by adopting the following formula:
Figure GDA0004143382500000071
wherein S (t) q ,T o ) Keyword t representing user q input q With the current nodeSet T of all keywords in region o o Text similarity of (c); ED (t) q ,t o ) For the key word t q Editing operations such as adding, deleting and modifying are performed to become a keyword t o Is the least operand of (1); w (t) q ) For keyword t q Weight value of (2); max is the weight value of the keyword with the largest weight value in the database; wherein W (t) q )=TF(t q ,T)*IDF(t q ,U);TF(t q T) represents the keyword T q Frequency of occurrence in keyword set T, IDF (T q U) is the inverse document frequency, representing the keyword t q Reciprocal of frequency of occurrence in all points of interest of the database;
if the user inputs multiple keywords, the text similarity is calculated by adopting the following formula:
Figure GDA0004143382500000072
where |T| represents the number of query keywords.
The invention mainly provides a keyword weight model based on a TF-IDF model, a fuzzy keyword measurement method is realized by using the model, and a space text dominant relation calculation method is further provided, and the method flow is shown in figure 2 and comprises the following key steps:
step S1, aiming at a keyword t input by a user q Calculating the weight W (t) of each keyword according to the TF-IDF model q ):
Step S2, calculate the edit distance ED (t q ,t i ),t i A certain keyword in the interest point; t is t i ∈T i ,T i A set of keywords held for point of interest i;
step S3, calculating text similarity S between keywords or keyword sets input by a user and all the keyword sets held by the interest points i (the interest points are all in leaf nodes); if the user inputs a single keyword s=s (t q ,T i ) If the user inputs multiple keywords then s=s (T, T i );
Step S4, calculating the space text distance from the user q to the queried interest point i:
D t (q,i)=D r (q,i)/S
wherein D is r (q, i) represents the road network distance of the querying user q to the queried point of interest i.
Step S5, space text dominance judgment is carried out on the interest points in the current leaf nodes and the interest points in the candidate set, wherein the judgment rule is as follows:
if point of interest i is not weaker than point of interest j in non-spatial properties (other properties than space, such as good scoring by the merchant (point of interest) in hungry), and the text spatial distance of point of interest i is closer to the user than point of interest j, point of interest i dominates point of interest j.
The invention provides an efficient space text index structure KR-Tree by combining with an IR-Tree index, which can effectively improve the fuzzy matching of keywords and pruning efficiency of a space region irrelevant to query, wherein the construction process of the index is shown in figure 4:
and step A, estimating the size of the required memory space according to the number of the interesting points in the database, and applying for the memory space with the corresponding size to the computer.
And B, initializing an index head in a starting section of the memory space, and generating a root node.
And C, traversing the coordinates of all the interest points in the database, sequentially inserting the space coordinates of all the interest points into leaf nodes as key values, and introducing parent nodes if necessary, thereby completing the construction of the KR-Tree index frame.
The specific method comprises the following steps: inserting the key value into the current node, if the number of the key values in the node exceeds the maximum value F, splitting the node according to the neighbor principle, thereby generating two new nodes; if the father node points downwards to the current node, the father node points downwards to the two generated new nodes, and the information of the father node is updated, if the father node does not point downwards to the current node, a new father node is generated at the upper layer, the pointer of the father node points downwards to split to generate two new nodes, and the information of the father node is updated;
starting from the root node again, sequentially inserting the key values of the rest interest points into the existing leaf nodes according to the neighbor principle until the key values of all the interest points are sequentially inserted into the leaf nodes;
the specific operation of updating the parent node information is as follows: and taking the father node as the current node, and inserting the space region coordinate information respectively represented by the two new nodes into the father node as two key values.
And D, traversing the keyword information held by the current interest point to construct AK-tables of the leaf nodes, as shown in figure 3.
And taking the key words held by the current interest points as key values, obtaining a specific Hash value through a Hash function, and inserting the specific Hash value into the position corresponding to the AK-Table index according to the Hash value. If the current node stores the record, a next linked list node is added, and the record information of the node is updated. The node record is the id of the current interest point and is used for quickly indexing the record during inquiry.
And E, if all the interest points are inserted, constructing a non-leaf node related AK-Table from bottom to top.
The specific method comprises the following steps: the key word of each father node is a set of all related words of the next layer node connected with the father node, all the key words held by each father node are sequentially input into a Hash function to obtain a Hash value of each key word, the Hash value of each key word is inserted into a specific position of an AK-Table index of the father node according to the Hash value of each key word, the id of a position area where the key word is located is recorded, if the position is already recorded, a subsequent linked list node is newly added behind the position, and the key word and the id of the position area where the key word is located are recorded in the subsequent linked list node.
And F, after the whole KR-Tree index is built, the index in the current memory is written into the disk in a blocking way by taking the node as a basic unit.
A keyword Skyline fuzzy query system based on a road network, the system comprising: the device comprises an index construction module, an input module, a conversion module, a retrieval module and an output module;
the index construction module is used for sequentially inserting coordinates of the interest points and keywords held by the interest points into leaf nodes of the index in the memory so as to construct a KR-Tree index;
the input module is used for inputting keywords to a user; transmitting the input keywords to a conversion module;
the conversion module is used for converting keywords input by a user into a form which can be identified by the system; and transmitted to the retrieval module
The retrieval module utilizes the index constructed by the index construction module to retrieve based on the keywords input by the user; and transmitting the search result to an output module;
and the output module outputs and is used for displaying the search result.
The invention aims to solve the problems of storage index of a space text data set and multi-preference query of users, and provides a space text dominance model by combining space attributes and text attributes, so that the preference requirements of querying users are fully met. Aiming at the phenomenon that input errors possibly occur in the query process of a user, the method for fuzzy matching of the keywords is provided to increase the fault tolerance rate of the query. Based on the characteristics of the IR-Tree index, a space text index structure KR-Tree is provided, and the pruning efficiency of irrelevant nodes in the query process can be improved by utilizing text information in the indexed nodes, so that the query efficiency is improved. The method is widely applied to application scenes related to Skyline query of the road network.
In addition, the specific features described in the above embodiments may be combined in any suitable manner without contradiction. The various possible combinations of the invention are not described in detail in order to avoid unnecessary repetition.

Claims (8)

1. The keyword Skyline fuzzy query method based on the road network is characterized by comprising the following steps of:
step 1, constructing a corresponding KR-Tree index in a memory aiming at a database stored in a disk;
step 2, converting keywords input by a user into a triplet form < L, R, T > which can be identified by a computer, wherein L is the spatial position of the user, R is the radius of a query area of the user, and T is a keyword set input by the user;
step 3, calling a KR-Tree index, and searching a database by using the KR-Tree index according to the depth priority principle and < L, R, T >;
step 4, after the search is finished, returning a search result to the user;
the specific method for constructing the corresponding KR-Tree index is as follows:
step A, estimating the size of a required memory space according to the number of the interesting points in the database, and applying for the memory space with the corresponding size to a computer;
step B, initializing an index head in a starting section of the memory space so as to generate a root node, and starting to access leaf nodes under the root node from top to bottom;
step C, traversing all the interest points in the database, sequentially inserting the space coordinates of all the interest points into leaf nodes as key values, and introducing parent nodes, thereby completing the construction of a KR-Tree index frame;
step D, traversing the keywords held by all the interest points, and inserting the keywords held by all the interest points into the AK-Table indexes of the corresponding leaf nodes, so as to construct the AK-Table indexes of the leaf nodes; the corresponding leaf nodes are the leaf nodes where the coordinates of the interest points of a certain keyword are located;
e, after the AK-Table indexes of the leaf nodes are built, building AK-Table indexes of all father nodes;
and F, after the whole KR-Tree index is built, the index in the memory is written into the disk in a blocking way by taking the node as a basic unit.
2. The method according to claim 1, wherein the specific method of step C is:
step C1, inserting a key value into a current node, and if the number of the key values in the node exceeds a maximum value F, splitting the node according to a neighbor principle so as to generate two new nodes; if the father node points downwards to the current node, the father node points downwards to generate two new nodes and update the information of the father node, if no father node points downwards to the current node, a new father node is generated to the upper layer, the pointer of the father node points downwards to split to generate two new nodes, and then the information of the father node is updated;
and C2, starting from the root node again, sequentially inserting the key values of the rest interest points into the existing leaf nodes according to the neighbor principle until the key values of all the interest points are sequentially inserted into the leaf nodes.
3. The method according to claim 2, wherein the specific operation of updating the information of the parent node is: and taking the father node as the current node, and inserting the space region coordinate information respectively represented by the two new nodes into the father node as two key values.
4. The method according to claim 1, wherein the specific method of step D is: inputting all keywords held by a current interest point into a Hash function one by one to obtain a Hash value of each keyword in the interest point, inserting the keywords into AK-Table indexes of corresponding leaf nodes according to the Hash value of each keyword, recording the id of the interest point, and if the position has recorded the same keywords as the keywords in other interest points, newly adding a subsequent linked list node in the position, and recording the keywords and the id of the current interest point in the subsequent linked list node; until all keywords in all interest points are inserted into the AK-Table index.
5. The method according to claim 1, wherein the specific method of step E is: the key words of each father node are the set of all key words of the next layer node connected with the father node, all key words held by each father node are input into a Hash function one by one to obtain a Hash value of each key word, the Hash value of each key word is inserted into an AK-Table index of the father node according to the Hash value of each key word, the id of the position area where the key word is located is recorded, if the position is already recorded, a subsequent linked list node is newly added behind the position, and the key words and the id of the position area where the key word is located are recorded in the subsequent linked list node.
6. The method according to claim 1, wherein the specific method for searching the database is as follows:
step 3.1, starting to access the indexed nodes from top to bottom by the root node indexed by the KR-Tree according to the query condition;
step 3.2, judging whether the current node is a leaf node, if so, turning to step 3.8, otherwise turning to step 3.3;
step 3.3, judging whether an overlapping area exists in the space of the area where the current node is located and the query area, and if so, turning to step 3.4; otherwise, turning to step 3.6;
step 3.4, calculating whether the text similarity between the set of keywords in the AK-Table index of the current node and T is smaller than or equal to a threshold K; if yes, turning to step 3.5; otherwise, turning to step 3.6;
step 3.5, accessing a subsequent node of the current node, namely a next layer node connected with the current node according to a depth priority principle, and converting the step 3.2;
step 3.6, judging whether the current node has a brother node which is not accessed yet, if so, accessing the brother node and jumping to the step 3.2; otherwise, stopping downward access, returning to the last node, and turning to the step 3.7;
step 3.7, judging whether the current node is a root node, if so, turning to step 3.9; otherwise, turning to step 3.6;
step 3.8, comparing all the interest points in the current leaf node with all the interest points in the candidate set, eliminating the candidate set and the interest points which are governed by other interest points in the current leaf node, and reserving the rest of the interest points in the candidate set to form a new candidate set, and turning to step 3.6 after the comparison is finished;
and 3.9, traversing all leaf nodes meeting the query condition, and taking the interest points in the final candidate set as query results.
7. The method of claim 6, wherein the method of calculating text similarity in step 3.4 is as follows:
if the user inputs a single keyword, the text similarity is calculated by adopting the following formula:
Figure QLYQS_1
wherein S (t) q ,T o ) Keyword t representing user q input q Set T of all keywords in region o where current node is located o Text similarity of (c); ED (t) q ,t o ) For the key word t q Changing the editing operation into the keyword t after the editing operation of adding, deleting and modifying o Is the least operand of (1); w (t) q ) For keyword t q Weight value of (2); max is the weight value of the keyword with the largest weight value in the database; wherein W (t) q )=TF(t q ,T)*IDF(t q ,U);TF(t q T) represents the keyword T q Frequency of occurrence in keyword set T, IDF (T q U) is the inverse document frequency, representing the keyword t q Reciprocal of frequency of occurrence in all points of interest of the database; u represents all points of interest of the database;
if the user inputs multiple keywords, the text similarity is calculated by adopting the following formula:
Figure QLYQS_2
where |T| represents the number of query keywords.
8. According to claim 7The method is characterized in that the specific method for comparing the dominant relations in the step 3.8 is as follows: if the interest point i is not weaker than the interest point j in the non-spatial attribute, and the text spatial distance of the interest point i is closer to the user than the interest point j, the interest point i dominates the interest point j; text space distance D of specific user and interest point t The calculation method of (q, i) is as follows, wherein the interest points i and j are the interest points in the leaf nodes:
D t (q,i)=D r (q,i)/S
wherein D is r (q, i) represents the road network distance from the user q to the queried interest point i, S is the keyword or keyword set input by the user and the keyword set T held by the interest point i i Text similarity between them, s=s (t if a single keyword is input by the user q ,T i ) The method comprises the steps of carrying out a first treatment on the surface of the If the user inputs multiple keywords then s=s (T, T i )。
CN201910388590.2A 2019-05-10 2019-05-10 Keyword Skyline fuzzy query method and system based on road network Active CN110263108B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910388590.2A CN110263108B (en) 2019-05-10 2019-05-10 Keyword Skyline fuzzy query method and system based on road network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910388590.2A CN110263108B (en) 2019-05-10 2019-05-10 Keyword Skyline fuzzy query method and system based on road network

Publications (2)

Publication Number Publication Date
CN110263108A CN110263108A (en) 2019-09-20
CN110263108B true CN110263108B (en) 2023-07-11

Family

ID=67913003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910388590.2A Active CN110263108B (en) 2019-05-10 2019-05-10 Keyword Skyline fuzzy query method and system based on road network

Country Status (1)

Country Link
CN (1) CN110263108B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883272B (en) * 2021-03-16 2022-04-29 山东大学 Method for determining recommended object
CN114064843B (en) * 2022-01-11 2022-05-17 深圳大学 Method, device and equipment for querying interplanetary line position nodes in RDF data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544291A (en) * 2013-10-29 2014-01-29 东北林业大学 Mobile object continuous k-nearest neighbor (CKNN) query method based on road based road networks tree (RRN-Tree) in road network
CN107633024A (en) * 2017-08-30 2018-01-26 清华大学 The method for fast searching of multidimensional property optimum point group
CN108052514A (en) * 2017-10-12 2018-05-18 南京航空航天大学 A kind of blending space Indexing Mechanism for handling geographical text Skyline inquiries
CN108733803A (en) * 2018-05-18 2018-11-02 电子科技大学 A kind of Multi-User Dimension keyword query method under road network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8620577B2 (en) * 2011-12-21 2013-12-31 Navteq B.V. System and method for searching for points of interest along a route

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544291A (en) * 2013-10-29 2014-01-29 东北林业大学 Mobile object continuous k-nearest neighbor (CKNN) query method based on road based road networks tree (RRN-Tree) in road network
CN107633024A (en) * 2017-08-30 2018-01-26 清华大学 The method for fast searching of multidimensional property optimum point group
CN108052514A (en) * 2017-10-12 2018-05-18 南京航空航天大学 A kind of blending space Indexing Mechanism for handling geographical text Skyline inquiries
CN108733803A (en) * 2018-05-18 2018-11-02 电子科技大学 A kind of Multi-User Dimension keyword query method under road network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KSTR: Keyword-Aware Skyline Travel Route Recommendation;Yu-Ting Wen等;《2015 IEEE International Conference on Data Mining》;20160107;第1-10页 *
多用户空间数据查询算法研究;段晓冉;《中国优秀硕士学位论文全文数据库信息科技辑》;20181015;第I138-1049页 *

Also Published As

Publication number Publication date
CN110263108A (en) 2019-09-20

Similar Documents

Publication Publication Date Title
CN105868411B (en) A kind of non-relational and relevant database integration data querying method and system
JP4856627B2 (en) Partial query caching
US7634465B2 (en) Indexing and caching strategy for local queries
CN106933833B (en) Method for quickly querying position information based on spatial index technology
CN110059264B (en) Site retrieval method, equipment and computer storage medium based on knowledge graph
CN104391908B (en) Multiple key indexing means based on local sensitivity Hash on a kind of figure
CN106897374B (en) Personalized recommendation method based on track big data nearest neighbor query
EP2788896B1 (en) Fuzzy full text search
CN106874425B (en) Storm-based real-time keyword approximate search algorithm
CN107590123A (en) Vehicle-mounted middle place context reference resolution method and device
CN110928882B (en) Memory database indexing method and system based on improved red black tree
CN110263108B (en) Keyword Skyline fuzzy query method and system based on road network
CN101256579A (en) Method for inquesting data organization in database
CN109815232A (en) A kind of method and system of retrieval, the data processing of the data rank using binary search tree
JP2020123320A (en) Method, apparatus, device and storage medium for managing index
CN111078952B (en) Cross-modal variable-length hash retrieval method based on hierarchical structure
CN110334290B (en) MF-Octree-based spatio-temporal data rapid retrieval method
CN113704248B (en) Block chain query optimization method based on external index
Abbasifard et al. Efficient indexing for past and current position of moving objects on road networks
CN108241709A (en) A kind of data integrating method, device and system
CN107341221B (en) Index structure establishing and associated retrieving method, device, equipment and storage medium
CN110347676B (en) Uncertainty tense data management and query method based on relation R tree
CN111666302A (en) User ranking query method, device, equipment and storage medium
CN107291875B (en) Metadata organization management method and system based on metadata graph
CN113806376B (en) Index construction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant