CN110955827B - By using AI 3 Method and system for solving SKQwyy-not problem - Google Patents

By using AI 3 Method and system for solving SKQwyy-not problem Download PDF

Info

Publication number
CN110955827B
CN110955827B CN201911128644.8A CN201911128644A CN110955827B CN 110955827 B CN110955827 B CN 110955827B CN 201911128644 A CN201911128644 A CN 201911128644A CN 110955827 B CN110955827 B CN 110955827B
Authority
CN
China
Prior art keywords
query
node
attribute
leaf node
objects
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911128644.8A
Other languages
Chinese (zh)
Other versions
CN110955827A (en
Inventor
李艳红
冯禹鹤
张望
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South Central Minzu University
Original Assignee
South Central University for Nationalities
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South Central University for Nationalities filed Critical South Central University for Nationalities
Priority to CN201911128644.8A priority Critical patent/CN110955827B/en
Publication of CN110955827A publication Critical patent/CN110955827A/en
Application granted granted Critical
Publication of CN110955827B publication Critical patent/CN110955827B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for using AI 3 The invention discloses a method and a system for solving SKQwyh-not problem, relating to the technical field of space keyword query, wherein the digital attribute of an object is expressed in a Boolean expression form, so that the method is closer to a practical application scene; and design AI 3 Object information is skillfully organized by indexing, and meanwhile, a corresponding query strategy is designed, so that the condition that all missing objects appear in a query result is met by modifying the query q' with the minimum modification cost, and the why-not problem in the space keyword query is solved.

Description

By using AI 3 Method and system for solving SKQwyy-not problem
Technical Field
The invention relates to the technical field of space keyword query, in particular to a method for querying a space keyword by adopting AI (artificial intelligence) 3 A method and system for solving SKQwyy-not problem.
Background
Spatial Key Queries (SKQ) have been proposed and extensively studied as more and more objects are associated with geographic locations and textual descriptions. In real life, objects typically have other digital attributes, such as average price, rate, popularity, etc. It is often impossible or difficult to obtain the results desired by the user if these limiting conditions are not taken into account in the query. Therefore, in order to satisfy the constraints of the querying user on these attributes and the refined query process, the spatial keyword query needs to take the numerical attributes into account.
The present document is primarily directed to top-k enhanced spatial keyword queries. When searching top-k objects, the query firstly searches objects meeting the digital attribute requirement in q query, and then ranks according to the space distance between the query point and the objects and the comprehensive score of text similarity. Fig. 1 shows an example of an enhanced spatial keyword query, and table 1 shows text information and related attribute information of an object.
Table 1: information about objects in FIG. 1
Figure BDA0002277655980000011
Figure BDA0002277655980000021
As shown in FIG. 1, a user initiates a query on the keyword cafe, where the average price is no more than $ 42, the score is higher than 4.3 points, and the popularity is greater than 700. These enhanced requirements can then be expressed by a boolean expression: (avg-price < 42 ^ Rating > 4.3 ^ Popularity > 700). First, object o 3 、o 5 、o 8 Satisfy the above enhanced query requirement, and then according to the object o 3 、o 5 、o 8 The degree of textual and spatial matching with query q, the top three objects may be returned using the selected ranking function. In addition to this, due to o 1 Does not have the same key as q, so o 1 Neglected; o 2 、o 4 、o 6 、o 7 And are also ignored because none of them meet the query attribute requirements.
However, in some cases, when a user's desired objects do not appear in the query result set, the user may think why these desired objects do not appear in the query result set, how to add their desired objects to the query result set. For example, a query is initiated at the user and a containment o is obtained 3 、o 5 、o 8 After querying the results, he may want to know why they are familiar with object o 1 、o 6 Not present in the query result set, o 3 、o 5 、o 8 Ratio o of difficult to track 1 、o 6 Is good? Object o how they can get them familiar with 1 、o 6 Is it present in the query result set?
After obtaining the query results, the user may find that they want some objects not in the query result set, so that they may question the entire query result. The problem of why these desired objects are missing and how to efficiently retrieve the query object that the user intended is addressed is known as the why-not problem. However, no relevant technology exists to solve the why-not problem in the enhanced spatial keyword top-k query. Therefore, a technical scheme capable of solving the why-not problem in the enhanced spatial keyword top-k query is needed.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a method for adopting AI 3 The method and the system for solving the SKQwyy-not problem effectively solve the why-not problem in the spatial keyword query.
In order to achieve the above purposes, the technical scheme adopted by the invention is as follows: adopt AI 3 The method for solving the SKQwyh-not problem comprises the following steps:
obtaining all objects o and constructing AI 3 Indexing;
obtaining an initial query q ═ (q.loc, q.doc) 0 q.B, k, α) and a missing object set M; constructing a candidate keyword list CKS according to the descending order of the frequency of the keywords of the missing objects, and constructing a candidate attribute value pair list CAS according to the descending order of the similarity scores of the missing objects; respectively setting a keyword set q '. doc and an attribute value pair q'. B 'of the refined query q' as q.doc 0 And q.B;
orderly extracting keywords in the CKS and attribute value pairs in the CAS, and respectively adding the keywords to a keyword set q '. doc of the query q' and the attribute value pairs q '. B' of the query q 'to form a new refined query q'; processing each refining query q' separately to find the best refining query until both CKS and CAS are empty;
processing each refined query q' respectively, specifically including:
calculating the modification cost p ' of q ', and filtering p ' to be more than or equal to p c Query q', p c Query q for preserving initial query key and attribute and all missing objects appearing in query results b The modification cost of (2);
to p'<p c According to the frequency of each keyword, to determine the query qWhether each key of query q' is a frequent key:
if the keywords are frequent keywords, adding the root nodes of the quadtree in the header file into a non-leaf node queue to be processed, selecting leaf nodes meeting the conditions according to a preset screening rule, and adding the leaf nodes meeting the conditions into the leaf node queue; for each object in the disk page pointed by the leaf node meeting the condition, sequentially judging whether the attribute value pair q '. B ' of the query q ' meets the attribute matching with the attribute value pair of the object, adding the matched object to an object set meeting the requirement of the query q ', and calculating the similarity score between the query q ' and the object;
if the keywords are the infrequent keywords, analyzing each object in the corresponding disk page, if the attribute value pair q '. B ' of the query q ' meets the attribute matching with the attribute value pair of the object, adding the object in the corresponding disk page into an object set meeting the requirement of the query q ', and calculating the similarity score between the query q ' and the object;
all the objects in the object set meeting the requirement of the query q 'are ranked from high to low according to the similarity scores of the objects until all original result objects and all missing objects appear, and k' objects are obtained;
if k' is ≦ k m ,k m To preserve the size of the result set when the initial query key and attributes are preserved and all missing objects appear in the query results, a modification cost p ' of q ' is computed, if p '<p c The query q' is taken as the current best refined query.
On the basis of the scheme, all the objects o are obtained, and AI is constructed 3 The indexing specifically comprises the following steps:
hierarchically dividing a data space into cells using a quadtree structure; taking the cell as a basic storage unit, and storing the spatial position and the attribute information of an object containing the keyword;
three components are created: a lookup table used as a portal, a header file containing summary information of dense key units, and a data file storing key unit tuples in all the posting tables;
storing the attribute information of the basic key word unit of the frequent key word in the leaf nodes of the quadtree;
each non-leaf node R of the quadtree i All contain three attributes: r i .id,R i .S, R i Address, wherein R i Id is node id, R i Address is R i Address list of all sub-nodes of (1) and R i S is R i The union of the attribute value pairs of all the sub-nodes;
each leaf node R of the quadtree i All contain three attributes: r i .id,R i .S, R i Address, wherein R i Id is node id, R i Address is the Address of the disk page to which it is linked, R i S is the union of the attribute-value pairs of all objects in the disk page to which it is linked.
On the basis of the scheme, B is a Boolean expression:
Figure BDA0002277655980000051
Figure BDA0002277655980000052
is a predicate set where i ∈ [1, n ]],i∈N *
On the basis of the scheme, the modification cost p 'of q' is calculated, and the calculation formula is as follows:
Figure BDA0002277655980000053
wherein, beta 1 ,β 2 ,β 3 ,β 4 Respectively representing the weights of a k value, a keyword, an attribute type and an attribute value in a cost function; beta is a beta i Is not less than 0 and
Figure BDA0002277655980000054
k 'is the size of the query result set that refines query q', k 0 Is the initial query qSize of result set, k m Is the size of the result set, k, when the initial query key and attributes are preserved and all missing objects appear in the query results m -k 0 Normalized k' -k 0 (ii) a Δ doc is from q.doc 0 The number of keys that need to be changed to q'. doc,
Figure BDA0002277655980000055
wherein the missing object set M ═ M 1 ,m 2 ,...,m j H, by | q.doc } 0 U.doc | to normalize Δ doc; delta A n Is the number of attribute types that need to be changed to adjust from the initial query to the refined query, Δ A is normalized by | q.B ≦ M.B | n
Figure BDA0002277655980000061
n is the sum of the attributes contained in q.B and M.B; Δ v i Is to contain an attribute A i The maximum difference value of the attribute values of all the objects with respect to the attribute; | v i '-v i I is attribute A i Current query attribute value v i ' with initial query attribute value v i Absolute value of the difference between, and | v i '-v i |≤Δv i By Δ v i To normalize | v i '-v i |。
On the basis of the scheme, the similarity score between the query q and the object o is calculated by the following formula:
Figure BDA0002277655980000062
where α is a variable between 0 and 1 defining the relative importance between the proximity and the text relevance, d (q.loc, o.loc) denotes the Euclidean distance between query q and object o, d max (q.loc, o.loc) represents the maximum distance from the query point q to all objects in the object set O, expressed as the maximum distance between all objects in the object set O.
On the basis of the scheme, if the keyword is a frequent keyword, adding the root node of the quadtree in the header file into a to-be-processed non-leaf node queue, selecting a leaf node meeting the condition according to a preset screening rule, and adding the leaf node queue meeting the condition, wherein the method specifically comprises the following steps:
if the keywords are frequent keywords, adding the root nodes of the quadtree in the header file into a non-leaf node queue to be processed;
judging whether the sub-node of the current node in the non-leaf node queue to be processed is a qualified node or not;
if not, filtering out the sub-node; if yes, judging whether the sub-node is a non-leaf node or a leaf node;
if the node is a non-leaf node, adding the non-leaf node into a to-be-processed non-leaf node queue to wait for processing; if yes, adding the leaf node into the leaf node queue meeting the conditions.
On the basis of the scheme, whether the sub-node of the current node in the to-be-processed non-leaf node queue is a qualified node is judged, and the judgment standard is as follows:
a) all attribute classes of query q' are on this child node;
b) each attribute value range of query q' intersects the corresponding attribute value range of the child node.
The invention also provides an AI 3 The system for solving the SKQwyh-not problem comprises:
AI 3 an index building module to: obtaining all objects o and constructing AI 3 Indexing;
a candidate list construction module to: obtaining an initial query q ═ (q.loc, q.doc) 0 q.B, k, α) and a missing object set M; constructing a candidate keyword list CKS according to the descending order of the frequency of the keywords of the missing objects, and constructing a candidate attribute value pair list CAS according to the descending order of the similarity scores of the missing objects; respectively setting a keyword set q '. doc and an attribute value pair q'. B 'of the refined query q' as q.doc 0 And q.B;
a refined query module to: orderly extracting keywords in CKS and attribute value pairs in CAS, and respectively adding the keywords in CKS and the attribute value pairs in CAS to a keyword set q '. doc ' of a query q ' and an attribute value pair q '. B ' of the query q ' to form a new refined query q '; processing each refining query q' to find the best refining query until both CKS and CAS are empty; processing each refined query q' respectively, specifically including:
calculating the modification cost p ' of q ', and filtering p ' to be more than or equal to p c Query q', p c Query q for preserving initial query key and attribute and all missing objects appearing in query results b The modification cost of (2);
to p'<p c According to the frequency of each keyword, determining whether each keyword of the query q' is a frequent keyword:
if the keywords are frequent keywords, adding the root nodes of the quadtree in the header file into a non-leaf node queue to be processed, selecting leaf nodes meeting the conditions according to a preset screening rule, and adding the leaf nodes meeting the conditions into a leaf node queue; for each object in the disk page pointed by the leaf node meeting the condition, sequentially judging whether the attribute value pair q '. B ' of the query q ' meets the attribute matching with the attribute value pair of the object, adding the matched object to an object set meeting the requirement of the query q ', and calculating the similarity score between the query q ' and the object;
if the key words are the infrequent key words, analyzing each object in the corresponding disk page, if the attribute value pair q '. B ' of the query q ' meets the attribute matching with the attribute value pair of the object, adding the object in the corresponding disk page into an object set meeting the requirement of the query q ', and calculating the similarity score between the query q ' and the object;
all the objects in the object set meeting the requirement of the query q 'are ranked from high to low according to the similarity scores of the objects until all original result objects and all missing objects appear, and k' objects are obtained;
if k' is ≦ k m ,k m To preserve the size of the result set when the initial query key and attributes are preserved and all missing objects appear in the query results, a modification cost p ' of q ' is computed, if p '<p c Then the query q' is taken asTop best refinement queries.
Based on the scheme, AI 3 The index building module is specifically configured to:
hierarchically dividing a data space into cells using a quadtree structure; taking the cell as a basic storage unit, and storing the spatial position and the attribute information of an object containing the keyword;
three components are created: a lookup table used as a portal, a header file containing summary information of dense key units, and a data file storing key unit tuples in all the inverted lists;
storing the attribute information of the basic key word unit of the frequent key word in the leaf nodes of the quadtree;
each non-leaf node R of the quadtree i All contain three attributes: r i .id,R i .S, R i Address, wherein R i Id is node id, R i Address is R i Address list of all sub-nodes of (1) and R i S is R i The union of the attribute value pairs of all the sub-nodes;
each leaf node R of the quadtree i All contain three attributes: r i .id,R i .S, R i Address, wherein R i Id is node id, R i Address is the Address of the disk page to which it is linked, R i S is the union of the attribute-value pairs of all objects in the disk page to which it is linked.
On the basis of the scheme, B is a Boolean expression:
Figure BDA0002277655980000091
Figure BDA0002277655980000092
is a predicate set where i ∈ [1, n ]],i∈N *
On the basis of the scheme, if the keywords are frequent keywords, the refining query module adds the root nodes of the quadtree in the header file into a non-leaf node queue to be processed, selects leaf nodes meeting the conditions according to a preset screening rule, and adds the leaf nodes into the leaf node queue meeting the conditions, and the method specifically comprises the following steps:
if the keywords are frequent keywords, adding the root nodes of the quadtree in the header file into a non-leaf node queue to be processed;
judging whether the sub-node of the current node in the non-leaf node queue to be processed is a qualified node or not;
if not, filtering out the sub-node; if yes, judging whether the sub-node is a non-leaf node or a leaf node;
if the node is a non-leaf node, adding the non-leaf node into a to-be-processed non-leaf node queue to wait for processing; if yes, adding the leaf node into the leaf node queue meeting the conditions.
On the basis of the scheme, the refining query module judges whether the sub-node of the current node in the non-leaf node queue to be processed is a qualified node, and the judgment standard is as follows:
a) all attribute classes of query q' are on this child node;
b) each attribute value range of query q' intersects the corresponding attribute value range of the child node.
Compared with the prior art, the invention has the advantages that:
the digital attribute of the object is expressed in the form of the Boolean expression, so that the method is closer to a real application scene; and design AI 3 Object information is skillfully organized by indexing, and meanwhile, a corresponding query strategy is designed, so that the condition that all missing objects appear in a query result is met by modifying the query q' with the minimum modification cost, and the why-not problem in the space keyword query is solved.
Drawings
FIG. 1 is a diagram of an example set of objects of the background art;
FIG. 2 is AI of an embodiment of the invention 3 Indexing a schematic diagram for dividing an object;
FIG. 3 is a drawing showingAI of an embodiment of the invention 3 A schematic of the structure of an instance of the index;
FIG. 4 is an AI-based embodiment of the invention 3 And (4) an algorithm schematic diagram of the index.
Detailed Description
The embodiment of the invention provides a method for adopting AI 3 The method for solving the SKQwyh-not problem comprises the following steps:
obtaining all objects o and constructing AI 3 Indexing;
obtaining an initial query q ═ (q.loc, q.doc) 0 q.B, k, α) and the missing object set M, q.loc represents the location of the query q, q.doc 0 Representing a query q keyword set, q.B is a Boolean expression used for representing attribute value pairs, k represents the top k bits of the ranking of the query result, and a is a variable between 0 and 1 and used for defining the relative importance between the distance proximity and the text relevance; constructing a candidate keyword list CKS according to the descending order of the frequency of the keywords of the missing objects, and constructing a candidate attribute value pair list CAS according to the descending order of the similarity scores of the missing objects; respectively setting a keyword set q '. doc and an attribute value pair q'. B 'of the refined query q' as q.doc 0 And q.B;
orderly extracting keywords in the CKS and attribute value pairs in the CAS, and respectively adding the keywords to a keyword set q '. doc of the query q' and the attribute value pairs q '. B' of the query q 'to form a new refined query q'; processing each refining query q' separately to find the best refining query until both CKS and CAS are empty;
processing each refined query q' respectively, specifically including:
calculating the modification cost p ' of q ', and filtering p ' to be more than or equal to p c Query q', p c Query q for preserving initial query key and attribute and all missing objects appearing in query results b The modification cost of (2);
to p'<p c According to the frequency of each keyword, determining whether each keyword of the query q' is a frequent keyword:
if the keywords are frequent keywords, adding the root nodes of the quadtree in the header file into a non-leaf node queue to be processed, selecting leaf nodes meeting the conditions according to a preset screening rule, and adding the leaf nodes meeting the conditions into a leaf node queue; for each object in the disk page pointed by the leaf node meeting the conditions, sequentially judging whether the attribute value pair q 'of the query q' meets the attribute matching with the attribute value pair of the object, adding the matched object into an object set meeting the requirement of the query q ', and calculating the similarity score between the query q' and the object;
if the key words are the infrequent key words, analyzing each object in the corresponding disk page, if the attribute value pair q '. B ' of the query q ' meets the attribute matching with the attribute value pair of the object, adding the object in the corresponding disk page into an object set meeting the requirement of the query q ', and calculating the similarity score between the query q ' and the object;
all the objects in the object set meeting the requirement of the query q 'are ranked from high to low according to the similarity scores of the objects until all original result objects and all missing objects appear, and k' objects are obtained;
if k' is less than or equal to k m ,k m To preserve the initial query key and attributes, and the size of the result set when all missing objects appear in the query result, a modification cost p ' of q ' is computed, if p '<p c The query q' is taken as the current best refined query.
Embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.
Definition of one, enhanced space keyword top-k query
Predicates are the basic components that make up a Boolean expression. Given a quadruple (A, f) opt ,f opd X) where A is an attribute, f opt Is an operand, f opd Is an operator and x is the value of the input, it is more convenient to define the predicate.
Definition 1: and (4) predicating.
If a mapping function p satisfies
Figure BDA0002277655980000121
Then
Figure BDA0002277655980000122
Is a predicate. Wherein if the input value x is within the predicate specification range, the mapping function will return 1, otherwise, it will return 0.
Definition 2: a boolean expression.
Knowing a predicate set
Figure BDA0002277655980000123
Wherein i ∈ [1, n ]],i∈N * Then boolean expression B may be defined as follows:
Figure BDA0002277655980000124
definition 3: text-space objects.
Knowing a spatial point o.loc, a set of keyword sets o.doc and a set of attribute-value pairs-<A 1 ,v 1 >,...,<A i ,v j >,...,<A n ,v n >Text space object o can be represented as follows:
o ═ o.loc, o.doc o.S >, where o.S { (A) 1 =v 1 )∩(A 1 =v 1 )∩…∩ (A n =v n )}
Definition 4: enhanced spatial keyword query.
Knowing a spatial point q.loc, a set of keywords q.doc 0 And a boolean expression q.B, an enhanced spatial key query q may be expressed as:
q=<q.loc,q.doc 0 ,q.B>
definition 5: and matching the keywords.
For query q and object o, if and only if q.doc and o.doc contain the same keywords, query q and object o are said to be keyword matched, i.e.: q.doc ≠ φ ≠ o.doc ≠ φ
As used herein
Figure BDA0002277655980000131
Representing keyword matches
Definition 6: and (6) matching the attributes.
For query q and object o, if and only if the following two conditions are satisfied: a) q.B are all contained in o.S; b)
Figure BDA0002277655980000132
(assume attribute A of attributes q.B i Attribute a in and o.S i' Equal),
Figure BDA0002277655980000133
wherein:
Figure BDA0002277655980000134
(A i' =v i' ) E o.S, then the query q and object o are attribute matches.
Use of
Figure BDA0002277655980000135
Representing attribute matching
Definition 7: and (5) comprehensive matching.
If and only if the enhanced spatial keyword query q and the text spatial object o satisfy both keyword matching and attribute matching, q and o are a composite match, that is:
Figure BDA0002277655980000136
as used herein
Figure BDA0002277655980000141
Representation synthesis matching
Now a Rank function is defined to measure the similarity score between query q and object o:
Figure BDA0002277655980000142
wherein α is a variable between 0 and 1Defining the relative importance between distance proximity and text relevance, d (q.loc, o.loc) denotes the Euclidean distance between query q and object o, d max (q.loc, o.loc) represents the maximum distance from the query point q to all objects in the object set O, specifically represented by the maximum distance between all objects in the object set O.
Definition 8: enhanced spatial key top-k query.
Knowing a set of objects O, the enhanced spatial key top-k query q ═ (loc, doc) 0 B, k, α) retrieves a set of objects O',
Figure BDA0002277655980000144
it satisfies: i O' | ═ k, and
Figure BDA0002277655980000143
o’∈O-O’,Rank(q,o)>Rank(q,o’).
two, why-not problem in enhanced spatial keyword top-k query
When a user initiates an enhanced Top-k space keyword query q ═ loc, doc 0 B, k, α), if the query parameters, such as text description, query attribute, k value and α, are set unreasonably, this may result in one or more objects desired by the user being accidentally missing, such objects being referred to as missing objects, M ═ M 1 ,m 2 ,...,m j Denotes. So that the user will propose a set of missing objects M ═ M 1 ,m 2 ,...,m j Why-not question why these desired objects would be missing and seek a refined query q ' ═ loc, doc, B ', k ', α, complete, set of results that can contain all the missing objects. Since the location of the query is usually deterministic, the initial query can be refined by changing the query keyword set, the Boolean expression, the k value, and the alpha value.
Considering that the result set of the refined query q 'contains all missing objects, let q' doc contain some or all of the keywords of the missing objects in addition to the original keyword set, i.e. CKS is oneThe ordered list of keys for missing objects ordered according to key frequency, the function Out list (CKS) indicates that the first key is taken from the CKS and returned. For example, in example 1, query q filters out o1, o2, o4, o6, o7, provided that o 4 And o 6 Is a missing object, the keyword "center" has a higher frequency than the keyword "Cosmic", and "center" is arranged before "Cosmic" in the CKS, when the CKS is { "center", "ic" }. Similarly, let q '. B' satisfy the requirement of each attribute-value pair of all the missing objects, except the original set of attribute-value pairs, i.e., the CAS represents an ordered list of attribute-value pairs of the missing objects, ordered by object similarity score. The function Out _ List (CAS) indicates that the first attribute-value pair is taken from the CAS and returned. Combining the above examples, assume o 4 Is ranked according to the similarity score of (a) 6 Is high, therefore o 4 The attribute value pair of (2) is ranked at o 6 The attribute value of (2) is ahead. This is because high-scoring objects are generally more desirable to users, so their attribute values are more in line with the needs of users. Therefore, priority is given to 4 The attribute value pair of (2) can obtain:
q′.B′=q.B∪Out_List(CAS)=q.B∪o 4 .B
=(avg-price≤42)∧(Rating>4.3)∧Popularity>700)
wherein q.B ═ avg-price < 42 ^ Rating > 4.3 ^ powdery > 700), o 4 .S=(avg-price=42∧Rating=4.4∧Popularity=900)。
Due to o 6 This refined query is still not satisfied, so its attribute-value pair, i.e., o, is considered 6 S ═ 35 Λ ratio ^ 4.6 Λ ratio ^ NULL), so that q '. B' ═ 42 ^ avg-price ≦ 42 ^ (Rating)>4.3)。
Considering that changing the values of different query parameters will have different effects on the optimization of the query, the modification cost between the refined query q' and the initial query q can be defined as follows:
Figure BDA0002277655980000151
wherein, beta 1 ,β 2 ,β 3 ,β 4 The weights of the k value, the keyword, the attribute type, and the attribute value in the cost function are respectively expressed. Beta is a i Not less than 0 and
Figure BDA0002277655980000161
k 'is the size of the query result set that refines query q', k 0 Is the size of the result set of the initial query q, in k m -k 0 Normalized k' -k 0 . This is because in many predecessors' studies, k was increased by preserving the initial query key and attributes 0 To k m Obtaining a basic refined query q by a method until all missing objects appear in a query result set b . In contrast, a better refined query may have a lower query modification cost by modifying the k value, the key, the attribute type, and the attribute value. Wherein k' -k 0 Is less than or equal to k m -k 0 . Δ doc is from q.doc 0 Doc is adjusted to q'. doc the number of keys needs to be changed,
Figure BDA0002277655980000162
wherein the missing object set M ═ M 1 ,m 2 ,...,m j }. Here by | q.doc 0 U.doc | to normalize Δ doc; delta A n Is the number of attribute categories that need to be changed from the initial query to the refined query, where Δ A is normalized by | q.B ≦ M.B | n (ii) a Then the
Figure BDA0002277655980000163
n is the sum of the attributes contained in q.B and M.B. Δ v i Is to contain an attribute A i With respect to the attribute value of the attribute. | v i '-v i Is attribute A i The value v of the current query attribute i ' with initial query attribute value v i The absolute value of the difference between, and | v i '-v i |≤Δv i . Here by Δ v i To normalize | v i '-v i |。
ΔA n And Δ doc can be calculated by compiling the distance. In the example of FIG. 1, the initial query q is modified to a refined query q ', where q '. doc ═ { cat }, { cafe } ", q '. A ═ avg-price < 42 ═ U (Rating > 4.5). U (Popularity > 700), and then Δ A n =1,Δdoc=1。
III, adopting AI 3 Method for solving why-not problem in spatial keyword query through index
Based on whether the query keyword is a frequent keyword or an infrequent keyword, AI is designed 3 Indexes are used for improving query efficiency and solving why-not problem of the enhanced space keyword top-k query. AI 3 The indexing is based on I 3 Indexing, using a quadtree structure to hierarchically divide a data space into cells, processes spatiotemporal textual information. The index takes a keyword cell as a basic storage unit, and the cell captures spatial position and attribute information of an object containing the keyword.
FIG. 2 shows the key units of the two keys "cat" and "cafe" in FIG. 1. A unit containing the number of objects not exceeding a given threshold is called a basic keyword unit; and vice versa as dense key units. Assume that a cell contains a threshold of 2 objects, and therefore each basic key cell of the key contains at most two objects with this key. In the unit for the key "cat" in FIG. 2, C 1 、C 2 、C4 2 、C 43 And C 44 Is a basic unit, and C 4 Are dense cells.
And I 3 Similarly, AI of an embodiment of the invention 3 Three main components are also included: a lookup table that serves as a portal, a header file that contains summary information for dense key cells, and a data file that stores key cell tuples in all posting tables. And I 3 Different is that AI 3 Not only text information and spatial information are used to retrieve an object desired by a user, but also attribute information is used in a header file to improve efficiency of pruning. Specifically, AI 3 And introducing the attribute information into the node abstract of the quadtree. If not leafIf the attribute information of the node and the query attribute are not 'attribute matching', the node and all sub-nodes thereof are pruned.
Each non-leaf node R of the quadtree i All contain a triplet (R) i .id,R i .S, R i Address), wherein R is i Id is node id, R i Address is R i Address list of all sub-nodes of (1) and R i S is R i Is generated by the attribute value pairs of all the sub-nodes. Since the header file is stored in memory, in order to save query time, especially time to access disk pages, the attribute information of the basic key unit of the frequent key is stored in the leaf nodes of the quadtree, not in the disk pages. And when the attribute information of the leaf node is not matched with the query attribute, ignoring the corresponding disk page. Each leaf node R of the quadtree i Comprising a triplet (R) i .id,R i .S,R i Address). Wherein R is i Id is node id, R i Address is the Address of the disk page to which it is linked, R i S is the union of the attribute-value pairs of all objects in the disk page to which it is linked.
Continuing with the example in FIG. 1, wherein
o 5 .S=(avg-price=37∧Rating=4.5∧Popularity=1400),
o 6 .S=(avg-price=35∧Rating=4.6),o 7 .S=(Rating=4.3∧Popularity=700),
o 8 S ═ 38 ^ Rating ^ 4.6 ^ poularity ^ 1600, as shown in fig. 3, R ^ R ═ g-price ^ 4.6 ^ poularity ^ 1600, as shown in fig. 3 7 Containing an object o 5 And o 6 Thus, therefore, it is
R 7 .S=Cover(o 5 .S,o 6 .S)=(avg-price∈[35,37])∧(Rating∈[4.5,4.6] )∧(Popularity=1400)。
Here the function Cover (o) i .S,o j S) returned is a list of value ranges, each value range covering O i .S.A k To O j .S.A k Each attribute A therebetween k ∈o i .S∪o j Attribute value of SAnd (3) a range. Note that the function Cover () also applies to two non-leaf nodes R i And R j And cases with more parameters. Then:
R 5 .S=Cover(R 7 .S,R 8 .S,R 9 .S)=(avg-price∈[35,38])∧(Rating∈[4.3,4.6]) ∧(Popularity∈[700,1600])
in I 3 In the above embodiments, the objects in different basic key units of the data file may be stored in the disk page to improve the storage utilization, but this means that if some objects in the basic key units of the disk page are loaded into the memory for processing, other basic key units from the disk page are also loaded into the memory for processing, which consumes time. In contrast, for AI 3 To improve the efficiency of the query, disk pages store only AI 3 Such that no other extraneous objects appear when the disk page is loaded into memory.
FIG. 3 illustrates an AI constructed for the object of FIG. 1 3 And (4) indexing. In FIG. 3, the keys "cat" and "cafe" are both stored in a lookup table, and both frequent keys are each linked to a quadtree in the header file. The attribute value pairs for each quadtree node are used to prune the ineligible tree branches. Each leaf node of the quad-tree in the header file is linked to the related disk page of the data file, and which disk pages need to be accessed can be determined according to the attribute value pairs of the leaf nodes.
Referring to FIG. 4, the algorithm illustrates the use of AI 3 The detailed implementation of the enhanced why-not space keyword top-k query processing method. The method comprises the step of subjecting AI 3 Index, initial query q, missing object set M, candidate keyword list CKS, candidate attribute value pair list CAS, basic optimization query q b Penalty of (1) c ,q b Query result object k in m As an input. The output is the best refined query q'.
Specifically, CKS is an ordered list of keywords of missing objects arranged in order of decreasing frequency of keywords, while CAS is based on missing objectsAn ordered list of attribute-value pairs of the missing objects in descending order of similarity score. The two lists are pre-constructed, and the processing order of the candidate keyword and candidate attribute value pairs plays an important role in obtaining the refined query. For P c A value equal to cost (q, q) calculated using equation (2) b )。q b Is the basic refined query discussed previously.
The queue D, queue D', queue W, pointer TWord, pointer TNode, set RRS are initialized to empty and are used to store the non-leaf nodes of the quadtree in the eligible header file, the keywords of the refined query being processed, the quadtree nodes of the header file being accessed, and the set of objects satisfying the optimized refinement requirements, respectively (line 4). Next, let q '. doc and q '. B ' equal q.doc, respectively 0 And q.B (line 5). Next, the key value pairs in the CKS and the attribute value pairs in the CAS are sequentially fetched and added to q '. doc and q '. B ', respectively, to form a new refined query, which is then processed to find the best refined query until both the CKS and CAS are empty.
Lines 7-38 show the processing steps for each refined query q'. First, a refined query q' is obtained by parameter modification. Specifically, the first key in CKS and the first attribute value pair in CAS are taken out and added to q '. doc and q '. B ' respectively (lines 7-8); here, the function Out _ list (cks) takes Out the first key and returns the key, and the function Out _ list (cas) functions similarly to Out _ list (cks); let k' be k 0 . The cost q 'of p' is then calculated according to equation (2) to filter the ratio q as early as possible b A costly refinement query. If P' ≧ P c The loop is terminated (lines 10-11). Otherwise, the key of the refined query is enqueued to queue W (line 12) to continue the processing of q' (lines 13-29): and determining whether the keywords of the queue W are frequent keywords according to the frequency of each keyword of the queue W, and respectively processing.
Lines 15-29 show the processing steps for frequent keywords. For frequent keys pointed to by TWord, its root node of the quadtree in the header file is pushed into queue D (line 16) and its eligible non-leaf nodes are then pushed into queue D for processing, thereby obtaining queue D', which retains the eligible leaf nodes for further processing.
When queue D is not empty, the elements in D are processed in the following order: 1) pop the head element in D and point to it by TNode (line 18); 2) for each sub-node n of the TNode s If the node meets the following requirements: a) can be at n s Finding all attribute categories of the refined query q' on S; b) each attribute value range of q' with n s The corresponding attribute value ranges of (2) have intersection; this node may contain a result object and is "eligible" requiring further processing (line 20). Then, if n is s Is a non-leaf node, n is s Enqueue to queue D. If n is s Is a leaf node, n is s Enqueue to queue D' (lines 21-24).
Next, queue D 'is processed to obtain query result q'. When queue D 'is not empty, the elements in D' are processed in the following order: 1) pop the head element (node) of D' and point to it by TNode (line 26); 2) for each object o of TNode i If the query attributes q '. B' and o are refined i S attribute match, object o can then be calculated according to equation (1) i And adds this object to the RRS (lines 27-29).
Lines 31-33 show the processing steps for infrequent keys. For each object o in a disk page linked by TWord i If the query attributes q '. B' and o are refined i S satisfies attribute matching, object o can be calculated according to equation (1) i And adds the subject to the RRS (lines 31-33).
Next, all the objects in the RRS are ranked according to their similarity scores. The top k' objects with the highest score can be obtained until all original result objects and all missing objects appear (line 34). If k' is ≦ k m Then calculate the cost of q' (line 36); if p'<P c (lines 37-38), P is modified with P c . After all of these refinements have been processedAfter the query, the best refined query can be obtained.
The embodiment of the invention also provides an AI adopted 3 The system for solving the SKQwyy-not problem comprises:
AI 3 an index building module to: obtaining all objects o and constructing AI 3 Indexing;
a candidate list construction module to: obtaining an initial query q ═ (q.loc, q.doc) 0 q.B, k, α) and a missing object set M; constructing a candidate keyword list CKS according to the descending order of the frequency of the keywords of the missing objects, and constructing a candidate attribute value pair list CAS according to the descending order of the similarity scores of the missing objects; respectively setting a keyword set q '. doc and an attribute value pair q'. B 'of the refined query q' as q.doc 0 And q.B;
a refinement query module to: orderly extracting keywords in the CKS and attribute value pairs in the CAS, and respectively adding the keywords to a keyword set q '. doc of the query q' and the attribute value pairs q '. B' of the query q 'to form a new refined query q'; processing each refining query q' to find the best refining query until both CKS and CAS are empty; processing each refined query q' respectively, specifically including:
calculating the modification cost p ' of q ', and filtering p ' to be more than or equal to p c Query q', p c Query q for preserving initial query key and attribute and all missing objects appearing in query results b The modification cost of (2);
to p'<p c According to the frequency of each keyword, determining whether each keyword of the query q' is a frequent keyword:
if the keywords are frequent keywords, adding the root nodes of the quadtree in the header file into a non-leaf node queue to be processed, selecting leaf nodes meeting the conditions according to a preset screening rule, and adding the leaf nodes meeting the conditions into a leaf node queue; for each object in the disk page pointed by the leaf node meeting the condition, sequentially judging whether the attribute value pair q '. B ' of the query q ' meets the attribute matching with the attribute value pair of the object, adding the matched object to an object set meeting the requirement of the query q ', and calculating the similarity score between the query q ' and the object;
if the key words are the infrequent key words, analyzing each object in the corresponding disk page, if the attribute value pair q '. B ' of the query q ' meets the attribute matching with the attribute value pair of the object, adding the object in the corresponding disk page into an object set meeting the requirement of the query q ', and calculating the similarity score between the query q ' and the object;
all the objects in the object set meeting the requirement of the query q 'are ranked from high to low according to the similarity scores of the objects until all original result objects and all missing objects appear, and k' objects are obtained;
if k' is ≦ k m ,k m To preserve the size of the result set when the initial query key and attributes are preserved and all missing objects appear in the query results, a modification cost p ' of q ' is computed, if p '<p c The query q' is taken as the current best refined query.
As a preferred embodiment, AI 3 The index building module is specifically configured to:
hierarchically dividing a data space into cells using a quadtree structure; taking the cell as a basic storage unit, and storing the spatial position and the attribute information of an object containing the keyword;
three components are created: a lookup table used as a portal, a header file containing summary information of dense key units, and a data file storing key unit tuples in all the posting tables;
storing the attribute information of the basic key word unit of the frequent key word in the leaf nodes of the quadtree;
each non-leaf node R of the quadtree i All contain three attributes: r is i .id,R i .S, R i Address, wherein R i Id is node id, R i Address is R i Address list and R of all sub-nodes i S is R i The union of the attribute value pairs of all the sub-nodes;
each leaf node R of the quadtree i All contain three attributes: r i .id,R i .S, R i Address, wherein R i Id is node id, R i Address is the Address of the disk page to which it is linked, R i S is the union of the attribute-value pairs of all objects in the disk page to which it is linked.
As a preferred embodiment, B is a boolean expression:
Figure BDA0002277655980000231
Figure BDA0002277655980000232
is a set of predicates where i ∈ [1, n ]],i∈N *
As a preferred embodiment, if the keyword is a frequent keyword, the refined query module adds the root node of the quadtree in the header file to a to-be-processed non-leaf node queue, selects a leaf node meeting the condition according to a preset screening rule, and adds the leaf node to the leaf node queue meeting the condition, specifically including the following steps:
if the keywords are frequent keywords, adding the root nodes of the quadtree in the header file into a non-leaf node queue to be processed;
judging whether a sub node of a current node in a non-leaf node queue to be processed is a qualified node or not;
if not, filtering out the sub-node; if yes, judging whether the sub-node is a non-leaf node or a leaf node;
if the node is a non-leaf node, adding the non-leaf node into a to-be-processed non-leaf node queue to wait for processing; if yes, adding the leaf node into the leaf node queue meeting the conditions.
As a preferred embodiment, the refined query module determines whether a child node of a current node in the to-be-processed non-leaf node queue is a qualified node, where the determination criterion is:
a) all attribute classes of query q' are on this child node;
b) each attribute value range of query q' intersects the corresponding attribute value range of the child node.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (12)

1. By using AI 3 The method for solving the SKQwyh-not problem is characterized by comprising the following steps of:
obtaining all objects o and constructing AI 3 Indexing;
obtaining an initial query q ═ q.loc, q.doc 0 q.B, k, α) and a missing object set M; constructing a candidate keyword list CKS according to the descending order of the frequency of the keywords of the missing objects, and constructing a candidate attribute value pair list CAS according to the descending order of the similarity scores of the missing objects; respectively setting a keyword set q '. doc and an attribute value pair q'. B 'of the refined query q' as q.doc 0 And q.B;
orderly extracting keywords in the CKS and attribute value pairs in the CAS, and respectively adding the keywords to a keyword set q '. doc of the query q' and the attribute value pairs q '. B' of the query q 'to form a new refined query q'; processing each refining query q' to find the best refining query until both CKS and CAS are empty;
processing each refined query q' respectively, specifically including:
calculating the modification cost p ' of q ', and filtering p ' to be more than or equal to p c Query q', p c Query q for preserving initial query key and attribute and all missing objects appearing in query results b The modification cost of (2);
to p'<p c According to the frequency of each keyword, determining whether each keyword of the query q' is a frequent keyword:
if the keywords are frequent keywords, adding the root nodes of the quadtree in the header file into a non-leaf node queue to be processed, selecting leaf nodes meeting the conditions according to a preset screening rule, and adding the leaf nodes meeting the conditions into a leaf node queue; for each object in the disk page pointed by the leaf node meeting the conditions, sequentially judging whether the attribute value pair q 'of the query q' meets the attribute matching with the attribute value pair of the object, adding the matched object into an object set meeting the requirement of the query q ', and calculating the similarity score between the query q' and the object;
if the key words are the infrequent key words, analyzing each object in the corresponding disk page, if the attribute value pair q '. B ' of the query q ' meets the attribute matching with the attribute value pair of the object, adding the object in the corresponding disk page into an object set meeting the requirement of the query q ', and calculating the similarity score between the query q ' and the object;
all the objects in the object set meeting the requirement of the query q 'are ranked from high to low according to the similarity scores of the objects until all original result objects and all missing objects appear, and k' objects are obtained;
if k' is ≦ k m ,k m To preserve the initial query key and attributes, and the size of the result set when all missing objects appear in the query result, a modification cost p ' of q ' is computed, if p '<p c The query q' is taken as the current best refined query.
2. The method of claim 1, wherein: obtaining all objects o and constructing AI 3 The indexing specifically comprises the following steps:
hierarchically dividing a data space into cells using a quadtree structure; taking the cell as a basic storage unit, and storing the spatial position and the attribute information of an object containing the keyword;
three components are created: a lookup table used as a portal, a header file containing summary information of dense key units, and a data file storing key unit tuples in all the inverted lists;
storing the attribute information of the basic key word unit of the frequent key word in the leaf node of the quadtree;
each non-leaf node R of the quadtree i All contain three attributes: r i .id,R i .S,R i Address, wherein R i Id is node id, R i Address is R i Address list of all sub-nodes of (1) and R i S is R i The union of the attribute value pairs of all the sub-nodes;
each leaf node R of the quadtree i All contain three attributes: r i .id,R i .S,R i Address, wherein R i Id is node id, R i Address is the Address of the disk page to which it is linked, R i S is the union of the attribute-value pairs of all objects in the disk page to which it is linked.
3. The method of claim 1, wherein: b is a Boolean expression:
Figure FDA0002277655970000031
Figure FDA0002277655970000032
is a predicate set where i ∈ [1, n ]],i∈N *
4. The method of claim 1, wherein: and calculating the modification cost p 'of q', wherein the calculation formula is as follows:
Figure FDA0002277655970000033
wherein, beta 1 ,β 2 ,β 3 ,β 4 Respectively representing the weight of a k value, a keyword, an attribute type and an attribute value in a cost function; beta is a i Is not less than 0 and
Figure FDA0002277655970000034
k' is the size of the query result set of the refined query qSmall, k 0 Is the size of the result set of the initial query q, k m Is the size of the result set, k, when the initial query key and attributes are preserved and all missing objects appear in the query results m -k 0 Normalizing k' -k 0 (ii) a Δ doc is from q.doc 0 Adjust to the number of keys that need to be changed to q'. doc,
Figure FDA0002277655970000035
wherein the missing object set M ═ M 1 ,m 2 ,...,m j }, by | q.doc 0 U.doc | to normalize Δ doc; delta A n Is the number of attribute types that need to be changed to adjust from the initial query to the refined query, and is normalized by | q.B ≦ M.B |, Δ A n
Figure FDA0002277655970000036
n is the sum of the attributes contained in q.B and M.B; Δ v i Is to contain an attribute A i The maximum difference value of the attribute values of all the objects with respect to the attribute; | v i '-v i Is attribute A i Current query attribute value v i ' with initial query attribute value v i The absolute value of the difference between, and | v i '-v i |≤Δv i By Δ v i To normalize | v i '-v i |。
5. The method of claim 1, wherein: calculating a similarity score between the query q and the object o, wherein the calculation formula is as follows:
Figure FDA0002277655970000041
where α is a variable between 0 and 1 defining the relative importance between distance proximity and text relevance, d (q.loc, o.loc) denotes the Euclidean distance between query q and object o, d max (q.loc, O.loc) represents the maximum distance of the query point q to all objects in the object set O, in terms of the object setThe maximum value of the distance between all objects in O.
6. The method of claim 2, wherein: if the keyword is a frequent keyword, adding a root node of the quadtree in the header file into a to-be-processed non-leaf node queue, selecting a leaf node meeting the conditions according to a preset screening rule, and adding the leaf node meeting the conditions into the leaf node queue, wherein the method specifically comprises the following steps of:
if the keywords are frequent keywords, adding the root nodes of the quadtree in the header file into a non-leaf node queue to be processed;
judging whether the sub-node of the current node in the non-leaf node queue to be processed is a qualified node or not;
if not, filtering out the sub-node; if yes, judging whether the sub-node is a non-leaf node or a leaf node;
if the node is a non-leaf node, adding the non-leaf node into a to-be-processed non-leaf node queue to wait for processing; if the leaf node is the leaf node, adding the leaf node into the leaf node queue meeting the conditions.
7. The method of claim 6, wherein: judging whether the sub-node of the current node in the non-leaf node queue to be processed is a qualified node or not, wherein the judgment standard is as follows:
a) all attribute classes of query q' are on this child node;
b) each attribute value range of query q' intersects the corresponding attribute value range of the child node.
8. By using AI 3 The system for solving the SKQwyh-not problem is characterized by comprising the following steps:
AI 3 an index building module to: obtaining all objects o and constructing AI 3 Indexing;
a candidate list construction module to: obtaining an initial query q ═ (q.loc, q.doc) 0 q.B, k, α) and a missing object set M; constructing candidate keyword columns according to descending order of frequency of keywords of missing objectsThe table CKS is used for constructing a candidate attribute value pair list CAS according to the descending order of the similarity scores of the missing objects; respectively setting a keyword set q '. doc and an attribute value pair q'. B 'of the refined query q' as q.doc 0 And q.B;
a refined query module to: orderly extracting keywords in the CKS and attribute value pairs in the CAS, and respectively adding the keywords to a keyword set q '. doc of the query q' and the attribute value pairs q '. B' of the query q 'to form a new refined query q'; processing each refining query q' to find the best refining query until both CKS and CAS are empty; processing each refined query q' respectively, specifically including:
calculating the modification cost p ' of q ', and filtering p ' to be more than or equal to p c Query q', p c Query q for preserving initial query key and attribute and all missing objects appearing in query results b The modification cost of (2);
to p'<p c According to the frequency of each keyword, determining whether each keyword of the query q' is a frequent keyword:
if the keywords are frequent keywords, adding the root nodes of the quadtree in the header file into a non-leaf node queue to be processed, selecting leaf nodes meeting the conditions according to a preset screening rule, and adding the leaf nodes meeting the conditions into a leaf node queue; for each object in the disk page pointed by the leaf node meeting the conditions, sequentially judging whether the attribute value pair q 'of the query q' meets the attribute matching with the attribute value pair of the object, adding the matched object into an object set meeting the requirement of the query q ', and calculating the similarity score between the query q' and the object;
if the keywords are the infrequent keywords, analyzing each object in the corresponding disk page, if the attribute value pair q '. B ' of the query q ' meets the attribute matching with the attribute value pair of the object, adding the object in the corresponding disk page into an object set meeting the requirement of the query q ', and calculating the similarity score between the query q ' and the object;
all the objects in the object set meeting the requirement of the query q 'are ranked from high to low according to the similarity scores of the objects until all original result objects and all missing objects appear, and k' objects are obtained;
if k' is ≦ k m ,k m To preserve the size of the result set when the initial query key and attributes are preserved and all missing objects appear in the query results, a modification cost p ' of q ' is computed, if p '<p c Then query q' is taken as the current best refined query.
9. The system of claim 8, wherein: AI 3 The index building module is specifically configured to:
hierarchically dividing a data space into cells using a quadtree structure; taking the cell as a basic storage unit, and storing the spatial position and the attribute information of an object containing the keyword;
three components are created: a lookup table used as a portal, a header file containing summary information of dense key units, and a data file storing key unit tuples in all the posting tables;
storing the attribute information of the basic key word unit of the frequent key word in the leaf nodes of the quadtree;
each non-leaf node R of the quadtree i All contain three attributes: r i .id,R i .S,R i Address, wherein R i Id is node id, R i Address is R i Address list of all sub-nodes of (1) and R i S is R i The union of the attribute value pairs of all the sub-nodes;
each leaf node R of the quadtree i All contain three attributes: r is i .id,R i .S,R i Address, wherein R i Id is node id, R i Address is the Address of the disk page to which it is linked, R i S is the union of the attribute-value pairs of all objects in the disk page to which it is linked.
10. The system of claim 9, wherein: b is a Boolean expression:
Figure FDA0002277655970000071
Figure FDA0002277655970000072
is a predicate set where i ∈ [1, n ]],i∈N *
11. The system of claim 9, wherein: if the keywords are frequent keywords, the refining query module adds the root nodes of the quadtree in the header file into a non-leaf node queue to be processed, selects leaf nodes meeting the conditions according to a preset screening rule, and adds the leaf nodes into the leaf node queue meeting the conditions, and the refining query module specifically comprises the following steps:
if the keywords are frequent keywords, adding the root nodes of the quadtree in the header file into a to-be-processed non-leaf node queue;
judging whether the sub-node of the current node in the non-leaf node queue to be processed is a qualified node or not;
if not, filtering out the sub-node; if yes, judging whether the sub-node is a non-leaf node or a leaf node;
if the node is a non-leaf node, adding the non-leaf node into a to-be-processed non-leaf node queue to wait for processing; if yes, adding the leaf node into the leaf node queue meeting the conditions.
12. The system of claim 11, wherein: the refining query module judges whether the sub-node of the current node in the non-leaf node queue to be processed is a qualified node, and the judgment standard is as follows:
a) all attribute classes of query q' are on this child node;
b) each attribute value range of query q' intersects the corresponding attribute value range of the child node.
CN201911128644.8A 2019-11-18 2019-11-18 By using AI 3 Method and system for solving SKQwyy-not problem Active CN110955827B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911128644.8A CN110955827B (en) 2019-11-18 2019-11-18 By using AI 3 Method and system for solving SKQwyy-not problem

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911128644.8A CN110955827B (en) 2019-11-18 2019-11-18 By using AI 3 Method and system for solving SKQwyy-not problem

Publications (2)

Publication Number Publication Date
CN110955827A CN110955827A (en) 2020-04-03
CN110955827B true CN110955827B (en) 2022-09-30

Family

ID=69977671

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911128644.8A Active CN110955827B (en) 2019-11-18 2019-11-18 By using AI 3 Method and system for solving SKQwyy-not problem

Country Status (1)

Country Link
CN (1) CN110955827B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343050B (en) * 2021-05-25 2022-11-29 中南民族大学 Method and system for solving wyy-not problem based on time perception object

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107193882A (en) * 2017-04-27 2017-09-22 东南大学 Why not query answer methods based on figure matching on RDF data
CN107391636A (en) * 2017-07-10 2017-11-24 江苏省现代企业信息化应用支撑软件工程技术研发中心 The anti-neighbour's spatial key querying methods of top m
CN109992590A (en) * 2019-03-11 2019-07-09 中南民族大学 Approximation space keyword query method and system in transportation network with number attribute

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102084363B (en) * 2008-07-03 2014-11-12 加利福尼亚大学董事会 A method for efficiently supporting interactive, fuzzy search on structured data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107193882A (en) * 2017-04-27 2017-09-22 东南大学 Why not query answer methods based on figure matching on RDF data
CN107391636A (en) * 2017-07-10 2017-11-24 江苏省现代企业信息化应用支撑软件工程技术研发中心 The anti-neighbour's spatial key querying methods of top m
CN109992590A (en) * 2019-03-11 2019-07-09 中南民族大学 Approximation space keyword query method and system in transportation network with number attribute

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Scalable top-k spatial keyword search;D Zhang, KL Tan, AKH Tung;《Proceedings of the 16th International Conference on Extending Database Technology》;20130318;全文 *
SKQAI: A novel air index for spatial keyword query processing in road networks;Yanhong Li,Guohui Li, Jianjun Li, Kai Yao;《Information Sciences》;20180331;全文 *
空间网络数据库关键字查询的高效空中索引;李艳红,李国徽;《华中科技大学学报(自然科学版)》;20160817;全文 *

Also Published As

Publication number Publication date
CN110955827A (en) 2020-04-03

Similar Documents

Publication Publication Date Title
De Felipe et al. Keyword search on spatial databases
CN108052514B (en) Mixed space indexing method for processing geographic text Skyline query
EP3314464B1 (en) Storage and retrieval of data from a bit vector search index
US20030014396A1 (en) Unified database and text retrieval system
EP3314468B1 (en) Matching documents using a bit vector search index
EP2788896B1 (en) Fuzzy full text search
CN106503223B (en) online house source searching method and device combining position and keyword information
EP3314465B1 (en) Match fix-up to remove matching documents
CN111026750B (en) Method and system for solving SKQwhy-non problem by AIR tree
CN112115227A (en) Data query method and device, electronic equipment and storage medium
CN111026710A (en) Data set retrieval method and system
CN110569289B (en) Column data processing method, equipment and medium based on big data
KR20180097120A (en) Method for searching electronic document and apparatus thereof
US7792826B2 (en) Method and system for providing ranked search results
Huang et al. Improving the relevancy of document search using the multi-term adjacency keyword-order model
CN110955827B (en) By using AI 3 Method and system for solving SKQwyy-not problem
Li et al. Aggregate nearest keyword search in spatial databases
CN111008270B (en) By A k C method and system for solving SKQwhy-not problem
EP3314467B1 (en) Bit vector search index
CN111506797B (en) Method and system for solving why-not problem in direction sensing SKQ
CN110147424B (en) Top-k combined space keyword query method and system
Georgoulas et al. User-centric similarity search
Wang et al. Efficient group-by reverse skyline computation
CN116680367B (en) Data matching method, data matching device and computer readable storage medium
Sun et al. A Point of Interest Intelligent Search Method based on Browsing History.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant