CN108304585B - Result data selection method based on space keyword search and related device - Google Patents

Result data selection method based on space keyword search and related device Download PDF

Info

Publication number
CN108304585B
CN108304585B CN201810184309.9A CN201810184309A CN108304585B CN 108304585 B CN108304585 B CN 108304585B CN 201810184309 A CN201810184309 A CN 201810184309A CN 108304585 B CN108304585 B CN 108304585B
Authority
CN
China
Prior art keywords
candidate
text object
diversity
spatial
boundary cost
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810184309.9A
Other languages
Chinese (zh)
Other versions
CN108304585A (en
Inventor
钱志虎
许佳捷
郑凯
柳诚飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201810184309.9A priority Critical patent/CN108304585B/en
Publication of CN108304585A publication Critical patent/CN108304585A/en
Application granted granted Critical
Publication of CN108304585B publication Critical patent/CN108304585B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Abstract

The application discloses a result data selection method based on space keyword search, firstly, the diversity of space text objects is measured through the number of diversity topics, then the boundary cost of each candidate space text object is determined through a distance coefficient and the number of diversity topics, the candidate space text object with the minimum boundary cost is selected to be in a result set, so that the distance between the object in the result set and a query object is short, the number of diversity topics is kept in a high state, namely, the diversity of each search result is considered while the selection is based on the distance coefficient, the diversity of the result set is improved, and the diversified search requirements of users are met. The application also discloses a result data selection device based on the space keyword search, a server and a computer readable storage medium, which have the beneficial effects.

Description

Result data selection method based on space keyword search and related device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for selecting result data based on a spatial keyword search, a server, and a computer-readable storage medium.
Background
With the advent of location-based services, more and more applications associate realistic phenomena with spatial locations, deriving widely-applied spatial keyword queries, i.e., hybrid queries that combine spatial queries and text queries to find optimal results.
Generally, the spatial keyword query is divided into three general steps, namely performing formalization measurement on spatial text objects and related data, establishing corresponding index structures for all the spatial text objects, and performing query through received query keywords. The formalized measurement of the spatial text object has a unified measurement standard, and the corresponding spatial text object query can be efficiently carried out through the measurement standard. Based on the above manner, further query operation on the spatial keywords can be realized, and query results related to the query keywords are obtained.
However, the query result, i.e., the returned spatial text object, generally returned by the current query method for the spatial keyword has higher similarity to the query keyword, and there is no requirement for the relationship between the interest points in the result set, and the returned points are generally very similar to each other, and cannot meet the diversified requirements of the user. For example, a user may want to search as many categories of restaurants as possible nearby to select among different types of restaurants, while a search engine may only return a uniform type of restaurants nearby and not help the user to make the selection.
Therefore, how to improve the diversity of search results by spatial keyword search and satisfy the diversity requirements of users is a key issue that is of interest to those skilled in the art.
Disclosure of Invention
The application aims to provide a result data selecting method based on space keyword search, a result data selecting device, a server and a computer readable storage medium, wherein the boundary cost of each candidate space text object is determined through a distance coefficient and a diversity subject number, the candidate space text object with the minimum boundary cost is selected to a result set, so that the distance between the object in the result set and a query object is short, the diversity subject number is kept in a high state, the diversity of each search result is considered while the selection is based on the distance coefficient, the diversity of the result set is improved, and the diversified search requirements of users are met.
In order to solve the above technical problem, the present application provides a result data selection method based on a spatial keyword search, including:
performing index structure establishment operation on a plurality of spatial text objects to obtain an index structure;
selecting a plurality of candidate space text objects by using the index structure according to the obtained query object to obtain a candidate set;
determining the distance between each candidate space text object and the query object to obtain a distance coefficient between each candidate space text object and the query object;
determining the number of topics included by each candidate space text object besides all initialized topics to obtain a first diversity topic number of each candidate space text object;
determining a first boundary cost of each candidate spatial text object according to all the distance coefficients and all the first diversity topic numbers; wherein the distance coefficient is in a direct proportion relation with the first boundary cost, and the first diversity theme number is in an inverse proportion relation with the first boundary cost;
and selecting the candidate space text object with the minimum first boundary cost to be added into a result set.
Optionally, the method further includes:
when the candidate space text object with the minimum first boundary cost is selected to be added into a result set, determining the number of the topics included by each candidate space text object outside all the topics added by the candidate space text object with the minimum first boundary cost, and obtaining a second diversity topic number of each candidate space text object;
determining a second boundary cost of the corresponding candidate space text object according to all the distance coefficients and all the second diversity topic numbers; the distance coefficient is in a direct proportion relation with the second boundary cost, and the second diversity subject number is in an inverse proportion relation with the second boundary cost;
and selecting the candidate space text object with the minimum second boundary cost to be added into the result set.
Optionally, the performing an index structure establishing operation on the plurality of spatial text objects to obtain an index structure includes:
determining the occurrence times of keywords of each space text object;
setting the space text object with the keyword occurrence frequency less than the preset frequency as a block structure to obtain a plurality of block structures;
setting the space text objects with the keyword occurrence times more than or equal to the preset times as tree structures to obtain a plurality of tree structures;
and taking all the block structures and all the tree structures as the index structures.
Optionally, selecting a plurality of candidate spatial text objects by using the index structure according to the obtained query object to obtain a candidate set, where the candidate set includes:
and selecting a plurality of candidate space text objects from all the space text objects by using the index structure according to the obtained query object and according to a greedy algorithm to obtain the candidate set.
The present application further provides a device for selecting result data based on a spatial keyword search, including:
the index establishing module is used for performing index structure establishing operation on the plurality of spatial text objects to obtain an index structure;
the candidate set acquisition module is used for selecting a plurality of candidate space text objects by using the index structure according to the obtained query object to obtain a candidate set;
a distance coefficient obtaining module, configured to determine a distance between each candidate spatial text object and the query object, and obtain a distance coefficient between each candidate spatial text object and the query object;
the first diversity topic number acquisition module is used for determining the number of topics included by each candidate space text object besides all initialized topics to obtain a first diversity topic number of each candidate space text object;
a first boundary cost obtaining module, configured to determine a first boundary cost of each candidate spatial text object according to all the distance coefficients and all the first diversity topic numbers; wherein the distance coefficient is in a direct proportion relation with the first boundary cost, and the first diversity theme number is in an inverse proportion relation with the first boundary cost;
and the first result data selection module is used for selecting the candidate space text object with the minimum first boundary cost to be added into a result set.
Optionally, the method further includes:
the second diversity theme number acquisition module is used for determining the number of themes of each candidate space text object, which are included in all the themes added by the candidate space text object with the minimum first boundary cost, when the candidate space text object with the minimum first boundary cost is selected to be added into a result set, and obtaining a second diversity theme number of each candidate space text object;
a second boundary cost obtaining module, configured to determine, according to all the distance coefficients and all the second diversity topic numbers, second boundary costs of the corresponding candidate spatial text objects; the distance coefficient is in a direct proportion relation with the second boundary cost, and the second diversity subject number is in an inverse proportion relation with the second boundary cost;
and the second result data selection module is used for selecting the candidate space text object with the minimum second boundary cost to be added into the result set.
Optionally, the index creating module includes:
a keyword occurrence frequency acquiring unit, configured to determine the occurrence frequency of a keyword of each spatial text object;
the block structure acquisition unit is used for setting the space text object with the keyword occurrence frequency less than the preset frequency as a block structure to obtain a plurality of block structures;
a tree structure obtaining unit, configured to set the spatial text object with the keyword occurrence frequency greater than or equal to the preset frequency as a tree structure, so as to obtain multiple tree structures;
and an index structure obtaining unit, configured to use all the block structures and all the tree structures as the index structure.
Optionally, the candidate set obtaining module includes:
and the candidate set acquisition unit is used for selecting a plurality of candidate space text objects from all the space text objects by using the index structure according to the obtained query object and a greedy algorithm to obtain the candidate set.
The present application further provides a server, comprising:
a memory for storing a computer program;
and the processor is used for realizing the result data selection method when executing the computer program.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the result data selection method as described above.
The application provides a result data selection method based on space keyword search, which comprises the following steps: performing index structure establishment operation on a plurality of spatial text objects to obtain an index structure; selecting a plurality of candidate space text objects by using the index structure according to the obtained query object to obtain a candidate set; determining the distance between each candidate space text object and the query object to obtain a distance coefficient between each candidate space text object and the query object; determining the number of topics included by each candidate space text object besides all initialized topics to obtain a first diversity topic number of each candidate space text object; determining a first boundary cost of each candidate spatial text object according to all the distance coefficients and all the first diversity topic numbers; wherein the distance coefficient is in a direct proportion relation with the first boundary cost, and the first diversity theme number is in an inverse proportion relation with the first boundary cost; and selecting the candidate space text object with the minimum first boundary cost to be added into a result set.
Therefore, the diversity of the space text objects is measured through the diversity theme number, the boundary cost of each candidate space text object is determined through the distance coefficient and the diversity theme number, the candidate space text object with the minimum boundary cost is selected to the result set, the distance between the object in the result set and the query object is short, the diversity theme number is kept in a high state, the diversity of each search result is considered while the distance coefficient is selected, the diversity of the result set is improved, and the diversified search requirements of users are met.
The application also provides a result data selection device based on the space keyword search, a server and a computer readable storage medium, which have the beneficial effects and are not described in detail herein.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a result data selection method based on a spatial keyword search according to an embodiment of the present application;
fig. 2 is a flowchart of a subsequent selecting process of a result data selecting method based on a spatial keyword search according to an embodiment of the present application;
fig. 3 is a flowchart of index establishment of a result data selection method based on a spatial keyword search according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a result data selecting apparatus based on a spatial keyword search according to an embodiment of the present application.
Detailed Description
The core of the application is to provide a result data selection method based on space keyword search, a result data selection device, a server and a computer readable storage medium, wherein the boundary cost of each candidate space text object is determined through a distance coefficient and a diversity subject number, the candidate space text object with the minimum boundary cost is selected to a result set, so that the distance between the object in the result set and a query object is short, the diversity subject number is kept in a high state, namely, the diversity of each search result is considered while the selection is based on the distance coefficient, the diversity of the result set is improved, and the diversified search requirements of users are met.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart of a method for selecting result data based on a spatial keyword search according to an embodiment of the present disclosure.
The embodiment provides a result data selection method based on a space keyword search, which can improve the diversity of the search and may include:
s101, performing index structure establishment operation on a plurality of spatial text objects to obtain an index structure;
the step aims to execute index structure establishment operation on a plurality of spatial text objects to obtain corresponding index structures, namely, establish index structures on all spatial text objects. When corresponding query is carried out according to query conditions, an index structure is established first, and then the search of the spatial text object can be realized. Specifically, the method for establishing the index structure is not limited herein, and the establishing method used in this embodiment may be any establishing method as long as the method can realize the search of the spatial text object.
And an appropriate index structure can be established according to different search environments and query characteristics, so that the search speed is accelerated, the index structure is simplified, and the cost for maintaining and updating the index structure is reduced.
The space text object is obtained by formalizing and expressing the searched space text data. The specific formalization process is illustrated in subsequent paragraphs.
S102, selecting a plurality of candidate space text objects by using an index structure according to the obtained query object to obtain a candidate set;
on the basis of step S101, this step aims to select and obtain a plurality of candidate spatial text objects according to the obtained index structure and query object, so as to obtain a candidate set. The purpose of this step is mainly to select all spatial text objects by the query object, that is, to select candidate spatial text objects by the index result, in order to make the candidate spatial text objects conform to the limitation of the query object, so in this step, a plurality of candidate spatial text objects are selected by the query object, and a candidate set is obtained.
The query object is similar to the spatial text object in the previous step, and is obtained by performing formal expression on data, and a specific formal process is explained in a subsequent paragraph.
S103, determining the distance between each candidate space text object and the query object to obtain a distance coefficient between each candidate space text object and the query object;
on the basis of step S102, this step aims to determine the distance between each candidate spatial text object and the query object, resulting in a corresponding distance coefficient. The calculation method for determining the distance in the step can be the same as a general spatial distance calculation method, and the main purpose of the calculation method is to obtain the distance between the object and the query object in the spatial search, so that the subsequent query process can obtain the optimal query result through distance screening.
Of course, the method for determining the distance in this step may also be changed according to different formalized expression manners of the candidate spatial text object and the query object, and the specific change form and manner should be selected according to the actual application environment, which is not limited herein.
S104, determining the number of topics included by each candidate space text object besides all initialized topics to obtain a first diversity topic number of each candidate space text object;
on the basis of step S102, this step aims to determine the number of topics included by each candidate spatial text object in addition to all initialized topics, and obtain a first diversity topic number corresponding to the candidate spatial text object.
In general space keyword search, the selection process only considers the distance between the candidate space text object and the query object, that is, the final result data is composed of the candidate space text object which is the shortest with the query object, so that the result data is very similar to the query object. However, the obtained result cannot meet the diversification requirement of the user for the query process, that is, under a certain similarity degree of the query object, the query result is required to be diversified as much as possible, that is, the result data includes multiple types of interest points related to a certain keyword topic.
Furthermore, in this step, the number of the various topics of a certain candidate space text object is determined according to the number of the topics included by the candidate space text object outside all the initialized topics. The topic is an attribute shared by the candidate spatial text object and the query object, and is also a general attribute used in spatial text search.
All initialized topics are topics included by all spatial text objects in a result set obtained by current initialization, and further, the topic diversity of a certain candidate spatial text object can be measured through the number of topics outside the range of all topics. When the method is actually used, the search result data can be selected through the diversity theme number in the search process of the space text object, and the diversity of the search result is improved, so that the result data meets the diversity requirements of users.
Therefore, the result set is obtained by performing the initialization process in the present embodiment, and the data added in the result set is obtained by continuously and cyclically selecting the data in the candidate set. If it is the first round of loop, the result set obtained by the initialization process may not have a spatial text object, and thus has no corresponding subject range. The initialization processing may also be that a certain amount of result data is added to the result set by using another search method, and then more spatial text objects are added by using the method of the present embodiment, at this time, a certain theme range also exists in the result set obtained by the initialization processing, and the first diversity theme number of the candidate spatial text object can be determined according to the theme range.
It is conceivable that the initialization processing in this embodiment may also be another spatial keyword search processing, that is, this embodiment may be applied after other spatial keyword search processing to improve the diversity of the spatial keyword search. Further, other search methods may be used after the spatial keyword search process of this embodiment to select a more appropriate search result.
It should be noted that the step and the step S103 have no precedence in the execution order.
S105, determining the first boundary cost of each candidate space text object according to all the distance coefficients and all the first diversity subject numbers; the distance coefficient and the first boundary cost are in a direct proportion relation, and the first diversity theme number and the first boundary cost are in an inverse proportion relation;
on the basis of steps S103 and S104, this step aims to determine a first boundary cost of the candidate spatial text object according to the obtained distance coefficient and the first diversity topic number. That is, based on the distance coefficient and the first diversity topic number obtained in step S103 and step S104, the boundary cost is introduced as a unified measurement mode for measuring the similarity and diversity of the object. And the distance coefficient is in a direct proportion relation with the first boundary cost, and the first diversity theme number is in an inverse proportion relation with the first boundary cost, namely, the smaller the boundary cost, the smaller the distance coefficient is and the larger the first diversity theme number is.
And S106, selecting the candidate space text object with the minimum first boundary cost and adding the candidate space text object into the result set.
On the basis of step S105, this step aims to select the candidate spatial text object with the smallest first boundary cost to be added into the result set. Namely, the candidate space text object with the best candidate concentration distance coefficient and the best first diversity subject number is selected as one of the result data, so that the diversity of the result data is improved, and the diversity requirement of the user is met.
In summary, in the embodiment, the boundary cost of each candidate spatial text object is determined through the distance coefficient and the diversity topic number, and the candidate spatial text object with the minimum boundary cost is selected to the result set, so that the distance between the object in the result set and the query object is short and the diversity topic number is kept in a high state, that is, the diversity of each search result is considered while selecting based on the distance coefficient, the diversity of the result set is improved, and the diversified search requirements of the user are met.
Optionally, in this embodiment, a plurality of candidate spatial text objects may also be selected from all spatial text objects according to the obtained query object by using an index structure and a greedy algorithm, so as to obtain a candidate set.
The alternative scheme mainly aims to select candidate space text objects from all space text objects to obtain a candidate set, and is characterized in that a proper candidate space text object is selected through a greedy algorithm. Mainly, in the embodiment, the result data selection method based on the boundary cost cannot ensure that the diversity and the distance of the result data simultaneously meet certain requirements, so in order to improve the quality of the query result, a greedy algorithm is used for selecting to obtain a plurality of candidate space text objects in the alternative scheme, and a candidate set is obtained.
The core idea of this alternative is to layer the spatial text objects and the query object into different layers according to their spatial distance, and then select a specific number of spatial text objects in each layer, so that these spatial text objects have more uncovered topics than other spatial text objects in the same layer.
Specifically, given a set of spatial objects D, for a spatial keyword query q, it is assumed that k spatial objects satisfying the diversification requirement simultaneously, i.e., covering enough subjects and minimizing the distance function, can be found on D, and the sum of their spatial distances from query q is M. We assume that the value of M may be small, so we can start from a circular range with a small radius from the position of query q, and expand M to search over a larger range if the results found in this range do not cover enough topics. For each circular search range, the objects within the range are divided into different layers by their distance from the query point, and then the appropriate object is selected at each layer.
Furthermore, the greedy algorithm used in the alternative scheme can be used for corresponding search after result data are obtained, that is, the result set obtained in the embodiment is re-screened through the greedy algorithm, so that the accuracy of the original result set can be improved, and the result set is closer to the query object.
Based on the above-described embodiments, the spatial text object, the query object, and the like described therein may be formally expressed as follows.
1) Spatial text object formalized representation
A spatial text object is represented using a point o ═ loc, term, topic } in 2-dimensional space with location coordinates and text descriptions. Wherein loc is formed by longitude and latitude and represents the position of the object o; term is a set of keywords used to describe object o; topic represents the collection of topics covered by the object o.
For example, in a map application environment, a spatial keyword corresponds to a point of interest, i.e. a business or an organization, the system records its location and text description, and the topic covered by the point of interest can be obtained by manually marking or analyzing the comment information of the point of interest through a natural language processing technology. For convenience, a spatial text object may also be referred to as a spatial object.
Based on the above definition, we represent the set of all spatial text objects in the database with D, namely:
Figure BDA0001589800010000101
2) formalized representation of spatial keyword queries
The spatial keyword query is formalized as q ═ { loc, term }, and q is the query object. Wherein loc is the location of the query point, i.e. the user, and is expressed by longitude and latitude coordinates in a two-dimensional space; term is a set of keywords entered by the user, such as "Chinese restaurant," to describe the user's query intent.
For a given query object q, the search engine picks the k most similar spatial text objects from the dataset D that most closely match q as the returned result data. The result data is a set of a group of spatial text objects which are close in distance, large in text correlation degree and high in diversity degree among results.
3) Formalized representation of candidate sets
Given a spatial text object database D, a spatial key queries the object q ═ { loc, term } and a subset S of a threshold Thre, D (i.e., a subset S of the object q ═ loc, term } and a threshold Thre, D
Figure BDA0001589800010000102
) Referred to as a candidate set. If and only if two conditions are met:
keyword constraint, each spatial text object o in S contains all query keywords, i.e.
Figure BDA0001589800010000103
Diversification requires that the sum of different theme numbers covered by all spatial text objects in S is not less than Thre, i.e.
Figure BDA0001589800010000104
4) Formalized representation of distance function for spatial text objects
Given a spatial text object database D, a spatial keyword query object q ═ { loc, term }, for a subset of D's element number k
Figure BDA0001589800010000105
We define the distance function of the sets R and q as:
Figure BDA0001589800010000111
where Dist (q, o) represents the distance between the spatial text object o and the query object q, DistmaxRepresenting the farthest distance of the spatial text object in the dataset D from the query object. As indicated above, the distance between the query object and the set of spatial text objects is normalized, i.e., the value is taken to be [0,1 ]]An interval.
5) Formalized definition of search questions
Given a spatial text object data set D, a spatial keyword query q ═ { loc, term }, a distance function f and a threshold Thre, considering the spatial distance between the spatial text object and the query, the text threshold, and the topic coverage, we intend to return k spatial text objects that satisfy the following two similarity metric conditions:
1. the k spatial text objects constitute a candidate set R, i.e.
Figure BDA0001589800010000112
And is
Figure BDA0001589800010000113
2. f (q, R) takes the minimum value.
Referring to fig. 2, fig. 2 is a flowchart illustrating a subsequent selecting process of a result data selecting method based on a spatial keyword search according to an embodiment of the present application.
Based on the previous embodiment, this embodiment mainly aims at an extended description made after the candidate spatial text object with the smallest first boundary cost is added to the result set in the previous embodiment, the foregoing parts are substantially the same as those in the previous embodiment, and the same parts may refer to the previous embodiment, which is not described herein again.
The embodiment may include:
s201, when the candidate space text object with the minimum first boundary cost is selected to be added into the result set, determining the number of the topics of each candidate space text object, which are included in all the topics after the candidate space text object with the minimum first boundary cost is added, and obtaining a second diversity topic number of each candidate space text object;
in this step, when the candidate spatial text object with the smallest first boundary cost is added to the result set in the previous embodiment, which is equivalent to changing the spatial text object in the result set, that is, the scope of the topic included in the result set changes correspondingly, in order to continuously improve the diversity of selecting the candidate spatial text object from the candidate set, it is necessary to determine the number of topics included by each candidate spatial text object in addition to all the topics after the candidate spatial text object with the smallest first boundary cost is added, so as to obtain the second diversity topic number.
Since the candidate spatial text data selected in the first embodiment is added to the result set, the scope of the subject included in the corresponding result set changes, that is, the scope of the measured subject changes, so the objective of this step is to recalculate the number of diverse subjects of all candidate spatial text objects, that is, the number of second diverse subjects, on the basis of the second result set.
S202, determining second boundary cost of the corresponding candidate space text object according to all the distance coefficients and all the second diversity subject numbers; the distance coefficient and the second boundary cost are in a direct proportion relation, and the second diversity theme number and the second boundary cost are in an inverse proportion relation;
on the basis of step S202, this step is intended to determine a second boundary cost from the distance coefficient and the second diversity topic number of the previous embodiment. The specific content is substantially the same as that of the previous embodiment, and reference may be made to the jacket embodiment, which is not described herein again.
S203, selecting the candidate space text object with the minimum second boundary cost to be added into the result set.
On the basis of step S202, this step aims at adding the candidate spatial text object with the smallest second boundary cost to the result set.
Since the range of the included subject matter of each addition of the result data is changed accordingly in the spatial keyword search, the present embodiment is intended to explain how to add the result data subsequently, thereby maintaining the diversity of the result data. Therefore, the steps described in this embodiment can be extended to multiple times, and only adaptive modifications need to be made on the basis of this embodiment, which is not described in detail herein.
Referring to fig. 3, fig. 3 is a flowchart illustrating index establishment of a result data selection method based on a spatial keyword search according to an embodiment of the present application.
Based on the previous embodiment, this embodiment is mainly a specific description of how to establish the index result in the previous embodiment, and other parts may refer to the previous embodiment, which is not described herein again.
The embodiment may include:
s301, determining the occurrence frequency of keywords of each space text object;
s302, setting the space text object with the keyword occurrence frequency less than the preset frequency as a block structure to obtain a plurality of block structures;
s303, setting the space text object with the keyword occurrence frequency more than or equal to the preset frequency as a tree structure to obtain a plurality of tree structures;
s304, taking all block structures and all tree structures as index structures.
In existing spatial keyword queries, the index results can be divided into three categories: namely, a spatial-first index structure, a text-first index structure, and a tightly coupled index structure. The spatial-first index structure can be further divided into index structures based on an R tree, a grid and a space filling curve; the text priority index structure is mainly based on inverted files and bitmaps; the spatial text combination index structure simultaneously and tightly combines the structures to more effectively filter some spatial text objects which do not meet the requirements of the query object. However, as the data volume increases, the index structures become extremely large, which causes the space occupation of the index to rise straightly, and the updating speed becomes slow, which affects the experience in practical application.
Therefore, in this embodiment, different index structures are set for each spatial text object according to the number of times that the keyword occurs in the object, that is, the index structures of the spatial text objects are processed hierarchically, and the object with a low keyword occurrence frequency is set as a block structure, so that a large amount of object data is already stored. The objects with higher keyword occurrence frequency are set to be tree structures, so that related objects can be conveniently found during searching.
And in the process of searching for the object, you can search different tree structures and block structures to complete corresponding searching operation. For tree structures that meet the query object condition, the object nodes therein may be accessed in increasing order of minimum boundary cost, where the minimum boundary cost may be defined as:
Figure BDA0001589800010000131
where N represents a node of the tree structure, Dist (q, N.mbr) is the spatial distance of the minimum bounding rectangle of N from the query, | Occuri1| is the number of occurrences in the index structure of N (i.e., the number of topics covered by N).
The embodiment of the application provides a result data selection method based on space keyword search, which can determine the boundary cost of each candidate space text object through a distance coefficient and a diversity topic number, and select the candidate space text object with the minimum boundary cost to a result set, so that the distance between the object in the result set and a query object is short, and the diversity topic number is kept in a higher state, namely, the diversity of each search result is considered while the selection is based on the distance coefficient, the diversity of the result set is improved, and the diversified search requirements of users are met.
In the following, a result data selection device based on a spatial keyword search according to an embodiment of the present application is introduced, and a result data selection device based on a spatial keyword search described below and a result data selection method based on a spatial keyword search described above may be referred to in a corresponding manner.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a result data selecting device based on a spatial keyword search according to an embodiment of the present disclosure.
The embodiment provides a result data selecting device based on a spatial keyword search, which may include:
an index establishing module 100, configured to perform an index structure establishing operation on a plurality of spatial text objects to obtain an index structure;
a candidate set obtaining module 200, configured to select multiple candidate spatial text objects according to the obtained query object by using an index structure, so as to obtain a candidate set;
a distance coefficient obtaining module 300, configured to determine a distance between each candidate spatial text object and the query object, and obtain a distance coefficient between each candidate spatial text object and the query object;
a first diversity topic number obtaining module 400, configured to determine the number of topics included by each candidate spatial text object in addition to all initialized topics, to obtain a first diversity topic number of each candidate spatial text object;
a first boundary cost obtaining module 500, configured to determine a first boundary cost of each candidate spatial text object according to all distance coefficients and all first diversity topic numbers; the distance coefficient and the first boundary cost are in a direct proportion relation, and the first diversity theme number and the first boundary cost are in an inverse proportion relation;
and a first result data selecting module 600, configured to select a candidate spatial text object with the smallest first boundary cost to add to the result set.
Based on the above embodiment, the method may further include:
the second diversity theme number acquisition module is used for determining the number of themes of each candidate space text object, which are included in all the themes added by the candidate space text object with the minimum first boundary cost, when the candidate space text object with the minimum first boundary cost is selected to be added into the result set, and obtaining the second diversity theme number of each candidate space text object;
the second boundary cost acquisition module is used for determining the second boundary cost of the corresponding candidate space text object according to all the distance coefficients and all the second diversity subject numbers; the distance coefficient and the second boundary cost are in a direct proportion relation, and the second diversity theme number and the second boundary cost are in an inverse proportion relation;
and the second result data selection module is used for selecting the candidate space text object with the minimum second boundary cost and adding the candidate space text object into the result set.
The index creating module 100 may include:
the keyword occurrence frequency acquisition unit is used for determining the keyword occurrence frequency of each space text object;
the block structure acquisition unit is used for setting the space text object with the keyword occurrence frequency less than the preset frequency as a block structure to obtain a plurality of block structures;
the system comprises a tree structure acquisition unit, a keyword analysis unit and a keyword analysis unit, wherein the tree structure acquisition unit is used for setting space text objects with the keyword occurrence times larger than or equal to preset times into a tree structure to obtain a plurality of tree structures;
and the index structure acquisition unit is used for taking all the block structures and all the tree structures as index structures.
The candidate set obtaining module 200 may include:
and the candidate set acquisition unit is used for selecting a plurality of candidate space text objects from all the space text objects according to the obtained query object by using an index structure and a greedy algorithm to obtain a candidate set.
An embodiment of the present application further provides a server, including:
a memory for storing a computer program;
and a processor for implementing the result data selecting method of the above embodiment when executing the computer program.
The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for selecting result data according to the above embodiment is implemented.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed in the embodiment corresponds to the method disclosed in the embodiment, so that the description is simple, and the relevant points can be referred to the description of the method part.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The foregoing details a method for selecting result data based on a spatial keyword search, a device for selecting result data, a server, and a computer-readable storage medium provided by the present application. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

Claims (8)

1. A result data selection method based on space keyword search is characterized by comprising the following steps:
performing index structure establishment operation on a plurality of spatial text objects to obtain an index structure;
selecting a plurality of candidate space text objects by using the index structure according to the obtained query object to obtain a candidate set;
determining the distance between each candidate space text object and the query object to obtain a distance coefficient between each candidate space text object and the query object;
determining the number of topics included by each candidate space text object besides all initialized topics to obtain a first diversity topic number of each candidate space text object;
determining a first boundary cost of each candidate spatial text object according to all the distance coefficients and all the first diversity topic numbers; wherein the distance coefficient is in a direct proportion relation with the first boundary cost, and the first diversity theme number is in an inverse proportion relation with the first boundary cost;
selecting the candidate space text object with the minimum first boundary cost to be added into a result set;
when the candidate space text object with the minimum first boundary cost is selected to be added into a result set, determining the number of the topics included by each candidate space text object outside all the topics added by the candidate space text object with the minimum first boundary cost, and obtaining a second diversity topic number of each candidate space text object;
determining a second boundary cost of the corresponding candidate space text object according to all the distance coefficients and all the second diversity topic numbers; the distance coefficient is in a direct proportion relation with the second boundary cost, and the second diversity subject number is in an inverse proportion relation with the second boundary cost;
and selecting the candidate space text object with the minimum second boundary cost to be added into the result set.
2. The method of claim 1, wherein the step of performing an index structure building operation on the plurality of spatial text objects to obtain an index structure comprises:
determining the occurrence times of keywords of each space text object;
setting the space text object with the keyword occurrence frequency less than the preset frequency as a block structure to obtain a plurality of block structures;
setting the space text objects with the keyword occurrence times more than or equal to the preset times as tree structures to obtain a plurality of tree structures;
and taking all the block structures and all the tree structures as the index structures.
3. The method of claim 2, wherein selecting a plurality of candidate spatial text objects using the index structure according to the obtained query object to obtain a candidate set comprises:
and selecting a plurality of candidate space text objects from all the space text objects by using the index structure according to the obtained query object and according to a greedy algorithm to obtain the candidate set.
4. A result data selection device based on space keyword search is characterized by comprising:
the index establishing module is used for executing index structure establishing operation on the plurality of spatial text objects to obtain an index structure;
the candidate set acquisition module is used for selecting a plurality of candidate space text objects by using the index structure according to the obtained query object to obtain a candidate set;
a distance coefficient obtaining module, configured to determine a distance between each candidate spatial text object and the query object, and obtain a distance coefficient between each candidate spatial text object and the query object;
the first diversity topic number acquisition module is used for determining the number of topics included by each candidate space text object besides all initialized topics to obtain a first diversity topic number of each candidate space text object;
a first boundary cost obtaining module, configured to determine a first boundary cost of each candidate spatial text object according to all the distance coefficients and all the first diversity topic numbers; wherein the distance coefficient is in a direct proportion relation with the first boundary cost, and the first diversity theme number is in an inverse proportion relation with the first boundary cost;
the first result data selecting module is used for selecting the candidate space text object with the minimum first boundary cost to be added into a result set;
the second diversity theme number acquisition module is used for determining the number of themes of each candidate space text object, which are included in all the themes added by the candidate space text object with the minimum first boundary cost, when the candidate space text object with the minimum first boundary cost is selected to be added into a result set, and obtaining a second diversity theme number of each candidate space text object;
a second boundary cost obtaining module, configured to determine a second boundary cost of the corresponding candidate spatial text object according to all the distance coefficients and all the second diversity topic numbers; the distance coefficient is in a direct proportion relation with the second boundary cost, and the second diversity subject number is in an inverse proportion relation with the second boundary cost;
and the second result data selection module is used for selecting the candidate space text object with the minimum second boundary cost to be added into the result set.
5. The apparatus of claim 4, wherein the index creation module comprises:
a keyword occurrence frequency acquiring unit, configured to determine the occurrence frequency of a keyword of each spatial text object;
the block structure acquisition unit is used for setting the space text object with the keyword occurrence frequency less than the preset frequency as a block structure to obtain a plurality of block structures;
a tree structure obtaining unit, configured to set the spatial text object with the keyword occurrence frequency greater than or equal to the preset frequency as a tree structure, so as to obtain multiple tree structures;
and an index structure obtaining unit, configured to use all the block structures and all the tree structures as the index structures.
6. The apparatus of claim 5, wherein the candidate set obtaining module comprises:
and the candidate set acquisition unit is used for selecting a plurality of candidate space text objects from all the space text objects by using the index structure according to a greedy algorithm according to the obtained query object to obtain the candidate set.
7. A server, comprising:
a memory for storing a computer program;
a processor for implementing the method of selecting result data according to any one of claims 1 to 3 when executing the computer program.
8. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out a method of selecting result data according to any one of claims 1 to 3.
CN201810184309.9A 2018-03-06 2018-03-06 Result data selection method based on space keyword search and related device Active CN108304585B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810184309.9A CN108304585B (en) 2018-03-06 2018-03-06 Result data selection method based on space keyword search and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810184309.9A CN108304585B (en) 2018-03-06 2018-03-06 Result data selection method based on space keyword search and related device

Publications (2)

Publication Number Publication Date
CN108304585A CN108304585A (en) 2018-07-20
CN108304585B true CN108304585B (en) 2022-05-17

Family

ID=62849191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810184309.9A Active CN108304585B (en) 2018-03-06 2018-03-06 Result data selection method based on space keyword search and related device

Country Status (1)

Country Link
CN (1) CN108304585B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149005B (en) * 2019-06-27 2023-09-01 腾讯科技(深圳)有限公司 Method, apparatus, device and readable storage medium for determining search results
CN112632267B (en) * 2020-12-04 2023-05-02 中国人民大学 Global interaction and greedy selection combined search result diversification system
CN113065036B (en) * 2021-04-14 2021-11-16 深圳大学 Method and device for measuring performance of space supporting point and related components

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834679A (en) * 2015-04-14 2015-08-12 苏州大学 Representation and inquiry method of behavior track and device therefor
CN105069094A (en) * 2015-08-06 2015-11-18 苏州大学 Semantic understanding based space keyword indexing method
CN106503223A (en) * 2016-11-04 2017-03-15 华东师范大学 A kind of binding site and the online source of houses searching method and device of key word information
CN107145545A (en) * 2017-04-18 2017-09-08 东北大学 Top k zone users text data recommends method in a kind of location-based social networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834679A (en) * 2015-04-14 2015-08-12 苏州大学 Representation and inquiry method of behavior track and device therefor
CN105069094A (en) * 2015-08-06 2015-11-18 苏州大学 Semantic understanding based space keyword indexing method
CN106503223A (en) * 2016-11-04 2017-03-15 华东师范大学 A kind of binding site and the online source of houses searching method and device of key word information
CN107145545A (en) * 2017-04-18 2017-09-08 东北大学 Top k zone users text data recommends method in a kind of location-based social networks

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Interactive Spatial Keyword Querying with Semantics;Jiabao Sun等;《 Proceedings of the 2017 ACM on Conference on Information and Knowledge Management》;20171130;第1-10页 *
On Efficient Spatial Keyword Querying with Semantics;Zhihu Qian等;《International Conference on Database Systems for Advanced Applications》;20160819;第149-163页 *
Semantic-aware top-k spatial keyword queries;Zhihu Qian等;《World Wide Web》;20170617;第573-594页 *
基于对象集合的空间关键词查询;梁银等;《计算机应用》;20140710;第34卷(第7期);第1992-1996页 *

Also Published As

Publication number Publication date
CN108304585A (en) 2018-07-20

Similar Documents

Publication Publication Date Title
US9501577B2 (en) Recommending points of interests in a region
CN107122376B (en) Method and apparatus for map-based selection of query components
CN108304585B (en) Result data selection method based on space keyword search and related device
US10769140B2 (en) Concept expansion using tables
WO2017181866A1 (en) Making graph pattern queries bounded in big graphs
CN110298687B (en) Regional attraction assessment method and device
CN107315833A (en) Method and apparatus of the retrieval with downloading based on application program
CN113177058A (en) Geographic position information retrieval method and system based on composite condition
CN109508361A (en) Method and apparatus for output information
CN109460398A (en) Complementing method, device and the electronic equipment of time series data
CN109791545A (en) The contextual information of resource for the display including image
CN109828984B (en) Analysis processing method and device, computer storage medium and terminal
CN111930891A (en) Retrieval text expansion method based on knowledge graph and related device
CN112970011A (en) Recording pedigrees in query optimization
EP4053713A1 (en) Question and answer method and apparatus based on knowledge graph
CN110321435B (en) Data source dividing method, device, equipment and storage medium
US11386143B2 (en) Searching for analogue subsurface structures based on topological knowledge representation (TKR)
US9886520B2 (en) Exposing relationships between universe objects
Wang et al. Interactive multiple-user location-based keyword queries on road networks
JP6333306B2 (en) SEARCH DATA MANAGEMENT DEVICE, SEARCH DATA MANAGEMENT METHOD, AND SEARCH DATA MANAGEMENT PROGRAM
JP6167531B2 (en) Region search method, region index construction method, and region search device
CN113254724B (en) Network space discovery method and device, electronic equipment and storage medium
CN116150304B (en) Data query method, electronic device and storage medium
Vargas Martin Enhancing hyperlink structure for improving Web performance.
JP6065708B2 (en) Information processing method, apparatus and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant