CN107247800B - Top-k keyword search method/system, readable storage medium and terminal - Google Patents

Top-k keyword search method/system, readable storage medium and terminal Download PDF

Info

Publication number
CN107247800B
CN107247800B CN201710508847.4A CN201710508847A CN107247800B CN 107247800 B CN107247800 B CN 107247800B CN 201710508847 A CN201710508847 A CN 201710508847A CN 107247800 B CN107247800 B CN 107247800B
Authority
CN
China
Prior art keywords
tuple
node
grid
relational database
unprocessed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710508847.4A
Other languages
Chinese (zh)
Other versions
CN107247800A (en
Inventor
许延伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Broadband Technology and Application Engineering Research Center
Original Assignee
Shanghai Broadband Technology and Application Engineering Research Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Broadband Technology and Application Engineering Research Center filed Critical Shanghai Broadband Technology and Application Engineering Research Center
Priority to CN201710508847.4A priority Critical patent/CN107247800B/en
Publication of CN107247800A publication Critical patent/CN107247800A/en
Application granted granted Critical
Publication of CN107247800B publication Critical patent/CN107247800B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing

Abstract

The invention provides a method/system for searching a Top-k keyword, a readable storage medium and a terminal, wherein the searching method comprises the following steps: generating a grid relational database; calculating the upper limit of the association degree of the tuple on each node in the grid relational database, and sequencing the tuples in a non-increasing sequence; judging whether the upper limit of the relevancy of the tuple is larger than the kth maximum relevancy in the searched search result; if not, outputting the kth maximum correlation degree in the searched search result; if yes, processing the found unprocessed tuple on one node in the grid relational database, storing a search result generated after the tuple is processed, and updating the upper limit of the relevancy of the unprocessed tuple on all nodes in the grid relational database; and executing in a loop until the unprocessed tuples are processed. The invention reduces the performance of the relational database top-k keyword search method and reduces the frequent access to the relational database and the memory consumption of the server in the search processing process.

Description

Top-k keyword search method/system, readable storage medium and terminal
Technical Field
The invention belongs to the field of computer databases and information retrieval, relates to a searching method and a searching system, and particularly relates to a Top-k keyword searching method/system, a readable storage medium and a terminal.
Background
The big data is used as the vocabulary of the IT industry which is the most hot at present, and the utilization of the commercial value of the big data, such as data warehouse, data security, data analysis, data mining and the like, becomes the profit focus which is gradually pursued by the industry people. Relational databases, as an important tool for storing data for a long time, contain a large amount of data information, and how to mine useful information from large relational databases is an important challenge in the big data era. The keyword query is an important research direction due to the characteristics of simplicity and high efficiency.
Currently, many mainstream relational database systems (e.g., Microsoft SQL Server, Oracle, MySQL, and IBM DB2) support an extended function: full text Index (Fulltext Index), which handles text document searching, has two disadvantages: (1) full-text indexes of different attributes of multiple tables are built separately, although simultaneous indexing of multiple attributes may be supported. Even if a relational expression consisting of keywords AND "," OR ", etc. is supported in some systems, the result of such a query is essentially a set of tuples of a single relational table, AND cannot generate one complex result consisting of tuples from different tables; (2) full-text indexing is typically a stand-alone engine that cannot be truly integrated with database systems. Query predicates for text follow different concepts and syntax than SQL, and systems sometimes even require the user to use a particular syntax to direct the query processor to optimize the query. In addition, they also have difficulty providing flexible Scoring (Scoring) and Ranking (Ranking) strategies. Therefore, it is difficult to construct an efficient keyword search system on such a platform.
At present, the keyword search technology adopted by the internet search engine can only provide limited structured data query capability. In the internet, to enable limited queries of back-end databases, many Web sites either export data in the database as static HTML documents or query the database using forms. For the former, there is a significant maintenance overhead when the database changes and the semantic information contained in the database schema is lost. With respect to the latter, it brings about not little trouble to users and developers, and flexibility is limited.
Therefore, in more than a decade, keyword query in relational databases has become a research hotspot in the database field, and a plurality of related papers are published on top international meetings and periodicals in the database field every year. By supporting keyword query, enterprises can establish a rapid and convenient information publishing and searching system aiming at a large-scale database on the existing relational database, so that the database technology and the information retrieval technology can be integrated on the same platform, the beautiful vision of seamless integration of structured data and text document data is realized, and good economic and social benefits are brought to the enterprises.
The existing database searching method can not effectively solve the problems of massive repeated access to the database, calculation of massive query results with low relevance and the like, the top-k keyword query method needs a long time (x seconds to x minutes) for processing a single query, most of the existing results are academic researches, and no specific scheme for the actual application of the top-k keyword query method exists, so that the actual application of a keyword query system is limited.
Therefore, how to provide a Top-k keyword search method/system, a readable storage medium and a terminal to solve the defects that the prior art cannot effectively solve a large number of repeated accesses to a database and calculate a large number of query results with low relevance, and a single query requires a long time, thereby limiting the practical application of a keyword query system, and the like, has become a technical problem to be solved urgently by practitioners in the art.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, an object of the present invention is to provide a Top-k keyword search method/system, a readable storage medium and a terminal, which are used to solve the problems that the prior art cannot effectively solve the problems of repeated accesses to a database and calculation of a large number of query results with low relevancy, and the time required for a single query is long, thereby limiting the practical application of the keyword query system.
To achieve the above and other related objects, an aspect of the present invention provides a Top-k keyword searching method, including: generating a grid relational database according to a keyword set consisting of the Top-k keywords; the method comprises the following steps: searching according to a keyword set formed by the Top-k keywords, and representing alternative networks generated in the searching process in a root tree mode; generating the grid relational database by sharing a common sub-tree with all the alternative networks; calculating the upper limit of the relevancy of the search results which can be formed by tuples on each node in the grid relational database, and sequencing according to the non-increasing sequence of the relevancy; finding out unprocessed tuples on one node in the grid relational database, and judging whether the upper limit of the relevancy of the search result which can be formed by the tuples is greater than the kth maximum relevancy in the search result which is found currently; if not, stopping searching, and directly outputting the kth maximum association degree in the searched search result; if yes, continuing to execute the next step; k is a positive integer greater than 1; processing the found unprocessed tuple on one node in the grid relational database, storing a search result generated after the tuple is processed, and updating the upper limit of the relevancy of the unprocessed tuple on all the nodes in the grid relational database; and circularly executing the judging step and the processing step until all unprocessed tuples existing on all nodes in the grid data are processed.
In an embodiment of the present invention, a calculation formula for calculating an upper limit of a relevancy degree of a search result that can be composed of tuples on each node in the grid-type relational database is as follows:
Figure GDA0002525721830000031
wherein C is an alternative network, Q is a keyword set consisting of Top-k keywords, and Vi QIs a node belonging to the alternative network C, t is in C;
Figure GDA0002525721830000032
representing the upper limit of the relevance degree of the unprocessed tuple t possibly forming the tuple connection tree in the alternative network CN; each node ViAll have an output buffer containing node ViCan be combined with node ViAt least one tuple in the output cache of each child node of (1) is connected to a tuple.
In an embodiment of the present invention, the calculation formula is 0 in the initial stage, and the upper limit of the correlation dynamically increases according to the execution process during the search processing.
In an embodiment of the present invention, the step of processing the unprocessed tuple existing on a node in the found grid-type relational database includes: and (3) filtering: judging whether the unprocessed tuple can be connected with at least one output tuple of the child node of the node, if so, continuously judging whether the unprocessed tuple can be connected with at least one output tuple of the parent node of the node; if yes, continuing to execute the connection step; if not, filtering the tuple; a connection step: in the connection process, all tuple connection trees containing the tuple are searched from top to bottom from each output tuple of the father node connected with the tuple, and a search result generated after the tuple is processed is generated.
Another aspect of the present invention provides a Top-k keyword search system, including: the database generation module is used for generating a grid relational database according to a keyword set consisting of the Top-k keywords; the database generation module searches according to a keyword set formed by the Top-k keywords and represents the alternative network generated in the searching process in a root tree mode; generating the grid relational database by sharing a common sub-tree with all the alternative networks; the calculation module is used for calculating the upper limit of the relevance degree of the search results which can be formed by tuples on each node in the grid relational database and sequencing the search results according to the non-increasing sequence of the relevance degree; the first processing module is used for finding out unprocessed tuples on one node in the grid relational database and judging whether the upper limit of the relevancy of the search result which can be formed by the tuples is greater than the kth maximum relevancy in the search result which is found currently; if not, stopping searching, and directly outputting the kth maximum association degree in the searched search result; if so, calling a second processing module for processing the unprocessed tuple on one node in the searched grid relational database, storing the search result generated after the tuple is processed, and updating the upper limit of the relevancy of the unprocessed tuple on all nodes in the grid relational database; k is a positive integer greater than 1; and the circulating module is used for circularly operating the first processing module and the second processing module until unprocessed tuples existing on all nodes in the grid data are processed.
In an embodiment of the present invention, the second processing module includes a filtering unit and a connecting unit; the filtering unit is used for judging whether the unprocessed tuple can be connected with at least one output tuple of the child node of the node, if so, continuously judging whether the unprocessed tuple can be connected with at least one output tuple of the father node of the node; if so, calling the connection unit; if not, filtering the tuple; the connection unit is used for searching all tuple connection trees containing the tuple from top to bottom from each output tuple of the father node connected with the tuple in the connection process, and generating a search result generated after the tuple is processed. Still another aspect of the present invention provides a readable storage medium having stored thereon a computer program that, when executed by a processor, implements the Top-k keyword search method.
A final aspect of the present invention provides a terminal, including: a processor and a memory; the memory is used for storing computer programs, and the processor is used for executing the computer programs stored by the memory so as to enable the terminal to execute the Top-k keyword search method.
As described above, the Top-k keyword searching method, system, readable storage medium and terminal of the present invention have the following advantages:
the Top-k keyword search method, the Top-k keyword search system, the readable storage medium and the terminal can avoid complete processing of the alternative network by calculating the upper limit of the relevance degree of the unseen search result, and can stop the search processing process in time when the upper limit of the relevance degree of the unseen search result does not exceed the kth maximum relevance degree of the current search result, so that the Top-k keyword search method, the Top-k keyword search system, the readable storage medium and the terminal have the following advantages that:
firstly, reducing the performance of a keyword search method of a relational database top-k;
secondly, frequent access to the relational database in the search processing process is reduced;
and thirdly, the memory consumption of the server in the search processing process is reduced.
Drawings
FIG. 1 is a flowchart illustrating a Top-k keyword searching method according to an embodiment of the present invention.
Fig. 2 is shown as an example diagram of an alternative network.
FIG. 3 is a schematic structural diagram of the Top-k keyword search system of the present invention in one embodiment.
Description of the element reference numerals
3 Top-k keyword search system
31 database generation module 3
32 calculation module
33 first processing module
34 second processing module
35 circulation module
341 filter unit
342 connecting unit
S11-S15
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
Example one
The embodiment provides a Top-k keyword search method, which includes:
generating a grid relational database according to a keyword set consisting of the Top-k keywords;
calculating the upper limit of the relevancy of the search results which can be formed by tuples on each node in the grid relational database, and sequencing according to the non-increasing sequence of the relevancy;
finding out unprocessed tuples on one node in the grid relational database, and judging whether the upper limit of the relevancy of the search result which can be formed by the tuples is greater than the kth maximum relevancy in the search result which is found currently; if not, stopping searching, and directly outputting the kth maximum association degree in the searched search result; if not, continuing to execute the next step; k is a positive integer greater than 1;
processing the found unprocessed tuple on one node in the grid relational database, storing a search result generated after the tuple is processed, and updating the upper limit of the relevancy of the unprocessed tuple on all the nodes in the grid relational database;
and circularly executing the previous step until the unprocessed tuples existing on all the nodes in the grid data are processed.
The Top-k keyword search method provided in the present embodiment will be described in detail below with reference to the drawings. Please refer to fig. 1, which is a schematic structural diagram of a Top-k keyword searching method in an embodiment. As shown in fig. 1, the Top-k keyword search method specifically includes the following steps:
and S11, generating a grid relational database according to the keyword set formed by the Top-k keywords. In this embodiment, the grid relational database is a lattic-based relational database. The result of the keyword search in the lattic-based relational database is a group of Tuple connection trees (JTT for short), which are directed trees formed by connecting tuples containing keywords in the relational database according to the inter-main and inter-external reference relations and having no ring, no multiple edges and no specific root nodes. Each tuple connection tree (JTT) is the result of a Relational Algebra Expression (Relational Algebra Expression). This relational algebra expression is called a Candidate Network (CN). Referring to fig. 2, an example diagram of an alternative network is shown. The generation of the candidate networks and the relational database schema and keywords are related to the actual distribution in the relational tables, and the efficiency of Top-k keyword search in the relational database depends on how quickly and efficiently the generation of the candidate networks is performed to find the JTT of k with the greatest relevance as the search result. In this embodiment, the S11 specifically includes searching according to a keyword set composed of the Top-k keywords, and representing the alternative network CN generated in the search process in a root tree manner; and generating the grid relational database, namely the relational database of the Lattice structural formula by sharing the public subtree by all the alternative networks CN. Wherein the alternative networks may share intermediate results of different alternative network processes.
In this embodiment, in Lattice, each node ViAll have an Output Buffer (Output Buffer) marked as ViOutput, comprising ViCan neutralize ViAt least one tuple in the output cache of each child node of (1) is connected to a tuple. ViIs also called ViThe output tuple of (2). Therefore, each output tuple of the root node of each alternative network CN in Lattice can form a tuple connection tree with the output tuples in its child nodes.
S12, calculating each node V in the grid relational databaseiAnd the upper limit of the relevancy of the search result which can be formed by the upper tuple t is sorted according to the non-increasing order of the relevancy.
In this embodiment, the formula for calculating the upper limit of the relevancy of the search result that can be composed of tuples on each node in the grid-type relational database is as follows:
a node V in the grid databasei QOn
Figure GDA0002525721830000061
In the present embodiment, the above calculation formula can be expressed as:
Figure GDA0002525721830000062
wherein C is an alternative network, Q is a keyword set consisting of Top-k keywords, and Vi QIs a node belonging to the alternative network C, t is in C;
Figure GDA0002525721830000071
indicating that the unprocessed tuple t may constitute an upper bound of the association degree of the tuple connection tree in the alternative network CN. The calculation formula is 0 in the initial stage, and the upper limit of the degree of association dynamically increases according to the execution process in the search processing process. The characteristic ensures that the processing process of the tuple in the Lattice is a processing mode from bottom to top.
S13, finding out a node V in the grid relational databasei QJudging whether the upper limit of the association degree of the element t is larger than the kth maximum association degree in the searched search result or not when the element t which is not processed exists; if yes, go to S14; if not, step S15 is executed, i.e., the search is stopped, and the kth maximum relevance degree in the currently searched search result is directly output. k is a positive integer greater than 1.
S14, processing a node V in the searched grid relational databasei QStoring the unprocessed tuple t, storing the search result generated after the processing of the tuple is finished, and updating the upper limit of the association degree of the unprocessed tuple on all nodes in the grid relational database. The S14 includes the following filtering step and linking step.
And (3) filtering: in the filtering step, when a tuple t is at the node ViWhen processed, is determined using select (Selection) and Semi-join (Semi-join) query operations.
Specifically, whether the unprocessed tuple t can be compared with the node V is judgedi QIf so, continuously judging whether the unprocessed tuple t can be connected with the node Vi QAt least one output tuple of the parent node of (c) is connected; if yes, continuing to execute the connection step; if not, filtering the tuple; if not, the tuple t is filtered out.
A connection step: in the connection process, all tuple connection trees containing the tuple t are searched from top to bottom from each output tuple of the father node connected with the tuple, and a search result generated after the tuple is processed is generated. Because all tuples that cannot make up the tuple connection tree can be filtered out, the connection process from the parent node can always generate search results.
And S15, stopping the search processing process in time when the upper limit of the relevancy of the unseen search result does not exceed the kth maximum relevancy in the search result which is searched currently.
S16, executing S13 and S14 in a loop until the unprocessed tuples existing on all the nodes in the grid data are processed.
The present embodiment also provides a readable storage medium (computer-readable storage medium) on which a computer program is stored, wherein the program, when executed by a processor, implements the Top-k keyword search method. Those of ordinary skill in the art will understand that: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The Top-k keyword search method and the readable storage medium in this embodiment may avoid complete processing of the alternative network by calculating the upper limit of the relevance degree of the unset search result, and may stop the search processing process in time when the upper limit of the relevance degree of the unset search result does not exceed the kth maximum relevance degree of the existing search result, which have the following beneficial effects:
firstly, reducing the performance of a keyword search method of a relational database top-k;
secondly, frequent access to the relational database in the search processing process is reduced;
and thirdly, the memory consumption of the server in the search processing process is reduced.
Example two
The embodiment provides a Top-k keyword search system, which includes:
the database generation module is used for generating a grid relational database according to a keyword set consisting of the Top-k keywords;
the calculation module is used for calculating the upper limit of the relevance degree of the search results which can be formed by tuples on each node in the grid relational database and sequencing the search results according to the non-increasing sequence of the relevance degree;
the first processing module is used for finding out unprocessed tuples on one node in the grid relational database and judging whether the upper limit of the relevancy of the search result which can be formed by the tuples is greater than the kth maximum relevancy in the search result which is found currently; if not, stopping searching, and directly outputting the kth maximum association degree in the searched search result; if so, calling a second processing module for processing the unprocessed tuple on one node in the searched grid relational database, storing the search result generated after the tuple is processed, and updating the upper limit of the relevancy of the unprocessed tuple on all nodes in the grid relational database; k is a positive integer greater than 1;
and the circulating module is used for circularly operating the first processing module and the second processing module until unprocessed tuples existing on all nodes in the grid data are processed.
The Top-k keyword search system provided in the present embodiment will be described in detail below with reference to the drawings. It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the x module may be a processing element that is set up separately, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the function of the x module may be called and executed by a processing element of the apparatus. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.
For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), etc. For another example, when one of the above modules is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Please refer to fig. 3, which shows a schematic structural diagram of the Top-k keyword search system in an embodiment. As shown in fig. 3, the Top-k keyword search system 3 includes a database generation module 31, a calculation module 32, a first processing module 33, a second processing module 34, and a circulation module 35.
The database generation module 31 is configured to generate a grid relational database according to the keyword set formed by the Top-k keywords. In this embodiment, the grid relational database is a lattic-based relational database.
Specifically, the database generation module 31 is configured to search according to a keyword set formed by Top-k keywords, and represent an alternative network CN generated in a search process in a root tree manner; and generating the grid relational database, namely the relational database of the Lattice structural formula by sharing the public subtree by all the alternative networks CN. Wherein the alternative networks may share intermediate results of different alternative network processes.
In this embodiment, in Lattice, each node ViAll have an Output Buffer (Output Buffer) marked as ViOutput, comprising ViCan neutralize ViAt least one tuple in the output cache of each child node of (1) is connected to a tuple. ViIs also called ViThe output tuple of (2). Therefore, each output tuple of the root node of each alternative network CN in Lattice can form a tuple connection tree with the output tuples in its child nodes.
A calculation module 32 coupled to the database generation module 31 for calculating each node V in the grid relational databaseiAnd the upper limit of the relevancy of the search result which can be formed by the upper tuple t is sorted according to the non-increasing order of the relevancy.
In this embodiment, the calculation formula for the calculation module 32 to calculate the upper limit of the relevancy of the search result that can be composed of tuples on each node in the grid-type relational database is as follows:
a node V in the grid databasei QOn
Figure GDA0002525721830000091
In the present embodiment, the above calculation formula can be expressed as:
Figure GDA0002525721830000092
wherein C is an alternative network, Q is a keyword set consisting of Top-k keywords, and Vi QIs a node belonging to the alternative network C, t is in C;
Figure GDA0002525721830000101
indicating that the unprocessed tuple t may constitute an upper bound of the association degree of the tuple connection tree in the alternative network CN. The calculation formula is 0 at an initial stage, and during the search process,the upper bound of the degree of association dynamically grows according to the execution process. The characteristic ensures that the processing process of the tuple in the Lattice is a processing mode from bottom to top.
The first processing module 33 coupled to the database generation module 31 and the calculation module 32 is used for finding out a node V in the grid relational databasei QJudging whether the upper limit of the association degree of the element t is larger than the kth maximum association degree in the searched search result or not when the element t which is not processed exists; if yes, calling the second processing module 34; if not, stopping searching, and directly outputting the kth maximum association degree in the searched search result. k is a positive integer greater than 1.
A second processing module 34 coupled to the database generating module 31 and the first processing module 33 is used for processing a node V in the searched grid relational databasei QStoring the unprocessed tuple t, storing the search result generated after the processing of the tuple is finished, and updating the upper limit of the association degree of the unprocessed tuple on all nodes in the grid relational database. The second processing module 34 includes a filtering unit 341 and a connecting unit 342.
The filtering unit 341 performs the filtering step when a tuple t is at the node ViWhen processed, is determined using select (Selection) and Semi-join (Semi-join) query operations.
Specifically, the filtering unit 341 determines whether the unprocessed tuple t can be associated with the node Vi QIf so, continuously judging whether the unprocessed tuple t can be connected with the node Vi QAt least one output tuple of the parent node of (c) is connected; if yes, continuing to execute the connection step; if not, filtering the tuple; if not, the tuple t is filtered out.
In the connection process, the connection unit 342 searches all tuple connection trees including the tuple t from top to bottom from each output tuple of the parent node connected with the tuple, and generates a search result generated after the tuple is processed.
The loop module 35 coupled to the first processing module 33 and the second processing module 34 is configured to loop the first processing module 33 and the second processing module 34 until the unprocessed tuples existing on all nodes in the grid data are processed.
EXAMPLE III
The terminal provided by the embodiment comprises: a processor, a memory, a transceiver, a communication interface, and a system bus; the memory for storing the computer program and the communication interface for communicating with other devices are connected to the processor and the transceiver through the system bus and perform communication with each other, and the processor and the transceiver are used for operating the computer program to make the terminal perform the steps S11 to S15 of the above Top-k keyword search method.
The above-mentioned system bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other equipment (such as a client, a read-write library and a read-only library). The memory may include a Random Access Memory (RAM), and may further include a non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor may be a general-purpose processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the integrated circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components.
In summary, the Top-k keyword search method, system, readable storage medium and terminal of the present invention can avoid complete processing of the alternative network by calculating the upper limit of the relevance degree of the unset search result, and can stop the search processing process in time when the upper limit of the relevance degree of the unset search result does not exceed the kth maximum relevance degree of the existing search result, and have the following beneficial effects:
firstly, reducing the performance of a keyword search method of a relational database top-k;
secondly, frequent access to the relational database in the search processing process is reduced;
and thirdly, the memory consumption of the server in the search processing process is reduced. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (8)

1. A Top-k keyword search method is characterized by comprising the following steps:
generating a grid relational database according to a keyword set consisting of the Top-k keywords; the method comprises the following steps: searching according to a keyword set formed by the Top-k keywords, and representing alternative networks generated in the searching process in a root tree mode; generating the grid relational database by sharing a common sub-tree with all the alternative networks;
calculating the upper limit of the relevancy of the search results which can be formed by tuples on each node in the grid relational database, and sequencing according to the non-increasing sequence of the relevancy;
finding out unprocessed tuples on one node in the grid relational database, and judging whether the upper limit of the relevancy of the search result which can be formed by the tuples is greater than the kth maximum relevancy in the search result which is found currently; if not, stopping searching, and directly outputting the kth maximum association degree in the searched search result; if yes, continuing to execute the next step; k is a positive integer greater than 1;
processing the found unprocessed tuple on one node in the grid relational database, storing a search result generated after the tuple is processed, and updating the upper limit of the relevancy of the unprocessed tuple on all the nodes in the grid relational database;
and circularly executing the judging step and the processing step until all unprocessed tuples existing on all nodes in the grid data are processed.
2. The Top-k keyword search method of claim 1, wherein the formula for calculating the upper limit of the relevancy of the search result that can be composed of tuples on each node in the grid-type relational database is as follows:
Figure FDA0002939022860000011
wherein C is an alternative network, Q is a keyword set consisting of Top-k keywords, and Vi QIs a node belonging to the alternative network C, t is in C;
Figure FDA0002939022860000012
representing the upper limit of the relevance degree of the unprocessed tuple t possibly forming the tuple connection tree in the alternative network CN; each node ViAll have an output buffer containing node ViCan be combined with node ViOf each child node, V, of the output buffer of the child nodei QT represents the tuple t, V on each node in the grid relational databasei QCN denotes containing node Vi QA set of alternative networks CN.
3. The Top-k keyword search method of claim 2, wherein the calculation formula is 0 in an initial stage, and the upper limit of the degree of association dynamically increases according to an execution process during the search process.
4. The Top-k keyword search method according to claim 1, wherein the step of processing the unprocessed tuple existing on a node in the searched grid relational database comprises:
and (3) filtering: judging whether the unprocessed tuple can be connected with at least one output tuple of the child node of the node, if so, continuously judging whether the unprocessed tuple can be connected with at least one output tuple of the parent node of the node; if yes, continuing to execute the connection step; if not, filtering the tuple;
a connection step: in the connection process, all tuple connection trees containing the tuple are searched from top to bottom from each output tuple of the father node connected with the tuple, and a search result generated after the tuple is processed is generated.
5. A Top-k keyword search system, comprising:
the database generation module is used for generating a grid relational database according to a keyword set consisting of the Top-k keywords; the database generation module searches according to a keyword set formed by the Top-k keywords and represents the alternative network generated in the searching process in a root tree mode; generating the grid relational database by sharing a common sub-tree with all the alternative networks;
the calculation module is used for calculating the upper limit of the relevance degree of the search results which can be formed by tuples on each node in the grid relational database and sequencing the search results according to the non-increasing sequence of the relevance degree;
the first processing module is used for finding out unprocessed tuples on one node in the grid relational database and judging whether the upper limit of the relevancy of the search result which can be formed by the tuples is greater than the kth maximum relevancy in the search result which is found currently; if not, stopping searching, and directly outputting the kth maximum association degree in the searched search result; if so, calling a second processing module for processing the unprocessed tuple on one node in the searched grid relational database, storing the search result generated after the tuple is processed, and updating the upper limit of the relevancy of the unprocessed tuple on all nodes in the grid relational database; k is a positive integer greater than 1;
and the circulating module is used for circularly operating the first processing module and the second processing module until unprocessed tuples existing on all nodes in the grid data are processed.
6. The Top-k keyword search system of claim 5, wherein the second processing module comprises a filtering unit and a connection unit
The filtering unit is used for judging whether the unprocessed tuple can be connected with at least one output tuple of the child node of the node, if so, continuously judging whether the unprocessed tuple can be connected with at least one output tuple of the father node of the node; if so, calling the connection unit; if not, filtering the tuple;
the connection unit is used for searching all tuple connection trees containing the tuple from top to bottom from each output tuple of the father node connected with the tuple in the connection process, and generating a search result generated after the tuple is processed.
7. A readable storage medium on which a computer program is stored, the program, when being executed by a processor, implementing the Top-k keyword search method according to any one of claims 1 to 4.
8. A terminal, comprising: a processor and a memory;
the memory is configured to store a computer program, and the processor is configured to execute the computer program stored in the memory to cause the terminal to execute the Top-k keyword search method according to any one of claims 1 to 4.
CN201710508847.4A 2017-06-28 2017-06-28 Top-k keyword search method/system, readable storage medium and terminal Active CN107247800B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710508847.4A CN107247800B (en) 2017-06-28 2017-06-28 Top-k keyword search method/system, readable storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710508847.4A CN107247800B (en) 2017-06-28 2017-06-28 Top-k keyword search method/system, readable storage medium and terminal

Publications (2)

Publication Number Publication Date
CN107247800A CN107247800A (en) 2017-10-13
CN107247800B true CN107247800B (en) 2021-04-09

Family

ID=60013653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710508847.4A Active CN107247800B (en) 2017-06-28 2017-06-28 Top-k keyword search method/system, readable storage medium and terminal

Country Status (1)

Country Link
CN (1) CN107247800B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440308A (en) * 2013-08-27 2013-12-11 北京理工大学 Digital thesis retrieval method based on formal concept analyses
US9298693B2 (en) * 2011-12-16 2016-03-29 Microsoft Technology Licensing, Llc Rule-based generation of candidate string transformations
CN105912606A (en) * 2016-04-05 2016-08-31 湖南人文科技学院 Synonym expansion based relational database keyword search method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6697818B2 (en) * 2001-06-14 2004-02-24 International Business Machines Corporation Methods and apparatus for constructing and implementing a universal extension module for processing objects in a database
US20070179784A1 (en) * 2006-02-02 2007-08-02 Queensland University Of Technology Dynamic match lattice spotting for indexing speech content

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9298693B2 (en) * 2011-12-16 2016-03-29 Microsoft Technology Licensing, Llc Rule-based generation of candidate string transformations
CN103440308A (en) * 2013-08-27 2013-12-11 北京理工大学 Digital thesis retrieval method based on formal concept analyses
CN105912606A (en) * 2016-04-05 2016-08-31 湖南人文科技学院 Synonym expansion based relational database keyword search method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《A lattice framework for reusing top-k query results》;Brandeis Hill等;《 IRI -2005 IEEE International Conference on Information Reuse and Integration, Conf, 2005》;20050912;第38-43页 *
《Top-k-size Keyword Search on Tree Structured Data》;Aggeliki Dimitriou等;《Information Systems》;20150131;第47卷;第178-193页 *

Also Published As

Publication number Publication date
CN107247800A (en) 2017-10-13

Similar Documents

Publication Publication Date Title
Wu et al. Query optimization for massively parallel data processing
Tran et al. Top-k exploration of query candidates for efficient keyword search on graph-shaped (rdf) data
EP2843567B1 (en) Computer-implemented method for improving query execution in relational databases normalized at level 4 and above
EP2605158A1 (en) Mixed join of row and column database tables in native orientation
Ilyas et al. Adaptive rank-aware query optimization in relational databases
CN106777343A (en) increment distributed index system and method
Sun et al. Dima: A distributed in-memory similarity-based query processing system
Ma et al. G-SQL: Fast query processing via graph exploration
Bergamaschi et al. Keyword search over relational databases: Issues, approaches and open challenges
US10372736B2 (en) Generating and implementing local search engines over large databases
Cappellari et al. A path-oriented rdf index for keyword search query processing
WO2014161201A1 (en) Keyword search on databases
KR101718119B1 (en) System and Method for processing SPARQL queries based on Spark SQL
Svoboda et al. Linked data indexing methods: A survey
Endrullis et al. Entity search strategies for mashup applications
CN107247800B (en) Top-k keyword search method/system, readable storage medium and terminal
Mulay et al. SPOVC: a scalable RDF store using horizontal partitioning and column oriented DBMS
Zhang et al. Unified SQL query middleware for heterogeneous databases
Rajith et al. JARS: join-aware distributed RDF storage
CN109582698B (en) Method, system, storage medium and terminal for updating query results of multiple continuous top-k keywords
Zhong et al. 3SEPIAS: A semi-structured search engine for personal information in dataspace system
Zhu et al. Hydb: Access optimization for data-intensive service
Phillips et al. InterJoin: Exploiting indexes and materialized views in XPath evaluation
Yu et al. Distributed top-k keyword search over very large databases with MapReduce
Guravannavar et al. Which sort orders are interesting?

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant