CN110750704B

CN110750704B - Method and device for automatically completing query

Info

Publication number: CN110750704B
Application number: CN201911014061.2A
Authority: CN
Inventors: 秦建斌; 王尧舒; 毛睿
Original assignee: Shenzhen Institute of Computing Sciences
Current assignee: Shenzhen Institute of Computing Sciences
Priority date: 2019-10-23
Filing date: 2019-10-23
Publication date: 2022-03-11
Anticipated expiration: 2039-10-23
Also published as: CN110750704A; WO2021077585A1

Abstract

The embodiment of the invention provides a method and a device for inquiring automatic completion, wherein the method for inquiring automatic completion comprises the following steps: receiving a query prefix from a user side; matching the character result of the query prefix based on a nested dictionary tree structure; adding the character result into an interval list according to the nested dictionary tree nodes; and sequencing the interval list according to the analysis of the user target character string to obtain a result set. The embedded dictionary tree can more accurately position the character string interval matched with the prefix, and supports the query automatic completion technology of keyword reduction, thereby greatly reducing the query length required to be input by a user and improving the comfort level of user experience.

Description

Method and device for automatically completing query

Technical Field

The invention relates to the technical field of search, in particular to a method and a device for automatically completing inquiry.

Background

Query autocompletion techniques are an important component of guiding users to correctly enter queries and reduce the number of characters that need to be entered. In search engines (e.g., Google, hundredths, etc.), users often want to enter a small amount of information and return their desired results. Such as the user entering MJ of this query and the search engine expecting to return results on Michael Jordan. When a user enters a query in a search box, the query autocomplete will give appropriate suggestions with the query input character as a prefix.

To better enhance human-computer interaction experience, query autocompletion is often used in various error-prone applications that require a lot of human input, such as command lines, desktop searches, mobile devices, and so on. Because of its importance, the query autocomplete technology has been widely regarded and applied to information extraction and database search.

For the existing query autocompletion methods, a user needs to manually separate keywords input by a query, and the methods perform matching operation by using query characters as prefixes of the keywords. These methods are not effective when the user does not prefer or otherwise facilitate manual separation of keywords in a query.

Disclosure of Invention

In view of the above problems, embodiments of the present invention are provided to provide a method for query autocompletion and a corresponding apparatus for query autocompletion that overcome or at least partially solve the above problems.

In order to solve the above problems, an embodiment of the present invention discloses a method for query automatic completion, including:

receiving a query prefix from a user side;

matching the character result of the query prefix based on a nested dictionary tree structure;

adding the character result into an interval list according to the nested dictionary tree nodes;

and sequencing the interval list according to the analysis of the user target character string to obtain a result set.

Further, after the step of sorting the interval list according to the analysis of the user target character string to obtain a result set, the method further includes:

and returning a target result set by adopting a Top-K algorithm according to the user requirement.

Further, before the step of receiving the query prefix from the user side, the method includes:

and establishing the nested dictionary tree structure.

Further, the step of establishing the nested trie structure includes:

dividing the keywords and establishing a dictionary tree;

the dictionary trees are linked together to form a nested dictionary tree structure.

Further, the dictionary tree includes an internal dictionary tree and an external dictionary tree, and the step of dividing the keywords and establishing the dictionary tree includes:

the first letter of the keyword is added to the external dictionary tree and the other letters of the corresponding keyword are added to the internal dictionary tree.

Further, the step of linking the tries together to form a nested trie structure includes:

linking the outer dictionary tree and the inner dictionary tree together to form a nested dictionary tree.

Further, the step of sorting the interval list according to the analysis of the user target character string to obtain a result set includes:

calculating the segmentation matching probability of the target character string by using Bayes theorem and a Gaussian mixture model;

and sequencing the interval list according to the descending mode of the segmentation matching probability.

The embodiment of the invention discloses a device for automatically completing inquiry, which comprises:

the receiving module is used for receiving the query prefix from the user side;

the matching module is used for matching the character result of the query prefix based on a nested dictionary tree structure;

the interval list merging module is used for adding the character result into an interval list according to the nested dictionary tree nodes;

and the interval result sorting module is used for sorting the interval list according to the analysis of the user target character string to obtain a result set.

The embodiment of the invention discloses electronic equipment, which comprises a processor, a memory and a computer program which is stored on the memory and can run on the processor, wherein the computer program realizes the steps of the method for automatically completing inquiry when being executed by the processor.

The embodiment of the invention discloses a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method for automatically completing the query are realized.

The embodiment of the invention has the following advantages: the embedded dictionary tree can more accurately position the character string interval matched with the prefix, and supports the query automatic completion technology of keyword reduction, thereby greatly reducing the query length required to be input by a user and improving the comfort level of user experience.

Drawings

FIG. 1 is a diagram illustrating a nested trie structure in accordance with an embodiment of the present invention;

FIG. 2 is a block diagram of a fast query dictionary tree algorithm in accordance with an embodiment of the present invention;

FIG. 3 is a flow chart of steps in an embodiment of a method for query autocomplete of the present invention;

FIG. 4 is a flow chart of steps of another embodiment of a method for query autocomplete of the present invention;

FIG. 5 is a block diagram illustrating an embodiment of an apparatus for query autocomplete according to the present invention;

FIG. 6 is a block diagram of another embodiment of an apparatus for query autocomplete according to the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

One of the core concepts of the embodiments of the present invention is to provide a method and a device for query automatic completion, where the method for query automatic completion includes: receiving a query prefix from a user side; matching character results of the query prefixes based on the nested dictionary tree structure; adding the character result into the interval list according to the nested dictionary tree nodes; and sequencing the interval list according to the analysis of the user target character string to obtain a result set. The embedded dictionary tree can more accurately position the character string interval matched with the prefix, and supports the query automatic completion technology of keyword reduction, thereby greatly reducing the query length required to be input by a user and improving the comfort level of user experience.

Referring to fig. 1 to 4, a flowchart illustrating steps of an embodiment of a method for query autocomplete of the present invention is shown, which may specifically include the following steps:

s100, receiving a query prefix from a user side;

in this embodiment, Σ is a limited set of characters; a string s is an ordered array of characters extracted from sigma. | s | represents the length of the string s, s [ i |)]Representing the ith character in s. s [ i]Representing the sub-string from the ith character to the jth character in s. Given 2 strings s and t, a prefix for s being t is expressed as s ≦ t, if and only if s [1.. i]＝t[1..i]And i is more than or equal to 1 and less than or equal to s. The string concatenated in s and t order is denoted by st. A set of character string arrays [ s ]₁,s₂,..s_n](n>1) If s is equal to s₁s₂..s_nSplicing of (a) with (b) a₁,s₂,..s_n]One cut called s. By s<A prefix substring representing any one s. Given that S is a string dataset, each string S ∈ S can be cut into a set of keywords, assuming that Σ contains a set of english letters. The segmentation symbol can be a space, a punctuation, a capital letter, etc. For example, "AddNextValue" is divided into three parts, "Add", "Next", and "Value". Consider that a string s can be partitioned into a set of keywords s1]. Given a query string q, said q is a prefix abbreviation match for s, expressed as

If and only if q is s1<s2<..si<I is more than or equal to 1 and less than or equal to n; q is the concatenation of prefix abbreviations of the first i keywords of s. For example, gene is a prefix abbreviation match for the string "GetNextValue" because ge and ne are prefixes of Get and Next. Prefix abbreviation matching is denoted by PAM. Given a character string data set S, a query character string q and a prefix abbreviation Query Automatic Completion (QACA), all character string sets si are found to be the same as S, and the conditions are met

The output results are incrementally computed based on the user's current input characters.

The method for automatically completing the query allows a user to input the link of the reducible keyword prefix as the query, and improves the experience degree. According to the scene of keyword prefix link, an index structure and a query method are designed to complete the functions of the method. And a ranking algorithm is proposed which is incorporated into the queries to ensure a quality ranking of the results output, i.e. the top ranked results are most likely to be desired by the user. A small amount of K is returned by a Top-K method, and the result is high in quality.

In this embodiment, by establishing a nested dictionary tree index structure, a query algorithm, an interval list merging method, an interval result ordering method and an interval Top-K algorithm, on-line, after original data is given, preprocessing data according to different requirements, such as removing noise and dirty data, and establishing an index structure. When the user inquires on the line, the inquiry algorithm is executed until the output result is presented to the user.

The index data structure in this embodiment is a nested trie structure, which includes a plurality of internal tries nested within an external trie. Referring to fig. 1, a diagram of a nested dictionary tree structure is shown. To build a nested dictionary tree, given each string input S, the initials of each key of the string are selected to be added to the external dictionary tree. Then, for the outer node where each initial is located, the other letters of the corresponding keyword are added to the internal dictionary tree. Nodes and edges of the external dictionary tree are called external nodes and edges, and nodes and edges of the internal dictionary tree are called internal nodes and edges. The root node of the nested trie is the root node of the external trie. Links from internal nodes to external nodes are also added between the nodes of the tree. For an internal node n, the root node containing the number of internal fields of n is represented by the initial node. And for any data character string where the non-initial character is located, if the data character string is followed by an immediately connected keyword, adding a shortcut link to the external node to the initial node corresponding to the internal node. The label of this quick link is the first letter of the next keyword.

To reduce the space for quick links, most of the links do not need to be physically saved. The target node of the link is always a subset of the outer edges. Based on this phenomenon, for an outer edge, one bit, namely a bit vector, is used for storage. The destination of the link of the ith bit representing the node is the same as the destination outside the ith entry. This avoids duplicate edges that hold the same function. Compared with the traditional dictionary tree, the nested dictionary tree combines the keywords sharing the same initial. In the following description of the algorithm, such a data structure can effectively reduce the number of active nodes. At the same time, the active node can also be found quickly.

S200, matching character results of the query prefix based on a nested dictionary tree structure;

in the nested dictionary tree structure, an active node n is a node having at least one path (through an edge or a link) from a root node to n, which can exactly match a query string input by a user. The algorithm starts from an external root node, and for each character input by a user, a new activation node is found from the existing activation nodes. Given this entered character, either the first character or the non-first character may be matched. Nested tries can support such matching well. For a non-initial character, a new activation node is found by walking an internal edge. For an initial letter, a new activation node can be found by walking an outer edge. In addition, a new activation node can be generated by jumping from the internal node to the external node through a shortcut link.

In this embodiment, the data under each node is not all the desired result. Strings that are not the result are removed by means of list merging. Defining In as a sequence of ordered intervals

The operation is to merge the sequences of two intervals.

Where x is_iAnd y_jTwo intervals are shown. Property 1, given a path from the root node to n, n₁,...,n_k. The result of query q is to exist only

Among them. Based on property 1, the complexity of a fast query dictionary tree algorithm in the present embodiment is: o (log | In' |). the specific algorithm is shown In fig. 2.

S300, adding the character result into an interval list according to the nodes of the nested dictionary tree;

in this embodiment, a query in the nested trie algorithm may not match all of the strings below the active node. In order not to report non-result data, each node in the trie is added to an ordered list of intervals to display strings describing a match between a prefix and a path in the trie. To compute the intervals in the list, a string is given, the nodes in the dictionary tree are traversed, and the ID of the string is added to the interval list for the corresponding node. One basic method is to use the sweepline algorithm to process interval list merging, and the time complexity of the method is O (| I)_n|+|I_n'| where | represents the number of intervals in the list. Due to the merge operation, | I_nI is generally very small in practical cases and much smaller than I_n'L. If is holding I_nL is regarded as a constant, and the time complexity becomes O (| I)_n'|). When traversing deep nodes in a nested trie, intervals in memory fission can become very dispersed, and | I_n'As l becomes larger, a large amount of merging penalty is introduced here. In view of the above problem, the present embodiment is an algorithm for list merging. For an interval u, v in the list]Using a binary search mode to take u as a key value in I_n'Find the first sum [ u, v]There is an intersecting interval.

S400, sorting the interval list according to the analysis of the user target character string to obtain a result set.

In this embodiment, the results of the output are sorted according to the target string of the estimated user based on the analysis of the user's needs.

In this embodiment, before the step of receiving the query prefix from the user side, S100 includes:

and establishing a nested dictionary tree structure.

In this embodiment, the step of establishing the nested trie structure includes:

dividing the keywords and establishing a dictionary tree;

the tries are linked together to form a nested trie structure.

In this embodiment, the trie includes an internal trie and an external trie, and the step of dividing the keyword and establishing the trie includes:

In this embodiment, the step of linking the tries together to form a nested trie structure includes:

the outer dictionary tree and the inner dictionary tree are linked together to form a nested dictionary tree.

In this embodiment, step S400 of sorting the interval list according to the analysis of the user target character string to obtain a result set includes:

and sequencing the interval list according to a mode of descending the segmentation matching probability.

In the present embodiment, given a data string s is cut into s₁,...,s_n]Assume that the first m keywords have been abbreviated to the query and the remaining (n-m) keywords have not been entered. Thus, it is possible to provide

q may be cut into [ q ]₁,...,q_m]And satisfy q_i≤s_iAnd i is more than or equal to 1 and less than or equal to m and less than or equal to n. Adding (n-m) empty strings, by q_m+1,...,q_nTo indicate. So that q and s will have the same number of cuts. The score for ranking s is defined as the string s being a query string with respect to segmentation [ q₁,...,q_n]And [ s ]₁,...,s_n]Probability of match, using score (s, q) ═ P(s)₁...s_n|q₁...q_n) To indicate. If there are multiple cutting modes, one cutting mode can be selected to obtain the maximum score. For all q PAM results, sorting is performed by score (s, q) function to obtain a descending result set.

To calculate score (s, q), bayes' theorem is applied:

score(s,q)＝P(s₁...s_n|q₁...q_n)

＝P(q₁...q_n|s₁...s_n)*P(s₁...s_n)/P(q₁...q_n)

∝P(q₁...q_n|s₁...s_n)*P(s₁...s_n)

＝P(q₁...q_n|s₁...s_n)*P(s)

denominator P (q) in the above formula₁...q_n) Can be safely ignored because P (q)₁...q_n) P (q), this is the same value for all strings that PA matches. P(s) is characterized by the popularity of s. To calculate P (q)₁...q_n|s₁...s_n) Let P (q) be assumed_i|s_i) I is 1-n are independent of each other. Thus, there are: p (q)₁...q_n|s₁...s_n)＝P(q₁|s₁)·...·P(q_n|s_n) The following formula is obtained:

score(s,q)∝P(q₁|s₁)·...·P(q_n|s_n)·P(s)

each P (q)_i|s_i) Described user input query string q_iIn the case of (2) is a character string s_iProbability of the prefix. Suppose P (q) for a character that has not been entered_i|s_i)＝1，m<i is less than or equal to n. The reason for this is that these keywords are then used as user input. In order that the fraction of s is not due to sequential operationsThe values are low, especially when n is much larger than m, these probability values are set to 1.

To better calculate P (q)_i|s_i) It is found that users habitually narrow down some special character sequences, such as ignoring consonant portions, and that there is a certain pattern of such omission. The current features are therefore described using vectors: (1) q. q.s_iLength of (2) q_iHow many vowels there are, (3) q_iHow many consonants there are, (4) q_iWhether or not to end with a consonant, (5) the value of i, i.e. the character s_iThe position in the string. As described above, the current feature is represented by a 5-dimensional vector. Here s_iAnd is not fully encoded in the vector. The reason for this is as follows: let p be_iRepresenting the user reducing si to q_iThe mode vector of (1). Since it is known how a keyword is reduced, i.e. is P (q)_i,s_i)＝P(p_i)·P(s_i). Because P (q)_i,s_i)＝P(q_i|s_i)·P(s_i),P(p_i). Thus P (P)_i) The result of (a) is P (q)_i|s_i)。

Given a mode vector, P (P) is calculated using a mixed Gaussian model (GMM)_i) The value of (c). The Gaussian mixture model uses unknown parameters to calculate the density function of p, which is the probability as follows:

where l is the number of Gaussian distributions, wi is the weight of each Gaussian distribution, N (p | μ_i,∑_i) Is measured in mu_iIs a mean value and ∑_iIs a variance matrix and is a probability density function of p. Where the parameter/can be fine-tuned in the training. Meanwhile, other parameters can be learned in a clustering manner and by using an EM algorithm: a series of data strings are given by the user, after which all prefixes of their data are collected and converted into keyword and prefix data pairs as features of the training data.

In this embodiment, after the step of sorting the interval list according to the analysis of the user target character string to obtain the result set, S400 further includes:

and S500, returning a target result set by adopting a Top-K algorithm according to the user requirement.

In this embodiment, the user may not be interested in all the results, and usually only the top K results, during the process of inputting the query. Under this assumption, results that are unlikely to go to the first K can be filtered ahead of time. And estimating the upper limit of the score of one activated node, and filtering the activated node in advance if the upper limit is lower than the lower limit of the current K previous results. In the interval list algorithm, one merged interval list is obtained in each valid node as a validation set. And if TopK of a result is required to be obtained, traversing the interval list in each effective node, calculating a corresponding score value for each character string in the interval, and then sorting according to the calculated scores and extracting the result of Top-K. The greatest cost in current method implementations is to use a Gaussian mixture model to compute the probability P (q)_i|s_i). Because the number of strings in the interval is large in practical situations, especially for query strings with short length, it is necessary to design an efficient Top-K algorithm to reduce the number of computations of the gaussian mixture model.

In a specific embodiment, the maximum possible score in the merge interval list is defined. According to the characteristics of the merging list, the following characteristics are provided: for each interval [ u, v ]]∈J_nAlways present in one interval [ u ', v']∈I_nAnd u ' is less than or equal to u ' and v ' is more than or equal to v. Thus, in List J_nThe maximum possible score value for the middle string is List I_nThe upper bound of (c). To calculate the score for each interval, consider the root node of a dictionary tree as n. The depth of the dictionary tree is denoted by d, where all lists I can be deduced_nHas at least d keywords, and when n becomes an active node, the query q has exactly d non-empty partitions. Thus for each interval u, v]∈I_nCan be processed in an offline modePhysical string s^u...s^vAnd the maximum value is used to define the boundary of the on-line query. Given a string sⁱFor every d keywords

Enumerating a string sⁱAll possible prefixes

Then calculate the probability

Note here that when j-d, there is only one possible prefix since a match is made on node n. Maximum probability

Is represented by a string sⁱIs calculated, where the maximum value is taken and stored in the interval u, v in the dictionary tree]In (1).

The embodiment discloses an online Top-K result extraction algorithm. At the very beginning, a priority queue R is initialized for storing the Top-K results. For each activation node n, for list J_nThe intervals in (1) are sorted in descending order of the maximum score. Second, for J_nEach interval [ u, v ] of (1)]The score for each string is computed sequentially and then updated into the priority queue. If an interval is reached where his maximum score is not greater than the kth result, the process for n can safely end.

In another embodiment, the calculation of some gaussian mixture models is skipped, and some keywords are shared by the character strings in the same interval with a high probability, i.e. with the same probability p ═ (q ═ q)_i|s_i). For in an interval u, v]∈I_nTwo adjacent character strings sⁱAnd sⁱ⁺¹Checking offline the number of keywords they share as prefixes and recording this value in si +1, with sⁱ⁺¹Spr. For online query processing, if sⁱAnd sⁱ⁺¹At the same time at J_nCan be for the first s in the same intervalⁱ⁺¹The Gaussian mixture model calculation of the spr key is skipped because it has already been calculated. To make better use of keyword sharing, the strings in S are sorted in the order of the earliest points.

The application allows the user to decide the number of results. If the user expects to obtain all results and screens the results one by one, the step of returning the target result set by adopting a Top-K algorithm according to the user requirement in the step S500 can be skipped, and if the user only wants a limited number of high-quality results, the step of returning the K results most wanted by the user is carried out.

The application discloses a method for inquiring automatic completion, which is based on a model for inquiring prefix abbreviation matching of a completion technology, wherein the model for inquiring prefix abbreviation matching of the completion technology is a new algorithm in the completion technology. Compared with the prior art, the method and the device fully consider various scenes, and particularly do not display separators which indicate key words for users. The method and the device can save 20% of the number of characters input by a user. The embedded dictionary tree is a new data structure for supporting the automatic completion technology. Compared with a traditional dictionary tree index structure, the embedded dictionary tree can more accurately position the character string interval matched with the prefix. To return more meaningful results, a ranking algorithm is designed that uses the probability of the query string versus the data string versus the segmentation, and uses bayesian formulas and gaussian mixture model structures to compute its probability value. The ranking algorithm can return results that are more desirable to the user. Considering the interesting result of the user, two Top-K optimization algorithms are designed, namely the calculation times of designing the score upper bound of each interval list and skipping the Gaussian mixture model with higher complexity. Compared with the existing algorithm, the Top-K optimization algorithm has higher efficiency and accuracy.

The method is not limited to be applied to the technical fields of prompt of database query input, search box optimization of search engines, code prompt in integrated development environments, query prompt systems in the field of biochemical medicine, quick input interfaces of input methods, limited terminal input interfaces and the like.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Referring to fig. 5 to 6, there are shown block diagrams of the structural embodiments of an apparatus for query autocomplete according to the present invention, which may specifically include the following modules:

a receiving module 100, configured to receive a query prefix from a user side;

a matching module 200, configured to match the character result of the query prefix based on the nested trie structure;

the interval list merging module 300 is used for adding the character result into the interval list according to the nested dictionary tree nodes;

and the interval result sorting module 400 is configured to sort the interval list according to analysis of the user target character string to obtain a result set.

In this embodiment, the method further includes:

and the result screening module 500 is used for returning the target result set by adopting a Top-K algorithm according to the user requirements.

In this embodiment, the method further includes:

and the structure establishing module is used for establishing a nested dictionary tree structure.

In this embodiment, the structure building module includes:

the splitting unit is used for dividing the keywords and establishing a dictionary tree;

and the linking unit is used for linking the dictionary trees together to form a nested dictionary tree structure.

In this embodiment, the splitting unit includes:

and the splitting subunit is used for adding the first letter of the keyword to the external dictionary tree and adding other letters of the corresponding keyword to the internal dictionary tree.

In the present embodiment, the link unit includes:

and the link subunit is used for linking the external dictionary tree and the internal dictionary tree together to form a nested dictionary tree.

In this embodiment, the interval result sorting module includes:

the segmentation probability calculation unit is used for calculating the segmentation matching probability of the target character string by using Bayes theorem and a Gaussian mixture model;

and the sorting unit is used for sorting the interval list in a mode of descending the segmentation matching probability.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiment of the invention discloses electronic equipment, which comprises a processor, a memory and a computer program which is stored on the memory and can run on the processor, wherein the computer program realizes the steps of the query automatic completion method when being executed by the processor.

The embodiment of the invention discloses a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to realize the steps of the query automatic completion method.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The method for automatically completing inquiry and the corresponding device for automatically completing inquiry provided by the invention are introduced in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for query autocompletion, comprising:

building a nested dictionary tree structure; specifically, the first letter of a keyword is added to an external dictionary tree, and the other letters of the corresponding keyword are added to an internal dictionary tree; linking the external dictionary tree and the internal dictionary tree together to form a nested dictionary tree; adding a link from an internal node to an external node in the nested dictionary tree, and if an immediate keyword is behind a character string where a non-initial character is located, adding a link to the external node for the initial node of the internal node corresponding to the non-initial character, wherein the label of the link is the initial letter of the immediate keyword; wherein the internal nodes are nodes of the internal dictionary tree, the external nodes are nodes of the external dictionary tree, and the initial nodes are root nodes containing internal fields of the internal nodes;

receiving a query prefix from a user side; the query prefix is the concatenation of prefix abbreviations of any previous keyword of a character string formed by sequentially splicing a plurality of keywords;

2. The method of claim 1, wherein after the step of sorting the interval list according to the analysis of the user target string to obtain a result set, further comprising:

3. The method of claim 1, wherein the step of sorting the interval list according to the analysis of the user target string to obtain a result set comprises:

4. An apparatus for query autocomplete, comprising:

the structure building module is used for building a nested dictionary tree structure; specifically, the first letter of a keyword is added to an external dictionary tree, and the other letters of the corresponding keyword are added to an internal dictionary tree; linking the external dictionary tree and the internal dictionary tree together to form a nested dictionary tree; adding a link from an internal node to an external node in the nested dictionary tree, and if an immediate keyword is behind a character string where a non-initial character is located, adding a link to the external node for the initial node of the internal node corresponding to the non-initial character, wherein the label of the link is the initial letter of the immediate keyword; wherein the internal nodes are nodes of the internal dictionary tree, the external nodes are nodes of the external dictionary tree, and the initial nodes are root nodes containing internal fields of the internal nodes;

the receiving module is used for receiving the query prefix from the user side; the query prefix is the concatenation of prefix abbreviations of any previous keyword of a character string formed by sequentially splicing a plurality of keywords;

5. Electronic device, characterized in that it comprises a processor, a memory and a computer program stored on said memory and capable of running on said processor, said computer program, when executed by said processor, implementing the steps of the method for query autocompletion according to any one of claims 1 to 3.

6. Computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the method for query autocompletion according to any one of claims 1 to 3.