CN113239258B

CN113239258B - Method, device, electronic equipment and storage medium for providing query suggestion

Info

Publication number: CN113239258B
Application number: CN202110547368.XA
Authority: CN
Inventors: 周丽芳; 张谦; 陈国梁; 王岗
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-05-19
Filing date: 2021-05-19
Publication date: 2023-06-27
Anticipated expiration: 2041-05-19
Also published as: CN113239258A

Abstract

The present disclosure provides a method, apparatus, electronic device and storage medium for providing query suggestions, relates to data processing, and in particular relates to the field of search engines and content recommendation. A method of providing query suggestions comprising: acquiring a first character string input by a user; querying a preconfigured database by using the first character string as an index, wherein the database is an inverted index database, and storing a second character string by taking the first character string as an index in the database, wherein the first character string represents an intermediate character sequence of the second character string or a part of the second character string when being input; and outputting the second string as a query suggestion.

Description

Method, device, electronic equipment and storage medium for providing query suggestion

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a search engine and content recommendation, and more particularly, to a method, an apparatus, an electronic device, and a storage medium for providing query suggestions.

Background

In the field of search engines, it is desirable to be able to provide a user with possible prompt sentences during the process of entering query terms by the user, or to complement a partial search currently entered by the user. Such a hint term or completion statement is referred to as a query Suggestion (or Sug for short) or "hint term". A method that can provide query suggestions in more real time is desired.

Disclosure of Invention

The present disclosure provides a method, apparatus, electronic device, computer-readable storage medium, and computer program product for providing query suggestions.

According to an aspect of the present disclosure, there is provided a method of providing query suggestions, comprising: acquiring a first character string input by a user; querying a preconfigured database by using the first character string as an index, wherein the database is an inverted index database, and a second character string is stored in the database by taking the first character string as an index, and the first character string represents an intermediate character sequence of the second character string or a part of the second character string when the second character string or the part of the second character string is input; and outputting the second string as a query suggestion.

According to another aspect of the present disclosure, there is provided an apparatus for providing query suggestions, comprising: a character string input unit configured to acquire a first character string input by a user; a character string query unit configured to query a preconfigured database using the first character string as an index, wherein the database is an inverted index database in which a second character string is stored with the first character string as an index, the first character string representing an intermediate character sequence of the second character string or a part of the second character string when inputted; and a character string output unit configured to output the second character string as a query suggestion.

According to still another aspect of the present disclosure, there is provided a database construction method, including: processing a vocabulary term string to obtain one or more fragment strings, each of the one or more fragment strings being an intermediate character sequence of the vocabulary term string or a portion of the vocabulary term string when entered; and storing, for each of the one or more segment strings, the term string as an index.

According to still another aspect of the present disclosure, there is provided a database construction apparatus including: a character string processing unit configured to process a vocabulary term character string to obtain one or more fragment character strings, each of the one or more fragment character strings being an intermediate character sequence of the vocabulary term character string or a portion of the vocabulary term character string when input; and a character string storage unit configured to store, for each of the one or more fragment character strings, the fragment character string as an index, the term character string.

According to still another aspect of the present disclosure, there is provided an electronic apparatus including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method for providing query suggestions according to embodiments of the present disclosure.

According to yet another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a method for providing query suggestions according to an embodiment of the present disclosure.

According to yet another aspect of the present disclosure, a computer program product is provided, comprising a computer program, wherein the computer program, when executed by a processor, implements a method for providing query suggestions according to embodiments of the present disclosure.

In accordance with one or more embodiments of the present disclosure, query suggestions may be provided in real-time.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The accompanying drawings illustrate exemplary embodiments and, together with the description, serve to explain exemplary implementations of the embodiments. The illustrated embodiments are for exemplary purposes only and do not limit the scope of the claims. Throughout the drawings, identical reference numerals designate similar, but not necessarily identical, elements.

FIG. 1 is a schematic diagram illustrating an exemplary system in which various methods described herein may be implemented, according to an embodiment of the present disclosure;

FIG. 2 is a flow chart illustrating a method for providing query suggestions according to an embodiment of the present disclosure;

FIG. 3A is an example application scenario diagram illustrating a method for providing query suggestions according to an embodiment of the present disclosure;

FIGS. 3B and 3C are schematic diagrams illustrating data stored in a database according to an embodiment of the present disclosure;

4A-4F are flowcharts illustrating methods for providing query suggestions and database pre-configuration methods according to embodiments of the present disclosure;

FIG. 5A is a block diagram illustrating an apparatus for providing query suggestions according to an embodiment of the present disclosure;

FIG. 5B is a block diagram illustrating an apparatus for building a database according to an embodiment of the present disclosure;

fig. 6 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present disclosure, the use of the terms "first," "second," and the like to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of the elements, unless otherwise indicated, and such terms are merely used to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, they may also refer to different instances based on the description of the context.

The terminology used in the description of the various illustrated examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, the elements may be one or more if the number of the elements is not specifically limited. Furthermore, the term "and/or" as used in this disclosure encompasses any and all possible combinations of the listed items.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented, in accordance with an embodiment of the present disclosure. Referring to fig. 1, the system 100 includes one or

more client devices

101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120.

Client devices

101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.

In embodiments of the present disclosure, the server 120 may run one or more services or software applications that enable execution of methods for providing query suggestions.

In some embodiments, server 120 may also provide other services or software applications that may include non-virtual environments and virtual environments. In some embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of

client devices

101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model.

In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof that are executable by one or more processors. A user

operating client devices

101, 102, 103, 104, 105, and/or 106 may in turn utilize one or more client applications to interact with server 120 to utilize the services provided by these components. It should be appreciated that a variety of different system configurations are possible, which may differ from system 100. Accordingly, FIG. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.

The user may use

client devices

101, 102, 103, 104, 105, and/or 106, for example, to conduct searches, type in search terms, receive query suggestions, view query results, and so forth. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that the present disclosure may support any number of client devices.

Client devices

101, 102, 103, 104, 105, and/or 106 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and the like. These computer devices may run various types and versions of software applications and operating systems, such as Microsoft Windows, apple iOS, UNIX-like operating systems, linux, or Linux-like operating systems (e.g., google Chrome OS); or include various mobile operating systems such as Microsoft Windows Mobile OS, iOS, windows Phone, android. Portable handheld devices may include cellular telephones, smart phones, tablet computers, personal Digital Assistants (PDAs), and the like. Wearable devices may include head mounted displays and other devices. The gaming system may include various handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), short Message Service (SMS) applications, and may use a variety of communication protocols.

Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a number of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. For example only, the one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture that involves virtualization (e.g., one or more flexible pools of logical storage devices that may be virtualized to maintain virtual storage devices of the server). In various embodiments, server 120 may run one or more services or software applications that provide the functionality described below.

The computing units in server 120 may run one or more operating systems including any of the operating systems described above as well as any commercially available server operating systems. Server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, etc.

In some implementations, server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of

client devices

101, 102, 103, 104, 105, and 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of

client devices

101, 102, 103, 104, 105, and 106.

In some implementations, the server 120 may be a server of a distributed system or a server that incorporates a blockchain. The server 120 may also be a cloud server, or an intelligent cloud computing server or intelligent cloud host with artificial intelligence technology. The cloud server is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and virtual private server (VPS, virtual Private Server) service.

The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of databases 130 may be used to store information such as audio files and video files. Database 130 may reside in various locations. For example, the data store used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. Database 130 may be of different types. In some embodiments, the data store used by server 120 may be a database, such as a relational database. One or more of these databases may store, update, and retrieve the databases and data from the databases in response to the commands.

In some embodiments, one or more of databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key value stores, object stores, or conventional stores supported by the file system.

The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.

A method 200 of providing query suggestions in accordance with an embodiment of the present disclosure is described below with reference to fig. 2.

At step 210, a first string entered by a user is obtained.

At step 220, the preconfigured database is queried using the first string as an index. The database may be an inverted index database. In the database, a second character string is stored with the first character string as an index, and the first character string represents an intermediate character sequence of the second character string or a part of the second character string when inputted. For example, for a second string (e.g., a potential search term) "physics," a first string may be the intermediate character sequence "lix", "theory xue", "lx", etc. during input, or the intermediate character sequence "li", "theory", etc. during input that is part of ("theory") the second string.

An inverted index refers to an index table in which each entry includes an attribute value and the address of each record having the attribute value. Since the attribute value is not determined by a record but the position of the record is determined by the attribute value, it is called inverted index (inverted index).

At step 230, the second string is output as a query suggestion.

An example application scenario of a method according to an example form example of the present disclosure is described with reference to fig. 3A. That is, in the course of implementing the method, the first character string "lix" entered by the user is acquired in the input area 310, the second character string "Li Xue" is read in the library, and returned to the user as a query suggestion in the query suggestion area 320. Outputting the second string as the query suggestion may include causing the terminal device to input or present the query suggestion to the user. Thus, query suggestions can be provided accurately in real time, particularly when user input is incomplete or partial segments are input.

In the prior art, when a user inputs a character string (for example, pinyin letters or a part of Chinese characters), the character string input by the user is analyzed online, and then subjected to word segmentation, pinyin matching, semantic analysis and other processes, and then a query suggestion is searched from a library. Such a process is relatively poor in real-time performance and requires an online computing process.

In contrast, according to the scheme of the present disclosure, a database of segment strings-complete strings can be constructed in advance and stored with the segment strings as indexes. Here, by using the idea of "inverted index", that is, using an attribute value (a fragment string or a first string) as an index, the corresponding complete string is stored in association. For example, continuing with the example above, a "lix" may be used as an index, store physics, idealized, etc. In the real-time searching process, a first character string lix input by a user is received. Because the 'physics' is stored in the database under the index of the 'lix', the 'physics' can be directly read from the database as query suggestion, and the 'lix' input by a user is not required to be analyzed, matched and the like on line, and only the reading is required. For example, referring to FIG. 3B, data 321 therein is one example of data stored in an inverted index, where "physics" (and optionally, "Li Xue", "academy of science", etc.) is stored indexed by "lix".

The method provided by the disclosure is particularly suitable for and easily extends to the information retrieval fields of text-based retrieval services in specific professional fields, such as trade name retrieval, paper retrieval and the like. It is to be understood that the methods of the present disclosure are not limited to such fields.

Some modified examples of the method 200 and some examples of the pre-configuration process of the database according to embodiments of the present disclosure are described below with reference to fig. 4A-4F.

Fig. 4A illustrates a pre-configuration process 410 of a database according to some embodiments.

At step 411, the second string is processed to obtain one or more segment strings, each segment string of the one or more segment strings representing an intermediate character sequence of the second string or a portion of the second string when entered. The one or more fragment strings include a first string. For example, in the pre-configuration process, for the second string "physics," the segment string may be obtained, including, for example, "lix, lixue, lx, reason xue, reason x," and so on. The second string may be referred to herein as a complete string or a term string, as it is often a complete, meaningful cue word.

At step 412, for each of the one or more segment strings, the segment string is stored as an index with the second string. For example, "physics" may be stored under the index of lix, lixue, lx, respectively, such that when one of the user inputs lix, lixue, lx is the first string, etc., this query suggestion of "physics" may be corresponded to. With continued reference to FIG. 3B, data 322 therein shows that "physics" may also be stored under the index "lixue".

Fig. 4A gives a further definition of how the inverted index database is preconfigured. In particular, it is statically defined that the database is configured through a step that is either pre-or off-line. For example, such steps may occur at a thesaurus update, without the step of online computing being required when query suggestions are provided in real-time. In some embodiments, the pre-configuration process of the database may occur at the time of the second string binning. Therefore, the system can be subjected to same-frequency warehousing and updating with database building data, and the second-level effective service is provided by an online prompt word service, so that the user is ensured to acquire the latest data material information. Under the scene that the user query word is strongly related to the real-time material data, particularly for the scene requiring high-frequency material data updating, the query suggestion service can be realized with high instantaneity.

Step 411 in fig. 4A is further described with reference to fig. 4B, according to some embodiments. According to some embodiments, the second string may be a kanji string. In such an embodiment, step 411, i.e., the step of processing the second string to obtain one or more fragment strings, may be implemented by steps 421-423.

At step 421, one or more kanji substrings of the second string are obtained, each kanji substring comprising one or more kanji characters arranged in succession in the second string. For example, for the second string "college of science," the kanji substrings may include: academy of science, physics, college, theory, academy, etc.

At step 422, for each of the one or more kanji substrings, one or more hybrid substrings are generated by replacing at least one kanji in the kanji substring with a corresponding pinyin representation. For example, for a kanji substring "college," a hybrid substring may include: y, xuey, xyuan … …, etc.

At step 423, one or more of the generated kanji substrings and hybrid substrings are used as one or more segment strings for the second string. The generated segment strings may include substrings of pure Chinese characters, or substrings of Chinese characters mixed with letters (or pure letters). For example, for "college of theory" may include college of theory, physics, college, theory … …, theory xuey, lxyuan … …, and the like. That is, the first string in method 200 may be one of academy of theory, physics, academy of theory … …, theory xuey, lxyuan, and the like. It is to be appreciated that in the case of fully pre-configuring a database, almost any string may be used as the first string to implement the query suggestion providing method of the present disclosure. That is, for any possible user input, e.g., no more than a certain length of Chinese characters and pinyin combinations, etc., the database may have been pre-configured with a corresponding complete string indexed by the user input.

The steps described with reference to fig. 4B enable simple, efficient and comprehensive generation of segment strings, such as strings including pure kanji and mixed pinyin, for a second string, i.e., a complete string.

Referring to fig. 4C, step 421 in fig. 4B is further described, according to some embodiments. According to some embodiments, step 421, i.e., the step of obtaining one or more kanji substrings of the second string, may be further implemented by steps 431-432.

At step 431, one or more right side substrings of the second string are generated, each right side substring being a consecutively arranged substring in the second string, and the last character of the right side substring being identical to the last character of the second string. For example, for the second string "college of science", the right side substring generated may be "college of science", and the last word of each right side substring is identical to the last word of the second string. It can be seen that although referred to as a "substring," this does not mean that the substring contains fewer characters than the string. For example, for a string, the substring generated for it may be the string itself.

At step 432, for each right side substring, a left side substring of the right side substring is generated as a substring for the second string, each left side substring is a substring of consecutive permutations in the right side substring, and a first character of the left side substring is identical to a first character of the right side substring. For the right substring "college," the left substring generated may be "academic," "college," with the first word of each left substring being the same as the first word of the right substring.

Through the steps shown in fig. 4C, the sub-strings are generated by splitting from the left and right sides, respectively, so that possible input fragments for the user can be maximized.

Step 422 in fig. 4B is further described with reference to fig. 4D, according to some embodiments. According to some embodiments, step 422, i.e., the step of generating one or more mixed substrings, may be implemented by steps 441-444 for each of the one or more kanji substrings.

At step 441, the first k characters of the kanji substring are taken as a first concatenation portion, where k is a non-negative integer and k+.n, n being the number of characters of the kanji substring. Here, k represents the number of kanji characters that need to be reserved. The size of k can be preset or modified according to different situations. The size of k can restrict the set size of the generated substring, because smaller k means fewer Chinese characters must be reserved, more subsets are formed, association capacity is stronger, and the required storage space is relatively larger; and vice versa. In addition, k may also be selected based on the accuracy of the matching of query suggestions, since smaller k suggests query suggestions containing more Chinese characters for shorter user input segments; and under the condition of larger k, the query suggestion is prompted only when the user inputs more Chinese characters. It will be appreciated that during configuration of the database, the size of k or the duty cycle of k to n may be different for different strings or different substrings.

As one example, a kanji substring is a "university physical course", the number of characters n=6, and k=3 is set. At this time, the first 3 characters are not pinyin-converted and remain directly as the first spliced portion, "college object". It will be appreciated that when k is equal to n, all chinese characters will remain as the first splice portion and thus be output as a mixed substring, while the second splice portion and the third splice portion may be empty in this case. That is, the recitation of "mixed substring" herein does not require that the mixed substring necessarily include pinyin characters, nor that the mixed substring necessarily contain fewer kanji characters than the kanji substring.

At step 442, one or more second stitched portions are generated, each of the one or more second stitched portions being a string formed in sequence of the full spellings or initials of the k+1 through n-1 characters of the kanji substring. Considering the ambiguity of user input habit and pinyin, this part either directly uses full pinyin or simple pinyin only retains the first letter.

For example, continuing with the example above, for a kanji substring "university physical course", n= 6,k =3, the k+1 to n-1 characters correspond to "lessons", the possible pinyin is converted to two combinations of full and first spellings, and the plurality of second stitched portions may be "like, lk".

At step 443, one or more third stitched portions are generated, each of the one or more third stitched portions being a full-spell or a left ordered subset of the full-spell of the nth character of the kanji substring. For example, continuing with the example above, for the kanji substring "university physical course", the last kanji "course", the plurality of third splice parts may be: cheng (whole spelling); and chen, che, ch, c (left ordered subset of full spellings).

At step 444, one or more hybrid substrings are generated, each of the hybrid substrings being a string formed by sequentially concatenating the first splice portion, one of the one or more second splice portions, and one of the one or more third splice portions. For example, a mixed substring may be expressed as: the first splice portion + the second splice portion (optionally one of them) +the third splice portion (optionally one of them). Specifically, the example "university physical course" above is continued, and in the case of k=3, the mixed substring may be "university" + "like or lk" + "chen or che or ch or c". More specifically, one example may be "university likechen". It will be appreciated that depending on certain criteria (e.g., character length limitations or user habit preferences, etc.), only a portion of them may be generated or selected as mixed substrings, while others are not generated or discarded. All possible mixed substrings may also be reserved and stored as mixed substrings for the kanji substrings. Specifically, in the above example, there may be at most 2×5=10 possible mixed substrings, but the finally generated mixed substring does not necessarily include all of these 10. It is to be understood that the present disclosure is not limited thereto.

According to the embodiment of fig. 4D, for a given kanji substring composed of kanji, a corresponding hybrid substring can be generated that is computationally simple, comprehensive in coverage and custom-fitted to the user. The database so configured is more capable of reflecting possible query suggestions and, thus, makes the query suggestion provision method of the present disclosure more accurate and efficient.

A variation of the database pre-configuration process 450 according to some embodiments is described below in connection with fig. 4E. According to some embodiments, the database pre-configuration process 450 may be implemented by steps 451-453.

At step 451, the second string is processed to obtain one or more fragment strings. Step 451 may be similar to step 411, and duplicate description is omitted herein.

Then, at step 452, for each of the one or more segment strings, an association value between the segment string and the second string is determined. The association value may also be referred to as a degree of association, an association score, a suggestion score, and the like, and the present disclosure is not limited thereto.

At step 453, for each of the one or more segment strings, the segment string is indexed, storing the second string with the corresponding association value. In step 453, the process of storing the segment string as an index into the second string may be similar to step 412, and a repetitive description is omitted here. Referring to FIG. 3C, data 331 is shown in which a plurality of strings ("second strings") and corresponding associated values are stored under an index of "lix". It is understood that brackets in fig. 3C are merely illustrative examples. Those skilled in the art will appreciate that the second string and associated value may be stored under a corresponding index in any data format and storage form, including but not limited to an associated pair.

According to such an embodiment, in case the association value has been stored in advance in the database, step 230 may further comprise: in response to determining that the association value between the first string and the second string satisfies a threshold condition, the second string is output. For example, the threshold condition may be above a predetermined value, or the first few characters ordered by threshold, etc.

According to the embodiment depicted in fig. 4E, the association value is also calculated during the pre-configuration process. Thus, the matching score between the first string (segment string) and the second string can be calculated offline and output directly according to the score when generating suggestions online, without requiring online calculation or ranking processes. In this way, the process of generating query suggestions can be made more real-time. During online recall, the sorting can be performed by using offline relevance values without designing an additional complex fine-sorting algorithm.

Step 452 in fig. 4E is further described with reference to fig. 4F, in accordance with some embodiments. According to some embodiments, step 452, i.e. the step of determining the association value between the segment string and the second string, may be implemented by steps 461-463.

At step 461, the degree of offset of the segment string relative to the second string is determined. The degree of offset may indicate a position of a character in the second string that corresponds to a first character in the segment string. The degree of offset may reflect the degree of prefix priority matching.

For example, first the offset is calculated, i.e. the position of its first character or initial in the original, the position being counted starting from 1. For example, for the second string "college of theory", if the first string is "yy", the offset is 1; if "chemical" is offset by 2, because the first word "chemical" is located at 2.

Next, the offset, e.g., normalized offset score, may be further calculated

offset_score = [10-min(offset, L)] / L

Where L reflects the mode, average number of user queries, i.e., the common query word (query) character length. L is adjustable. For example, L may be 10.

L is introduced to indicate that if offset is greater than the set mode or average L, then the term is zero. That is, for terms that are too long in characters, the score is reduced for later words. For example, when it is detected that the user inputs the character "xx", it is desirable that the content corresponding to "xx" is located as far as possible at the beginning, and chinese characters should not be located at positions after L (for example, 10) characters.

At step 462, a degree of overlap of the segment string with the second string is determined, the degree of overlap indicating a degree of content matching of the segment string with the second string. The overlap ratio may characterize the number of overlapping characters (e.g., the proportion of the original string) of the segment string and the original string. For example, the overlap ratio can be calculated according to the following formula

Overlap = mixing length of fragment string/mixing length of second string

The "mixing length" here employs the following logic: if the segment character string comprises Chinese characters, counting the mixed length of the molecular denominator of the Chinese characters according to the Chinese characters; if the corresponding position is pinyin, the mixed length of the numerator and denominator is counted according to the pinyin. That is, if the current segment string is composed of i Chinese characters + pinyin, the second string (including n Chinese characters) is noted as the full pinyin number of i Chinese characters (i) +the remaining (n-i) Chinese characters.

For example, taking the fragment string "theory xue" and the second string "college of theory" as an example, since pinyin is at the position of the second word "school", the overlap ratio is calculated as the mixed length of the overlap ratio= "theory xue"/"mixed length of the" theory xueyuan "= (1+3)/(1+7) =0.5.

At step 463, an association value between the segment string and the second string is determined based on the degree of offset and the degree of overlap. For example, the association value may be calculated as:

score = offset_weight * offset_score + ctr_weight * ctr_score

where offset_weight represents an offset weight, offset_score represents an offset, ctr_weight represents a coincidence weight, and ctr_score represents a coincidence. The offset weight and the overlap weight are, for example, 0.5 by default, and can be configured in a self-defined manner to represent the attention degree of the service to prefix priority matching and correlation matching respectively.

Through the relevance calculation, the relevance between the character string fragments and the complete character string can be scored offline, and the global scoring sorting algorithm based on prefix priority matching is simple and can rapidly score the relevance of two items globally.

With continued reference back to FIG. 4E, according to some other embodiments, the database provisioning process 450 may also include steps 454-456.

At step 454, the third string is processed to obtain one or more fragment strings, each fragment string of the one or more fragment strings representing an intermediate string of the third string or a portion thereof, and the one or more fragment strings including the first string. For example, the process of processing the third string may be similar to

steps

411 or 451, and will not be described again here. Referring back to fig. 3B or 3C, for example, the third string may be "Li Xue" or "ideal" or the like.

At step 455, for each of the one or more segment strings, an association value between the segment string and the third string is determined.

At step 456, each of the one or more fragment strings is used as an index, and a third string and a corresponding association value are stored in the database under the corresponding index. Step 455 may be similar to step 452 and step 456 may be similar to step 453. Specific details are not described here.

According to such an embodiment, in the case where the association values associated with the plurality of character strings have been stored in advance in the database, step 230 may further include: in response to determining that the association value of the first string with the second string is greater than the association value of the first string with the third string, the second string is caused to be output preferentially over the third string. For example, referring to fig. 3C, upon receiving "lix", since the association degree of "lix" with "Li Xue" is 0.95 and the association degree of "physics" is 0.92, it is possible to push "Li Xue" to the user in order at a higher position without requiring complicated calculation or even without requiring any online calculation.

After the query suggestions are obtained, the related art often requires online computation of the ranking of the query suggestions. The common query suggestion service generally recalls a batch of related query suggestion sets on line and in real time according to the input query words of a user, and then designs a related algorithm to measure the relativity matching degree of a query suggestion list and the query words, so that the online real-time calculation and the adjustment of the ordering are required. In contrast, according to the present embodiment, the ranking can be directly read online using the offline-calculated association value score. Thus, higher real-time performance can be obtained. In particular, the scoring algorithm according to embodiments of the present disclosure can pre-implement global scoring of query suggestions without knowing the set of global query suggestions.

The query suggestion service construction method provided by the disclosure is more real-time. By using the method, the relevant prompt word service of the query text can be provided in real time based on the latest application data, namely, the text to be used for the query is constructed and displayed to the user in real time, and the mechanism can perform self-defined strategy optimization on the query suggestion set according to the self strategy requirement, so that the correlation of the prompt words is improved. And the query suggestions are served to the user in real time through the same-frequency updating of the database building data, so that the user satisfaction is improved. For example, the method can generate query suggestion sets of fields which are likely to be searched by a user during data storage, and perform global scoring and sorting on each query suggestion at the same time of storage, so as to construct an index storage, and can effectively display a head query suggestion set list of most likely queries during online search.

Conventional retrieval systems basically have an inverted index module. The real-time query suggestion construction service can skillfully rely on an inverted index module of a retrieval system, a global score can be obtained through data construction and warehousing and database construction, the inverted index is updated in real time, the latest query suggestion data service can be obtained in real time through retrieval, and the whole process is based on the real-time data retrieval system, and the same-frequency updating and retrieval of query suggestions are completed. Further, when the inverted index is constructed, according to the embodiment of the disclosure, other numerical information related to the query suggestion and the original text, such as "offset, chinese character length, letter character length" and the like, can be selectively reserved, and the fine ranking can be performed according to the business requirement and the post-recall stacking strategy, so that the experience of the retrieval prompt word is better improved.

In the related art, after a user inputs a query term, the query term is usually preprocessed, but in a scenario according to an embodiment of the disclosure, if the user inputs "lixuey", the query term is an inverted index, and is directly sent to a database for query without processing, if the index of the query term exists, the whole linked list is correspondingly acquired. In addition, after the whole linked list is obtained, more fine-ranking policy processing is needed to be performed on the linked document information, such as accurately calculating index information of the documents, such as ctr, cqr, and the like, and optimizing the documents matched with the prefix. According to embodiments of the present disclosure, after recall of the query suggestion, the output may be directly ranked using offline global scoring (relevance score).

According to the embodiment of the disclosure, the materials can be updated, put in storage and deleted in real time. The deleting can ensure that the user can not see the deleted inquiry suggestion, and the mechanism is particularly suitable for ensuring that the user can acquire the latest inquiry suggestion information of the material when searching under the scene that the text of the material is changed frequently.

Common query suggestion services often rely on offline index batch update construction, timed reload update services. According to the embodiment of the disclosure, the index construction can be updated in real time, and the index construction and the database building material data can be updated in the same frequency, so that the second-level data updating effect of the online query suggestion service is achieved based on the real-time performance of the retrieval system.

According to some embodiments of the present disclosure, the provided association degree calculation formula skillfully fuses offset and overlap ratio, which can not only pay attention to prefix priority matching, but also take into consideration text relevance. Such a relevance Score has global comparability in the offline phase.

In addition, the global scoring algorithm can be customized and adjusted according to the service, and the customized ordering requirement of the service is flexibly supported by distributing and adjusting the weight items. The relevance score can be matched based on literal relevance by quick cold start under the condition that no user clicks on data.

Furthermore, the relevance score can be flexibly expanded. For example, other relevancy policies may be added depending on the business development. For example, the formula term may be adjusted using the accumulated relevant click log data. After a certain weight is assigned to the relevant click information, the click information can be added as a ranking criterion. The updated relevancy score may be

Score = offset_weight * offset_score + ctr_weight * ctr_score + click_rate * click_weight

Wherein the click_rate may be defined by the business itself, such as click_num/default_max_click_num.

Therefore, according to the embodiment of the disclosure, flexible service customization capability and expansibility can be realized.

Fig. 5A illustrates an apparatus 500 for providing query suggestions according to an embodiment of the present disclosure. The apparatus 500 may include a character string input unit 510, a character string query unit 520, and a character string output unit 530. The character string input unit 510 may be configured to acquire a first character string input by a user. The string querying unit 520 may be configured to query a preconfigured database using the first string as an index. The database may be a reverse index database. In the database, a second character string may be stored with the first character string as an index, the first character string representing an intermediate character sequence of the second character string or a part of the second character string when input. The character string output unit 530 may be configured to output the second character string as a query suggestion.

Fig. 5B illustrates a provisioning database construction device 550 according to an embodiment of the present disclosure. The apparatus 550 may include a character string processing unit 560 and a character string storage unit 570. The string processing unit 560 may be configured to process the term string to obtain one or more fragment strings. Each of the one or more segment strings is a lexical string or an intermediate character sequence of a portion of the lexical string when entered.

The string storage unit 570 may be configured to store, for each of one or more fragment strings, the fragment string as an index, a term string.

According to some embodiments, the term string is a kanji string. In such an embodiment, the character string processing unit 560 may further include: a unit configured to obtain one or more kanji substrings of a vocabulary term string, each kanji substring comprising one or more kanji characters arranged consecutively in the vocabulary term string; a unit configured to generate, for each of the one or more kanji substrings, one or more hybrid substrings by replacing at least one kanji of the kanji substring with a corresponding pinyin representation; and a unit configured to use one or more of the generated kanji substrings and the hybrid substrings as one or more segment strings for the term string.

According to some embodiments, the apparatus 550 may further include: and means for determining, for each of the one or more segment strings, an association value between the segment string and the term string after processing the term string to obtain the one or more segment strings. The character string storage unit 570 may include: and a unit configured to store, for each of the one or more fragment strings, the term string and the corresponding association value with the fragment string as an index.

According to a further embodiment, the unit configured to determine, for each of the one or more segment strings, an association value between the segment string and the term string after processing the term string to obtain the one or more segment strings may comprise: a unit configured to determine a degree of offset of the segment string relative to the term string, the degree of offset indicating a position of a character in the term string corresponding to a first character in the segment string; a unit configured to determine a degree of coincidence of the segment string with the term string, the degree of coincidence indicating a degree of content matching of the segment string with the term string; and a unit configured to determine an association value between the segment string and the term string based on the degree of offset and the degree of overlap.

According to embodiments of the present disclosure, there is also provided an electronic device, a readable storage medium and a computer program product.

Referring to fig. 6, a block diagram of an electronic device 600 that may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606, an output unit 607, a storage unit 608, and a communication unit 609. The input unit 606 may be any type of device capable of inputting information to the device 600, the input unit 606 may receive input numeric or character information and generate key signal inputs related to user settings and/or function control of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a trackpad, a trackball, a joystick, a microphone, and/or a remote control. The output unit 607 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 608 may include, but is not limited to, magnetic disks, optical disks. The communication unit 609 allows the device 600 to exchange information/data with other devices through a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, 1302.11 devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the various methods and processes described above, such as method 200 and

methods

410, 450, and so on. For example, in some embodiments, the method 200 and

methods

410, 450, etc. may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by computing unit 601, one or more of the steps of method 200 and

methods

410, 450, etc., described above, may be performed. Alternatively, in other embodiments, computing unit 601 may be configured to perform method 200 and

methods

410, 450, etc. in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the foregoing methods, systems, and apparatus are merely exemplary embodiments or examples, and that the scope of the present invention is not limited by these embodiments or examples but only by the claims following the grant and their equivalents. Various elements of the embodiments or examples may be omitted or replaced with equivalent elements thereof. Furthermore, the steps may be performed in a different order than described in the present disclosure. Further, various elements of the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced by equivalent elements that appear after the disclosure.

Claims

1. A method of providing query suggestions, comprising:

acquiring a first character string input by a user;

querying a preconfigured database by using the first character string as an index, wherein the database is an inverted index database, and a second character string is stored in the database by taking the first character string as an index, and the first character string is an intermediate character sequence of the second character string or a part of the second character string when the second character string or the part of the second character string is input; and is also provided with

Outputting the second character string as a query suggestion,

wherein the database is preconfigured by the following steps:

processing the second string to obtain one or more fragment strings, each of the one or more fragment strings being an intermediate string of the second string or a portion of the second string when entered, and the one or more fragment strings comprising the first string; and

for each of the one or more fragment strings, storing the second string as an index to the fragment string,

wherein the second string is a kanji string, and wherein processing the second string to obtain one or more segment strings comprises:

acquiring one or more Chinese character sub-strings of the second character string, wherein each Chinese character sub-string comprises one or more Chinese characters which are arranged in succession in the second character string;

generating one or more mixed substrings by replacing at least one Chinese character in the one or more Chinese character substrings with a corresponding pinyin representation for each Chinese character substring; and

Using one or more of the generated kanji substrings and hybrid substrings as the one or more segment strings for the second string,

wherein generating one or more mixed substrings for each of the one or more kanji substrings comprises:

taking the first k characters of the Chinese character sub-string as a first splicing part, wherein k is a non-negative integer and k is less than or equal to n, and n is the character length of the Chinese character sub-string;

generating one or more second concatenation parts, each of the one or more second concatenation parts being a character string formed by the full concatenation or initial of the (k+1) th to (n-1) th characters of the kanji substring in sequence;

generating one or more third stitched portions, each of the one or more third stitched portions being a left ordered subset of the full concatenation of the nth character of the kanji substring;

generating the one or more mixed sub-strings, wherein each mixed sub-string is a string formed by sequentially splicing a first splicing part, one second splicing part of the one or more second splicing parts and one third splicing part of the one or more third splicing parts.

2. The method of claim 1, wherein obtaining one or more kanji substrings of the second string comprises:

generating one or more right side substrings of the second string, each right side substring being a consecutively arranged substring in the second string, and a last character of the right side substring being identical to a last character of the second string; and is also provided with

For each right side substring, a left side substring of the right side substring is generated as a substring for the second character string, each left side substring is a substring arranged consecutively in the right side substring, and a first character of the left side substring is identical to a first character of the right side substring.

3. The method according to claim 1 or 2, wherein the database is further preconfigured by:

after processing the second string to obtain one or more fragment strings, for each fragment string of the one or more fragment strings, determining an association value between the fragment string and the second string, and

storing the segment string as an index the second string includes: taking the fragment character string as an index, and storing the second character string and a corresponding association value;

And wherein outputting the second string as a query suggestion comprises:

in response to determining that the association value between the first string and the second string meets a threshold condition, the second string is caused to be output to the user.

4. A method according to claim 3, wherein determining an association value between the segment string and the second string comprises:

determining a degree of offset of the segment string relative to the second string, the degree of offset indicating a position of a character in the second string corresponding to a first character in the segment string;

determining the coincidence degree of the segment character string and the second character string, wherein the coincidence degree indicates the content matching degree of the segment character string and the second character string; and is also provided with

The association value between the segment string and the second string is determined based on the offset and the overlap.

5. A method according to claim 3, wherein the database is further preconfigured by:

processing a third string to obtain one or more fragment strings, each of the one or more fragment strings representing an intermediate string of the third string or a portion of the third string when entered, and the one or more fragment strings comprising the first string;

Determining, for each of the one or more segment strings, an association value between the segment string and the third string; and

using each of the one or more fragment strings as an index, storing the third string and corresponding association value in the database under the corresponding index, and

wherein outputting the second string as a query suggestion comprises:

in response to determining that the association value of the first string with the second string is greater than the association value of the first string with the third string, the second string is preferentially output relative to the third string.

6. A database construction method, comprising:

processing a vocabulary term string to obtain one or more fragment strings, each of the one or more fragment strings being an intermediate character sequence of the vocabulary term string or a portion of the vocabulary term string when entered; and

for each of the one or more fragment strings, storing the term string as an index,

wherein the term string is a kanji string, and wherein processing the term string to obtain one or more segment strings comprises:

Acquiring one or more Chinese character sub-strings of the term character string, wherein each Chinese character sub-string comprises one or more Chinese characters which are arranged in succession in the term character string;

using one or more of the generated kanji substrings and hybrid substrings as the one or more segment strings for the term string,

7. The method of claim 6, further comprising:

after processing the term string to obtain one or more segment strings, for each segment string of the one or more segment strings, determining an association value between the segment string and the term string, and

storing the term string as an index includes: and taking the fragment character string as an index, and storing the term character string and the corresponding association value.

8. The method of claim 7, wherein determining an association value between the segment string and the term string comprises:

determining a degree of offset of the segment string relative to the term string, the degree of offset indicating a position of a character in the term string corresponding to a first character in the segment string;

determining the coincidence degree of the segment character string and the term character string, wherein the coincidence degree indicates the content matching degree of the segment character string and the term character string; and is also provided with

And determining the association value between the fragment character string and the term character string based on the offset and the coincidence degree.

9. An apparatus for providing query suggestions, comprising:

a character string input unit configured to acquire a first character string input by a user;

a character string query unit configured to query a pre-configured database using the first character string as an index, wherein the database is an inverted index database in which a second character string is stored with the first character string as an index, the first character string representing an intermediate character sequence of the second character string or a part of the second character string when input; and

a character string output unit configured to output the second character string as a query suggestion,

wherein the database is preconfigured by the following steps:

10. A database construction apparatus comprising:

a character string processing unit configured to process a vocabulary term character string to obtain one or more fragment character strings, each of the one or more fragment character strings being an intermediate character sequence of the vocabulary term character string or a portion of the vocabulary term character string when input; and

a character string storage unit configured to store, for each of the one or more fragment character strings, the fragment character string as an index,

Wherein the term string is a kanji string, and wherein the string processing unit further comprises:

a unit configured to obtain one or more kanji substrings of the term string, each kanji substring comprising one or more kanji characters arranged consecutively in the term string;

a unit configured to generate, for each of the one or more kanji substrings, one or more hybrid substrings by replacing at least one kanji in the kanji substring with a corresponding pinyin representation; and

a unit configured to use one or more of the generated kanji substrings and hybrid substrings as the one or more segment strings for the term string,

11. The apparatus of claim 10, further comprising:

a unit configured to determine, for each of the one or more segment strings, an association value between the segment string and the term string after processing the term string to obtain one or more segment strings, and

the character string storage unit includes: and a unit configured to store, for each of the one or more fragment strings, the term string and the corresponding association value with the fragment string as an index.

12. The apparatus of claim 11, wherein the means configured to determine, for each of the one or more segment strings, an association value between the segment string and the term string after processing the term string to obtain the one or more segment strings comprises:

A unit configured to determine a degree of offset of the segment string relative to the term string, the degree of offset indicating a position of a character in the term string corresponding to a first character in the segment string;

a unit configured to determine a degree of coincidence of the segment string with the term string, the degree of coincidence being indicative of a degree of content matching of the segment string with the term string; and

and determining the association value between the segment string and the term string based on the degree of offset and the degree of overlap.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the method comprises the steps of

The memory stores instructions for execution by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5 or 6-8.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5 or 6-8.