CN111831876B - Query method, device and storage medium - Google Patents

Query method, device and storage medium Download PDF

Info

Publication number
CN111831876B
CN111831876B CN201910297346.5A CN201910297346A CN111831876B CN 111831876 B CN111831876 B CN 111831876B CN 201910297346 A CN201910297346 A CN 201910297346A CN 111831876 B CN111831876 B CN 111831876B
Authority
CN
China
Prior art keywords
index
query
words
word
list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910297346.5A
Other languages
Chinese (zh)
Other versions
CN111831876A (en
Inventor
李世峰
李中男
朱宏波
于严
赵帅领
王鹏
郭艳民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Navinfo Co Ltd
Original Assignee
Navinfo Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Navinfo Co Ltd filed Critical Navinfo Co Ltd
Priority to CN201910297346.5A priority Critical patent/CN111831876B/en
Publication of CN111831876A publication Critical patent/CN111831876A/en
Application granted granted Critical
Publication of CN111831876B publication Critical patent/CN111831876B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a query method, a query device and a storage medium, wherein the query method comprises the following steps: acquiring query words; if the query list contains target index words with the same coding values as the query words, determining index identifiers of the target index words and the number of candidate index words comprising the query words according to the query list, wherein the query list is used for indicating the corresponding relation among the coding values of the target index words, the index identifiers of the target index words and the number of the candidate index words; acquiring all candidate index words according to the index identifications of the target index words and the number of the candidate index words, wherein the index identifications of the adjacent candidate index words are continuous; pushing all candidate index words. According to the invention, through pre-establishing the corresponding relation among the coding value of the target index word, the index identification of the target index word and the number of the candidate index words, all the candidate index words can be rapidly determined, and the index efficiency is improved regardless of the number of the index words.

Description

Query method, device and storage medium
Technical Field
The present invention relates to the field of computer science and technology, and in particular, to a query method, device, and storage medium.
Background
With the rapid expansion of internet information, users have increasingly relied on search engines, and their commercial value has not been estimated. Typically, when a user inputs a character to be queried using a mobile terminal or a PC terminal, a search engine needs to quickly and intelligently prompt the user of the remaining part to be input, i.e., the associative word indexing function.
In the prior art, the associative word indexing function generally adopts a sequential table indexing and hash table indexing mode; the sequential table index is to store all index words in sequence, and obtain the index words by binary search during searching; the hash table index is to build a mapping between each query word and the corresponding index word list through a hash function, and obtain the index word through the mapping relation.
The query time complexity of the sequential table index is O (log 2 n), n is the number of index words, the searching efficiency of the method is low, and particularly the searching efficiency is lower under the condition that the number of index words is very large; the query time complexity of the hash table index is O (1), and the search efficiency is high, but when the data amount is large, the hash function increases the operation amount in order to avoid a hash collision.
Disclosure of Invention
The invention provides a query method, query equipment and a storage medium, which are irrelevant to the number of index words and improve the index efficiency.
A first aspect of the present invention provides a query method, including:
Acquiring query words;
if the query list contains target index words with the same coding values as the query words, determining index identifiers of the target index words and the number of candidate index words comprising the query words according to the query list, wherein the query list is used for indicating the corresponding relation among the coding values of the target index words, the index identifiers of the target index words and the number of candidate index words;
acquiring all the candidate index words according to the index identifications of the target index words and the number of the candidate index words, wherein the index identifications of the adjacent candidate index words are continuous;
pushing all the candidate index words.
A second aspect of the present invention provides a querying device comprising:
the query term acquisition module is used for acquiring query terms;
The determining module is used for determining index identifiers of the target index words and the number of candidate index words comprising the query words according to the query list if the query list contains the target index words with the same coding values as the query words, wherein the query list is used for indicating the corresponding relation among the coding values of the target index words, the index identifiers of the target index words and the number of candidate index words;
the candidate index word acquisition module is used for acquiring all the candidate index words according to the index identifications of the target index words and the number of the candidate index words, wherein the index identifications of the adjacent candidate index words are continuous;
and the pushing module is used for pushing all the candidate index words.
A third aspect of the present invention provides a querying device, comprising: at least one processor and memory;
The memory stores computer-executable instructions;
The at least one processor executes the computer-executable instructions stored by the memory to cause the querying device to perform the querying method described above.
A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, implement the above-described query method.
The invention provides a query method, a query device and a storage medium, wherein the query method comprises the following steps: acquiring query words; if the query list contains target index words with the same coding values as the query words, determining index identifiers of the target index words and the number of candidate index words comprising the query words according to the query list, wherein the query list is used for indicating the corresponding relation among the coding values of the target index words, the index identifiers of the target index words and the number of the candidate index words; acquiring all candidate index words according to the index identifications of the target index words and the number of the candidate index words, wherein the index identifications of the adjacent candidate index words are continuous; pushing all candidate index words. According to the invention, through pre-establishing the corresponding relation among the coding value of the target index word, the index identification of the target index word and the number of the candidate index words, all the candidate index words can be rapidly determined, and the index efficiency is improved regardless of the number of the index words.
Drawings
FIG. 1 is a schematic diagram of a system architecture to which the query method of the present invention is applicable;
FIG. 2 is a flow chart of a query method according to the present invention;
fig. 3 is an interface change schematic diagram of a terminal corresponding to the query method provided by the invention;
FIG. 4 is a second flow chart of the query method according to the present invention;
FIG. 5 is a schematic flow chart of creating a query list, dictionary files and query files in the query method provided by the invention;
FIG. 6 is an exemplary diagram of an index tree provided by the present invention;
FIG. 7 is an exemplary diagram of a query document provided by the present invention;
FIG. 8 is an exemplary diagram of a dictionary file provided by the present invention;
Fig. 9 is a schematic structural diagram of a query device according to the present invention;
fig. 10 is a schematic diagram of a second structure of the query device provided by the present invention;
fig. 11 is a schematic diagram of a structure of a query device according to the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described in the following in conjunction with the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a schematic diagram of a system architecture suitable for a query method provided by the present invention, where the query method provided by the present invention is suitable for a query scenario shown in fig. 1, where the query scenario includes a terminal and a query device, and the query device may be a server, where the terminal provided by the present invention is in communication connection with the query device through a wired or wireless manner, and a user inputs a query word through the terminal, and after the terminal obtains the query word, the terminal sends the query word to the query device, so that the query device returns an association word (index word) corresponding to the query word to the terminal according to the query word. In the following embodiments, a query method provided by the present invention is described by taking a query device as an example of a server.
The terminal in the present invention may be, but is not limited to, a mobile terminal or a fixed terminal. The specific mobile terminal may be a smart phone, a PAD, or the like, which has a function of enabling a user to input a query word and display the query word. The stationary terminal may be a stationary device such as a desktop computer having a function of enabling a user to input a query term and display the query term.
Fig. 2 is a schematic flow diagram of a query method provided in the present invention, and an execution subject of the flow of the method shown in fig. 2 may be a query device, and the query device may be a server as described above. As shown in fig. 2, the query method provided in this embodiment may include:
S101, acquiring query words.
In this embodiment, the terminal may obtain the query term, and the manner in which the terminal obtains the query term may be: the terminal displays an interface for acquiring the query words, specifically, the terminal can acquire the query words through the query words input by the user on the interface, and can acquire the query words through collecting the audio input by the user. After the terminal acquires the query term, the query term may be sent to the server, so that the server acquires the query term.
S102, if the query list contains target index words with the same coding values as the query words, determining index identifiers of the target index words and the number of candidate index words comprising the query words according to the query list, wherein the query list is used for indicating the corresponding relation among the coding values of the target index words, the index identifiers of the target index words and the number of the candidate index words.
In this embodiment, the query list may have a code value of the index word stored in advance. After the server obtains the query word, the query word can be encoded according to the agreed encoding mode, the encoding value of the query word is obtained, and whether the target index word which is the same as the encoding value of the query word is contained in the query list is further determined. It should be appreciated that the contracted encoding mode is the same as the encoding mode that encodes the index words in the query list.
The query list is used for indicating the corresponding relation among the coding value of the target index word, the index identification of the target index word and the number of candidate index words. When the query list contains the target index words with the same coding values as the query words, the index identification of the target index words and the number of candidate index words containing the query words can be determined according to the query list. The target index words are candidate index words with the minimum index identification in all candidate index words. The number of candidate index words is the number of candidate index words comprising the query word. For example, if the query word is "beijing", there are 4 candidate index words including "beijing", and the number of candidate index words is 4.
Specifically, all index words in the query list in the server are index words having a certain order arrangement. For example, the arrangement order of the index words may be that each index word in the query database is arranged according to the order of Unicode codes of each character in the index word, and index identifiers of adjacent index words are continuous, which shows a corresponding list of index words, coding values of the index words, and corresponding index identifiers, where the coding manner of the index words is not limited in this embodiment, and the coding values of the index words are expressed in a alphabetical manner in the first table.
For example, if the index words include four index words of "beijing city", "beijing university", "beijing book building" and "beijing city government", the four index words are ordered according to Unicode codes of each character in each index word, and the obtained corresponding list of the index words and the corresponding index identifiers is as follows:
List one
Index identification Index words Coded value
1000 University of Beijing A
1001 Beijing city B
1002 Government of Beijing city C
1003 Beijing book mansion D
For example, if the query word is "north", the candidate index words including the query word are four index words of "beijing city", "beijing university", "beijing book mansion" and "beijing city government", the number of candidate index words including the query word is 4, and the index identifier of the target index word is the candidate index word with the minimum index identifier among all the candidate index words, that is, 1000. For example, if the query word is "beijing city", the candidate index words including "beijing city" are two index words of "beijing city" and "beijing city government", the number of candidate index words including "beijing city" is 2, and the index identifier of the target index word is 1001.
The form of the query list stored in the server may specifically be as shown in the following table two, where the query list is used to indicate the corresponding relationship among the target index word, the index identifier of the target index word, and the number of all candidate index words:
Watch II
In this embodiment, in order to more accurately obtain the index word corresponding to the query word, the index query word corresponding to the index word may be stored in the second table, and a specific query list may be in a form as shown in the third table:
Watch III
Because the server in this embodiment stores the query list in advance, after the server obtains the query word, the encoding value of the query word and the encoding value of the index word in the query list may be matched to obtain the target index word, and the index identifier of the target index word and the number of candidate index words including the query word are obtained in the query list.
S103, obtaining all candidate index words according to index identifiers of target index words and the number of candidate index words, wherein index identifiers of adjacent candidate index words are continuous.
The query database in this embodiment stores a plurality of index words in advance, wherein index identifiers of adjacent candidate index words are continuous, and index identifiers of each index word are the same as index identifiers used in the query list. After the server obtains the index identifiers of the target index words and the number of candidate index words, the server can use the index identifiers of the target index words as initial identifiers in the query database, and the continuous multiple index words are all candidate index words, wherein the number of the multiple index words is the same as the number of the determined candidate index words.
Illustratively, when the query term is "beijing", the number of corresponding candidate index terms including the query term is 4, and the index of the target index term is 1000, the index terms from 1000 to 1004 are all candidate index terms, i.e., the candidate index terms are "beijing university", "beijing city government", and "beijing book building".
S104, pushing all candidate index words.
After the server acquires all candidate index words in the database, pushing all candidate index words to the terminal.
For example, fig. 3 is a schematic diagram of interface change of a terminal corresponding to the query method provided by the present invention, as shown in fig. 3, where interface 201 is that when a query word input by a user in a query box is "north", a server pushes all candidate index words to the terminal, including "beijing university", "beijing city government" and "beijing book building"; the interface 202 is that when the query word input by the user in the query box is "beijing", the server pushes all candidate index words to the terminal as "beijing university", "beijing city government" and "beijing book building"; the interface 203 is used for pushing all candidate index words to the terminal as "beijing city" and "beijing city government" when the query word input by the user in the query box is "beijing city".
The embodiment provides a query method, which includes: acquiring query words; if the query list contains target index words with the same coding values as the query words, determining index identifiers of the target index words and the number of candidate index words comprising the query words according to the query list, wherein the query list is used for indicating the corresponding relation among the coding values of the target index words, the index identifiers of the target index words and the number of the candidate index words; acquiring all candidate index words according to the index identifications of the target index words and the number of the candidate index words, wherein the index identifications of the adjacent candidate index words are continuous; pushing all candidate index words. According to the method and the device, through the preset corresponding relation between each index word and the target index word identification and the number of candidate index words comprising the index word, after the query word is obtained, all candidate index words corresponding to the query word can be obtained quickly, and the index efficiency is improved.
On the basis of the foregoing embodiment, the query method provided by the present invention and the query list thereof are further described below with reference to fig. 4, and fig. 4 is a second schematic flow chart of the query method provided by the present invention, as shown in fig. 4, where the query method provided by the present embodiment may include:
s301, a query list, dictionary files and query files are established.
The index database in this embodiment stores a pre-established query file, where the query file includes a query index area and a query content area; the query index area is used for storing a query list, the query index area is also used for indicating the storage position of the target index words in the query content area, and the query content area is used for storing all index words in the index database.
The index database also stores a pre-established dictionary file, wherein the dictionary file comprises: the dictionary element area is used for storing the coding values of all index words, and the storage positions of the double-array coding base values and the double-array coding check values corresponding to the coding values of each index word in the dictionary content area, and the dictionary content area is used for storing the base values and the check values corresponding to the coding values of each index word. It should be appreciated that after the index words are double-data encoded, a base array and a check array for each query word may be obtained. The base array includes the code value of each index word and the base value corresponding to the code value, and similarly, the check array includes the code value of each index word and the check value corresponding to the code value.
The specific process of creating the query list, dictionary file, and query file in this embodiment is described with reference to fig. 5. Fig. 5 is a schematic flow chart of establishing a query list, a dictionary file and a query file in the query method provided by the present invention, as shown in fig. 5, the specific process of S301 in this embodiment is as follows:
S3011, sorting all index words according to Unicode codes of each character in each index word and the length of each index word, and generating an index word list.
In this embodiment, the server first orders the plurality of index words stored in the query database, specifically in the following manner: traversing the Unicode codes of each character in each index word in turn, and acquiring the first order of the index words in the index word list to be generated according to the sequence of the Unicode codes of each character in each index word.
For example, the database includes four index words of "beijing university", "beijing city government" and "beijing book mansion", and the first order of obtaining the index words in the list of index words to be generated may be "beijing university", "beijing city government" and "beijing book mansion" sequentially according to the order of obtaining the index words from the Unicode code of the first character, the second character, … … to the last character of the index words.
In this embodiment, if the sequence of Unicode codes of part of characters in at least two index words is the same, the sequence of at least two index words is adjusted in the first order according to the lengths of at least two index words, so as to obtain an adjusted first order. The length of the index word may be the number of characters contained in the index word. For example, if the index word is "Beijing", the length of the index word is 2; the index word is "Beijing city", and the length of the index word is 3.
For example, the "beijing city" and "beijing city government" in the first ranking are index words, in which the order of Unicode codes with partial characters is the same, and the order of the two index words is adjusted in the first ranking according to the order of the lengths of the two index words from small to large, so as to obtain the adjusted first ranking as "beijing university", "beijing city government" and "beijing book building".
After the server obtains the sequences of the index words, an index word list is generated according to the adjusted first sequence and the index identification of each index word. Specifically, the index identifier of the index word may be the sequence number of the index word obtained according to the above-mentioned sequence among all the index words.
For example, the index word list may be as shown in table one of the above embodiments, such as "beijing university" which is the index word ranked 1000 th among all index words, and thus the index flag of "beijing university" is set to 1000.
S3012, obtaining an index tree corresponding to the index word with the same initial character according to the index word list.
In this embodiment, the server obtains, according to the index word list, an index tree corresponding to an index word having the same first character. Fig. 6 is an exemplary diagram of an index tree provided in the present invention, and fig. 6 is an exemplary illustration of an index tree formed when index words are "beijing university", "beijing city government" and "beijing book building".
The index tree in this embodiment includes a plurality of nodes, each node including one character of an index word having the same first character, wherein the characters corresponding to each node are arranged in the order of Unicode codes of each character in the index word. Such as: the index words are "Beijing university", "Beijing city government" and "Beijing book mansion" with the same first character "North", and the order of each node corresponding to the index words "Beijing university" is "North", "Beijing", "big" and "school", respectively; accordingly, the characters in the nodes corresponding to the beijing city, the "beijing city government" and the "beijing book building" are arranged in the order of each character in the index word.
Wherein, the index identifier corresponding to each node is: in the index word list, index marks of index words formed by characters in the nodes and all characters of father nodes of the nodes are used; illustratively, the identification corresponding to node "city" is: index marks of the index word "Beijing city" consisting of all characters "north" and "Beijing" of the father node of the "city" and "city" are shown as 1001 in the index word list, so that the index mark of the node where the "city" is located is 1001.
The corresponding number of each node is the number of index words beginning with the index word in the index word list; illustratively, the corresponding number of nodes "city" is: the number of index words beginning with the index word "beijing city" consisting of all characters "north" and "jing" of the father node of "city" and "city", in the index word list, the index words beginning with "beijing city" are "beijing city", "beijing city government", and the corresponding number of node "city" is 2.
According to the mode, the server traverses each node in the index tree in turn, and acquires index identifiers corresponding to index words formed by characters in each node and the parent node of each node and the number of the index words.
S3013, performing double-array coding on the index words corresponding to each node of the index tree to obtain a coding value of each index word, and performing double-array coding base value and double-array coding check value corresponding to the coding value of each index word.
It should be understood that the index word corresponding to each node is an index word composed of characters in each node and its parent node. In this embodiment, the server performs double-array encoding on each index word of each index tree to obtain a base array and a check array. The base array and the check array comprise the coding value of each index word, and the double-array coding base value and the double-array coding check value corresponding to the coding value of each index word.
Specifically, the server encodes a single character corresponding to each node into Unicode codes corresponding to each character. For example, as shown in fig. 6, the above-mentioned index tree that has been built mainly contains 10 characters, and the 10 characters and the corresponding Unicode codes are: north (21271), beijing (20140), dada (22823), city (24066), diagram (22270), school (23398), government (25919), book (20070), house (24220), mansion (21414), then double-array coding is performed on each index word in the index tree, namely, the codes corresponding to the obtained index words are respectively: north (21271), beijing (40907 =20767+20140), beijing (41461 =18638+22823), beijing (42704 =18638+24066), beijing (40908 =18638+22270), beijing university (40909 =17511+233998), beijing (40910 =14991+25919), beijing book (40911 =20841+20070), beijing government (40912=16692+24220), beijing book (40913=18090+22823), beijing book building (40914 =19500+21414).
In the process of carrying out double-array coding on each index word, whether the current coding exceeds the lengths of double-array base and check is needed to avoid the crossing of array subscripts, 40907 spaces are reserved behind the maximum coding in the double-array, and therefore the purpose of exchanging small lost space for high efficiency is achieved.
The specific base array for double array encoding is shown in table four below, and the check array for double array encoding is shown in table five below:
Table four
TABLE five
As shown in the above table four and table five, the lower regions in the corresponding tables of the base array and the check array are base values and check values respectively; the upper regions in the corresponding tables of the base array and the check array are the encoded values of the index words.
S3014, establishing a query list and a query file according to the coding value of the index words corresponding to each node in the index tree and the index identification and the number corresponding to each node.
After obtaining the code value corresponding to each index word, the server can build a mapping relation between each index word of the index tree, the code value of each index word of each index tree, and index identifiers and numbers corresponding to nodes to which each index word of each index tree belongs, so as to obtain a query list. It should be understood that the index identifier corresponding to the node to which each index word of each index tree belongs, that is, the index identifier of the target index word when the index word is the target index word.
Illustratively, the query list may be as shown in Table six below:
TABLE six
Index words Coded value Index identification Quantity of
North China 21271 1000 4
Beijing 40907 1000 4
Beijing city 42704 1001 2
Beijing da (big Beijing) 41461 1000 1
Similarly, the server establishes a query file according to the coding value of the index word corresponding to each node in the index tree, and the index identifier and the number corresponding to each node. Wherein, the query file can be constructed in the query database and divided into a query index area and a query content area. And storing the established query list in a query index area of the query file, and storing all index words in an index database in a query content area. And storing each index word in the storage position of the query content area in the query index area, thereby constructing a query file.
Optionally, after the server generates the query list, the query list may also be stored in the form of a double array, and correspondingly, the obtained double data includes a from array (i.e. index identifier) and a length array (i.e. the number of index words included) corresponding to each index word.
Specifically, the from array is shown in table seven below, and the length array is shown in table eight below:
Watch seven
Table eight
As shown in the above table seven and the table eight, the upper area in the corresponding tables of the from array and the length array is an array index, that is, the coded values of the corresponding index words respectively, and the lower area in the tables is the array content, that is, the corresponding index identifiers and the number of the index words respectively.
Fig. 7 is an exemplary diagram of a query document provided in the present invention, where, as shown in fig. 7, a coded value of each index word is stored in a query meta-region, and a storage location of each index word in a query content region, and in fig. 7, for simplicity and convenience, the number of index words is shown in the query index region. The query content area stores index identifiers of target index words corresponding to each index word of each index tree and the number of the index words. Illustratively, from21271 and length21271 correspond to the index identifier and the number of index words of the target index word "north" of the index word, respectively, from40907 and length40907 correspond to the index identifier and the number of index words of the index word "Beijing", respectively, and in fig. 7, the index identifier of the target index word is denoted by f in shorthand, and the number of index words is denoted by l in shorthand.
S3015, establishing a dictionary file according to the coding value of the index word corresponding to each node in the index tree, and the base value and the check value corresponding to the coding value of each index word.
In the same manner as the above-described manner of creating the query file, a dictionary file may be created in the query database, the dictionary file being divided into a dictionary meta area and a dictionary content area. Storing the coding value of the index word corresponding to each node in the index tree in a dictionary element area, storing the base value and the check value corresponding to the coding value of each index word in a dictionary content area, and further storing the base value and the check value corresponding to the coding value of each index word in the dictionary element area at the storage position of the dictionary content area, thereby constructing a dictionary file and facilitating the query of the base value and the check value corresponding to the coding value of each index word.
Fig. 8 is an exemplary diagram of a dictionary file provided in the present invention, as shown in fig. 8, in which the code values of index words are stored in a dictionary element area, and the storage positions of base values and check values corresponding to the code values of each index word in a dictionary content area, and in fig. 8, for simplicity and convenience, the number of index words is shown in the dictionary element area. The dictionary content area stores a base value and a check value corresponding to each index word of each index tree; illustratively, base21271 and check21271 correspond to the base value and the check value of the index word "north", respectively, base40907 and check40907 correspond to the base value and the check value of the index word "Beijing", respectively, and in fig. 8, the base value is denoted by b, and the check value is denoted by c.
It is to be understood that S3014 and S3015 do not have a distinction of a sequential order, and both may be performed simultaneously.
S302, acquiring query words.
S303, determining that the query list contains target index words which are the same as the code values of the query words according to the query words, the base value corresponding to the code value of each index word and the check value.
If the query word contains 1 character, the Unicode code of the query word is the code value of the query word, and the target index word which is the same as the code value of the query word is determined to be contained in the query list according to the code value of the query word and the code values of a plurality of index words stored in the query list.
If the query word contains i characters, i is an integer greater than 1, that is, if the query word contains at least two characters, the coding value of the second character string can be obtained from the dictionary element area according to the Unicode code of the i-th character and the base value corresponding to the coding value of the first character string. Wherein, the first character string is: a character string formed from the first character of the query term to the i-1 th character, wherein the second character string is as follows: a character string formed from the first character to the i-th character.
If the check value corresponding to the code value of the second character string is the Unicode code of the i-1 th character, determining that the query list contains the target index word which is the same as the code value of the query word, wherein the code value of the second character string is the code value of the query word.
For example, the query term is "Beijing", the Unicode code corresponding to the first character "north" of the query term is 21271, and the Unicode code 20140 corresponding to "Beijing"; the method comprises the steps that when the code value of a preset index word 'Beijing' taking a first character as a first character is 40907, a base value corresponding to 'north' is 20767 according to the difference value between the code value of 'Beijing' and a Unicode code 20140 corresponding to 'Beijing', a server judges that a second character 'Beijing' exists in a query word except the first character, and a character string 'Beijing' code value from the first character to the second character is obtained according to the Unicode code of the second character and the base value corresponding to the first character; judging whether the check value corresponding to the character string code value is the Unicode code of the first character, and determining that the query list contains the target index word identical to the code value of the query word when the check value corresponding to the character string code value is the Unicode code of the first character, namely the check value corresponding to the character string code value is equal to the Unicode of the first character, wherein the character string code value is the code value of the query word. Optionally, if the check value corresponding to the string code value is not Unicode of the first character, that is, if the check value corresponding to the string code value is not equal to Unicode of the first character, it is determined that the query list does not contain the target index word, the query is stopped.
It should be understood that in this scenario, it may further be further determined whether the query word further includes a third character, whether the check value corresponding to the three string code values is equal to the Unicode code of the second character may be further determined in the same manner as described above, if yes, whether the fourth character is further determined, and according to the determination manner described above, until the last character in the query word is determined to be complete, it is determined that the query list includes the target index word having the same code value as the query word.
S304, determining the storage position of the target index word in the query content area in the query index area according to the index identification of the target index word; and acquiring all candidate index words in the query content area according to the storage position of the target index word in the query content area and the number of the candidate index words.
In the above steps, the server obtains the code value of the query word, and determines that the query list includes the target index word identical to the code value of the query word, and then determines the index identifier of the target index word identical to the code value of the query word and the number of candidate index words including the query word in the query list. Because the query index area contains the storage position of each index word in the query content area, the storage position of the target index word in the query content area can be obtained according to the index identification of the target index word.
Alternatively, the ordering of the index words in the query content area may be the same as the ordering in the query list, i.e., the index identifications of adjacent index words are consecutive. The server may obtain all candidate index words in the query content area according to the storage location of the target index word in the query content area and the number of candidate index words including the query word. For example, the number of candidate index words including the query word is 3, the index of the target index word is identified as 1000, and all the corresponding candidate index words are index words with index identifications of 1000, 1001, and 1002.
S305, pushing all candidate index words.
Specific implementations of S302 and S305 in this embodiment may refer to the descriptions related to S101 and S104 in the foregoing embodiments, which are not described herein.
In the embodiment, an index tree mode is adopted, so that index words corresponding to each node, index identifiers corresponding to each index word and the number of the index words are determined quickly, and a query list and a query file are established; further, the index words in the index tree are subjected to double-array coding in a double-array coding mode, and convenience is provided for establishing dictionary files. In this embodiment, when it is determined that the query list includes the target index word according to the dictionary file, all candidate index words are obtained from the query file. The dictionary file and the query file in the embodiment provide convenience for determining that the query list contains the target index words and obtaining all candidate index words, and the index speed of providing the query mode in the embodiment is irrelevant to the number of the index words, so that the index efficiency can be improved.
Fig. 9 is a schematic structural diagram of a query device according to the present invention. As shown in fig. 9, the querying device 400 includes: a query term acquisition module 401, a determination module 402, a candidate index term acquisition module 403, and a push module 404.
The query term acquisition module 401 is configured to acquire a query term.
The determining module 402 is configured to determine, according to the query list, an index identifier of the target index word and a number of candidate index words including the query word if the query list includes the target index word having the same code value as the code value of the query word, where the query list is used to indicate a correspondence between the code value of the target index word, the index identifier of the target index word, and the number of candidate index words.
The candidate index word obtaining module 403 is configured to obtain all candidate index words according to the index identifier of the target index word and the number of candidate index words, where the index identifiers of the adjacent candidate index words are continuous.
And the pushing module 404 is configured to push all candidate index words.
The query device provided in this embodiment is similar to the principle and technical effects achieved by the foregoing query method, and will not be described herein.
Alternatively, fig. 10 is a schematic structural diagram of a query device provided by the present invention. As shown in fig. 10, the querying device 400 further includes: a module 405 is established.
Optionally, the index database stores query files, wherein the query files comprise a query index area and a query content area; the query index area is used for storing a query list, the query index area is also used for indicating the storage position of the target index words in the query content area, and the query content area is used for storing all index words in the index database.
Optionally, the index database further stores a dictionary file, where the dictionary file includes: the dictionary element area is used for storing the coding values of all index words, and the base value and the check value corresponding to the coding value of each index word are stored in the storage position of the dictionary content area, and the dictionary content area is used for storing the base value and the check value corresponding to the coding value of each index word.
The determining module 402 is further configured to determine that the query list includes a target index word that is the same as the code value of the query word according to the query word, the base value and the check value corresponding to the code value of each index word.
Optionally, the query term contains i characters.
The determining module 402 is specifically configured to obtain, in the dictionary element area, a code value of a second string according to the Unicode code of the ith character and a base value corresponding to the code value of the first string, where the first string is: a character string formed from the first character of the query term to the i-1 th character, wherein the second character string is as follows: a character string formed from the first character to the i-th character;
If the check value corresponding to the code value of the second character string is the Unicode code of the i-1 th character, determining that the query list contains a target index word which is the same as the code value of the query word, wherein the code value of the second character string is the code value of the query word, and i is an integer greater than 1;
when i is equal to 1, the Unicode code of the query word is the code value of the query word.
The candidate index word obtaining module 403 is specifically configured to determine, in the query index area, a storage location of the target index word in the query content area according to an index identifier of the target index word; and acquiring all candidate index words in the query content area according to the storage position of the target index word in the query content area and the number of the candidate index words.
The establishing module 405 is configured to sort all the index words according to the Unicode code of each character in each index word and the length of each index word, and generate an index word list; and establishing a query list, a dictionary file and a query file according to the index word list.
Optionally, the establishing module 405 is specifically configured to traverse Unicode codes of each character in each index word in sequence, and obtain a first order of index words in the index word list to be generated according to an order of Unicode codes of each character in each index word; if the Unicode codes of at least one character in the at least two index words are in the same sequence, adjusting the sequence of the at least two index words in the first ordering according to the lengths of the at least two index words to obtain an adjusted first ordering; and generating an index word list according to the adjusted first order and the index identification of each index word.
Optionally, the establishing module 405 is specifically configured to obtain, according to the index word list, an index tree corresponding to an index word having the same first character, where the index tree includes a plurality of nodes, and each node includes one character of the index word having the same first character; wherein, the index identifier corresponding to each node is: in the index word list, index marks of index words formed by characters in the nodes and all characters of parent nodes of the nodes are used, and the corresponding number of each node is the number of candidate index words comprising the index words in the index word list;
Performing double-array coding on index words corresponding to each node of the index tree to obtain a coding value of each index word, and a base value and a check value corresponding to the coding value of each index word;
establishing a query list and a query file according to the coding value of the index word corresponding to each node in the index tree and the index identifier and the number corresponding to each node;
and establishing a dictionary file according to the coding value of the index word corresponding to each node in the index tree, and the base value and the check value corresponding to the coding value of each index word.
Fig. 11 is a schematic diagram of a third structure of the query device provided by the present invention, as shown in fig. 11, the query device 500 includes: a memory 501 and at least one processor 502.
Memory 501 for storing program instructions.
The processor 502 is configured to implement the query method in this embodiment when the program instruction is executed, and the specific implementation principle can be seen from the above embodiment, which is not described herein again.
The query device 500 may also include and input/output interface 503.
The input/output interface 503 may include a separate output interface and an input interface, or may be an integrated interface that integrates input and output. The output interface is used for outputting data, the input interface is used for acquiring input data, the output data is the generic name output in the method embodiment, and the input data is the generic name input in the method embodiment.
The present invention also provides a readable storage medium having stored therein execution instructions which, when executed by at least one processor of a querying device, when executed by the processor, implement the querying method in the above embodiment.
The present invention also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the querying device may read the execution instructions from the readable storage medium, the execution instructions being executed by the at least one processor to cause the querying device to implement the querying methods provided by the various embodiments described above.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in hardware plus software functional modules.
The integrated modules, which are implemented in the form of software functional modules, may be stored in a computer readable storage medium. The software functional module is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (english: processor) to perform some of the steps of the methods according to the embodiments of the invention. And the aforementioned storage medium includes: u disk, mobile hard disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
In the above embodiments of the network device or the terminal device, it should be understood that the Processor may be a central processing unit (english: central Processing Unit, abbreviated as CPU), or may be other general purpose processors, digital signal processors (english: DIGITAL SIGNAL Processor, abbreviated as DSP), application-specific integrated circuits (english: application SPECIFIC INTEGRATED Circuit, abbreviated as ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. A method of querying, comprising:
Acquiring query words;
if the query list contains target index words with the same coding values as the query words, determining index identifiers of the target index words and the number of candidate index words comprising the query words according to the query list, wherein the query list is used for indicating the corresponding relation among the coding values of the target index words, the index identifiers of the target index words and the number of candidate index words, and the character strings of each candidate index word comprise all characters of the query words;
According to the index identification of the target index words and the number of the candidate index words, taking the index identification of the target index words as a starting identification, obtaining a plurality of continuous index words, wherein the number of the plurality of index words is the same as the number of the candidate index words, a query database is pre-stored with a plurality of index words which are arranged in sequence, and the index identifications of the adjacent candidate index words are continuous;
Pushing all the candidate index words;
Before determining the index identifier of the target index word and the number of candidate index words including the query word according to the query list, the method further includes:
And determining that the query list contains target index words which are the same as the code values of the query words according to the query words, the base values and the check values corresponding to the code values of the index words, wherein the base values and the check values corresponding to the code values of the index words are stored in a dictionary content area of a dictionary file, and the dictionary file is stored in an index database.
2. The method of claim 1, wherein the index database further stores a query file, the query file comprising a query index area, a query content area; the query index area is used for storing the query list, the query index area is also used for indicating the storage position of the target index words in the query content area, and the query content area is used for storing all index words in the index database.
3. The method of claim 2, wherein the dictionary file further comprises: and the dictionary element area is used for storing the coding values of all index words, and the storage positions of the double-array coding base value and the double-array coding check value corresponding to the coding value of each index word in the dictionary content area.
4. The method of claim 3, wherein the query term includes i characters, and the determining that the query list includes the target index term having the same code value as the query term includes:
According to the Unicode code of the ith character of the query word and the base value corresponding to the coding value of the first character string, the coding value of the second character string is obtained from the dictionary element area, and the first character string is: a character string formed from the first character of the query word to the i-1 th character, wherein the second character string is: a character string formed from the first character to the i-th character;
If the check value corresponding to the code value of the second character string is the Unicode code of the i-1 th character, determining that the query list contains a target index word which is the same as the code value of the query word, wherein the code value of the second character string is the code value of the query word, and i is an integer greater than 1;
and when i is equal to 1, the Unicode code of the query word is the code value of the query word.
5. The method of claim 4, wherein the obtaining all candidate index words according to the index identification of the target index word and the number of candidate index words comprises:
Determining the storage position of the target index word in the query content area in the query index area according to the index identification of the target index word;
and acquiring all the candidate index words in the query content area according to the storage positions of the target index words in the query content area and the number of the candidate index words.
6. The method of claim 3, wherein prior to the obtaining the query term, further comprising:
according to the Unicode code of each character in each index word and the length of each index word, sequencing all index words to generate an index word list;
and establishing the query list, the dictionary file and the query file according to the index word list.
7. The method of claim 6, wherein generating the index word list comprises:
Traversing Unicode codes of each character in each index word in turn, and acquiring a first sequence of index words in the index word list to be generated according to the sequence of the Unicode codes of each character in each index word;
If the Unicode codes of at least one character in at least two index words are the same in sequence, adjusting the sequence of the at least two index words in the first ordering according to the lengths of the at least two index words to obtain an adjusted first ordering;
and generating an index word list according to the adjusted first order and the index identification of each index word.
8. The method according to claim 6 or 7, wherein said creating the query list, the dictionary file, and the query file from the index word list comprises:
According to the index word list, obtaining an index tree corresponding to the index word with the same initial character, wherein the index tree comprises a plurality of nodes, and each node comprises one character of the index word with the same initial character; wherein, the index identifier corresponding to each node is: index identifiers of index words formed by all characters from an index tree root node on a branch of an index tree where the node is located to the node, wherein the characters in the index tree root node are first characters of the index words, and the number corresponding to each node is the number of candidate index words including the index words in the index word list;
Performing double-array coding on index words corresponding to each node of the index tree to obtain a coding value of each index word, and a base value and a check value corresponding to the coding value of each index word;
Establishing the query list and the query file according to the coding value of the index word corresponding to each node in the index tree and the index identifier and the number corresponding to each node;
and establishing the dictionary file according to the coding value of the index word corresponding to each node in the index tree, and the base value and the check value corresponding to the coding value of each index word.
9.A query device, comprising: at least one processor and memory;
The memory stores computer-executable instructions;
The at least one processor executing computer-executable instructions stored in the memory to cause the querying device to perform the method of any of claims 1-8.
10. A computer readable storage medium having stored thereon computer executable instructions which, when executed by a processor, implement the method of any of claims 1-8.
CN201910297346.5A 2019-04-15 2019-04-15 Query method, device and storage medium Active CN111831876B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910297346.5A CN111831876B (en) 2019-04-15 2019-04-15 Query method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910297346.5A CN111831876B (en) 2019-04-15 2019-04-15 Query method, device and storage medium

Publications (2)

Publication Number Publication Date
CN111831876A CN111831876A (en) 2020-10-27
CN111831876B true CN111831876B (en) 2024-07-23

Family

ID=72914521

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910297346.5A Active CN111831876B (en) 2019-04-15 2019-04-15 Query method, device and storage medium

Country Status (1)

Country Link
CN (1) CN111831876B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115982167A (en) * 2022-12-30 2023-04-18 企知道网络技术有限公司 Data storage method and device and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101398830A (en) * 2007-09-27 2009-04-01 阿里巴巴集团控股有限公司 Thesaurus fuzzy enquiry method and thesaurus fuzzy enquiry system
CN102768681A (en) * 2012-06-26 2012-11-07 北京奇虎科技有限公司 Recommending system and method used for search input

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101023911B1 (en) * 2008-11-12 2011-03-22 엔에이치엔(주) Method and System for Providing Recommendation Query
CN103092992B8 (en) * 2013-02-17 2016-09-14 南京师范大学 Vector data elder generation based on Key/Value type NoSQL data base sequence quadtree coding and indexing means
CN107341165B (en) * 2016-04-29 2022-09-06 上海京东到家元信信息技术有限公司 Method and device for carrying out prompt display at search box
US10282369B2 (en) * 2017-03-08 2019-05-07 Centri Technology, Inc. Fast indexing and searching of encoded documents
CN108227954A (en) * 2017-12-29 2018-06-29 北京奇虎科技有限公司 A kind of method, apparatus and electronic equipment that search input associational word is provided

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101398830A (en) * 2007-09-27 2009-04-01 阿里巴巴集团控股有限公司 Thesaurus fuzzy enquiry method and thesaurus fuzzy enquiry system
CN102768681A (en) * 2012-06-26 2012-11-07 北京奇虎科技有限公司 Recommending system and method used for search input

Also Published As

Publication number Publication date
CN111831876A (en) 2020-10-27

Similar Documents

Publication Publication Date Title
CN108255958B (en) Data query method, device and storage medium
CN108304444B (en) Information query method and device
US9195738B2 (en) Tokenization platform
US8171029B2 (en) Automatic generation of ontologies using word affinities
CN111339382B (en) Character string data retrieval method, device, computer equipment and storage medium
US8838551B2 (en) Multi-level database compression
JP3889762B2 (en) Data compression method, program, and apparatus
EP3072076B1 (en) A method of generating a reference index data structure and method for finding a position of a data pattern in a reference data structure
CN106599097B (en) Matching method and device for mass feature string set
US20190087466A1 (en) System and method for utilizing memory efficient data structures for emoji suggestions
CN102867049A (en) Chinese PINYIN quick word segmentation method based on word search tree
CN114817651B (en) Data storage method, data query method, device and equipment
CN113568940A (en) Data query method, device, equipment and storage medium
Haj Rachid et al. A practical and scalable tool to find overlaps between sequences
CN111831876B (en) Query method, device and storage medium
CN113297204B (en) Index generation method and device
US8392433B2 (en) Self-indexer and self indexing system
CN115543993A (en) Data processing method and device, electronic equipment and storage medium
US9509757B2 (en) Parallel sorting key generation
CN115809248B (en) Data query method and device and storage medium
CN108776705B (en) Text full-text accurate query method, device, equipment and readable medium
CN116521733A (en) Data query method and device
CN113407702B (en) Employee cooperation relationship intensity quantization method, system, computer and storage medium
US8682644B1 (en) Multi-language sorting index
CN110347925A (en) Information processing method and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant