WO2016155662A1 - 一种搜索处理方法及装置 - Google Patents

一种搜索处理方法及装置 Download PDF

Info

Publication number
WO2016155662A1
WO2016155662A1 PCT/CN2016/078309 CN2016078309W WO2016155662A1 WO 2016155662 A1 WO2016155662 A1 WO 2016155662A1 CN 2016078309 W CN2016078309 W CN 2016078309W WO 2016155662 A1 WO2016155662 A1 WO 2016155662A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
string
candidate
search
unit
Prior art date
Application number
PCT/CN2016/078309
Other languages
English (en)
French (fr)
Inventor
梁捷
李富科
Original Assignee
广州市动景计算机科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州市动景计算机科技有限公司 filed Critical 广州市动景计算机科技有限公司
Publication of WO2016155662A1 publication Critical patent/WO2016155662A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present application relates to the field of Internet technologies, and in particular, to a search processing method and apparatus.
  • the existing search system mainly performs related information retrieval on the Internet according to a search string (or a keyword) input by a user.
  • the string input by the user is sometimes incomplete or there is an individual character input error, which causes it to not match the candidate string stored in the search term library.
  • the character string input by the user needs to be performed. Fuzzy matching finds an alternate string with a higher search frequency that is similar to the search and recommends it to the user for retrieval. For example, when the user enters the string "Chinese People's Liberation Army", after the system performs fuzzy matching, it will prompt the user whether he wants to search for "Chinese People's Liberation Army.”
  • the most commonly used fuzzy matching based search method is to select a part of the candidate string from the search term library, and calculate the minimum edit distance (Edit Distance) one by one with the search string input by the user, thereby finding and describing
  • the search string has the shortest edit distance and searches for a higher frequency alternative string.
  • AB be two strings. Do the following for A: delete a character from A; insert a character into A; replace one character in A with another character.
  • the minimum operand required to edit the character string A into the character string B is called the minimum edit distance of A and B.
  • the minimum edit distance needs to be calculated one by one with the candidate string, so the calculation amount is large, which causes the system response time to be too long, which affects the user experience.
  • the embodiment of the present application provides a search processing method and apparatus, so as to solve the prior art search processing method, when the number of candidate strings is large, the search string needs to be calculated one by one with the candidate string, and thus the minimum edit distance is calculated.
  • the large amount of computation causes the system response time to be too long, which affects the user experience.
  • a search processing method includes: generating, according to a received search string, a plurality of candidate character strings having a predetermined edit distance from the search string; respectively searching a dictionary using a dictionary of alternative thesaurus An alternative string; if the candidate string is found, it is provided to the user as a recommended retrieval string.
  • a search processing apparatus comprising: a generating unit, configured to generate, according to the received search string, a plurality of candidate character strings having a predetermined edit distance from the search string; The candidate string is searched for by using a dictionary tree of the alternative lexicon, respectively; a recommendation unit is provided to the user as a recommended retrieval string if the candidate string is found.
  • a computing device comprising: one or more processors; a memory; and one or more modules, the one or more modules being stored in the memory and configured to be by the one Or executing by the plurality of processors, wherein the one or more modules are configured to generate, according to the received retrieval string, a plurality of candidate character strings having a predetermined editing distance from the retrieval character string;
  • the dictionary tree of the library looks up the candidate string; if the candidate string is found, it is provided to the user as a recommended retrieval string.
  • the search method and apparatus provided by the above technical solution first generate a plurality of candidate character strings having a predetermined editing distance from the search character string according to the received search character string, and then use the dictionary tree of the alternative word library to search for the device.
  • the string is selected, and if the candidate string is found, it is provided to the user as a recommended retrieval string.
  • the scheme generates a quantity controllable candidate string according to the preset edit distance, so the algorithm calculation amount is relatively constant, does not increase with the number of strings of the alternative thesaurus; and the generated candidate string does not need to be
  • the strings in the alternative lexicon are used to calculate the edit distance one by one, and the candidate string is further filtered by the dictionary tree with faster search speed to obtain the recommended search string, which improves the retrieval processing speed.
  • FIG. 1 is a schematic flow chart of an embodiment of a search processing method provided by the present application.
  • 2 is a schematic structural diagram of a dictionary tree
  • FIG. 3 is a schematic flow chart of another embodiment of a search processing method provided by the present application.
  • FIG. 4 is a schematic flow chart of another embodiment of a search processing method provided by the present application.
  • FIG. 5 is a schematic flowchart diagram of still another embodiment of a search processing method provided by the present application.
  • FIG. 6 is a schematic structural diagram of an embodiment of a search processing apparatus provided by the present application.
  • FIG. 7 is a schematic structural diagram of another embodiment provided by a search processing apparatus according to the present application.
  • FIG. 8 is a schematic structural diagram of an editing unit in an embodiment provided by a search processing apparatus according to the present application.
  • FIG. 9 is a schematic structural diagram of still another embodiment provided by a search processing apparatus according to the present application.
  • FIG. 10 is a structural block diagram of a computing device according to an embodiment of the present invention.
  • FIG. 1 it is a schematic flowchart of an embodiment of a search processing method provided by the present application. The embodiment includes the following steps:
  • Step 101 Generate, according to the received search string, a plurality of candidate character strings having a predetermined edit distance from the search string.
  • an alternative string with an edit distance of the string A is set to a predetermined edit distance by inserting, and/or deleting, and/or replacing the specified character in the string A. .
  • the predetermined edit distance is 1, and if the search string input by the user is "lovf", the last character "f" in the string can be replaced, and replaced with 26 lowercase letters except "f". Any other characters, so that a total of 25 alternate strings are generated.
  • the alternate string can be constructed into a set of candidate strings to generate all possible similar strings into the set within a predetermined edit distance.
  • Step 102 Find the candidate string by using a dictionary tree of the alternative lexicon.
  • the dictionary tree also known as the Trie tree or the search tree, is a form of storing strings.
  • each string in the alternative lexicon can be represented as a path from the root node in the dictionary tree, and the characters represented by the nodes passing through the path are sequentially connected. Is the string.
  • the alternative vocabulary is ⁇ love, lover, like, move, moon ⁇
  • Figure 2 is the structure diagram of the dictionary tree established according to the alternative lexicon, (the characters contained in the alternative vocabulary in practical applications)
  • the number of strings is large, so the structure of the dictionary tree is relatively complicated.
  • the embodiment of the present application is only a schematic introduction to the structure of the dictionary tree.
  • the number on the left side of the node in the figure is the level of the node.
  • the dictionary tree has 6 levels.
  • the node at level 0 is the root node, and the root node is empty and does not represent any characters.
  • Step 103 If the candidate string is found, it is provided to the user as a recommended search string.
  • the candidate string may be provided to the user as a recommended search string for the user to further search according to the recommended search string.
  • step 101 a total of 25 candidate strings are generated, and the string can be searched one by one using the dictionary tree of FIG. 2, and finally the candidate string "love” is found in the dictionary tree, and the other 24 candidate strings are all It is not found in the dictionary tree, so the string "love” can be provided to the user as a recommended search string.
  • a plurality of candidate character strings having a predetermined edit distance from the search string are generated, and the candidate string is searched by the dictionary tree of the candidate word library, respectively.
  • the candidate string is provided to the user as a recommended search string.
  • the number of controllable candidate strings is generated according to the preset edit distance, so the calculation amount of the algorithm is relatively constant, and does not increase with the number of strings of the alternative thesaurus; and the generated candidate strings do not need to be generated.
  • the editing distance is calculated one by one with the string in the alternative vocabulary, and the candidate string is further filtered by the dictionary tree with faster search speed to obtain the recommended search string, which improves the retrieval processing speed.
  • FIG. 3 is a schematic flowchart diagram of another embodiment of the search processing method provided by the present application.
  • the embodiment includes the following steps 301 to 304:
  • Step 301 Establish a dictionary tree according to the candidate term library, where the node of the dictionary tree stores an address pointer array pointing to the child node, and the value of the address pointer in the array is respectively the same as the code value of the character corresponding to the child node. .
  • the two first-level child nodes of the root node are respectively “l” and “m”, and in the root node, the pointing points are stored.
  • An array of address pointers for level 1 child nodes The values of the address pointers in the array are the same as the encoded values of the characters corresponding to the two level 1 child nodes, respectively. The same is true for other sub-nodes.
  • the value of the address pointer of the above child node is a relative value rather than an absolute value, and the address offset of the parent node address is the base address.
  • Step 302 Generate, according to the received search string, a plurality of candidate character strings having a predetermined edit distance from the search string.
  • the candidate character set may be predefined, the edit distance is preset, and the search string is edited according to the candidate character set and the edit distance according to at least one of the following 3021, 3022, and 3023.
  • the candidate character set may be pre-defined, and the edit distance is set, so that after the retrieved character string is received, the search string is paired according to the preset candidate character set and the edit distance. At least one of the editing operations of the above 3021, 3022, and 3023 is performed to generate an alternate character string having a predetermined edit distance from the search string.
  • the predetermined edit distance is 1 and the predefined candidate character set is 26 lowercase letters.
  • the search string input by the user is lovf, replace the last character "f" in the string, and replace it with 26 lowercase letters. Any character other than "f" can be edited to generate 25 alternative strings.
  • the candidate string generated by the editing operation is determined according to the editing distance. For example, if the editing distance is 1, insert, replace, and/or delete one character, and edit the distance to 2, insert, replace, and/or delete two characters. .
  • Step 303 Search for the characters included in the candidate string in the dictionary tree, and use the coded value of the character to be searched as the query index of the address pointer of the current node.
  • the existing conventional search method is to search for the memory address pointed to by these address pointers according to the pointers of the level 1 child nodes contained in the root node to see if there is any " l "This character, if there is, then find out whether there is "o" character from the 2nd child node, and the search of other characters is performed in sequence.
  • the address pointer of the child node included in the node in the dictionary tree and the code value of the character corresponding to the child node are set to the same value, so that each character in the candidate string is searched.
  • the address pointer included in each level node determines whether the content is a character to be searched, but directly search for the content according to the coded value of the character to be searched.
  • the memory address represented by the encoded value confirms whether it is a character to be searched, so the search speed is faster.
  • the dictionary tree is stored in memory, so lookups in the dictionary tree are looked up in memory.
  • the coded value of the character to be searched is used as the query index of the address pointer of the current node, that is, each character in the candidate string is used as the character to be searched, and the code value of each character is used as an index to perform the node in the dictionary tree. Match, when the encoded value of the character is the same as the value of the address pointer in the array, it is confirmed that the character is found.
  • Step 304 If the candidate string is found, it is provided to the user as a recommended retrieval string.
  • FIG. 4 is a schematic flowchart diagram of another embodiment of a search processing method according to the present application. The embodiment includes the following steps:
  • Step 401 Obtain a complete set of characters corresponding to the character string according to a character string in the alternative vocabulary.
  • the characters contained in all the strings in the candidate lexicon are counted and de-duplicated to form a complete set of characters, and the characters contained in the string in the alternative lexicon are in the complete set of characters.
  • Step 402 Encode each character in the complete character set separately, so that the coded value of the character is an integer continuously changing from a preset value, and the preset value is an integer greater than or equal to 1.
  • the original code value of the characters in the complete character set is not necessarily a continuous integer. Take the ASCII code commonly used in computers as an example.
  • the ASCII code corresponds to a decimal value of 0 to 127, corresponding to 128 different characters, but complete.
  • the character set does not necessarily contain all of the 128 characters. In particular, some special characters such as "$" and "*" are not common in the search, so the original code values of the characters in the complete character set are generally not continuously.
  • the characters in the complete character set are re-encoded and encoded as integers that continuously change from the preset value.
  • the coded value can be continuously incremented from 1.
  • the coded value of the character represented by the node stored by the node is the same as the value of the address pointer of the child node, so the larger the coded value of the character, the meaning The larger the value of the address pointer, the longer the path from the node index to the address of its child node, and the longer the indexing process takes.
  • the encoded value of the character in the complete character set is continuously encoded from a smaller value, the value of the character's encoded value, that is, the value of the address pointer of the child node, can be minimized, thereby increasing the speed of the index to the child node.
  • the search speed of the dictionary tree is improved.
  • the continuous encoding can also ensure the continuity of the memory address range represented by the encoded value (ie, the address range stored by the complete character set in the memory space), prevent the storage address from being scattered, and save storage space.
  • Step 403 Create a dictionary tree according to the candidate vocabulary, the node of the dictionary tree stores an array of address pointers to the child nodes, and the values of the address pointers in the array are respectively encoded with characters corresponding to the child nodes. The values are the same.
  • Step 404 Generate, according to the received search string, a plurality of candidate character strings having a predetermined edit distance from the search string.
  • Step 405 Search for the characters included in the candidate string in the dictionary tree, and use the coded value of the character to be searched as the query index of the address pointer of the current node.
  • Step 406 If the candidate string is found, it is provided to the user as a recommended retrieval string.
  • the user may select a string with a higher weight to be recommended to the user, and the weight may be specifically in the user's search history according to the candidate string.
  • the frequency of occurrence, or the frequency at which an alternate string appears in a preset alternate search material may be selected.
  • the search string input by the user is “aove”, under the condition that the preset edit distance is 1, the edit generated alternative character string includes “love” and “move”.
  • the dictionary tree After searching in the dictionary tree, it was found that both "love” and “move” appear in the dictionary tree.
  • the user's search history “love” appears more frequently than “move”, so you can choose to recommend “love” as the recommended search string to the user, or use "love” as the most preferred search.
  • the string is provided to the user.
  • FIG. 5 is a schematic flowchart diagram of still another embodiment of a search processing method according to the present application. The embodiment includes the following steps:
  • Step 501 Obtain a complete set of characters corresponding to the character string according to a character string in the alternative vocabulary.
  • the characters contained in all the strings in the candidate lexicon are counted and de-duplicated to form a complete set of characters, and the characters contained in the string in the alternative lexicon are in the complete set of characters.
  • Step 502 Encode each character in the complete character set separately, so that the coded value of the character is an integer continuously changing from a preset value, and the preset value is an integer greater than or equal to 1.
  • the characters in the complete character set are re-encoded and encoded as integers that continuously change from a preset value.
  • the coded value can be continuously incremented from 1.
  • Step 503 Establish a dictionary tree according to the candidate vocabulary.
  • the node of the dictionary tree stores an array of address pointers to the child nodes, and the values of the address pointers in the array are respectively encoded with characters corresponding to the child nodes. The values are the same.
  • Step 504 Predetermine a set of candidate characters, and preset an edit distance.
  • the candidate character set before the retrieval character string is received, the candidate character set may be pre-configured, and the editing distance is pre-configured, so that after the retrieval character string is received, the candidate character set and the editing distance are determined according to the configuration. Just select the string.
  • Step 505 Perform an editing operation on the search string according to the candidate character set and the edit distance, and generate an alternate character string having the edit distance from the search string.
  • the editing operation includes at least one item:
  • Step 506 Search for the characters included in the candidate string in the dictionary tree, wherein the coded value of the character to be searched is used as the query index of the address pointer of the current node.
  • Step 507 If the candidate string is found, it is provided to the user as a recommended search string.
  • one of the edits The operation is to delete at least one character from the retrieval string, and replace the target character with a preset custom character when the target character in the retrieval string needs to be deleted.
  • a character string is generally edited according to the manner of "insertion”, “delete”, and “replace” characters.
  • “delete” editing operation Take the “delete” editing operation as an example. For example, if you need to delete the first "v” character in the string “lovve”, that is, when editing it into the target string “love”, the existing implementation method is Copy the “ve” contained in the original string “lovve” once in memory, and then overwrite the copied string “ve” with the position of "vve” in the original string, so this implementation "delete” The method will have a memory copy process, which is a waste of memory.
  • the target character to be deleted is replaced by a preset custom character
  • the “delete” operation is implemented by “replace” operation.
  • the preset custom characters are different from any existing characters. Because they are custom characters, they are not specifically provided in this specification. Instead, they are replaced by "custom” and their usage is described. .
  • the character "v” in the above string “lovve” is deleted as an example.
  • the character "v” to be deleted is replaced with the character "custom”, and the original string is edited as the target string. "lo custom ve”.
  • the identifier or the encoded value of the character "custom” is determined. If the next character to be searched for is a "custom” character, the custom character is not searched, but the character "v" following the custom character continues to be searched.
  • a plurality of candidate character strings having a predetermined edit distance from the search string are generated, and the candidate characters are respectively searched by the dictionary tree of the candidate word library.
  • the string if the candidate string is found, is provided to the user as a recommended retrieval string.
  • the number of controllable candidate strings is generated according to the preset edit distance, so the calculation amount of the algorithm is relatively constant, and does not increase with the number of strings of the alternative thesaurus; and the generated candidate strings do not need to be generated.
  • the editing distance is calculated one by one with the string in the alternative vocabulary, and the candidate string is further filtered by the dictionary tree with faster search speed to obtain the recommended search string, which improves the retrieval processing speed.
  • the present application further provides an embodiment of a search processing apparatus.
  • a schematic structural diagram of an embodiment of a search processing apparatus provided by the present application is provided.
  • the method includes: a generating unit 601, a searching unit 602, and a recommending unit 603.
  • the generating unit 601 is configured to generate, according to the received search string, a plurality of candidate character strings having a predetermined edit distance from the search string.
  • the searching unit 602 is configured to search the candidate string by using a dictionary tree of the alternative lexicon.
  • the recommending unit 603 is configured to provide the candidate search string as a recommended search string to the user if the candidate string is found.
  • FIG. 7 is a schematic structural diagram of another embodiment of a search processing apparatus provided by the present application.
  • the apparatus further includes: an establishing unit 604, configured to establish a dictionary tree according to the candidate vocabulary, the dictionary
  • the node of the tree stores an array of address pointers to the child nodes, the values of the address pointers in the array being the same as the encoded values of the characters corresponding to the child nodes, respectively.
  • the searching unit 602 is specifically configured to sequentially search for characters included in the candidate string in the dictionary tree, and use the coded value of the character to be searched as the query index of the address pointer of the current node.
  • the device further includes: an obtaining unit 605 and an encoding unit 606.
  • the obtaining unit 605 is configured to obtain a complete character set corresponding to the character string according to the character string in the candidate vocabulary;
  • the encoding unit 606 is configured to separately encode each character in the complete character set, so that the encoding value of the character is an integer continuously changing from a preset value, and the preset value is greater than or equal to 1. Integer.
  • the generating unit 801 includes:
  • a predefined unit 6011 configured to predefine a set of candidate characters, and preset a edit distance
  • the editing unit 6012 is configured to perform an editing operation on the search string according to the candidate character set and the edit distance, and generate an alternate character string having the edit distance with the search string, as shown in FIG.
  • the editing unit includes at least one of the following units:
  • Inserting an editing unit 60121 configured to insert at least one character into the search string, the at least one character being a character in the candidate character set;
  • the replacement editing unit 60122 is configured to replace at least one character in the search string with a character in the candidate character set
  • the deletion editing unit 60123 is configured to delete at least one character in the retrieval string.
  • the generating unit 601 is specifically configured to: when the target character in the search string needs to be deleted, replace the target character with a preset custom character;
  • the searching unit 602 is specifically configured to: when locating the custom character in the target character string, ignore the custom character, and continue to search for the next character adjacent to the custom character.
  • the recommending unit 603 includes: a weight recommending unit 6031, configured to select, according to the weight of the candidate string, from the candidate string if at least two of the candidate strings are found The recommended retrieval string is provided to the user.
  • the weight recommendation unit 6031 is specifically configured to: according to the frequency of occurrence of the candidate string in the search history of the user, or according to the frequency of occurrence of the candidate string in the preset candidate search data, Alternative word The recommended search string is selected from the string to be provided to the user.
  • FIG. 9 is a schematic structural diagram of still another embodiment of a search processing apparatus provided by the present application.
  • the apparatus includes: a generating unit 601, a searching unit 602, a recommending unit 603, an establishing unit 604, an obtaining unit 605, and an encoding.
  • the generating unit 601 is configured to generate, according to the received search string, a plurality of candidate character strings having a predetermined edit distance from the search string.
  • the searching unit 602 is configured to search the candidate string by using a dictionary tree of the alternative lexicon.
  • the recommending unit 603 is configured to provide the candidate search string as a recommended search string to the user if the candidate string is found.
  • the establishing unit 604 is configured to establish a dictionary tree according to the candidate term library, where the node of the dictionary tree stores an array of address pointers to the child nodes, and the values of the address pointers in the array respectively correspond to the child nodes
  • the encoded values of the characters are the same.
  • the searching unit 602 is specifically configured to sequentially search for characters included in the candidate string in the dictionary tree, and use the coded value of the character to be searched as the query index of the address pointer of the current node.
  • the obtaining unit 605 is configured to obtain a complete character set corresponding to the character string according to the character string in the candidate vocabulary;
  • the encoding unit 606 is configured to separately encode each character in the complete character set, so that the encoding value of the character is an integer continuously changing from a preset value, and the preset value is greater than or equal to 1. Integer.
  • the preset unit 607 is configured to predefine a candidate character set and preset edit distance
  • the generating unit 801 includes:
  • the editing unit 6012 is configured to perform an editing operation on the search string according to the candidate character set and the edit distance, and generate an alternate character string having the edit distance with the search string, as shown in FIG.
  • the editing unit includes at least one of the following units:
  • Inserting an editing unit 60121 configured to insert at least one character into the search string, the at least one character being a character in the candidate character set;
  • the replacement editing unit 60122 is configured to replace at least one character in the search string with a character in the candidate character set
  • the deletion editing unit 60123 is configured to delete at least one character in the retrieval string.
  • the generating unit 601 is specifically configured to: when the target character in the search string needs to be deleted, replace the target character with a preset custom character;
  • the searching unit 602 is specifically configured to: when locating the custom character in the target character string, ignore the custom character, and continue to search for the next character adjacent to the custom character.
  • the recommending unit 603 includes: a weight recommending unit 6031, configured to select, according to the weight of the candidate string, from the candidate string if at least two of the candidate strings are found The recommended retrieval string is provided to the user.
  • the weight recommendation unit 6031 is specifically configured to: according to the frequency of occurrence of the candidate string in the search history of the user, or according to the frequency of occurrence of the candidate string in the preset candidate search data, The recommended search string is selected from the candidate string to be provided to the user.
  • An embodiment of a search processing apparatus provided by the present application is the same as the embodiment of the foregoing search processing method, and therefore is not specifically explained. For related details, refer to the embodiment of the foregoing search processing method. Corresponding part.
  • the embodiment of the search processing apparatus provided by the foregoing technical solution first generates, according to the received search string, a plurality of candidate character strings having a predetermined edit distance from the search string, and then respectively search the dictionary by using the dictionary tree of the alternative lexicon.
  • the candidate string is provided to the user as a recommended search string if the candidate string is found.
  • the number of controllable candidate strings is generated according to the preset edit distance, so the calculation amount of the algorithm is relatively constant, and does not increase with the number of strings of the alternative thesaurus; and the generated candidate strings do not need to be generated.
  • the editing distance is calculated one by one with the string in the alternative vocabulary, and the candidate string is further filtered by the dictionary tree with faster search speed to obtain the recommended search string, which improves the retrieval processing speed.
  • the technology in the embodiment of the present application can be implemented by means of software and necessary general hardware, including general-purpose integrated circuits, general-purpose CPUs, general-purpose memories, general-purpose components, and the like. It can be implemented by dedicated hardware including an application specific integrated circuit, a dedicated CPU, a dedicated memory, a dedicated component, etc., but in many cases the former is a better implementation.
  • the technical solution in the embodiments of the present application may be embodied in the form of a software product in essence or in the form of a software product, and the computer software product may be stored in a storage medium, such as a read-only memory.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • CD Compact Disc
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • CD Compact Disc
  • the computing device can be implemented as various types of computer devices, such as desktops, portable computers, tablets, smart phones, personal data assistants (PDAs), smart wearable devices, or other types of computer devices, but is not limited to any particular form.
  • the computer can include a processing module 100, a storage subsystem 200, an input device 300, a display 400, a network interface 500, and a bus 600.
  • the processing module 100 can be a multi-core processor or multiple processors.
  • processing module 100 can include a general purpose main processor and one or more special coprocessors, such as a graphics processing unit (GPU), a digital signal processor (DSP), and the like.
  • processor 100 can be implemented using custom circuitry, such as an application specific integrated circuit (ASIC) or field programmable gate arrays (FPGA).
  • ASIC application specific integrated circuit
  • FPGA field programmable gate arrays
  • the processing module 100 can be a similar circuit that executes executable instructions stored on itself.
  • the processing module 100 can execute executable instructions stored on the storage subsystem 200.
  • Storage subsystem 200 can include various types of storage units, such as system memory, read only memory (ROM), and persistent storage.
  • the ROM can store static data or instructions required by the processing module 100 or other modules of the computer.
  • the persistent storage device can be a readable and writable storage device.
  • the persistent storage device may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off.
  • the persistent storage device employs a mass storage device (eg, magnetic or optical disk, flash memory) as the permanent storage device.
  • the persistent storage device can be a removable storage device (eg, a floppy disk, an optical drive).
  • the system memory can be a readable and writable storage device or a volatile read/write storage device, such as dynamic random access memory.
  • System memory can store instructions and data that some or all of the processors need at runtime.
  • storage subsystem 200 can include any combination of computer readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read only memory), and magnetic disks and/or optical disks can also be employed.
  • storage subsystem 200 can include removable storage devices that are readable and/or writable, such as a compact disc (CD), a read-only digital versatile disc (eg, a DVD-ROM, a dual layer DVD-ROM) ), read-only Blu-ray discs, ultra-density discs, flash cards (such as SD cards, min SD cards, Micro-SD cards, etc.), magnetic floppy disks, and so on.
  • the computer readable storage medium does not include a carrier wave and an instantaneous electronic signal transmitted by wireless or wire.
  • the storage subsystem 200 can store one or more software programs that can be executed by the processing module 100 or resource files that need to be invoked.
  • the resource files can include some third-party libraries, including but not limited to audio libraries, video libraries. , 2D graphics library, 3D graphics library.
  • the user interface can be provided by one or more user input devices 300, display 400, and/or one or more other user output devices.
  • Input device 300 can include means for a user to input signals to a computer that can interpret such signals containing particular user requests or information.
  • a web address may be input to the user interface through a keyboard to display webpage content corresponding to the input webpage.
  • input device 300 can include some or all of a keyboard button, touch screen, mouse or other pointing device, scroll wheel, click wheel, dial, button, switch, keypad, microphone, and the like.
  • the display 400 can display computer generated images, and can include various types of image devices such as cathode ray tubes (CRTs), liquid crystal displays (LCDs), light emitting diodes (LEDs) including organic light emitting diodes (OLEDs), projection systems. And other collections of supporting electronic devices (such as DACs, ADCs, signal processors, etc.). In some In an embodiment, it is also possible to additionally provide other user output devices, or replace the display 400, such as a signal light, a speaker, a tactile sensor, a printer, and the like.
  • image devices such as cathode ray tubes (CRTs), liquid crystal displays (LCDs), light emitting diodes (LEDs) including organic light emitting diodes (OLEDs), projection systems. And other collections of supporting electronic devices (such as DACs, ADCs, signal processors, etc.).
  • the user interface can be provided through a graphical user interface.
  • Certain areas of the display 400 define some visual graphical elements as interactive objects or control objects that the user selects through the input device 300.
  • the user can operate the user input device 300 to move the specified location input URL on the screen, and control the display of the webpage content corresponding to the webpage on the display 400.
  • a touch device that can recognize a user gesture can be used as an input device that can, but need not, be associated with an array on display 300.
  • Network interface 500 provides sound and/or data communication functionality to the computer.
  • network interface 500 can include a radio frequency transceiver to communicate sound and/or data (eg, using cellular telephone technology, such as 3G, 4G or EDGE, WIFI data network technology), GPS accepting modules, and/or other Module.
  • network interface 500 can provide an additional wireless network connection or an alternative wireless interface.
  • Network interface 500 may be a combination of hardware (eg, antennas, modems, codecs, and other analog and/or digital signal processing circuits) and software modules.
  • the bus 600 can include various systems, external devices, and chip buses that connect various components within the computer.
  • bus 600 connects processing device 100 to storage subsystem 200, and may also connect input device 300 and display 400.
  • Bus 600 can also cause a computer to interface with the network via network interface 500.
  • the computer can be part of multiple networked computer devices. Any or all of the components of the computer can be used in concert in embodiments of the present invention.
  • Some embodiments include electronic components, such as a microprocessor, a memory that stores computer instructions and data in a computer readable storage medium. Many of the features described in the Detailed Description section can be implemented by the method steps of executing computer instructions stored on a computer readable storage medium. When these computer instructions are executed, the computer processing unit performs various functions of the instructions.
  • the embodiment of the program instructions or computer code may be machine code, such as code compiled using a computer, electronic component or microprocessor of the object to be parsed to compile other high-level languages.
  • the computer is schematic.
  • the computer may have other functions not specifically described (eg, mobile call, GPS, power management, one or more cameras, various connection ports or accessories for connecting external devices, etc.).
  • the specific functional modules involved in the computer 100 are described herein, and the description of these functional modules is for convenience of description, and does not mean a specific physical configuration of the functional components. Moreover, these functional modules do not need to be in one-to-one correspondence with physical modules.
  • the module can be configured to perform various operations, such as by programming or setting up appropriate control circuitry, and the module may be reconfigured according to initial settings.
  • Embodiments of the invention may be implemented in a variety of devices, including electronic devices, through the use of a combination of hardware and software.
  • the embodiment further provides a non-volatile readable storage medium, wherein the storage medium stores one or more modules Programs, when the one or more modules are applied to a computing device, can cause the computing device to perform the following steps:
  • the string is provided to the user as a recommended search string.
  • the method further includes: establishing a dictionary tree according to the candidate vocabulary, the node of the dictionary tree storing an array of address pointers to the child nodes, and the values of the address pointers in the array respectively correspond to the child nodes
  • the encoded values of the characters are the same;
  • the searching for the candidate strings by using the dictionary tree of the alternative lexicon respectively includes: sequentially searching for characters included in the candidate string in the dictionary tree, wherein The encoded value of the character is used as the query index of the address pointer of the current node.
  • the method further includes: obtaining a complete character set corresponding to the character string according to the character string in the candidate vocabulary; and setting the complete character set
  • Each of the characters is separately encoded such that the encoded value of the character is an integer continuously changing from a preset value, and the preset value is an integer greater than or equal to 1.
  • the generating a plurality of candidate character strings having a predetermined edit distance from the search string includes: a predefined candidate character set, a preset edit distance; according to the candidate character set and the edit distance pair
  • the search string performs at least one of the following editing operations, generating an alternative character string having the edit distance from the search string: inserting at least one character into the search string, the at least one character being Characters in the set of candidate characters; replacing at least one character in the search string with characters in the set of candidate characters; deleting at least one character in the search string.
  • the method further includes: predefining a set of candidate characters, preset edit distance; and generating, by the plurality of candidate strings having the predetermined edit distance from the search string, according to the candidate character set and the Editing a distance to the search string to perform at least one of the following editing operations, generating an alternative character string having the edit distance from the search string: inserting at least one character into the search string, the at least One character is a character in the candidate character set; at least one character in the retrieval character string is replaced with a character in the candidate character set; at least one character in the retrieval character string is deleted.
  • providing the recommended search string to the user includes: if at least two of the candidate strings are found, according to the weight of the candidate string The recommended search string is selected from the candidate strings to be provided to the user.
  • the selecting, according to the weight of the candidate string, the recommended retrieval string from the candidate string to be provided to the user includes: displaying the frequency of the candidate string in the user's retrieval history according to the candidate string Or selecting the recommended retrieval string from the candidate string to provide to the user according to the frequency at which the candidate string appears in the preset alternative retrieval material.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种搜索处理方法及装置,所述方法包括:根据接收的检索字符串,生成若干与所述检索字符串具有预定编辑距离的备选字符串(101);分别利用备选词库的字典树查找所述备选字符串(102);如果查找到所述备选字符串,则作为推荐检索字符串提供给用户(103)。该方法根据预设的编辑距离生成数量可控的备选字符串,因此算法计算量较为恒定,不会随备选词库的字符串数量的增加而增长;并且生成的备选字符串无需与备选词库中的字符串逐一计算编辑距离,而是利用搜索速度较快的字典树对备选字符串进行进一步筛选后获得推荐检索字符串,提高了检索处理速度。

Description

一种搜索处理方法及装置 技术领域
本申请涉及互联网技术领域,特别是涉及一种搜索处理方法及装置。
背景技术
现有的搜索系统主要是根据用户输入的检索字符串(或称关键词)在互联网中进行相关的信息检索。在实际使用中,用户输入的字符串有时会不完整或出现个别字符输入错误,从而导致其与检索词库中保存的备选字符串无法完全匹配,这时就需要对用户输入的字符串进行模糊匹配,找出与其相近的检索频率更高的备选字符串推荐给用户进行检索。例如当用户输入字符串“中国人明解放军”时,系统进行模糊匹配后,会提示用户想要检索的是否为“中国人民解放军”。
现有的最常用的基于模糊匹配的搜索方法为,从检索词库中挑出部分备选字符串,与用户输入的检索字符串逐一计算最小编辑距离(Edit Distance),从而找出与所述检索字符串的编辑距离最短,并且搜索频率较高的备选字符串。这里解释一下两个字符串间的编辑距离,设AB是两个字符串。对A做如下操作:从A中删除一个字符;向A中插入一个字符;将A中的一个字符替换为另一个字符。通过上述三类操作,将字符串A编辑成字符串B所需的最小操作数称为A和B的最小编辑距离。
但是这种搜索方法当备选字符串较多时,由于需要逐一与备选字符串计算最小编辑距离,因此运算量较大,导致系统响应时间过长,影响用户体验。
发明内容
本申请实施例提供了一种搜索处理方法及装置,以解决现有技术中的搜索处理方法当备选字符串数量较多时,需要将检索字符串逐一与备选字符串计算最小编辑距离,因此运算量较大,导致系统响应时间过长,影响用户体验的问题。
为了解决上述技术问题,本申请实施例公开了如下技术方案:
一方面,一种搜索处理方法,所述方法包括:根据接收的检索字符串,生成若干与所述检索字符串具有预定编辑距离的备选字符串;分别利用备选词库的字典树查找所述备选字符串;如果查找到所述备选字符串,则作为推荐检索字符串提供给用户。
另一方面,提供了一种搜索处理装置,所述装置包括:生成单元,用于根据接收的检索字符串,生成若干与所述检索字符串具有预定编辑距离的备选字符串;查找单元, 用于分别利用备选词库的字典树查找所述备选字符串;推荐单元,用于如果查找到所述备选字符串,则作为推荐检索字符串提供给用户。
另一方面,提供了一种计算设备,包括:一个或多个处理器;存储器;和一个或多个模块,所述一个或多个模块存储于所述存储器中并被配置成由所述一个或多个处理器执行,其中,所述一个或多个模块配置用于:根据接收的检索字符串,生成若干与所述检索字符串具有预定编辑距离的备选字符串;分别利用备选词库的字典树查找所述备选字符串;如果查找到所述备选字符串,则作为推荐检索字符串提供给用户。
另一方面,提供了一种在其上记录有用于执行权利要求1-9所述方法的程序的计算机可读记录介质。
上述技术方案提供的检索方法及装置,首先根据接收的检索字符串,生成若干与所述检索字符串具有预定编辑距离的备选字符串,再分别利用备选词库的字典树查找所述备选字符串,如果查找到所述备选字符串,则作为推荐检索字符串提供给用户。本方案根据预设的编辑距离生成数量可控的备选字符串,因此算法计算量较为恒定,不会随备选词库的字符串数量的增加而增长;并且生成的备选字符串无需与备选词库中的字符串逐一计算编辑距离,而是利用搜索速度较快的字典树对备选字符串进行进一步筛选后获得推荐检索字符串,提高了检索处理速度。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本申请一种搜索处理方法提供的一个实施例的流程示意图;
图2为字典树的结构示意图;
图3为本申请一种搜索处理方法提供的另一个实施例的流程示意图;
图4为本申请一种搜索处理方法提供的另一个实施例的流程示意图;
图5为本申请一种搜索处理方法提供的又一个实施例的流程示意图;
图6为本申请一种搜索处理装置提供的一个实施例的结构示意图;
图7为本申请一种搜索处理装置提供的另一个实施例的结构示意图;
图8为本申请一种搜索处理装置提供的实施例中编辑单元的结构示意图;
图9为本申请一种搜索处理装置提供的又一个实施例的结构示意图;
图10为根据本发明实施方式提供的计算设备的结构框图。
具体实施方式
首先对本申请一种搜索处理方法提供的实施例进行说明,参见图1,为本申请一种搜索处理方法提供的一个实施例的流程示意图,本实施例包括如下步骤:
步骤101:根据接收的检索字符串,生成若干与所述检索字符串具有预定编辑距离的备选字符串。
当用户输入检索字符串A时,通过插入,和/或删除,和/或替换字符串A中的指定字符的方式,生成与所述字符串A的编辑距离为预定编辑距离的备选字符串。
例如预定编辑距离为1,假设用户输入的检索字符串为“lovf”,则可替换该字符串中的最后一个字符“f”,将其替换成26个小写字母中除了“f”之外的其他任意字符,这样共编辑生成25个备选字符串。可以将备选字符串构成备选串集合,从而生成预定编辑距离内所有可能相似串放入集合。
步骤102:分别利用备选词库的字典树查找所述备选字符串。
字典树又称为Trie树或查找树,是一种存储字符串的形式。根据备选词库建立字典树后,备选词库中的每一个字符串都能表示为字典树中的某一条从根节点出发的路径,该路径经过的节点所代表的字符顺序连起来即是该字符串。
通过预先扫描大量用户的搜索历史,并观察用户在搜索引擎中输入的检索字符串,确定一个能够覆盖绝大多数用户输入的检索字符串的集合,称为备选词库。
假设备选词库为{love,lover,like,move,moon},如图2所示即是根据该备选词库建立的字典树的结构示意图,(实际应用中备选词库包含的字符串的数量很多,因此字典树的结构较为复杂,本申请实施例仅为示意性的介绍字典树的结构),图中节点左边的编号即为该节点的层级,该字典树共有6级,第0层级的节点为根节点,根节点为空,不代表任何字符。
步骤103:如果查找到所述备选字符串,则作为推荐检索字符串提供给用户。
在字典树查找所述备选字符串,如果查找到所述备选字符串,则所述备选字符串可以作为推荐检索字符串提供给用户,以供用户进一步根据该推荐检索字符串进行检索。
上述步骤101的例子中共生成25个备选字符串,可利用图2的字典树逐个查找这些字符串,最终在字典树种查找到了“love”这个备选字符串,其他24个备选字符串都没有在该字典树中查找到,因此,可将“love”这个字符串作为推荐检索字符串提供给用户。
本实施例首先根据接收的检索字符串,生成若干与所述检索字符串具有预定编辑距离的备选字符串,再分别利用备选词库的字典树查找所述备选字符串,如果查找到所述备选字符串,则作为推荐检索字符串提供给用户。本实施例根据预设的编辑距离生成数量可控的备选字符串,因此算法计算量较为恒定,不会随备选词库的字符串数量的增加而增长;并且生成的备选字符串无需与备选词库中的字符串逐一计算编辑距离,而是利用搜索速度较快的字典树对备选字符串进行进一步筛选后获得推荐检索字符串,提高了检索处理速度。
参见图3,为本申请搜索处理方法提供的另一个实施例的流程示意图,本实施例包括如下步骤301至步骤304:
步骤301:根据备选词库建立字典树,所述字典树的节点存储有指向子节点的地址指针数组,所述数组中的地址指针的值分别与所述子节点对应的字符的编码值相同。
仍以图2所示的字典树为例进行说明,在该字典树中,根节点的两个1级子节点分别为“l”和“m”,在根节点中,存储有指向这两个1级子节点的地址指针数组。所述数组中的地址指针的值分别与这两个1级子节点对应的字符的编码值相同。其他各级子节点也是如此。
需要注意的是,上述子节点的地址指针的值都是相对值而不是绝对值,是以其父节点地址作为基地址的地址偏移量。
步骤302:根据接收的检索字符串,生成若干与所述检索字符串具有预定编辑距离的备选字符串。
在该步骤中,可以预定义备选字符集合,预设编辑距离,并根据所述备选字符集合和所述编辑距离对所述检索字符串进行如下3021、3022、3023中的至少一项编辑操作,以生成与所述检索字符串具有所述编辑距离的备选字符串:
3021:在所述检索字符串中插入至少一个字符,所述至少一个字符为所述备选字符集合中的字符;和/或
3022:将所述检索字符串中的至少一个字符替换成所述备选字符集合中的字符;和/或
3023:将所述检索字符串中的至少一个字符删除。
本申请另一个可选实施中,可以预先定义备选字符集合,以及设置编辑距离,从而在接收到检索的字符串后,依据预置完成的备选字符集合和编辑距离对所述检索字符串进行如上3021、3022、3023中的至少一项编辑操作,生成与所述检索字符串具有预定编辑距离的备选字符串。
例如预定编辑距离为1,预定义备选字符集合为26个小写字母,假设用户输入的检索字符串为lovf,替换该字符串中的最后一个字符“f”,将其替换成26个小写字母中除了“f”之外的其他任意字符,可编辑生成25个备选字符串。
其中,依据编辑距离确定编辑操作所生成的备选字符串,例如编辑距离为1,则插入、替换和/或删除一个字符,编辑距离为2,则插入、替换和/或删除两个个字符。
步骤303:在所述字典树中依次查找所述备选字符串包含的字符,以待查找字符的编码值作为当前节点的地址指针的查询索引。
假设要查找的备选字符串为“love”,现有的常规查找方法为,根据根节点中包含的1级子节点的指针,依次在这些地址指针指向的内存地址中寻找,看是否有“l”这个字符,如果有,再从2级子节点中查找是否有“o”这个字符,其他字符的查找依次进行。
而本实施例在建立字典树时,将字典树中的节点包含的子节点的地址指针与所述子节点对应的字符的编码值设置为相同值,这样在查找备选字符串中的各个字符时,就不用依次读取各级节点包含的地址指针所指向的内存地址中的内容,再判断该内容是否是待查找的字符,而是直接根据待查找的字符的编码值去内存中查找该编码值所代表的内存地址,确认是否是待查找的字符,因此查找速度较快。字典树存储在内存中,因此对于在字典树中的查找即是在内存中查找。
本实施例将待查找字符的编码值作为当前节点的地址指针的查询索引,即将备选字符串中每个字符作为待查找字符,采用每个字符的编码值为索引在字典树中对节点进行匹配,当字符的编码值和数组中的地址指针的值相同,即确认查找到该字符。
步骤304:如果查找到所述备选字符串,则作为推荐检索字符串提供给用户。
如图4所示,为本申请一种搜索处理方法的另一个实施例的流程示意图,本实施例包括如下步骤:
步骤401:根据备选词库中的字符串获得所述字符串对应的完备字符集合。
统计备选词库中的所有字符串所包含的字符,并将这些字符去重,形成一个完备字符集合,所述备选词库中的字符串所包含的字符都在该完备字符集合中。
步骤402:将所述完备字符集合中的各字符分别编码,以使所述字符的编码值为从预设值开始连续变化的整数,所述预设值为大于等于1的整数。
完备字符集合中的字符的原有编码值并不一定是连续的整数,以计算机中常用的ASCII码为例,ASCII码对应的十进制数值为0至127,共对应128个不同的字符,但完备字符集合并不一定全部包含这128个字符,尤其是一些特殊字符例如“$”、“*”等字符在检索中并不常见,因此完备字符集合中的字符的原有编码值一般都是不连续的。
在本步骤中,将完备字符集合中的字符重新编码,将其编码为从预设值开始连续变化的整数,优选的,该编码值可以从1开始连续递增。
参见上述实施例的步骤303的相关描述可知,在字典树中,节点存储的子节点所代表的字符的编码值与该子节点的地址指针的值相同,因此字符的编码值越大,则意味着地址指针的值越大,从该节点索引到其子节点的地址的路径就越长,这一索引过程占用的时间就越长。而如果将完备字符集合中的字符的编码值从一个较小的数值开始连续编码,这样就可以使字符的编码值即子节点的地址指针的值最小化,从而提高索引到子节点的速度,进而提高字典树的查找速度。此外,采用连续编码也可以保证在该编码值所代表的内存地址范围(即完备字符集合在内存空间中存储的地址范围)的连续性,防止存储地址零散化,节约存储空间。
步骤403:根据所述备选词库建立字典树,所述字典树的节点存储有指向子节点的地址指针数组,所述数组中的地址指针的值分别与所述子节点对应的字符的编码值相同。
步骤404:根据接收的检索字符串,生成若干与所述检索字符串具有预定编辑距离的备选字符串。
步骤405:在所述字典树中依次查找所述备选字符串包含的字符,以待查找字符的编码值作为当前节点的地址指针的查询索引。
步骤406:如果查找到所述备选字符串,则作为推荐检索字符串提供给用户。
在向用户推荐检索字符串时,可具体从查找到的若干备选字符串中,选择权重比较高的字符串推荐给用户,该权重具体可以是根据备选字符串在用户的检索历史记录中出现的频率,或根据备选字符串在预设的备选检索资料中出现的频率。
例如,当用户输入的检索字符串为“aove”时,在预设编辑距离为1的条件下,编辑生成的备选字符串中包括“love”和“move”。在字典树种进行查找后发现,“love”和“move”都出现在字典树中。而通过统计发现,在用户的检索历史中,“love”出现的频率比“move”要高,因此可选择向用户推荐“love”作为推荐检索字符串,或者将“love”作为最优选的检索字符串提供给用户。
如图5所示,为本申请一种搜索处理方法的又一个实施例的流程示意图,本实施例包括如下步骤:
步骤501:根据备选词库中的字符串获得所述字符串对应的完备字符集合。
统计备选词库中的所有字符串所包含的字符,并将这些字符去重,形成一个完备字符集合,所述备选词库中的字符串所包含的字符都在该完备字符集合中。
步骤502:将所述完备字符集合中的各字符分别编码,以使所述字符的编码值为从预设值开始连续变化的整数,所述预设值为大于等于1的整数。
其中,将完备字符集合中的字符重新编码,将其编码为从预设值开始连续变化的整数,优选的,该编码值可以从1开始连续递增。
步骤503:根据所述备选词库建立字典树,所述字典树的节点存储有指向子节点的地址指针数组,所述数组中的地址指针的值分别与所述子节点对应的字符的编码值相同。
步骤504;预定义备选字符集合,预设编辑距离。
本实施例中,在未接收检索字符串之前,即可预先配置备选字符集合,以及预先配置编辑距离,从而在接收到检索字符串后,依据配置完成的备选字符集合和编辑距离确定备选字符串即可。
步骤505:根据所述备选字符集合和所述编辑距离对所述检索字符串进行编辑操作,生成与所述检索字符串具有所述编辑距离的备选字符串。
其中,编辑操作包括一下至少一项:
在所述检索字符串中插入至少一个字符,所述至少一个字符为所述备选字符集合中的字符;
将所述检索字符串中的至少一个字符替换成所述备选字符集合中的字符;;
将所述检索字符串中的至少一个字符删除。
步骤506:在所述字典树中依次查找所述备选字符串包含的字符,其中,以待查找字符的编码值作为当前节点的地址指针的查询索引。
步骤507:如果查找到所述备选字符串,则作为推荐检索字符串提供给用户。
本实施例各步骤与上述实施例对应步骤基本一致,具体参照上述实施开描述即可,因此不再赘述。
在上述实施例的步骤404、505或本申请的其他实施例中,根据用户输入的检索字符串编辑生成与所述检索字符串的编辑距离为预设距离的备选字符串时,其中一个编辑操作是从检索字符串中删除至少一个字符,当需要删除所述检索字符串中的目标字符时,以预设的自定义字符替换所述目标字符。
参考背景技术中的相关描述,现有技术中,一般是根据“插入”、“删除”、“替换”字符的方式对字符串进行编辑。以“删除”这个编辑操作为例,例如需要将字符串“lovve”中的第一个“v”字符删除,即将其编辑成目标字符串“love”时,现有的实现方法是, 将原字符串“lovve”中包含的“ve”,在内存中复制一次,再将该复制后的字符串“ve”覆盖掉原字符串中的“vve”的位置,因此这种实现“删除”的方法会有一次内存复制的过程,比较浪费内存。
有鉴于此,本申请中提供了另外一种实现“删除”操作的具体方法,即以预设的自定义字符替换欲删除的目标字符,将“删除”操作通过“替换”操作来实现。该预设的自定义字符与现有的任何字符都不相同,由于是自定义的字符,因此不在本说明书中具体提供其字形,而只以“自定义”来代替并对其使用方式进行描述。
仍以将上述字符串“lovve”中的字符“v”删除为例,在具体实现时,用“自定义”这个字符来替换需要删除的字符“v”,将原字符串编辑为目标字符串“lo自定义ve”。
与此对应的,在根据字典树查找所述备选字符串时,当查找至所述目标字符串中的自定义字符时,忽略所述自定义字符后,继续查找与所述自定义字符相邻的下一个字符。
例如,在如图2所示的字典树中查找字符串“lo自定义ve”时,在2级子节点中查找到字符“o”之后,根据“自定义”这个字符的标识或编码值判定出下一个要查找的字符为“自定义”字符,则不对该自定义字符进行查找,而是继续查找该自定义字符后面的字符“v”。
上述搜索处理方法的实施例,首先根据接收的检索字符串,生成若干与所述检索字符串具有预定编辑距离的备选字符串,再分别利用备选词库的字典树查找所述备选字符串,如果查找到所述备选字符串,则作为推荐检索字符串提供给用户。本实施例根据预设的编辑距离生成数量可控的备选字符串,因此算法计算量较为恒定,不会随备选词库的字符串数量的增加而增长;并且生成的备选字符串无需与备选词库中的字符串逐一计算编辑距离,而是利用搜索速度较快的字典树对备选字符串进行进一步筛选后获得推荐检索字符串,提高了检索处理速度。
与上述搜索处理方法的实施例相对应,本申请还提供了一种搜索处理装置的实施例,参见图6,为本申请提供的一种搜索处理装置的一个实施例的结构示意图,所述装置包括:生成单元601、查找单元602、推荐单元603。
其中,所述生成单元601,用于根据接收的检索字符串,生成若干与所述检索字符串具有预定编辑距离的备选字符串。
所述查找单元602,用于分别利用备选词库的字典树查找所述备选字符串。
所述推荐单元603,用于如果查找到所述备选字符串,则作为推荐检索字符串提供给用户。
如图7所示为本申请提供的一种搜索处理装置的另一个实施例的结构示意图,所述装置还包括:建立单元604,用于根据所述备选词库建立字典树,所述字典树的节点存储有指向子节点的地址指针数组,所述数组中的地址指针的值分别与所述子节点对应的字符的编码值相同。
所述查找单元602,具体用于在所述字典树中依次查找所述备选字符串包含的字符,以待查找字符的编码值作为当前节点的地址指针的查询索引。
如图7所示,可选的,所述装置还包括:获得单元605、编码单元606。
所述获得单元605,用于根据所述备选词库中的字符串获得所述字符串对应的完备字符集合;
所述编码单元606,用于将所述完备字符集合中的各字符分别编码,以使所述字符的编码值为从预设值开始连续变化的整数,所述预设值为大于等于1的整数。
可选的,所述生成单元801包括:
预定义单元6011,用于预定义备选字符集合,以及预设编辑距离;
编辑单元6012,用于根据所述备选字符集合和所述编辑距离对所述检索字符串进行编辑操作,生成与所述检索字符串具有所述编辑距离的备选字符串,如图8所示,所述编辑单元至少包括一个如下单元:
插入编辑单元60121,用于在所述检索字符串中插入至少一个字符,所述至少一个字符为所述备选字符集合中的字符;
替换编辑单元60122,用于将所述检索字符串中的至少一个字符替换成所述备选字符集合中的字符;
删除编辑单元60123,用于将所述检索字符串中的至少一个字符删除。
可选的,所述生成单元601具体用于若需要删除所述检索字符串中的目标字符时,以预设的自定义字符替换所述目标字符;
所述查找单元602具体用于当查找至所述目标字符串中的自定义字符时,忽略所述自定义字符后,继续查找与所述自定义字符相邻的下一个字符。
可选的,所述推荐单元603包括:权重推荐单元6031,用于如果查找到至少两个所述备选字符串,则根据所述备选字符串的权重从所述备选字符串中选择所述推荐检索字符串提供给用户。
可选的,所述权重推荐单元6031,具体用于根据备选字符串在用户的检索历史记录中出现的频率,或根据备选字符串在预设的备选检索资料中出现的频率,从所述备选字 符串中选择所述推荐检索字符串提供给用户。
如图9所示为本申请提供的一种搜索处理装置的又一个实施例的结构示意图,所述装置包括:生成单元601、查找单元602、推荐单元603、建立单元604、获得单元605、编码单元606和预置单元607。
其中,所述生成单元601,用于根据接收的检索字符串,生成若干与所述检索字符串具有预定编辑距离的备选字符串。
所述查找单元602,用于分别利用备选词库的字典树查找所述备选字符串。
所述推荐单元603,用于如果查找到所述备选字符串,则作为推荐检索字符串提供给用户。
建立单元604,用于根据所述备选词库建立字典树,所述字典树的节点存储有指向子节点的地址指针数组,所述数组中的地址指针的值分别与所述子节点对应的字符的编码值相同。
所述查找单元602,具体用于在所述字典树中依次查找所述备选字符串包含的字符,以待查找字符的编码值作为当前节点的地址指针的查询索引。
所述获得单元605,用于根据所述备选词库中的字符串获得所述字符串对应的完备字符集合;
所述编码单元606,用于将所述完备字符集合中的各字符分别编码,以使所述字符的编码值为从预设值开始连续变化的整数,所述预设值为大于等于1的整数。
所述预置单元607,用于预定义备选字符集合,以及预设编辑距离
可选的,所述生成单元801包括:
编辑单元6012,用于根据所述备选字符集合和所述编辑距离对所述检索字符串进行编辑操作,生成与所述检索字符串具有所述编辑距离的备选字符串,如图8所示,所述编辑单元至少包括一个如下单元:
插入编辑单元60121,用于在所述检索字符串中插入至少一个字符,所述至少一个字符为所述备选字符集合中的字符;
替换编辑单元60122,用于将所述检索字符串中的至少一个字符替换成所述备选字符集合中的字符;
删除编辑单元60123,用于将所述检索字符串中的至少一个字符删除。
可选的,所述生成单元601具体用于若需要删除所述检索字符串中的目标字符时,以预设的自定义字符替换所述目标字符;
所述查找单元602具体用于当查找至所述目标字符串中的自定义字符时,忽略所述自定义字符后,继续查找与所述自定义字符相邻的下一个字符。
可选的,所述推荐单元603包括:权重推荐单元6031,用于如果查找到至少两个所述备选字符串,则根据所述备选字符串的权重从所述备选字符串中选择所述推荐检索字符串提供给用户。
可选的,所述权重推荐单元6031,具体用于根据备选字符串在用户的检索历史记录中出现的频率,或根据备选字符串在预设的备选检索资料中出现的频率,从所述备选字符串中选择所述推荐检索字符串提供给用户。
本申请提供的一种搜索处理装置的实施例,技术方案本质与上述一种搜索处理方法的实施例相同,因此未做具体解释描述,相关之处可参见上述一种搜索处理方法的实施例的对应部分。
上述技术方案提供的搜索处理装置的实施例,首先根据接收的检索字符串,生成若干与所述检索字符串具有预定编辑距离的备选字符串,再分别利用备选词库的字典树查找所述备选字符串,如果查找到所述备选字符串,则作为推荐检索字符串提供给用户。本实施例根据预设的编辑距离生成数量可控的备选字符串,因此算法计算量较为恒定,不会随备选词库的字符串数量的增加而增长;并且生成的备选字符串无需与备选词库中的字符串逐一计算编辑距离,而是利用搜索速度较快的字典树对备选字符串进行进一步筛选后获得推荐检索字符串,提高了检索处理速度。
本领域的技术人员可以清楚地了解到本申请实施例中的技术可借助软件加必需的通用硬件的方式来实现,通用硬件包括通用集成电路、通用CPU、通用存储器、通用元器件等,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请实施例中的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例或者实施例的某些部分所述的方法。
其中,图10示出了根据本发明实施方式提供的计算设备的结构框图。该计算设备可以实施为各种类型的计算机装置,例如台式机、便携式计算机、平板电脑、智能手机、个人数据助理(PDA)、智能穿戴设备,或者其他类型的计算机装置,但是不限于任何特定形式。计算机可以包括处理模块100,存储子系统200,输入装置300、显示器400、网络接口500,以及总线600。
处理模块100可以是一个多核的处理器,也可以包含多个处理器。在一些实施例中,处理模块100可以包含一个通用的主处理器以及一个或多个特殊的协处理器,例如图形处理器(GPU)、数字信号处理器(DSP)等等。在一些实施例中,处理器100可以使用定制的电路实现,例如特定用途集成电路(application specific integrated circuit,ASIC)或者现场可编程逻辑门阵列(field programmable gate arrays,FPGA)。在一些实施方式中,处理模块100可以是类似的电路执行存储在自身上的可执行指令。在另外一些实施方式中,处理模块100可以执行存储在存储子系统200上的可执行指令。
存储子系统200可以包括各种类型的存储单元,例如系统内存、只读存储器(ROM),和永久存储装置。其中,ROM可以存储处理模块100或者计算机的其他模块需要的静态数据或者指令。永久存储装置可以是可读写的存储装置。永久存储装置可以是即使计算机断电后也不会失去存储的指令和数据的非易失性存储设备。在一些实施方式中,永久性存储装置采用大容量存储装置(例如磁或光盘、闪存)作为永久存储装置。另外一些实施方式中,永久性存储装置可以是可移除的存储设备(例如软盘、光驱)。系统内存可以是可读写存储设备或者易失性可读写存储设备,例如动态随机访问内存。系统内存可以存储一些或者所有处理器在运行时需要的指令和数据。此外,存储子系统200可以包括任意计算机可读存储媒介的组合,包括各种类型的半导体存储芯片(DRAM,SRAM,SDRAM,闪存,可编程只读存储器),磁盘和/或光盘也可以采用。在一些实施方式中,存储子系统200可以包括可读和/或写的可移除的存储设备,例如激光唱片(CD)、只读数字多功能光盘(例如DVD-ROM,双层DVD-ROM)、只读蓝光光盘、超密度光盘、闪存卡(例如SD卡、min SD卡、Micro-SD卡等等)、磁性软盘等等。计算机可读存储媒介不包含载波和通过无线或有线传输的瞬间电子信号。在一些实施方式中,存储子系统200能够存储一个或多个能被处理模块100执行的软件程序或需要调用的资源文件,资源文件可以包含一些第三方库,包括但不限于音频库、视频库、2D图形库、3D图形库。
用户界面可以由一个或多个用户输入装置300、显示器400,和/或一个或多个其他用户输出设备提供。输入装置300可以包括用户向计算机输入信号的装置,计算机可以解释这些信号包含有特定的用户请求或信息。在一些实施方式中,可以通过键盘向用户界面输入网址,显示输入网址对应的网页内容。在一些实施方式中,输入装置300可以包含一些或所有的键盘按钮、触摸屏、鼠标或其他定点设备、滚轮、点击轮、转盘、按键、开关、小型键盘、麦克风等等。
显示器400可以显示由计算机生成的图像,可以包括各种类型的图像设备,例如阴极射线管(CRT)、液晶显示器(LCD)、发光二极管(LED)(包括有机发光二极管(OLED))、投射系统等等与其他支持电子装置(例如DAC、ADC、信号处理器等等)的集合。在一些 实施方式中,也可能额外提供其他用户输出设备,或者取代显示器400,例如信号灯、扬声器、触觉传感器、打印机等。
在一些实施方式中,用户界面可以通过图形用户界面提供。在显示器400中的某些区域定义一些可视的图形元素作为用户通过输入装置300选择的交互对象或者控制对象。例如,用户可以操作用户输入装置300移动屏幕上的指定位置输入网址,控制在显示器400上显示该网址对应的网页内容。在一些实施方式中,可以识别用户手势的触摸设备作为输入设备,这些手势可以但不必须与显示器300上的阵列相联系。
网络接口500为计算机提供声音和/或数据通讯功能。在一些实施方式中,网络接口500可以包括射频收发器来传递声音和/或数据(例如使用蜂窝式电话技术,例如3G、4G或EDGE、WIFI的数据网络技术)、GPS接受模块和/或其他模块。在一些实施方式中,网络接口500可以提供额外的无线网络连接或替代无线接口。网络接口500可以是硬件(例如天线、调制解调器、编解码器以及其他模拟和/或数字信号处理电路)和软件模块的结合。
总线600可以包括各种连接计算机内部各部件的系统、外部设备和芯片总线。例如总线600将处理装置100和存储子系统200连接,还可以连接输入装置300和显示器400。总线600也可以使得计算机通过网络接口500与网络连接。在这种情况下,计算机可以作为多个联网计算机设备的一部分。计算机的任意或所有部件都可以在本发明的实施方式中协调使用。
一些实施方式中包含电子元件,例如微处理器、在计算机可读存储媒介中存储有计算机指令和数据的存储器。在具体实施方式部分描述的许多特征都可以通过执行存储在计算机可读存储媒介上的计算机指令的方法步骤实现。当这些计算机指令被执行,计算机处理单元完成指令的各种功能。程序指令或计算机编码的实施方式可以是机器码,例如使用计算机、电子元件或待解析器的微处理器编译其他高级语言得到的代码。
需要理解的是,计算机是示意性的。计算机可以具有其他没有具体描述的功能(例如移动通话、GPS、电源管理,一个或多个摄像头、各种用于连接外部设备的连接端口或附件等等)。进一步,此处对计算机100涉及的特定功能模块进行了描述,这些功能模块的描述是为了便于描述,而且也不意味着对功能部件特定的物理配置。而且,这些功能模块不需要与物理模块一一对应。模块可以被配置成用来完成各种操作,例如通过编程或设置合适的控制电路,模块也可能会根据初始设置重新被配置。本发明的实施例可以在各种设备包括电子设备中,通过使用硬件和软件的结合来实现。
本实施例还提供了一种非易失性可读存储介质,该存储介质中存储有一个或多个模 块(programs),该一个或多个模块被应用在计算设备时,可以使得该计算设备执行如下步骤的指令(instructions):
根据接收的检索字符串,生成若干与所述检索字符串具有预定编辑距离的备选字符串;分别利用备选词库的字典树查找所述备选字符串;如果查找到所述备选字符串,则作为推荐检索字符串提供给用户。
可选的,还包括:根据所述备选词库建立字典树,所述字典树的节点存储有指向子节点的地址指针数组,所述数组中的地址指针的值分别与所述子节点对应的字符的编码值相同;所述分别利用备选词库的字典树查找所述备选字符串包括:在所述字典树中依次查找所述备选字符串包含的字符,其中,以待查找字符的编码值作为当前节点的地址指针的查询索引。
可选的,所述根据备选词库建立字典树之前,所述方法还包括:根据所述备选词库中的字符串获得所述字符串对应的完备字符集合;将所述完备字符集合中的各字符分别编码,以使所述字符的编码值为从预设值开始连续变化的整数,所述预设值为大于等于1的整数。
可选的,所述生成若干与所述检索字符串具有预定编辑距离的备选字符串包括:预定义备选字符集合,预设编辑距离;根据所述备选字符集合和所述编辑距离对所述检索字符串进行如下编辑操作中的至少一个,生成与所述检索字符串具有所述编辑距离的备选字符串:在所述检索字符串中插入至少一个字符,所述至少一个字符为所述备选字符集合中的字符;将所述检索字符串中的至少一个字符替换成所述备选字符集合中的字符;将所述检索字符串中的至少一个字符删除。
可选的,还包括:预定义备选字符集合,预设编辑距离;所述生成若干与所述检索字符串具有预定编辑距离的备选字符串包括:根据所述备选字符集合和所述编辑距离对所述检索字符串进行如下编辑操作中的至少一个,生成与所述检索字符串具有所述编辑距离的备选字符串:在所述检索字符串中插入至少一个字符,所述至少一个字符为所述备选字符集合中的字符;将所述检索字符串中的至少一个字符替换成所述备选字符集合中的字符;将所述检索字符串中的至少一个字符删除。
所述根据接收的检索字符串,生成若干与所述检索字符串具有预定编辑距离的备选字符串包括:若需要删除所述检索字符串中的目标字符时,以预设的自定义字符替换所述目标字符;所述分别利用备选词库的字典树查找所述备选字符串包括:当查找至所述目标字符串中的自定义字符时,忽略所述自定义字符后,继续查找与所述自定义字符相邻的下一个字符。
可选的,所述如果查找到所述备选字符串,则作为推荐检索字符串提供给用户包括:如果查找到至少两个所述备选字符串,则根据所述备选字符串的权重从所述备选字符串中选择所述推荐检索字符串提供给用户。
可选的,所述根据所述备选字符串的权重从所述备选字符串中选择所述推荐检索字符串提供给用户包括:根据备选字符串在用户的检索历史记录中出现的频率,或根据备选字符串在预设的备选检索资料中出现的频率,从所述备选字符串中选择所述推荐检索字符串提供给用户。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置和系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上所述的本申请实施方式,并不构成对本申请保护范围的限定。任何在本申请的精神和原则之内所作的修改、等同替换和改进等,均应包含在本申请的保护范围之内。

Claims (18)

  1. 一种搜索处理方法,其特征在于,所述方法包括:
    根据接收的检索字符串,生成若干与所述检索字符串具有预定编辑距离的备选字符串;
    分别利用备选词库的字典树查找所述备选字符串;
    如果查找到所述备选字符串,则作为推荐检索字符串提供给用户。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:根据所述备选词库建立字典树,所述字典树的节点存储有指向子节点的地址指针数组,所述数组中的地址指针的值分别与所述子节点对应的字符的编码值相同;
    所述分别利用备选词库的字典树查找所述备选字符串包括:在所述字典树中依次查找所述备选字符串包含的字符,其中,以待查找字符的编码值作为当前节点的地址指针的查询索引。
  3. 根据权利要求2所述的方法,其特征在于,所述根据备选词库建立字典树之前,所述方法还包括:
    根据所述备选词库中的字符串获得所述字符串对应的完备字符集合;
    将所述完备字符集合中的各字符分别编码,以使所述字符的编码值为从预设值开始连续变化的整数,所述预设值为大于等于1的整数。
  4. 根据权利要求1至3任意一项所述的方法,其特征在于,所述生成若干与所述检索字符串具有预定编辑距离的备选字符串包括:
    预定义备选字符集合,预设编辑距离;
    根据所述备选字符集合和所述编辑距离对所述检索字符串进行如下编辑操作中的至少一个,生成与所述检索字符串具有所述编辑距离的备选字符串:
    在所述检索字符串中插入至少一个字符,所述至少一个字符为所述备选字符集合中的字符;
    将所述检索字符串中的至少一个字符替换成所述备选字符集合中的字符;
    将所述检索字符串中的至少一个字符删除。
  5. 根据权利要求1至3任意一项所述的方法,其特征在于,还包括:
    预定义备选字符集合,预设编辑距离;
    所述生成若干与所述检索字符串具有预定编辑距离的备选字符串包括:
    根据所述备选字符集合和所述编辑距离对所述检索字符串进行如下编辑操作中的至 少一个,生成与所述检索字符串具有所述编辑距离的备选字符串:
    在所述检索字符串中插入至少一个字符,所述至少一个字符为所述备选字符集合中的字符;
    将所述检索字符串中的至少一个字符替换成所述备选字符集合中的字符;
    将所述检索字符串中的至少一个字符删除。
  6. 根据权利要求4或5所述的方法,其特征在于,所述根据接收的检索字符串,生成若干与所述检索字符串具有预定编辑距离的备选字符串包括:若需要删除所述检索字符串中的目标字符时,以预设的自定义字符替换所述目标字符;
    所述分别利用备选词库的字典树查找所述备选字符串包括:当查找至所述目标字符串中的自定义字符时,忽略所述自定义字符后,继续查找与所述自定义字符相邻的下一个字符。
  7. 根据权利要求6所述的方法,其特征在于,所述如果查找到所述备选字符串,则作为推荐检索字符串提供给用户包括:
    如果查找到至少两个所述备选字符串,则根据所述备选字符串的权重从所述备选字符串中选择所述推荐检索字符串提供给用户。
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述备选字符串的权重从所述备选字符串中选择所述推荐检索字符串提供给用户包括:
    根据备选字符串在用户的检索历史记录中出现的频率,或根据备选字符串在预设的备选检索资料中出现的频率,从所述备选字符串中选择所述推荐检索字符串提供给用户。
  9. 一种搜索处理装置,其特征在于,所述装置包括:
    生成单元,用于根据接收的检索字符串,生成若干与所述检索字符串具有预定编辑距离的备选字符串;
    查找单元,用于分别利用备选词库的字典树查找所述备选字符串;
    推荐单元,用于如果查找到所述备选字符串,则作为推荐检索字符串提供给用户。
  10. 根据权利要求9所述的装置,其特征在于,所述装置还包括:
    建立单元,用于根据所述备选词库建立字典树,所述字典树的节点存储有指向子节点的地址指针数组,所述数组中的地址指针的值分别与所述子节点对应的字符的编码值相同;
    所述查找单元具体用于在所述字典树中依次查找所述备选字符串包含的字符,其中,以待查找字符的编码值作为当前节点的地址指针的查询索引。
  11. 根据权利要求10所述的装置,其特征在于,所述装置还包括:
    获得单元,用于根据所述备选词库中的字符串获得所述字符串对应的完备字符集合;
    编码单元,用于将所述完备字符集合中的各字符分别编码,以使所述字符的编码值为从预设值开始连续变化的整数,所述预设值为大于等于1的整数。
  12. 根据权利要求9至11任意一项所述的装置,其特征在于,所述生成单元包括:
    预定义单元,用于预定义备选字符集合,以及预设编辑距离;
    编辑单元,用于根据所述备选字符集合和所述编辑距离对所述检索字符串进行编辑操作,生成与所述检索字符串具有所述编辑距离的备选字符串,所述编辑单元至少包括一个如下单元:
    插入编辑单元,用于在所述检索字符串中插入至少一个字符,所述至少一个字符为所述备选字符集合中的字符;
    替换编辑单元,用于将所述检索字符串中的至少一个字符替换成所述备选字符集合中的字符;
    删除编辑单元,用于将所述检索字符串中的至少一个字符删除。
  13. 根据权利要求9至11任意一项所述的装置,其特征在于,还包括:
    预置单元,用于预定义备选字符集合,预设编辑距离;
    则所述生成单元包括:
    编辑单元,用于根据所述备选字符集合和所述编辑距离对所述检索字符串进行编辑操作,生成与所述检索字符串具有所述编辑距离的备选字符串,所述编辑单元至少包括一个如下单元:
    插入编辑单元,用于在所述检索字符串中插入至少一个字符,所述至少一个字符为所述备选字符集合中的字符;
    替换编辑单元,用于将所述检索字符串中的至少一个字符替换成所述备选字符集合中的字符;
    删除编辑单元,用于将所述检索字符串中的至少一个字符删除。
  14. 根据权利要求12或13所述的装置,其特征在于,所述生成单元具体用于若需要删除所述检索字符串中的目标字符时,以预设的自定义字符替换所述目标字符;
    所述查找单元具体用于当查找至所述目标字符串中的自定义字符时,忽略所述自定义字符后,继续查找与所述自定义字符相邻的下一个字符。
  15. 根据权利要求14所述的装置,其特征在于,所述推荐单元包括:
    权重推荐单元,用于如果查找到至少两个所述备选字符串,则根据所述备选字符串的权重从所述备选字符串中选择所述推荐检索字符串提供给用户。
  16. 根据权利要求15所述的装置,其特征在于,所述权重推荐单元,具体用于根据备选字符串在用户的检索历史记录中出现的频率,或根据备选字符串在预设的备选检索资料中出现的频率,从所述备选字符串中选择所述推荐检索字符串提供给用户。
  17. 一种计算设备,包括:
    一个或多个处理器;
    存储器;和
    一个或多个模块,所述一个或多个模块存储于所述存储器中并被配置成由所述一个或多个处理器执行,其中,所述一个或多个模块配置用于:
    根据接收的检索字符串,生成若干与所述检索字符串具有预定编辑距离的备选字符串;
    分别利用备选词库的字典树查找所述备选字符串;
    如果查找到所述备选字符串,则作为推荐检索字符串提供给用户。
  18. 一种在其上记录有用于执行权利要求1-9所述方法的程序的计算机可读记录介质。
PCT/CN2016/078309 2015-04-02 2016-04-01 一种搜索处理方法及装置 WO2016155662A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510155618.XA CN106156103B (zh) 2015-04-02 2015-04-02 一种搜索处理方法及装置
CN201510155618.X 2015-04-02

Publications (1)

Publication Number Publication Date
WO2016155662A1 true WO2016155662A1 (zh) 2016-10-06

Family

ID=57004596

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/078309 WO2016155662A1 (zh) 2015-04-02 2016-04-01 一种搜索处理方法及装置

Country Status (2)

Country Link
CN (1) CN106156103B (zh)
WO (1) WO2016155662A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684438A (zh) * 2018-12-26 2019-04-26 成都科来软件有限公司 一种具有父子层级结构检索数据的方法

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908653A (zh) * 2017-10-12 2018-04-13 阿里巴巴集团控股有限公司 一种数据处理方法及装置
CN109992749A (zh) * 2017-12-29 2019-07-09 珠海金山办公软件有限公司 一种文字显示方法、装置、电子设备及可读存储介质
CN108984701A (zh) * 2018-07-06 2018-12-11 郑州云海信息技术有限公司 云数据系统中数据管理方法和装置
CN109359481B (zh) * 2018-10-10 2021-09-14 南京小安信息科技有限公司 一种基于bk树的反碰撞搜索约减方法
CN110119442A (zh) * 2019-05-17 2019-08-13 北京思维造物信息科技股份有限公司 一种动态搜索方法、装置、设备及介质
CN110674362B (zh) * 2019-08-22 2022-06-07 视联动力信息技术股份有限公司 搜索推荐方法、装置、电子设备及可读存储介质
CN111026281B (zh) * 2019-10-31 2023-09-12 重庆小雨点小额贷款有限公司 一种客户端的词组推荐方法、客户端及存储介质
CN112069286B (zh) * 2020-08-28 2024-01-02 喜大(上海)网络科技有限公司 字典树参数更新方法、装置、设备及存储介质
CN112988834B (zh) * 2021-02-07 2023-03-10 潍坊北大青鸟华光照排有限公司 一种字典短语的查询方法
CN113342848B (zh) * 2021-05-25 2024-04-02 中国平安人寿保险股份有限公司 信息搜索方法、装置、终端设备及计算机可读存储介质
CN113419742B (zh) * 2021-07-21 2022-05-24 北京华大九天科技股份有限公司 一种字符串编码和查找方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005011078A (ja) * 2003-06-19 2005-01-13 Patolis Corp 類似単語検索装置、この方法、このプログラム、このプログラムを記録した記録媒体、および情報検索システム
JP2006039871A (ja) * 2004-07-26 2006-02-09 Patolis Corp 類義語検索装置、その方法、そのプログラム、そのプログラムを記録した記録媒体、および、情報検索装置
CN101916263A (zh) * 2010-07-27 2010-12-15 武汉大学 基于加权编辑距离的模糊关键字查询方法及系统
JP2013029891A (ja) * 2011-07-26 2013-02-07 Fujitsu Ltd 抽出プログラム、抽出方法及び抽出装置
WO2014136173A1 (ja) * 2013-03-04 2014-09-12 三菱電機株式会社 検索装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8073869B2 (en) * 2008-07-03 2011-12-06 The Regents Of The University Of California Method for efficiently supporting interactive, fuzzy search on structured data
KR101089424B1 (ko) * 2008-10-01 2011-12-07 주식회사 케이티 트라이 구조를 이용한 문자열 저장 방법, 검색 방법, 삭제 방법 및 문자열 저장 장치
CN103514236B (zh) * 2012-06-30 2017-06-09 重庆新媒农信科技有限公司 检索应用中基于拼音的检索条件纠错提示处理方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005011078A (ja) * 2003-06-19 2005-01-13 Patolis Corp 類似単語検索装置、この方法、このプログラム、このプログラムを記録した記録媒体、および情報検索システム
JP2006039871A (ja) * 2004-07-26 2006-02-09 Patolis Corp 類義語検索装置、その方法、そのプログラム、そのプログラムを記録した記録媒体、および、情報検索装置
CN101916263A (zh) * 2010-07-27 2010-12-15 武汉大学 基于加权编辑距离的模糊关键字查询方法及系统
JP2013029891A (ja) * 2011-07-26 2013-02-07 Fujitsu Ltd 抽出プログラム、抽出方法及び抽出装置
WO2014136173A1 (ja) * 2013-03-04 2014-09-12 三菱電機株式会社 検索装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684438A (zh) * 2018-12-26 2019-04-26 成都科来软件有限公司 一种具有父子层级结构检索数据的方法

Also Published As

Publication number Publication date
CN106156103A (zh) 2016-11-23
CN106156103B (zh) 2019-11-26

Similar Documents

Publication Publication Date Title
WO2016155662A1 (zh) 一种搜索处理方法及装置
CN109564571B (zh) 利用搜索上下文的查询推荐方法及系统
AU2017264388B2 (en) Searching structured and unstructured data sets
CN107463693B (zh) 一种数据处理方法、装置、终端及计算机可读存储介质
US9400775B2 (en) Document data entry suggestions
US8751968B2 (en) Method and system for providing a user interface for accessing multimedia items on an electronic device
US20140075393A1 (en) Gesture-Based Search Queries
WO2017148323A1 (zh) 用于内容文档排序的方法及装置
KR20160127810A (ko) 온스크린 아이템 선택 및 명확화를 위한 모델 기반 방식
US9875245B2 (en) Content item recommendations based on content attribute sequence
WO2016018681A2 (en) Presenting dataset of spreadsheet in form based view
CN109976793B (zh) 一种应用程序的运行方法、装置、设备和介质
US20210279297A1 (en) Linking to a search result
US20180129716A1 (en) Multi-Level Data Pagination
KR20140084069A (ko) 드래그 앤 드롭 항시 합계 수식
US20200342029A1 (en) Systems and methods for querying databases using interactive search paths
KR20160083759A (ko) 주석 제공 방법 및 장치
CN110168536B (zh) 上下文敏感概要
CN108292324A (zh) 内容创作内联命令
CN107402953A (zh) 一种页面跳转方法及装置
CN108140039B (zh) 流式传输来自并行批次的数据库访问的记录
CN105550217A (zh) 场景音乐搜索方法及场景音乐搜索装置
RU2679971C2 (ru) Осуществление доступа к семантическому контенту в системе разработки
KR20160056994A (ko) 이모티콘 추천 방법 및 이모티콘을 추천하는 사용자 단말
US20160140161A1 (en) File system with per-file selectable integrity

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16771417

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16771417

Country of ref document: EP

Kind code of ref document: A1