CN114036371A - Search term recommendation method, device, equipment and computer-readable storage medium - Google Patents

Search term recommendation method, device, equipment and computer-readable storage medium Download PDF

Info

Publication number
CN114036371A
CN114036371A CN202111264694.6A CN202111264694A CN114036371A CN 114036371 A CN114036371 A CN 114036371A CN 202111264694 A CN202111264694 A CN 202111264694A CN 114036371 A CN114036371 A CN 114036371A
Authority
CN
China
Prior art keywords
word
recommendation
chinese
identification information
recommended
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111264694.6A
Other languages
Chinese (zh)
Inventor
朱林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202111264694.6A priority Critical patent/CN114036371A/en
Publication of CN114036371A publication Critical patent/CN114036371A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method, a device and equipment for recommending search terms and a computer-readable storage medium. The method comprises the following steps: the search terms are obtained and are segmented, and complex search term combinations can be segmented into a plurality of segmentation elements. The multiple segmentation elements are respectively inquired for identification information in the identification mapping tables corresponding to the preset types according to different preset types to which the segmentation elements belong, and search word error correction can be performed according to the complex search word combination condition, so that the accuracy of the candidate recommended words is improved when the candidate recommended words are matched subsequently. And matching in the recommendation table according to the identification information of each of the plurality of segmentation elements to obtain candidate recommended words, and then sequencing the candidate recommended words according to the word frequency information to obtain a sequencing result, so that the search word recommendation is performed according to the sequencing result, and the accuracy of recommending the search words is improved.

Description

Search term recommendation method, device, equipment and computer-readable storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for recommending search terms.
Background
The method comprises the steps that an inlet of a retrieval engine corresponds to a search bar, the quality of recommended search words directly influences whether data returned to a user meet the requirements of the user, and the search word pull-down recommendation means that when the search bar has any input content, the recommended bar below the search bar displays words related to the input content. The search term pull-down recommendation can also be understood as an automatic completion according to the search term, so as to save the input cost in the user search input process, predict and expand the intention of the user, and is a keyword association service provided by the search engine for enabling the user to input fewer characters, thereby improving the user search efficiency.
In the prior art, a background of a search engine needs to maintain a recommended word bank, store the recommended word bank in a prefix tree structure, match in the prefix tree structure according to search words after a user inputs the search words, return recommended words matched with the search words, sort the recommended words, and display the sorted recommended words in a pull-down recommendation bar.
However, in the way of recommending search terms through the prefix tree, accurate identification and recommendation cannot be performed for complex search term combinations, and the accuracy of recommending search terms is reduced.
Disclosure of Invention
The embodiment of the invention provides a search term recommendation method, a search term recommendation device and a computer readable storage medium.
The technical scheme of the embodiment of the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a search term recommendation method, where the method includes: acquiring a search word; carrying out segmentation processing on the search terms to obtain a plurality of segmentation elements; respectively inquiring identification information of the plurality of segmentation elements in an identification mapping table corresponding to a preset type according to the preset type to which the plurality of segmentation elements belong; matching in a recommendation table according to the identification information of the segmentation elements to obtain candidate recommended words; and sequencing the candidate recommended words according to the word frequency information to obtain a sequencing result, so that recommendation is performed according to the sequencing result.
In a second aspect, an embodiment of the present invention provides a search term recommendation apparatus, where the apparatus includes: the acquisition module is used for acquiring search terms; the segmentation module is used for carrying out segmentation processing on the search terms to obtain a plurality of segmentation elements; the query module is used for respectively querying the respective identification information of the plurality of segmentation elements in the identification mapping table corresponding to the preset type according to the preset type to which the plurality of segmentation elements belong; the matching module is used for matching in a recommendation table according to the identification information of each of the plurality of segmentation elements to obtain candidate recommended words; and the recommending module is used for sequencing the candidate recommended words according to the word frequency information to obtain a sequencing result, so that recommendation is carried out according to the sequencing result.
In a third aspect, an embodiment of the present invention provides a search term recommendation device, where the device includes a memory for storing executable instructions, and a processor for implementing the search term recommendation method when executing the executable instructions stored in the memory.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which executable instructions are stored, and when the computer-readable storage medium is executed by a processor, the method for recommending search terms is implemented.
The embodiment of the invention provides a search term recommendation method, a search term recommendation device, search term recommendation equipment and a computer readable storage medium. According to the scheme provided by the embodiment of the invention, the search word is obtained, and the search word is segmented to obtain a plurality of segmentation elements; and respectively inquiring the identification information of the plurality of segmentation elements in the identification mapping table corresponding to the preset type according to the preset type to which the plurality of segmentation elements belong. By the segmentation processing, the complex search word combination can be segmented into a plurality of segmentation elements, the plurality of segmentation elements are queried for identification information according to different types, and search word error correction can be performed according to the complex search word combination, so that the accuracy of candidate recommended words is improved when the candidate recommended words are matched subsequently. And matching in the recommendation table according to the identification information of each of the plurality of segmentation elements to obtain candidate recommended words, and sequencing the candidate recommended words according to the word frequency information to obtain a sequencing result so as to recommend the search words according to the sequencing result. Through segmentation processing, respective identification information of segmentation elements is inquired in an identification mapping table corresponding to a preset type, then candidate recommended words are matched according to the identification information, and then search word recommendation is carried out according to the sorting result of the candidate recommended words, so that the accuracy of recommending the search words is improved.
Drawings
Fig. 1 is a flowchart illustrating optional steps of a search term recommendation method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating optional steps of another method for recommending search terms according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating optional steps of yet another method for recommending search terms according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating optional steps of a sliding window matching principle according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating optional steps of a further method for recommending search terms according to an embodiment of the present invention;
fig. 6 is an alternative system architecture diagram of a search term recommendation method according to an embodiment of the present invention;
FIG. 7 is a flowchart illustrating optional steps of a further method for recommending search terms according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a search term recommendation apparatus according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a search term recommendation device according to an embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It should be understood that some of the embodiments described herein are only for explaining the technical solutions of the present invention, and are not intended to limit the technical scope of the present invention.
In order to better understand the search term recommendation method provided in the embodiment of the present invention, prior to introducing the technical solution of the embodiment of the present invention, a description is given to related technologies.
The automatic completion of the search term must be responsive quickly and update the recommended term list immediately after the user enters the next character in order to make the search term recommendations to the user quickly. The related technical scheme adopts a prefix tree mode. The prefix tree, which may be referred to as a word lookup tree, is a tree-shaped data structure that uses common prefixes of strings to accelerate completion speed, and is used to store a large number of strings. A set of words is arranged in a tree of nodes, the words being stored along a path from a root node to a leaf node, the hierarchy of the tree corresponding to the letter positions of the prefixes. Prefix completion is found along the path defined by the prefix, and the core of prefix tree-based autocompletion is illustratively a function that accepts a prefix of a search term entered by a user and searches a list of terms beginning with the given prefix. If no path defined by the prefix is found in the tree, it is stated that the recommended thesaurus does not contain words beginning with the prefix.
However, when the search term recommendation is implemented in the prefix tree manner, the search term cannot be accurately identified and effectively recommended for complex search term combinations, for example, scenes of pinyin mixed input of Chinese characters, and the accuracy of the search term recommendation is reduced.
Based on the defects in the related art, the invention provides a search word recommendation method which can be applied to pull-down recommendation of search words of various search engines, can correct search word errors aiming at complex search word combinations, and can be exemplarily applied to the E-commerce field and other scenes needing to support Chinese, Pinyin and English at the same time. As shown in fig. 1, fig. 1 is a flowchart illustrating steps of a search term recommendation method according to an embodiment of the present invention, where the search term recommendation method includes the following steps:
s101, obtaining search terms.
The search terms in the embodiment of the invention represent terms input in a search bar when a user searches. The search term may be a Chinese character, a pinyin acronym, English, or a combination of any two or more of the foregoing.
It should be noted that the pinyin in the embodiment of the present invention refers to syllables formed by various combinations of 23 initial consonants, 24 final consonants, and 16 whole syllables, and for example, 410 pinyin is provided.
S102, carrying out segmentation processing on the search terms to obtain a plurality of segmentation elements.
The search terms can be understood as query character strings, the search terms can comprise characters of various different types, and when the search terms are segmented, the segmentation method can be set properly by a person skilled in the art according to actual requirements, and the search terms can be segmented effectively. For example, a general segmentation method may be adopted, which first maintains 410 pinyins, and then performs comparison from front to back, and some special cases are processed, so as to implement the segmentation process for the search word.
For a search term composed of a single character, for example, "today weather", "jintianianqi", "today' sweother", a general segmentation method may be adopted to segment the search term, which is described by taking "today weather" as an example, and after the search term is segmented, 4 segmentation elements are obtained, where the 4 segmentation elements include "today", "day", and "qi". For a complex search word combination, for example, taking input of "jing dongj" as an example, after a segmentation process is performed on a search word, a plurality of segmentation elements are obtained, and the plurality of segmentation elements include "jing", "dong", and "j".
S103, respectively inquiring identification information of the multiple segmentation elements in an identification mapping table corresponding to a preset type according to the preset type to which the multiple segmentation elements belong.
Different segmentation elements belong to different types, the segmentation elements 'Jing', 'dong' and 'j' are taken as examples for explanation, the type of the segmentation element 'Jing' belongs to characters, the segmentation element 'Dong' belongs to pinyin, and the type of the segmentation element 'j' belongs to pinyin first letters. Correspondingly, the identifier mapping table corresponding to the preset type may include a word table and a spelling table, where each word and the identifier information corresponding to each word are stored in the word table. Each pinyin and identification information corresponding to each pinyin are stored in the pinyin table. Since the letter is only needed to correspond to the letter, the letter has no corresponding identification mapping table. Illustratively, the identification information represents unique information of the segment element, for example, the identification information may be an Identity Document (ID).
Compared with the scheme of recommending the search words through the prefix tree, the embodiment of the invention can segment the complex search word combination into a plurality of segmentation elements through segmentation processing, and query the identification information of the plurality of segmentation elements according to different types, can correct the search words aiming at the complex search word combination, and can correct the search words input by Chinese pinyin mixing into correct words, thereby improving the accuracy of the candidate recommended words when matching the candidate recommended words subsequently.
And S104, matching in a recommendation table according to the identification information of each of the plurality of segmentation elements to obtain candidate recommended words.
Aiming at complex search word combinations, the identification information of each of a plurality of segmentation elements belongs to different types and needs to be matched in different recommendation tables, and the recommendation tables in the embodiment of the invention comprise a Chinese recommendation table and an English recommendation table. According to the embodiment of the invention, matching is carried out in the Chinese recommendation table and/or the English recommendation table according to the respective identification information of the multiple segmentation elements, so as to obtain the candidate recommended words. The number of the candidate recommended words may be one or more, and the number of the candidate recommended words is not limited in the embodiment of the present invention.
And S105, sequencing the candidate recommended words according to the word frequency information to obtain a sequencing result, and recommending according to the sequencing result.
In general, there may be many candidate recommended words obtained according to S101-S104, and the user interface may display only a limited number of candidate recommended words, so that the most frequently searched or most valuable candidate recommended words need to be displayed. In the embodiment of the invention, the candidate recommended words in the recommendation table all correspond to the word frequency information, and the word frequency information is used for representing the weight (weight) of the candidate recommended words and can also be understood as a recommendation coefficient, an importance degree or historical search times and the like. In the embodiment of the invention, the candidate recommended words are ranked according to the word frequency information to obtain the ranking result, and the ranking result is ranked according to the importance degree of the candidate recommended words and can be used for recommending to a user, so that the search words are supplemented automatically, and the recommendation accuracy of the search words is improved.
According to the scheme provided by the embodiment of the invention, the search word is obtained, and the search word is segmented to obtain a plurality of segmentation elements; and respectively inquiring the identification information of the plurality of segmentation elements in the identification mapping table corresponding to the preset type according to the preset type to which the plurality of segmentation elements belong. By the segmentation processing, the complex search word combination can be segmented into a plurality of segmentation elements, the plurality of segmentation elements are queried for identification information according to different types, and search word error correction can be performed according to the complex search word combination, so that the accuracy of candidate recommended words is improved when the candidate recommended words are matched subsequently. And matching in the recommendation table according to the identification information of each of the plurality of segmentation elements to obtain candidate recommended words, and sequencing the candidate recommended words according to the word frequency information to obtain a sequencing result so as to recommend the search words according to the sequencing result. Through segmentation processing, respective identification information of segmentation elements is inquired in an identification mapping table corresponding to a preset type, then candidate recommended words are matched according to the identification information, and then search word recommendation is carried out according to the sorting result of the candidate recommended words, so that the accuracy of recommending the search words is improved.
In some embodiments, the preset type includes a word, a pinyin, and a pinyin initial, the identification mapping table includes a word table and a pinyin table, and the identification information includes first identification information, second identification information, and third identification information. The above S103 may be implemented in a manner that the types of the segmentation elements are characters, pinyin, and first letters of pinyin, respectively. If the preset type of the segmentation elements is the segmentation elements of the characters, inquiring first identification information of the segmentation elements in a character table; if the preset type of the segmentation elements is the segmentation elements of the pinyin, inquiring second identification information of the segmentation elements in the pinyin table; if the segmentation element with the preset type as the pinyin initial exists in the segmentation elements, the third identification information of the segmentation element is the pinyin initial itself.
The search word can be understood as a query character string, and for a complex search word combination, the preset type to which the segmentation element after segmentation processing belongs can be any one or more of three types of characters, pinyin and pinyin initials. The split-processed query string can be understood as a data structure of the Intelligent information (Intelligent Info) type.
The character table stores each character and identification information corresponding to each character, and the spelling table stores the spelling of each character and the identification information corresponding to the spelling of each character. Taking the example that the characters are Chinese characters, exemplarily, a character table stores 6000 Chinese characters and id corresponding to each Chinese character, and can also be understood as mapping from 6000 Chinese characters in common use to id; the pinyin table stores 410 pinyins and ids corresponding to the pinyins, and may also be understood as mapping from the 410 pinyins to the pinyin ids. According to the embodiment of the invention, the identification information is inquired in the corresponding word table or phonetic table according to the type of the segmentation element, so that the accuracy of the identification information is improved.
In some embodiments, the recommendation table in S104 may be generated through S201 to S203, as shown in fig. 2, fig. 2 is a flowchart illustrating optional steps of another method for recommending search terms according to an embodiment of the present invention.
S201, acquiring a preset word bank and a user historical query word bank.
The preset word bank comprises a plurality of recommended words and word frequency information of each recommended word, and the word frequency information is used for representing the weight (weight) of the candidate recommended words. For example, the recommended word in the preset word bank may be a high-frequency word selected from a corpus, and the corpus may be generated after performing word segmentation and part-of-speech tagging on a preset plain text corpus. The preset word bank can be understood as a predefined set of high-frequency words, and the high-frequency words can be analyzed and determined through a large number of searching times or searching frequencies of the words, as long as the searching frequencies of the words in the preset word bank can be reflected.
The user history query word bank represents search data in the user history, can also be understood as a query log of the user, and can comprise a plurality of search words and word frequency information of the search words. The user history query word bank is related to search words of the user during searching, and can be analyzed and determined through a large amount of search data of the user, which is collected by a search engine, and the embodiment of the invention is not limited.
S202, a word bank is inquired according to the history of the user, recommended words in a preset word bank and word frequency information of the recommended words are updated, and the recommended word bank is generated.
The embodiment of the invention can analyze the user historical query word bank, select the high-frequency words in the user historical query word bank, then perform updating operations such as supplement addition or deletion on the recommended words in the preset word bank, and also perform updating operations such as supplement addition or deletion on the word frequency information of the recommended words in the preset word bank according to the word frequency information of the search words in the user historical query word bank, and continuously optimize and update the recommended word bank, thereby generating the recommended word bank. For example, the update frequency of the recommended word library may be adjusted according to the actual situation of the application, and for example, may be set to be updated once per day.
According to the embodiment of the invention, the recommended words in the preset word bank and the word frequency information of the recommended words are updated by combining the high-frequency words analyzed from the user historical query word bank, so that the accuracy of generating the recommended word bank is improved.
The recommendation word bank comprehensively considers the preset word bank and the user historical query word bank, is a user personalized recommendation word bank, is closely related to the query habit of the user, enables the user to be recommended with the search words through the recommendation word bank, and improves the accuracy of recommendation of the search words.
And S203, generating a recommendation table according to the recommendation word bank and the identification mapping table.
The recommended word library stores a plurality of recommended words, and the identification mapping table stores each word and identification information corresponding to each word, pinyin of each word and identification information corresponding to the pinyin of each word. And each recommended word in the recommended word bank comprises one or more words, each word in each recommended word and the pinyin of each word are mapped according to the identification mapping table, and the recommendation table is generated by combining the pinyin initial letters of each word. The recommendation table stores a plurality of recommendation words, and identification information of each word in each recommendation word, identification information corresponding to pinyin of each word and pinyin first letter of each word.
In some embodiments, the mapping table includes a word table and a phonetic table, the word library includes a Chinese word library and an English word library, and the recommendation table includes a Chinese recommendation table and an English recommendation table. The above S203 may be implemented by: generating a Chinese recommendation table according to the Chinese recommendation word stock, the word table and the phonetic table and a first preset data structure, wherein the first preset data structure comprises: the method comprises the following steps of recommending first identification information corresponding to each character in the word, recommending second identification information corresponding to the pinyin of each character in the word, and recommending the pinyin first letter of each character in the word; generating an English recommendation table according to the English recommendation word bank and a second preset data structure, wherein the second preset data structure comprises: each letter in the recommended word.
The word library comprises a Chinese word library and an English word library, wherein the Chinese word library stores Chinese words, and the English word library stores English words. The identification mapping table comprises a word table and a phonetic table.
When generating the recommendation table, the first example may be implemented by generating the chinese recommendation table according to the first preset data structure according to the chinese recommendation word stock, the word table, and the phonetic table. The Chinese recommendation table stores a plurality of Chinese recommendation words, identification information of each character in each Chinese recommendation word, identification information corresponding to pinyin of each character and pinyin first letter of each character. Wherein, first preset data structure includes: the method comprises the steps of recommending first identification information corresponding to each character in the word, recommending second identification information corresponding to the pinyin of each character in the word, and recommending the pinyin first letter of each character in the word. Illustratively, the first preset data structure is shown in table 1, where table 1 is a schematic diagram of a selectable data table format of a chinese recommendation table provided in an embodiment of the present invention, a Field in table 1 represents different storage structures, a word stores multiple recommendation words, a character _ id stores an id corresponding to each word in the recommendation words, a phonetic _ id stores an id corresponding to a pinyin of each word in the recommendation words, and an initial _ char stores a pinyin initial of each word in the recommendation words. Type represents different data types, varchar represents variable-length character strings, Smallant represents short integer data, and char represents letters.
TABLE 1
Field Type
word varchar
charact_id smallint
phonetic_id smallint
initial_char char
In a second example, an english recommendation table is generated according to an english recommendation word bank and a second preset data structure. The English recommendation table stores a plurality of English recommendation words and each letter in each English recommendation word. Wherein the second predetermined data structure comprises: each letter in the recommended word. For example, the second preset data structure is shown in table 2, where table 2 is a schematic diagram of an optional data table format of an english recommendation table provided in the embodiment of the present invention, and the meaning indicated by the text content in table 2 is the same as that in table 1, and is not described herein again. In contrast, since only the letters need to be stored in the english recommendation table, each letter is stored in the initial _ char in table 2.
TABLE 2
Field Type
initial_char char
It should be noted that the first and second embodiments of the present invention are only for distinguishing names, do not represent sequential relationships, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features, such as the first identification information, the second identification information, the first preset data structure, and the second preset data structure.
All words in the recommended thesaurus, that is, words that may be recommended to the user, are stored in Word in table 1. For example, the Chinese recommendation word "Jingdonghui" is taken as an example, that is, word is "Jingdonghui", and the following is an example of one Chinese recommendation word in the Chinese recommendation table.
word Jingdong finance
charact_id_01:101
charact_id_02:22
charact_id_03:5968
charact_id_04:5162
phonetic_id_01:23
phonetic_id_02:14
phonetic_id_03:22
phonetic_id_04:265
initial_char_01:j
initial_char_02:d
initial_char_03:j
initial_char_04:r
The chinese recommendation table includes identification information corresponding to each word in "jingdong finance", for example, "jing" corresponds to a character _ id _01, whose identification information is 101, "east" corresponds to a character _ id _02, whose identification information is 22, "gold" corresponds to a character _ id _03, whose identification information is 5968, "financing" corresponds to a character _ id _04, whose identification information is 5162. The Chinese recommendation table also includes identification information corresponding to the pinyin of each word in "Jingdongdui", for example, "jin" corresponds to the phonetic _ id _01, the identification information is 23, "dong" corresponds to the phonetic _ id _02, the identification information is 14, "jin" corresponds to the phonetic _ id _03, the identification information is 22, and "rong" corresponds to the phonetic _ id _04, the identification information is 265. The chinese recommendation table also includes the pinyin initials of each word in "jingdong finance", for example, "j" corresponds to initial _ char _01, i.e., "j", "d" corresponds to initial _ char _02, i.e., "d", "j" corresponds to initial _ char _03, i.e., "j", and "r" corresponds to initial _ char _04, i.e., "r".
It should be noted that the above identification information may be set by a person skilled in the art according to actual situations, and here, only "101, 22, 5968, 5162, 23, 14, 22, 265" is taken as an example for description, and does not represent specific contents of the identification information in the embodiment of the present invention.
In some embodiments, S104 and S105 may also be implemented through S301 and S302, and it may also be understood that the search term recommendation method provided by the embodiment of the present invention includes S101, S102, S103, S301, and S302. Based on fig. 1, as shown in fig. 3, fig. 3 is a flowchart illustrating optional steps of another search term recommendation method according to an embodiment of the present invention.
S301, matching is carried out in the Chinese recommendation table according to the identification information of each of the plurality of segmentation elements, and Chinese candidate recommended words are obtained.
The candidate recommended words comprise Chinese candidate recommended words, the recommended table comprises a Chinese recommended table, the Chinese recommended table is used for storing the mapping relation between the recommended words and the identification information and the pinyin initial letters of the pinyin of each character in the recommended words, and the identification information comprises first identification information corresponding to each character in the recommended words and second identification information corresponding to the pinyin of each character in the recommended words.
S302, if the number of the Chinese candidate recommended words is a positive integer, sorting the Chinese candidate recommended words to obtain a sorting result.
The embodiment of the invention can correct the error of the search word aiming at the condition of complex search word combination by segmenting the search word to obtain a plurality of segmentation elements and inquiring the identification information of the segmentation elements in the identification mapping table corresponding to the preset type, for example, the search word 'Jingdongjr' input by mixing Chinese pinyin can be corrected into correct words.
Because the mapping relation between the recommended word and the identification information and the pinyin first letter of the pinyin of each word in the recommended word are stored in the Chinese recommendation table, the identification information comprises the first identification information corresponding to each word in the recommended word and the second identification information corresponding to the pinyin of each word in the recommended word, matching can be performed in the Chinese recommendation table according to the identification information of each of the plurality of segmentation elements, and the Chinese candidate recommended word is obtained. If the number of the Chinese candidate recommended words is not 0, the input search words contain Chinese characters, and the situation that the user wants to input English is not considered, but the situation that the input search words contain pinyin is similar to 'Jingdong'. The content related to Chinese needs to be inquired, and the Chinese candidate recommended words are recommended to the search bar, at the moment, the search in an English recommendation table is not needed, and the Chinese candidate recommended words are directly sorted according to the word frequency information to obtain a sorting result.
The search word recommendation method provided by the embodiment of the invention can not only recommend the search words composed of conventional single characters, but also recommend the search words for complex search word combinations, thereby improving the accuracy of recommending the search words. According to the input habit of the user, if the first character in the search word is Chinese, the user has a high probability of searching the Chinese. Therefore, if the number of the Chinese candidate recommended words matched in the Chinese recommendation table is a positive integer according to the identification information of each of the multiple segmentation elements, which indicates that the user wants to search Chinese at this time, the Chinese candidate recommended words are ranked to obtain a ranking result, and the recommendation efficiency is improved.
In some embodiments, the above S301 may also be implemented by: according to the identification information of each of the multiple segmentation elements, matching is carried out in a Chinese recommendation table by taking the identification information of the first segmentation element as a prefix, so as to obtain an original Chinese candidate recommended word; according to the position sequence of the identification information, sliding the position of one or more identification information backwards, and then matching in a Chinese recommendation table to obtain a supplementary Chinese candidate recommended word; the Chinese candidate recommended words comprise original Chinese candidate recommended words and supplementary Chinese candidate recommended words.
Illustratively, the example that the search word is "jingdong" is taken as an example to explain, the search word is segmented to obtain a plurality of segmentation elements "jingdong", "dong" and "j", which respectively represent three types of characters, pinyin full pinyin and pinyin initials. Then, the word table and the spelling table are inquired to obtain the id numbers corresponding to the "Jing" and the "dong" as id1 and id2, and the inquiry is not needed because the "j" corresponds to the letter. In this example, the matching in the Chinese recommendation table can be implemented by the following pseudo code.
select word from intelligent_table where charact_id[1]=id1 and phonetic_id[2]=id2 and initial_char[3]="j"。
The Chinese candidate recommended words which meet the conditions and take 'Jing' as a prefix can be inquired through the pseudo code.
But the recommendation in the middle position for "jing" still does not match, e.g. the recommendation for "rongjing east street" does not match because "jing" is in the second position, not the first. Therefore, the embodiment of the invention adopts a mode of sliding the search condition backwards to continue matching, and can be realized by the following pseudo code.
Figure BDA0003326579160000121
Wherein, N is the preset maximum word length, which is generally set to 8, that is, words exceeding 8 words are not recommended, len is the word length of the search condition, and doc _ set is the candidate recommended word for supplementing chinese according to the search word matching.
As shown in fig. 4, fig. 4 is a flowchart illustrating an optional step of a sliding window matching principle according to an embodiment of the present invention. N in fig. 4 is a preset maximum word length, N is equal to 8 and the search word is "jing dongj" in fig. 4 is taken as an example for explanation, id1 in fig. 4 represents the identification information of a channel _ id _01, i.e., "jing", id2 represents the identification information of a phonetic _ id _02, i.e., "dong", and id3 represents the identification information of an initial _ char _03, i.e., "j". The explanation will be given by taking the identification information id1 id2id3 of a plurality of sliced elements in order as an example of the search condition.
When the I is 1, the search condition is shifted to the right by one bit, namely the identification information of each of the segmentation elements slides to the right by the position of one identification information according to the position sequence of the identification information id1 id2id3, and then matching is carried out in a Chinese recommendation table to obtain a supplementary Chinese candidate recommended word, wherein the matching method of the supplementary Chinese candidate recommended word is realized by the following pseudo codes.
select word from intelligent_table where charact_id[2]=id1 and phonetic_id[3]=id2 and initial_char[4]="j"。
And so on for other cases. For example, when I is 3, the search condition is shifted to the right by three bits, that is, the respective identification information of a plurality of segmentation elements is slid to the right by the positions of three identification information according to the position sequence of the identification information id1 id2id3, and then matching is performed in the chinese recommendation table to obtain a complementary chinese candidate recommended word, where the matching method of the chinese candidate recommended word is implemented by the following pseudo code.
select word from intelligent_table where charact_id[4]=id1 and phonetic_id[5]=id2 and initial_char[6]="j"。
When recommendation is performed according to the search term, in the related art, for example, the prefix tree and the DFA, only the recommendation term with the search term as the prefix can be returned, so that the accuracy and richness of the recommendation term are reduced. In the embodiment of the invention, a sliding window method is adopted to dynamically adjust the search condition, namely, the positions of the identification information of a plurality of segmentation elements in the search word are adjusted, for example, after one or more identification information positions are slid backwards, matching is carried out in the Chinese recommendation table according to the position sequence of the identification information, so that the matched complementary Chinese candidate recommended words are richer, the recommendation efficiency is improved, and the recommendation accuracy of the search word is improved.
The embodiment of the invention not only can return the original Chinese candidate recommended word of the search word at the prefix of the recommended word, but also can return the supplementary Chinese candidate recommended word of the search word at the middle part of the recommended word, thereby improving the richness and the accuracy of the Chinese candidate recommended word.
In some embodiments, after the step S301 is executed, the embodiment of the present invention may further execute steps S303 to S305, and it is also understood that the search term recommendation method provided by the embodiment of the present invention includes steps S101, S102, S103, S301, S303, S304, and S305. Based on fig. 1 and fig. 3, as shown in fig. 5, fig. 5 is a flowchart illustrating an optional step of another search term recommendation method according to an embodiment of the present invention, it should be noted that S302 and S303-S305 are parallel, after S301, S303-S305 may be executed, or S302 may be executed, a specific execution method is determined according to the number of candidate chinese recommenders, and the embodiment of the present invention is not limited thereto.
And S303, if the number of the Chinese candidate recommended words is zero, performing single letter segmentation processing on the multiple segmentation elements to obtain multiple letters.
And S304, matching in the English recommendation table according to the plurality of letters to obtain English candidate recommended words.
The candidate recommended words further comprise English candidate recommended words, the recommendation table further comprises an English recommendation table, and the English recommendation table is used for storing each letter of the recommended words.
S305, sorting the English candidate recommended words to obtain a sorting result.
If matching is performed in the Chinese recommendation table according to the identification information of each of the multiple segmentation elements, the number of the obtained Chinese candidate recommended words is zero, which indicates that the user needs to search English with a high probability, and therefore, the multiple segmentation elements need to be subjected to single letter segmentation processing to obtain multiple letters. Because each letter of the recommended word is stored in the English recommended table, the English candidate recommended words are obtained by matching the letters in the English recommended table, and then the English candidate recommended words are ranked to obtain a ranking result.
The embodiment of the invention respectively matches in the Chinese recommendation table to obtain the Chinese candidate recommendation words and matches in the English recommendation table to obtain the English candidate recommendation words according to the number of the Chinese candidate recommendation words, improves the accuracy of matching results compared with the proposal of searching word recommendation through a prefix tree,
it should be noted that, for a special search term, for example, the input search term is "women," and first needs to be split into wo and men according to pinyin, and matching is performed in the chinese recommendation table, if the number of the chinese candidate recommendation terms is zero, the chinese candidate recommendation terms are split according to the english recommendation table, and are split one by one, and are split into w, o, m, e, and n, respectively.
In the following, an exemplary application of the embodiments of the present invention in a practical application scenario will be described.
The embodiment of the invention provides a search word recommendation method, which combines a preset word bank and a user historical query word bank to generate a recommended word bank and updates the recommended word bank in real time. And converting the recommended word bank into a recommended table which has a mapping relation with the identification information such as the characters, the pinyin initial letters and the like. And then, carrying out segmentation processing on the search words input by the user, and inquiring the identification information of the segmentation elements in the corresponding word list or the corresponding spelling list according to the preset type of the segmentation elements. The method can correct the errors of the search words input by mixing the pinyin of the Chinese characters, thereby improving the accuracy of the candidate recommended words matched based on the recommendation table. The embodiment of the invention also provides search word recommendation of the sliding window based on the recommendation table, not only can return the search word with the search word as the prefix, but also can return the recommendation word of the search word at the middle position, and improves the richness and the accuracy of the search word recommendation.
To facilitate understanding of the present solution, before a search term recommendation method is introduced in detail, a search term recommendation system is described, as shown in fig. 6, and fig. 6 is an alternative system architecture diagram of a search term recommendation method according to an embodiment of the present invention.
1. And a recommendation word bank updating module.
And the recommended word bank updating module is used for updating the recommended word bank, inquiring the word bank according to the preset word bank and the user history and updating the recommended word bank. Firstly, selecting high-frequency words from a corpus as a preset word bank, subsequently analyzing a user query log in a user historical query word bank, selecting the high-frequency words, supplementing recommended words to the preset word bank, updating word frequency information of the recommended words in the preset word bank, and continuously optimizing the preset word bank to obtain the recommended word bank. The updating frequency of the recommended word bank can be adjusted according to the practical situation of the application, and can be set to be updated once a day. The recommendation word bank is a personalized recommendation word bank of the user and is closely related to the query habit of the user, so that the search word recommendation is performed on the user through the recommendation word bank, and the accuracy of the search word recommendation is improved.
And loading and analyzing all the recommended word libraries and constructing related data structures. And generating a recommendation table according to the recommendation word stock, the word table and the phonetic table. The word recommending library comprises a Chinese word recommending library and an English word recommending library, and the recommending table comprises a Chinese recommending table and an English recommending table.
2. And a preprocessing module.
The preprocessing module can be understood as a segmentation module and the search terms can be understood as a query string. The method comprises the steps of preprocessing a query character string input by a user, namely performing segmentation operation according to the type of the query character string, and segmenting the input query character string into three parts, namely a character, pinyin and a pinyin initial letter. The preprocessed query string is converted into an intelligentitinfo type data structure. As shown in the data structure of the IntelligentInfo type in table 1 above, the character _ id stores the id corresponding to each word in the recommended word, the phoneticic _ id stores the id corresponding to the pinyin for each word in the recommended word, and the initial _ char stores the pinyin initial corresponding to each word in the recommended word.
3. And a recommended word matching module.
And matching the query character strings input by the user based on the recommendation table, and adding the matched words into the candidate set. And simultaneously expanding the candidate set in a sliding window mode. And sorting the candidate recommended words in the candidate set from large to small according to the word frequency information, and returning a sorting result to the user. Because the word frequency information of the recommended word can be dynamically changed along with the query condition of the user, the recommended word also needs to be dynamically updated. The candidate set comprises one or more of original Chinese candidate recommended words, supplementary Chinese candidate recommended words and English candidate recommended words.
Based on the system architecture diagram of fig. 6, the embodiment of the present invention provides a search term recommendation method, which belongs to a link between query input and query submission, that is, a link between query input and query submission is an intermediate link, that is, a query string input by a user is analyzed and processed before being submitted to a search system. After the user inputs the search word, the closest candidate recommended word is provided for the user, so that the recommendation efficiency and the accuracy of the search word are improved. As shown in fig. 7, fig. 7 is a flowchart illustrating optional steps of another method for recommending search terms according to an embodiment of the present invention.
The description will be given by taking an example in which the query character string represents a search word, the query condition represents identification information of each of the plurality of segmentation elements, and the matched word represents a chinese candidate recommended word or an english candidate recommended word.
1. The word stock is inquired according to a preset word stock and a user history to generate a Chinese recommended word stock and an English recommended word stock, the search word recommending system loads and analyzes the recommended word stock in the running process, and a recommending table is generated by combining a word table and a spelling table. This step of generating the recommendation table is not shown in fig. 7, since it may be done before the search term recommendation phase. The word recommendation library in the step comprises a Chinese word recommendation library and an English word recommendation library, and the recommendation table comprises a Chinese recommendation table and an English recommendation table.
2. The method comprises the steps of preprocessing a query character string input by a user, namely performing segmentation operation according to the type of the query character string, and segmenting the input query character string into three parts, namely a character, pinyin and a pinyin initial letter.
3. Inputting the preprocessed query conditions into a Chinese recommendation table for matching, matching the query conditions again after sliding displacement, and adding all matched terms into a candidate set.
4. And judging the number of recommended words in the candidate set, if not equal to 0, skipping to the 6 th step, otherwise, executing the 5 th step.
5. If the query condition only contains letters, preprocessing the query condition according to the format of the English recommendation table, namely performing segmentation processing on the query condition by using a single letter, putting the preprocessed query condition into the English recommendation table for matching, and adding all matched terms into the candidate set.
6. And sorting the candidate recommended words in the candidate set according to the word frequency information, and outputting a sorting result.
According to the search word recommendation method provided by the embodiment of the invention, in the initialization part, the Chinese recommendation word bank and the English recommendation word bank are loaded and analyzed according to the Chinese recommendation word bank, the English recommendation word bank, the word bank and the pinyin bank, and data construction is carried out according to a data table structure shown in table 1 to generate the Chinese recommendation table and the English recommendation table. Receiving a query character string input by a user, preprocessing the query character string, segmenting the query character string into characters, pinyin full pinyin and pinyin first letters, and completing the process of correcting the complicated query character string. After the preprocessing is finished, matching is respectively carried out in the Chinese recommendation table and the English recommendation table, and all matched words are added into the candidate set, so that the accuracy of the candidate set is improved. When matching is carried out in the Chinese recommendation table, the query conditions are adjusted based on the sliding window principle, then matching is continued, and all matched words are added into the candidate set, so that the richness and the accuracy of the candidate set are improved. And then sorting the candidate recommended words in the candidate set according to the sequence of the word frequency information from large to small, and outputting a sorting result, wherein the sorting result can be used for recommending the search words to the user.
The search term recommendation method provided by the embodiment of the invention comprises an initialization stage and a search term recommendation stage which are introduced below respectively.
The embodiment of the invention is characterized in that a recommendation table is established in an initialization stage, a Chinese recommendation word bank and an English recommendation word bank are established firstly, the Chinese recommendation word bank is loaded and analyzed, and a word table and a spelling table are combined, so that the Chinese recommendation table and the English recommendation table are established, and the initialization work is completed. For the generation of the recommended word stock, the recommended word stock is obtained from the high-frequency words in the corpus, and the high-frequency words are selected for supplement through the analysis of the query log of the user, so that the recommended word stock is optimized continuously, and the accuracy of the recommended list is improved.
In the stage of recommending the search terms, the embodiment of the invention divides the query character string into the characters, the pinyin and the pinyin initial letters according to the query character string input by the user, and can correct the errors aiming at the search terms input by mixing the pinyin of the Chinese characters, thereby improving the accuracy of recommending the search terms. And querying a character _ id corresponding to the character and a phonemic _ id corresponding to the pinyin to obtain converted query contents, namely the identification information of each segmentation element in the query character string. Then, the converted query contents are matched in a recommendation table, and all matched words are added into the candidate set, so that the accuracy of the candidate set is improved. When the Chinese recommendation tables are matched, search word recommendation is performed based on the sliding window principle, and the recommended words which can be recommended to the user include not only recommended words with the search words as prefixes, but also recommended words with the search words in the middle positions, so that the candidate set is greatly expanded, and the richness and the accuracy of the candidate recommended words are improved.
Compared with the proposal of recommending the search terms through the prefix tree, the search term recommending method provided by the embodiment of the invention not only enhances the error correction capability of complex search term combinations. And the matching method based on the sliding window principle realizes the expansion of the query character string input by the user, and returns the candidate recommended words in the candidate set to the user after sequencing the candidate recommended words according to the word frequency information, so that the user can be helped to quickly locate the content to be queried, the searching efficiency is improved, and the recommendation accuracy of the search words is improved.
In order to implement the search term recommendation method according to the embodiment of the present invention, an embodiment of the present invention further provides a search term recommendation device, as shown in fig. 8, fig. 8 is a schematic structural diagram of the search term recommendation device according to the embodiment of the present invention, where the search term recommendation device 80 includes: an obtaining module 801, configured to obtain a search term; a segmentation module 802, configured to perform segmentation processing on the search term to obtain a plurality of segmentation elements; the query module 803 is configured to query, according to the preset types to which the multiple segmentation elements belong, the identification information of the multiple segmentation elements in the identification mapping table corresponding to the preset types, respectively; a matching module 804, configured to perform matching in a recommendation table according to the identification information of each of the multiple segmentation elements, so as to obtain a candidate recommended word; and the recommending module 805 is configured to rank the candidate recommended words according to the word frequency information to obtain a ranking result, so that recommendation is performed according to the ranking result.
In some embodiments, the matching module 804 is further configured to perform matching in a chinese recommendation table according to the identification information of each of the multiple segmentation elements, so as to obtain a candidate chinese recommended word; the candidate recommended words comprise the Chinese candidate recommended words, the recommended table comprises the Chinese recommended table, the Chinese recommended table is used for storing a mapping relation between the recommended words and identification information and pinyin first letters of pinyin of each word in the recommended words, and the identification information comprises first identification information corresponding to each word in the recommended words and second identification information corresponding to the pinyin of each word in the recommended words; the recommending module 805 is further configured to rank the chinese candidate recommended words if the number of the chinese candidate recommended words is a positive integer, so as to obtain the ranking result.
In some embodiments, the segmentation module 802 is further configured to, if the number of the candidate chinese recommended words is zero, perform single letter segmentation processing on the multiple segmentation elements to obtain multiple letters; the matching module 804 is further used for matching the plurality of letters in the English recommendation table by the user to obtain English candidate recommended words; the candidate recommended words further comprise the English candidate recommended words, the recommendation table further comprises the English recommendation table, and the English recommendation table is used for storing each letter of the recommended words; the recommending module 805 is further configured to rank the english candidate recommended word to obtain the ranking result.
In some embodiments, the matching module 804 is further configured to perform matching in the chinese recommendation table by using the identification information of the first sliced element as a prefix according to the identification information of each of the multiple sliced elements, so as to obtain an original candidate chinese recommended word; according to the position sequence of the identification information, sliding the position of one or more identification information backwards, and then matching in the Chinese recommendation table to obtain a supplementary Chinese candidate recommended word; wherein the candidate Chinese recommended words include the original candidate Chinese recommended word and the supplemental candidate Chinese recommended word.
In some embodiments, the preset type includes a word, a pinyin, and a first letter of a pinyin, the identification mapping table includes a word table and a pinyin table, the identification information includes first identification information, second identification information, and third identification information, and the query module 803 is further configured to query, if a segmentation element of which the preset type is a word exists in the plurality of segmentation elements, the first identification information of the segmentation element in the word table; if the preset type of the segmentation elements is pinyin segmentation elements, inquiring second identification information of the segmentation elements in the pinyin table; if the segmentation element with the preset type as the pinyin initial exists in the segmentation elements, the third identification information of the segmentation element is the pinyin initial itself.
In some embodiments, the search word recommending apparatus 80 further includes a generating module, and the generating module is further configured to obtain a preset word bank and a user history query word bank; searching a word bank according to the user history, updating recommended words in the preset word bank and word frequency information of the recommended words, and generating a recommended word bank; and generating the recommendation table according to the recommendation word bank and the identification mapping table.
In some embodiments, the identifier mapping table includes a word table and a spelling table, the word library includes a chinese word library and an english word library, the recommendation table includes a chinese recommendation table and an english recommendation table, and the generating module is further configured to generate the chinese recommendation table according to a first preset data structure according to the chinese word library, the word table, and the spelling table, where the first preset data structure includes: the method comprises the following steps that first identification information corresponding to each character in a recommended word, second identification information corresponding to the pinyin of each character in the recommended word, and the pinyin first letter of the pinyin of each character in the recommended word; generating the English recommendation table according to the English recommendation word bank and a second preset data structure, wherein the second preset data structure comprises: each letter in the recommended word.
It should be noted that, when the search term recommendation apparatus provided in the foregoing embodiment performs search term recommendation, the division of each program module is merely used as an example, and in practical applications, the processing may be distributed to different program modules according to needs, that is, the internal structure of the apparatus may be divided into different program modules to complete all or part of the processing described above. In addition, the search term recommendation device and the search term recommendation method provided by the above embodiments belong to the same concept, and specific implementation processes and beneficial effects thereof are detailed in the method embodiments and are not described herein again. For technical details not disclosed in the embodiments of the apparatus, reference is made to the description of the embodiments of the method of the invention for understanding.
In this embodiment of the present invention, fig. 9 is a schematic structural diagram of a search term recommendation device according to an embodiment of the present invention, and as shown in fig. 9, a search term recommendation device 90 according to an embodiment of the present invention may include a processor 901 and a memory 902, where the memory 902 stores a computer program that is executable on the processor 901, and in some embodiments, the search term recommendation device 90 may further include a communication interface 903 and a bus 904 for connecting the processor 901, the memory 902 and the communication interface 903.
In an embodiment of the present invention, the Processor 901 may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a ProgRAMmable Logic Device (PLD), a Field ProgRAMmable Gate Array (FPGA), a Central Processing Unit (CPU), a controller, a microcontroller, and a microprocessor. It will be appreciated that the electronic devices used to implement the processor functions described above may be other devices, and embodiments of the present invention are not limited in particular.
In an embodiment of the present invention, a memory 902 may be connected to the processor 901, wherein the memory 902 is used for storing executable program codes and data, the program codes comprising computer operation instructions, and the memory 902 may comprise a high-speed RAM memory and may further comprise a non-volatile memory, for example, at least two disk memories.
In an embodiment of the present invention, a bus 904 is used for connecting the communication interface 903, the processor 901, and the memory 902, and mutual communication among these devices.
In this embodiment of the present invention, the processor 901 is configured to obtain a search term; carrying out segmentation processing on the search terms to obtain a plurality of segmentation elements; respectively inquiring identification information of the plurality of segmentation elements in an identification mapping table corresponding to a preset type according to the preset type to which the plurality of segmentation elements belong; matching in a recommendation table according to the identification information of the segmentation elements to obtain candidate recommended words; and sequencing the candidate recommended words according to the word frequency information to obtain a sequencing result, so that recommendation is performed according to the sequencing result.
In practical applications, the Memory 902 may be a volatile Memory (volatile Memory), such as a Random-Access Memory (RAM); or a non-volatile Memory (non-volatile Memory), such as a Read-Only Memory (ROM), a flash Memory (flash Memory), a Hard Disk (Hard Disk Drive, HDD) or a Solid-State Drive (SSD); or a combination of the above types of memories and provides instructions and data to the processor 901.
In addition, each functional module in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.
Based on the understanding that the technical solution of the present embodiment essentially or a part contributing to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium, and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
An embodiment of the present invention provides a computer-readable storage medium, on which a program is stored, which, when executed by a processor, implements a search term recommendation method as described in any of the above embodiments.
For example, the program instructions corresponding to a search term recommendation method in this embodiment may be stored in a storage medium such as an optical disc, a hard disc, or a usb disk, and when the program instructions corresponding to a search term recommendation method in the storage medium are read or executed by an electronic device, the search term recommendation method in any of the above embodiments may be implemented.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of implementations of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart block or blocks and/or flowchart block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks in the flowchart and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims (10)

1. A search term recommendation method, the method comprising:
acquiring a search word;
carrying out segmentation processing on the search terms to obtain a plurality of segmentation elements;
respectively inquiring identification information of the plurality of segmentation elements in an identification mapping table corresponding to a preset type according to the preset type to which the plurality of segmentation elements belong;
matching in a recommendation table according to the identification information of the segmentation elements to obtain candidate recommended words;
and sequencing the candidate recommended words according to the word frequency information to obtain a sequencing result, so that recommendation is performed according to the sequencing result.
2. The method according to claim 1, wherein the matching in the recommendation table according to the identification information of each of the plurality of segmentation elements to obtain candidate recommended words, and the ranking of the candidate recommended words according to the word frequency information to obtain a ranking result comprises:
matching in a Chinese recommendation table according to the identification information of the segmentation elements to obtain Chinese candidate recommended words;
the candidate recommended words comprise the Chinese candidate recommended words, the recommended table comprises the Chinese recommended table, the Chinese recommended table is used for storing a mapping relation between the recommended words and identification information and pinyin first letters of pinyin of each word in the recommended words, and the identification information comprises first identification information corresponding to each word in the recommended words and second identification information corresponding to the pinyin of each word in the recommended words;
and if the number of the Chinese candidate recommended words is a positive integer, sequencing the Chinese candidate recommended words to obtain a sequencing result.
3. The method of claim 2, further comprising:
if the number of the Chinese candidate recommended words is zero, performing single letter segmentation processing on the multiple segmentation elements to obtain multiple letters;
matching in an English recommendation table according to the letters to obtain English candidate recommended words; the candidate recommended words further comprise the English candidate recommended words, the recommendation table further comprises the English recommendation table, and the English recommendation table is used for storing each letter of the recommended words;
and sequencing the English candidate recommended words to obtain the sequencing result.
4. The method according to claim 2, wherein the matching in the chinese recommendation table according to the identification information of each of the plurality of sliced elements to obtain the candidate chinese recommended word comprises:
according to the identification information of each of the segmentation elements, matching is carried out in the Chinese recommendation table by taking the identification information of the first segmentation element as a prefix, so as to obtain an original Chinese candidate recommended word;
according to the position sequence of the identification information, sliding the position of one or more identification information backwards, and then matching in the Chinese recommendation table to obtain a supplementary Chinese candidate recommended word;
wherein the candidate Chinese recommended words include the original candidate Chinese recommended word and the supplemental candidate Chinese recommended word.
5. The method of claim 1, wherein the preset types include a word, a pinyin, and a first pinyin letter, the tag mapping table includes a word table and a pinyin table, the tag information includes first tag information, second tag information, and third tag information, and the querying, according to the preset types to which the multiple split elements belong, for the tag information of each of the multiple split elements in the tag mapping table corresponding to the preset type includes:
if the preset type of the segmentation elements is the segmentation elements of the characters, inquiring first identification information of the segmentation elements in the character table;
if the preset type of the segmentation elements is pinyin segmentation elements, inquiring second identification information of the segmentation elements in the pinyin table;
if the segmentation element with the preset type as the pinyin initial exists in the segmentation elements, the third identification information of the segmentation element is the pinyin initial itself.
6. The method according to any one of claims 1-5, further comprising:
acquiring a preset word bank and a user historical query word bank;
searching a word bank according to the user history, updating recommended words in the preset word bank and word frequency information of the recommended words, and generating a recommended word bank;
and generating the recommendation table according to the recommendation word bank and the identification mapping table.
7. The method of claim 6, wherein the tag mapping table comprises a word table and a phonetic table, the word library comprises a Chinese word library and an English word library, the recommendation table comprises a Chinese recommendation table and an English recommendation table, and the generating the recommendation table according to the word library and the tag mapping table comprises:
generating the Chinese recommendation table according to the Chinese recommendation word stock, the word list and the spelling list and a first preset data structure, wherein the first preset data structure comprises: the method comprises the following steps that first identification information corresponding to each character in a recommended word, second identification information corresponding to the pinyin of each character in the recommended word, and the pinyin first letter of the pinyin of each character in the recommended word;
generating the English recommendation table according to the English recommendation word bank and a second preset data structure, wherein the second preset data structure comprises: each letter in the recommended word.
8. An apparatus for recommending search terms, the apparatus comprising:
the acquisition module is used for acquiring search terms;
the segmentation module is used for carrying out segmentation processing on the search terms to obtain a plurality of segmentation elements;
the query module is used for respectively querying the respective identification information of the plurality of segmentation elements in the identification mapping table corresponding to the preset type according to the preset type to which the plurality of segmentation elements belong;
the matching module is used for matching in a recommendation table according to the identification information of each of the plurality of segmentation elements to obtain candidate recommended words;
and the recommending module is used for sequencing the candidate recommended words according to the word frequency information to obtain a sequencing result, so that recommendation is carried out according to the sequencing result.
9. A search term recommendation device, characterized in that the device comprises a memory and a processor, the memory storing a computer program being executable on the processor, the processor implementing the steps in the method of any of claims 1-7 when executing the program.
10. A computer-readable storage medium having stored thereon executable instructions for, when executed by a processor, implementing the method of any one of claims 1-7.
CN202111264694.6A 2021-10-28 2021-10-28 Search term recommendation method, device, equipment and computer-readable storage medium Pending CN114036371A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111264694.6A CN114036371A (en) 2021-10-28 2021-10-28 Search term recommendation method, device, equipment and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111264694.6A CN114036371A (en) 2021-10-28 2021-10-28 Search term recommendation method, device, equipment and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN114036371A true CN114036371A (en) 2022-02-11

Family

ID=80142243

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111264694.6A Pending CN114036371A (en) 2021-10-28 2021-10-28 Search term recommendation method, device, equipment and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN114036371A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117708308A (en) * 2024-02-06 2024-03-15 四川蓉城蕾茗科技有限公司 RAG natural language intelligent knowledge base management method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117708308A (en) * 2024-02-06 2024-03-15 四川蓉城蕾茗科技有限公司 RAG natural language intelligent knowledge base management method and system
CN117708308B (en) * 2024-02-06 2024-05-14 四川蓉城蕾茗科技有限公司 RAG natural language intelligent knowledge base management method and system

Similar Documents

Publication Publication Date Title
US9424351B2 (en) Hybrid-distribution model for search engine indexes
CN101978348B (en) Manage the archives about approximate string matching
TWI480746B (en) Enabling faster full-text searching using a structured data store
US8316292B1 (en) Identifying multiple versions of documents
US7756859B2 (en) Multi-segment string search
EP1826692A2 (en) Query correction using indexed content on a desktop indexer program.
CN110059163B (en) Method and device for generating template, electronic equipment and computer readable medium
CN106557777B (en) One kind being based on the improved Kmeans document clustering method of SimHash
CN105589894B (en) Document index establishing method and device and document retrieval method and device
US8090722B2 (en) Searching related documents
CN103733193A (en) Statistical spell checker
EP3926484B1 (en) Improved fuzzy search using field-level deletion neighborhoods
US7222129B2 (en) Database retrieval apparatus, retrieval method, storage medium, and program
CN105224624A (en) A kind of method and apparatus realizing down the quick merger of row chain
US20080270396A1 (en) Indexing versioned document sequences
JP2669601B2 (en) Information retrieval method and system
CN105404677A (en) Tree structure based retrieval method
CN114036371A (en) Search term recommendation method, device, equipment and computer-readable storage medium
CN108595437B (en) Text query error correction method and device, computer equipment and storage medium
CN114297143A (en) File searching method, file displaying device and mobile terminal
CN105426490A (en) Tree structure based indexing method
CN114003685B (en) Word segmentation position index construction method and device, and document retrieval method and device
EP1808781A2 (en) Evaluation of name prefix and suffix during a search
KR101694179B1 (en) Method and apparatus for indexing based on removing vowel
CN112182283A (en) Song searching method, device, network equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination