WO2021227059A1 - 一种基于多叉树的搜索词推荐方法及系统 - Google Patents

一种基于多叉树的搜索词推荐方法及系统 Download PDF

Info

Publication number
WO2021227059A1
WO2021227059A1 PCT/CN2020/090647 CN2020090647W WO2021227059A1 WO 2021227059 A1 WO2021227059 A1 WO 2021227059A1 CN 2020090647 W CN2020090647 W CN 2020090647W WO 2021227059 A1 WO2021227059 A1 WO 2021227059A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
hot
node
text
chain
Prior art date
Application number
PCT/CN2020/090647
Other languages
English (en)
French (fr)
Inventor
商良磊
Original Assignee
深圳市世强元件网络有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市世强元件网络有限公司 filed Critical 深圳市世强元件网络有限公司
Priority to PCT/CN2020/090647 priority Critical patent/WO2021227059A1/zh
Priority to US17/467,268 priority patent/US11947608B2/en
Publication of WO2021227059A1 publication Critical patent/WO2021227059A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3322Query formulation using system suggestions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/274Converting codes to words; Guess-ahead of partial word inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/048Fuzzy inferencing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Definitions

  • the invention relates to the field of search engine search term recommendation, and more specifically, to a search term recommendation method and system based on a multi-branch tree.
  • right fuzzy matching has a strict quantitative limit. For example, when more than 100W rows of data are used, a large number of right fuzzy matching will cause computer disk reading bottlenecks and occupy the server's IO storage. Instantly upgrading the SSD solid-state drive can not solve the root cause problem. The expensive SSD solid-state upgrade can only increase the response speed by less than 20%, and it will still fill up the disk read and write IO when the amount of access is large.
  • the technical problem to be solved by the present invention is to provide a search term recommendation method and system based on a multi-branch tree in view of the above-mentioned defects of the prior art.
  • the technical solution adopted by the present invention to solve its technical problems is: constructing a search term recommendation method based on a multi-branch tree, including:
  • A. Multi-tree word chain data generation process split each hot word into multiple separate words, and generate a word chain according to the order of the words in the hot words from front to back, and each word is the word chain in the word chain.
  • One node uses nodes corresponding to the same text in different word chains as common nodes to generate the polytree word chain data;
  • Search word recommendation process search the polytree word chain data according to the currently input text, and use the word chain matching the input text as the recommended word.
  • the searching for the multi-tree word chain data according to the currently input text includes:
  • the continuing to read the remaining nodes of the word chain where all the nodes corresponding to the input text are located include:
  • the use of a word chain matching the input text as a recommendation word includes:
  • the word chain with the least number of nodes among all word chains where all the nodes corresponding to the input text are located is used as the recommended word.
  • the term chain with the least number of nodes among all the term chains where all the nodes corresponding to the input text are located as the recommended term includes:
  • the words corresponding to all the nodes in the word chain with the least number of nodes in all word chains where all the nodes corresponding to the input text are located are combined into recommended words in a matching order from front to back.
  • the multi-tree word chain data in the process of generating the multi-tree word chain data: according to the attribute information of the hot word, the multi-tree word chain data is divided into multiple A polytree word chain data;
  • the searching for the polytree word chain data according to the currently inputted text includes: selecting the corresponding polytree word chain sub-data according to the attribute information of the first text of the inputted text, and searching for the selected one according to the currently inputted text The polytree word chain data.
  • search term recommendation method based on a multi-branch tree of the present invention further includes:
  • Multi-tree word chain data update process split each update hot word into multiple individual words, and generate an update word chain according to the order of the words in the update hot words from front to back, and each word is all State a node in the update word chain, and perform the same node fusion between each node in the update word chain and the existing multi-tree word chain data, and update the multi-tree word chain data.
  • the corresponding updated word chain is updated to the corresponding attribute polytree word chain sub-data according to the attribute information of the updated hot word.
  • the text is a Chinese character
  • the attribute information is the order of the first letter of the Chinese pinyin corresponding to the text.
  • the text is one or more of Chinese characters, foreign language words, and Arabic numerals.
  • the present invention also provides a search term recommendation system based on a multi-branch tree, including a search engine hot word recommendation module, a hot word query server, a hot word loading server and multiple hot word data sources.
  • the search engine hot word recommendation module Installed on the search engine of the user terminal, the hot word query server includes a user gateway proxy service module and a hot word node module;
  • the search engine hot word recommendation module is communicatively connected to the user gateway proxy service module, the user gateway proxy service module is communicatively connected to the hot word node module, and the hot word node module is communicatively connected to the hot word loading server;
  • the said hot word loading server is communicatively connected to each said hot word data source;
  • the hot word loading server receives all hot words sent by the hot word data source, and sends them to the hot word node module for storage, and the hot word node module splits each hot word into multiple individual words according to The text is sorted from front to back in the hot words to generate a word chain, each word is a node in the word chain, and nodes corresponding to the same word in different word chains are used as common nodes to generate polytree word chain data;
  • the search engine hot word recommendation module is configured to receive text input by the user, and send the input text to the user gateway proxy service module, and the user gateway proxy service module transmits the input text to the hot word node Module, the hot word node module searches the polytree word chain data according to the input text, and uses the word chain matching the input text as a recommended word;
  • the hot word node module sends the recommended words obtained by the query to the search engine of the user terminal for display.
  • the hot word node module searching for the data of the multi-tree word chain according to the input text includes:
  • Extract the first text of the entered text search for nodes matching the first text in the polytree word chain data; the remaining texts of the entered text match the first text matching node one by one in order After all the matching of the input text is completed, continue to read the remaining nodes of the word chain where all the nodes corresponding to the input text are located.
  • the remaining nodes of the word chain where all the nodes corresponding to the entered text are continued to be read in the hot word node module include:
  • the word chain matching the input text in the hot word node module as a recommendation word includes:
  • the word chain with the least number of nodes among all word chains where all the nodes corresponding to the input text are located is used as the recommended word.
  • the word chain with the least number of nodes among all the word chains where all the nodes corresponding to the input text are located is taken as Suggested words include:
  • the words corresponding to all the nodes in the word chain with the least number of nodes in all word chains where all the nodes corresponding to the input text are located are combined into recommended words in a matching order from front to back.
  • the hot word node module includes a plurality of hot word node sub-modules, and the hot word loading server calculates all the hot words according to the attribute information of the hot words.
  • the multi-tree word chain data is divided into multiple multi-tree word chain sub-data, and each of the hot word node sub-modules corresponds to one of the multi-tree word chain sub-data;
  • the user gateway proxy service module selects the corresponding hot word node submodule according to the attribute information of the first text of the inputted text, and the hot word node submodule searches for the selected polytree according to the currently inputted text Word chain data.
  • the hot word loading server receives updated hot words sent by multiple hot word data sources, and splits each updated hot word into multiple hot words.
  • a single text according to the text in the updated hot words from front to back to generate an update word chain, each word is a node in the update word chain, and each node in the update word chain is compared with the existing multiple
  • the cross-tree word chain data is fused with the same node, and the poly-tree word chain data is updated.
  • the hot word node module includes multiple hot word node sub-modules
  • the corresponding updated word is changed according to the attribute information of the updated hot word
  • the chain is updated to the sub-data of the polytree word chain of the corresponding attribute.
  • the text is a Chinese character
  • the attribute information is the order of the first letter of the Chinese pinyin corresponding to the text.
  • the text is one or more of Chinese characters, foreign language words, and Arabic numerals.
  • the multi-tree based search term recommendation method and system implemented in the present invention has the following beneficial effects: the present invention optimizes the way in the prior art that takes 1000ms to less than 1ms through the construction of the multi-tree algorithm, and Reduce machine costs, from expensive high-performance servers and expensive database software investment to ordinary machine horizontal expansion investment, and the investment can be selected according to the amount of data usage.
  • FIG. 1 is a schematic structural diagram of a search term recommendation system based on a multi-branch tree provided by Embodiment 1;
  • FIG. 2 is a schematic structural diagram of a search term recommendation system based on a multi-branch tree provided by Embodiment 1;
  • FIG. 3 is a schematic diagram of the structure of the polytree word chain data provided in Embodiment 1 and Embodiment 2.
  • the multi-tree based search term recommendation system of this embodiment includes but is not limited to Chinese characters, foreign language words, Arabic numerals, etc.
  • the foreign language words can be English words, French words, and German. Words, Spanish words, etc., texts in all languages can use the search term recommendation system based on the multi-branch tree of this embodiment.
  • the search term recommendation system based on the multi-branch tree of this embodiment includes a search engine hot word recommendation module, a hot word query server, a hot word loading server, and multiple hot word data sources.
  • the search engine hot word recommendation module is installed in the search of a user terminal.
  • the hot word query server includes a user gateway proxy service module and a hot word node module.
  • the search engine hot word recommendation module is communicatively connected to the user gateway proxy service module, the user gateway proxy service module is communicatively connected to the hot word node module, and the hot word node module is communicatively connected to the hot word loading server; the hot word loading server is communicatively connected to each hot word data source.
  • the hot word loading server receives the hot words sent by all hot word data sources, and sends them to the hot word node module for storage.
  • the hot word node module splits each hot word into multiple individual words, which are in the hot words from front to back according to the words
  • a word chain is generated by sorting, and each word is a node in the word chain, and nodes corresponding to the same word in different word chains are used as common nodes to generate multi-tree word chain data.
  • the multi-tree word chain data in Figure 3 includes: "Chinese people”, “Chinese nation”, “people and sea”, “Long live the Chinese people", “Good luck”, “Guotai and Min'an”, among which,
  • the word chain corresponding to the hot word “Chinese people” contains the 4 nodes of the Chinese people “ ⁇ ”, “ ⁇ ”, “ ⁇ ”, and “ ⁇ ”, and the node corresponding to the word " ⁇ ” is the end node of the word chain;
  • the word chain corresponding to the hot word “Chinese nation” contains 4 nodes: “ ⁇ ”, “ ⁇ ”, “ ⁇ ”, and “ ⁇ ”, and the node corresponding to the word " ⁇ " is the end node of the word chain;
  • the word chain corresponding to the hot word “ ⁇ ” contains 4 nodes of " ⁇ ", “ ⁇ ", “ ⁇ ”, and “ ⁇ ”, and the node corresponding to the word " ⁇ " is the end node of the word chain;
  • the word chain corresponding to the hot word “Long live the Chinese people” contains 6 nodes of " ⁇ ”, “ ⁇ ”, “ ⁇ ”, “ ⁇ ”, “ ⁇ ”, and “sui", and the node corresponding to the word “sui” is the chain The end node of;
  • the word chain corresponding to the hot word " ⁇ ” contains 4 nodes: “ ⁇ ”, “ ⁇ ”, “ ⁇ ” and “ ⁇ ”, and the node corresponding to the word " ⁇ ” is the end node of the word chain, and the end node is in the multi-fork
  • the tree word chain data needs to be marked.
  • the word chain corresponding to the hot word "Guotai Min'an” contains 4 nodes: " ⁇ ”, “ ⁇ ”, “ ⁇ ” and “ ⁇ ”, and the node corresponding to the word " ⁇ ” is the end node of the word chain, and the end node is in the polytree
  • the word chain data needs to be marked.
  • node "Hua”, the node “Ren”, the node “Min” and the node “Wan” are public nodes.
  • the public nodes need to be marked in the polytree word chain data, and the node “Min” is both the end node and the public node.
  • the search engine hot word recommendation module is used to receive the text entered by the user and send the entered text to the user gateway proxy service module.
  • the user gateway proxy service module transmits the entered text to the hot word node module, and the hot word node module according to the entered text Search for polytree word chain data, and use word chains that match the input text as recommended words.
  • the hot word node module sends the recommended words obtained by the query to the search engine of the user terminal for display. For example, after the user enters the two characters "China", traversing the word chain data of the polytree in the above embodiment can obtain the word chains matching "China", “Chinese nation", “Long live the Chinese people” and “Chinese people”.
  • the hot word node module in the search term recommendation system based on the multi-branch tree of this embodiment searches for the multi-tree word chain data according to the input text includes: extracting the first text of the entered text, and searching in the multi-tree word chain data The node that matches the first text; the remaining text of the entered text matches the branch nodes of the first text matching node one by one; after all the entered texts are matched, continue to read the rest of the word chain where all the nodes corresponding to the entered text are located node.
  • the word “ ⁇ ” is matched
  • the branch corresponding to the "middle” byte point is matched to obtain the node corresponding to the word " ⁇ ”.
  • continuing to read the remaining nodes of the word chain where all the nodes corresponding to the entered text are located includes: continuing to read the locations of all the nodes corresponding to the entered text The remaining nodes of the word chain until the end node is encountered, the end node is the node corresponding to the end text of each hot word in the process of generating the polytree word chain data.
  • the end node of the word chain “Chinese nation” is the node corresponding to the word “ethnic”
  • the end node of the word chain “Long live the Chinese people” is the node corresponding to the word “sui”
  • the word chain "China The end node corresponding to "people” is the node corresponding to the word " ⁇ ”.
  • using the word chain matching the entered text as the recommended word includes: taking all the nodes corresponding to the entered text as the least number of nodes in all word chains
  • the word chain is used as the recommended word.
  • the recommended word corresponding to the word chain " ⁇ ”, “Hua”, “ ⁇ ” and “ ⁇ ” is "Chinese nation”; the word chain " ⁇ ”, “Tai”, “ ⁇ ”, “ ⁇ ” corresponds to the recommended word "Guotai Min'an” .
  • the word chain with the least number of nodes among all the word chains where all the nodes corresponding to the entered text are located as the recommended word includes: corresponding to the entered text
  • the words corresponding to all nodes in the word chain with the least number of nodes in all word chains where all the nodes are located are combined into recommended words in a matching order from front to back.
  • the word chain where the two nodes of " ⁇ " and " ⁇ " are located includes three "Chinese nation", “Long live the Chinese people” and "Chinese people”, among which "Chinese nation” and “Chinese people” are nodes.
  • "Chinese nation” and “Chinese people” are recommended words.
  • the hot word node module in the search term recommendation system based on the multi-branch tree of this embodiment includes a plurality of hot word node sub-modules, and the hot word loading server divides the multi-tree word chain data into Multiple polytree word chain sub-data, and each hot word node sub-module corresponds to one poly-tree word chain sub-data.
  • the user gateway proxy service module selects the corresponding hot word node sub-module according to the attribute information of the first text of the inputted text, and the hot word node sub-module searches for the selected multi-tree word chain sub-data according to the currently inputted text.
  • the number of hot word node sub-modules can be set according to needs, and the number of hot words in the search thesaurus can be increased by expanding the number of hot word node sub-modules.
  • the hot word loading server and the user gateway proxy service module use the same attribute information, that is, the hot word loading server and the user gateway proxy service module use the same text distribution algorithm.
  • the hot word loading server in the search term recommendation system receives updated hot words sent by multiple hot word data sources, and splits each updated hot word into multiple separate words, according to the text Sorting from front to back in the updated hot words generates an update word chain, each text is a node in the update word chain, and each node in the update word chain is merged with the existing polytree word chain data with the same node. Update the polytree word chain data.
  • the corresponding updated word chain is updated to the corresponding attribute poly tree according to the attribute information of the updated hot word The word chain subdata.
  • the text is a Chinese character
  • the attribute information is the order of the first letter of the Chinese pinyin corresponding to the text. It can be understood that the rules of characters in different languages are different, and the attribute information of the characters can be determined according to the ordering rules of the characters in each language, and the attribute information of the hot words can be updated.
  • This embodiment uses a multi-tree algorithm construction method to optimize the 1000ms in the prior art to be less than 1ms, and reduces machine costs.
  • the investment in expensive high-performance servers and expensive database software is transformed into the investment in horizontal expansion of ordinary machines. , And the investment is optional according to the amount of data used.
  • the Chinese characters include but are not limited to Chinese characters, foreign language words, Arabic numerals, etc., wherein the foreign language words can be English words, French words, German words, Spanish words, etc., in all languages
  • the text can use the search term recommendation method based on the multi-branch tree of this embodiment.
  • the search term recommendation method based on a multi-branch tree of this embodiment includes the following steps:
  • A. Multi-tree word chain data generation process split each hot word into multiple separate words, and generate a word chain according to the order of the words in the hot words from front to back, and each word is a node in the word chain ,
  • the nodes corresponding to the same text in different word chains are regarded as common nodes to generate polytree word chain data.
  • the multi-tree word chain data in Figure 3 includes: "Chinese people”, “Chinese nation”, “people and sea”, “Long live the Chinese people”, “Good luck”, “Guotai and Min'an”, among which,
  • the word chain corresponding to the hot word “Chinese people” contains the 4 nodes of the Chinese people “ ⁇ ”, “ ⁇ ”, “ ⁇ ”, and “ ⁇ ”, and the node corresponding to the word " ⁇ ” is the end node of the word chain;
  • the word chain corresponding to the hot word “Chinese nation” contains 4 nodes: “ ⁇ ”, “ ⁇ ”, “ ⁇ ”, and “ ⁇ ”, and the node corresponding to the word " ⁇ " is the end node of the word chain;
  • the word chain corresponding to the hot word “ ⁇ ” contains 4 nodes of " ⁇ ", “ ⁇ ", “ ⁇ ”, and “ ⁇ ”, and the node corresponding to the word " ⁇ " is the end node of the word chain;
  • the word chain corresponding to the hot word “Long live the Chinese people” contains 6 nodes of " ⁇ ”, “ ⁇ ”, “ ⁇ ”, “ ⁇ ”, “ ⁇ ”, and “sui", and the node corresponding to the word “sui” is the chain The end node of;
  • the word chain corresponding to the hot word " ⁇ ” contains 4 nodes: “ ⁇ ”, “ ⁇ ”, “ ⁇ ” and “ ⁇ ”, and the node corresponding to the word " ⁇ ” is the end node of the word chain, and the end node is in the multi-fork
  • the tree word chain data needs to be marked.
  • the word chain corresponding to the hot word "Guotai Min'an” contains 4 nodes: " ⁇ ”, “ ⁇ ”, “ ⁇ ” and “ ⁇ ”, and the node corresponding to the word " ⁇ ” is the end node of the word chain, and the end node is in the polytree
  • the word chain data needs to be marked.
  • node "Hua”, the node “Ren”, the node “Min” and the node “Wan” are public nodes.
  • the public nodes need to be marked in the polytree word chain data, and the node “Min” is both the end node and the public node.
  • Search term recommendation process Find the multi-tree word chain data according to the current input text, and use the word chain matching the input text as the recommended word.
  • searching for the data of the multi-tree word chain according to the currently input text includes:
  • continuing to read the remaining nodes of the word chain where all the nodes corresponding to the entered text are located includes: continuing to read the remaining nodes of the word chain where all the nodes corresponding to the entered text are located Until the end node is encountered, the end node is the node corresponding to the end text of each hot word in the process of generating the polytree word chain data.
  • using the word chain matching the input text as the recommended word includes: taking the word chain with the least number of nodes among all the word chains where all the nodes corresponding to the input text are located. Recommended words.
  • the term chain with the least number of nodes among all the word chains where all the nodes corresponding to the input text are located is used as the recommended word, including: all the nodes corresponding to the input text are located at all
  • the words corresponding to all the nodes in the word chain with the least number of nodes in the word chain are combined into recommended words according to the matching order from front to back.
  • search term recommendation method based on the multi-tree divide the multi-tree word chain data into multiple multi-tree word chains according to the attribute information of the hot words data.
  • Searching for the polytree word chain data according to the currently entered text includes: selecting the corresponding polytree word chain data according to the attribute information of the first text of the entered text, and searching for the selected polytree word chain according to the currently entered text data.
  • the search term recommendation method based on a multi-branch tree of this embodiment further includes:
  • Multi-tree word chain data update process split each update hot word into multiple separate words, and generate an update word chain according to the order of the words in the update hot words from front to back, and each word is an update word chain A node in the updated word chain and the existing multi-tree word chain data are merged with the same node to update the multi-tree word chain data.
  • the multi-tree-based search term recommendation method of this embodiment in the multi-tree word chain data update process: if the hot word node module includes multiple hot word node sub-modules, the corresponding update is performed according to the attribute information of the updated hot word The word chain is updated to the polytree word chain sub-data of the corresponding attribute.
  • the text is a Chinese character
  • the attribute information is the order of the first letter of the Chinese pinyin corresponding to the text.
  • This embodiment uses a multi-tree algorithm construction method to optimize the 1000ms in the prior art to be less than 1ms, and reduces machine costs.
  • the investment in expensive high-performance servers and expensive database software is transformed into the investment in horizontal expansion of ordinary machines. , And the investment is optional according to the amount of data used.
  • the steps of the method or algorithm described in the embodiments disclosed in this document can be directly implemented by hardware, a software module executed by a processor, or a combination of the two.
  • the software module can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or all areas in the technical field. Any other known storage media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Automation & Control Theory (AREA)
  • Fuzzy Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于多叉树的搜索词推荐方法及系统。该方法包括:A、多叉树词链数据生成过程:将每个热词拆分为多个单独文字,按照文字在热词中从前到后的排序生成一个词链,每个文字为词链中的一个节点,将不同词链中相同文字对应的节点作为公用节点,生成多叉树词链数据;B、搜索词推荐过程:根据当前已输入文字查找多叉树词链数据,将与已输入文字匹配的词链作为推荐词。该方法通过多叉树算法搭建方式,将现有技术中的耗时1000ms优化到1ms以下,并降低机器成本,由昂贵的高性能服务器和昂贵数据库软件投入转变为普通机器横向扩展投入,并且投入按照数据使用量可选。

Description

一种基于多叉树的搜索词推荐方法及系统 技术领域
本发明涉及搜索引擎搜索词推荐领域,更具体地说,涉及一种基于多叉树的搜索词推荐方法及系统。
背景技术
人们在使用搜索引擎时会输入待搜索词,搜索引擎会根据用户输入的文字显示多个推荐词,用户可进行选择,以减少用户文字输入量。现有技术使用右模糊匹配算法提供推荐词,用户在搜索引擎输入文字后,使用这些输入文字扫描磁盘,并返回推荐热词,例如排名前十的推荐热词。但这种右模糊匹配算法存在以下缺陷:右模糊匹配有严格的数量限制,如超过100W行数据的情况下,大量使用右模糊匹配会带来计算机磁盘读取瓶颈,占满服务器IO存储。即时升级SSD固态硬盘,也无法解决根源问题,昂贵的SSD固态升级也只能提升不到20%的响应速度,访问量大时仍然会占满磁盘读写IO。
技术问题
本发明要解决的技术问题在于,针对现有技术的上述缺陷,提供一种基于多叉树的搜索词推荐方法及系统。
技术解决方案
本发明解决其技术问题所采用的技术方案是:构造一种基于多叉树的搜索词推荐方法,包括:
A、多叉树词链数据生成过程:将每个热词拆分为多个单独文字,按照文字在热词中从前到后的排序生成一个词链,每个文字为所述词链中的一个节点,将不同词链中相同文字对应的节点作为公用节点,生成所述多叉树词链数据;
B、搜索词推荐过程:根据当前已输入文字查找所述多叉树词链数据,将与所述已输入文字匹配的词链作为推荐词。
进一步,在本发明所述的基于多叉树的搜索词推荐方法中,所述根据当前已输入文字查找所述多叉树词链数据包括:
b1、提取所述已输入文字的首个文字,查找所述多叉树词链数据中与所述首个文字匹配的节点;
b2、所述已输入文字的剩余文字依序逐一匹配所述首个文字匹配节点的分支节点;
b3、所述已输入文字全部匹配完成后,继续读取所述已输入文字对应的所有节点所在词链的剩余节点。
进一步,在本发明所述的基于多叉树的搜索词推荐方法中,所述继续读取所述已输入文字对应的所有节点所在词链的剩余节点包括:
继续读取所述已输入文字对应的所有节点所在词链的剩余节点直至遇到结束节点,所述结束节点为所述多叉树词链数据生成过程每个热词的结束文字对应的节点。
进一步,在本发明所述的基于多叉树的搜索词推荐方法中,所述将与所述已输入文字匹配的词链作为推荐词包括:
将所述已输入文字对应的所有节点所在所有词链中节点数量最少的词链作为推荐词。
进一步,在本发明所述的基于多叉树的搜索词推荐方法中,所述将所述已输入文字对应的所有节点所在所有词链中节点数量最少的词链作为推荐词包括:
将所述已输入文字对应的所有节点所在所有词链中节点数量最少的词链中所有节点对应的文字按照从前到后的匹配顺序组合为推荐词。
进一步,在本发明所述的基于多叉树的搜索词推荐方法中,在多叉树词链数据生成过程中:根据所述热词的属性信息将所述多叉树词链数据分为多个多叉树词链子数据;
所述根据当前已输入文字查找所述多叉树词链数据包括:根据所述已输入文字的首个文字的属性信息选择对应的多叉树词链子数据,根据当前已输入文字查找选定的所述多叉树词链子数据。
进一步,本发明所述的基于多叉树的搜索词推荐方法还包括:
C、多叉树词链数据更新过程:将每个更新热词拆分为多个单独文字,按照文字在所述更新热词中从前到后的排序生成一个更新词链,每个文字为所述更新词链中的一个节点,将所述更新词链中每个节点与现有多叉树词链数据进行相同节点融合,更新所述多叉树词链数据。
进一步,在本发明所述的基于多叉树的搜索词推荐方法中,在多叉树词链数据更新过程:
若所述热词节点模块包括多个热词节点子模块,则根据所述更新热词的属性信息将对应的更新词链更新至对应属性的多叉树词链子数据中。
进一步,在本发明所述的基于多叉树的搜索词推荐方法中,所述文字为汉字,所述属性信息为文字对应的汉语拼音的首字母的排序。
进一步,在本发明所述的基于多叉树的搜索词推荐方法中,所述文字为汉字、外语单词、阿拉伯数字中的一种或几种。
另外本发明还提供一种基于多叉树的搜索词推荐系统,包括搜索引擎热词推荐模块、热词查询服务器、热词加载服务器和多个热词数据源,所述搜索引擎热词推荐模块安装在用户终端的搜索引擎上,所述热词查询服务器包括用户网关代理服务模块和热词节点模块;
所述搜索引擎热词推荐模块通信连接所述用户网关代理服务模块,所述用户网关代理服务模块通信连接所述热词节点模块,所述热词节点模块通信连接所述热词加载服务器;所述热词加载服务器通信连接每个所述热词数据源;
所述热词加载服务器接收所有所述热词数据源发送的热词,并发送至所述热词节点模块存储,所述热词节点模块将每个热词拆分为多个单独文字,按照文字在热词中从前到后的排序生成一个词链,每个文字为所述词链中的一个节点,将不同词链中相同文字对应的节点作为公用节点,生成多叉树词链数据;
所述搜索引擎热词推荐模块用于接收用户输入的文字,将已输入文字发送至所述用户网关代理服务模块,所述用户网关代理服务模块将所述已输入文字传输至所述热词节点模块,所述热词节点模块根据所述已输入文字查找所述多叉树词链数据,将与所述已输入文字匹配的词链作为推荐词;
所述热词节点模块将查询得到的所述推荐词发送至所述用户终端的搜索引擎显示。
进一步,在本发明所述的基于多叉树的搜索词推荐系统中,所述热词节点模块根据所述已输入文字查找所述多叉树词链数据包括:
提取所述已输入文字的首个文字,查找所述多叉树词链数据中与所述首个文字匹配的节点;所述已输入文字的剩余文字依序逐一匹配所述首个文字匹配节点的分支节点;所述已输入文字全部匹配完成后,继续读取所述已输入文字对应的所有节点所在词链的剩余节点。
进一步,在本发明所述的基于多叉树的搜索词推荐系统中,所述热词节点模块中所述继续读取所述已输入文字对应的所有节点所在词链的剩余节点包括:
继续读取所述已输入文字对应的所有节点所在词链的剩余节点直至遇到结束节点,所述结束节点为所述多叉树词链数据生成过程每个热词的结束文字对应的节点。
进一步,在本发明所述的基于多叉树的搜索词推荐系统中,所述热词节点模块中所述将与所述已输入文字匹配的词链作为推荐词包括:
将所述已输入文字对应的所有节点所在所有词链中节点数量最少的词链作为推荐词。
进一步,在本发明所述的基于多叉树的搜索词推荐系统中,所述热词节点模块中所述将所述已输入文字对应的所有节点所在所有词链中节点数量最少的词链作为推荐词包括:
将所述已输入文字对应的所有节点所在所有词链中节点数量最少的词链中所有节点对应的文字按照从前到后的匹配顺序组合为推荐词。
进一步,在本发明所述的基于多叉树的搜索词推荐系统中,所述热词节点模块包括多个热词节点子模块,所述热词加载服务器根据所述热词的属性信息将所述多叉树词链数据分为多个多叉树词链子数据,每个所述热词节点子模块对应一个所述多叉树词链子数据;
所述用户网关代理服务模块根据所述已输入文字的首个文字的属性信息选择对应的热词节点子模块,所述热词节点子模块根据当前已输入文字查找选定的所述多叉树词链子数据。
进一步,在本发明所述的基于多叉树的搜索词推荐系统中,所述热词加载服务器接收多个所述热词数据源发送的更新热词,将每个更新热词拆分为多个单独文字,按照文字在更新热词中从前到后的排序生成一个更新词链,每个文字为所述更新词链中的一个节点,将所述更新词链中每个节点与现有多叉树词链数据进行相同节点融合,更新所述多叉树词链数据。
进一步,在本发明所述的基于多叉树的搜索词推荐系统中,若所述热词节点模块包括多个热词节点子模块,则根据所述更新热词的属性信息将对应的更新词链更新至对应属性的多叉树词链子数据中。
进一步,在本发明所述的基于多叉树的搜索词推荐系统中,所述文字为汉字,所述属性信息为文字对应的汉语拼音的首字母的排序。
进一步,在本发明所述的基于多叉树的搜索词推荐系统中,所述文字为汉字、外语单词、阿拉伯数字中的一种或几种。
有益效果
实施本发明的一种基于多叉树的搜索词推荐方法及系统,具有以下有益效果:本发明通过多叉树算法搭建方式,将现有技术中的耗时1000ms优化到1ms以下的方式,并降低机器成本,由昂贵的高性能服务器和昂贵数据库软件投入转变为普通机器横向扩展投入,并且投入按照数据使用量可选。
附图说明
下面将结合附图及实施例对本发明作进一步说明,附图中:
图1是实施例1提供的一种基于多叉树的搜索词推荐系统的结构示意图;
图2是实施例1提供的一种基于多叉树的搜索词推荐系统的结构示意图;
图3是实施例1和实施例2提供的多叉树词链数据的结构示意图。
本发明的最佳实施方式
为了对本发明的技术特征、目的和效果有更加清楚的理解,现对照附图详细说明本发明的具体实施方式。
实施例1
参考图1、图2和图3,本实施例的基于多叉树的搜索词推荐系统中文字包括但不限于汉字、外语单词、阿拉伯数字等,其中外语单词可为英语单词、法语单词、德语单词、西班牙语单词等,所有语言的文字都可使用本实施例的基于多叉树的搜索词推荐系统。
本实施例的基于多叉树的搜索词推荐系统包括搜索引擎热词推荐模块、热词查询服务器、热词加载服务器和多个热词数据源,搜索引擎热词推荐模块安装在用户终端的搜索引擎上,热词查询服务器包括用户网关代理服务模块和热词节点模块。搜索引擎热词推荐模块通信连接用户网关代理服务模块,用户网关代理服务模块通信连接热词节点模块,热词节点模块通信连接热词加载服务器;热词加载服务器通信连接每个热词数据源。
热词加载服务器接收所有热词数据源发送的热词,并发送至热词节点模块存储,热词节点模块将每个热词拆分为多个单独文字,按照文字在热词中从前到后的排序生成一个词链,每个文字为词链中的一个节点,将不同词链中相同文字对应的节点作为公用节点,生成多叉树词链数据。例如,图3中多叉树词链数据包括:“中华人民”、“中华民族”、“人山人海”、“中华人民万岁”、“万事如意”、“国泰民安”,其中,
热词“中华人民”对应的词链包含中华人民“中”、“华”、“人”、“民”4个节点,且“民”字对应节点为该词链的结束节点;
热词“中华民族”对应的词链包含“中”、“华”、“民”、“族”4个节点,且“族”字对应节点为该词链的结束节点;
热词“人山人海”对应的词链包含“人”、“山”、“人”、“海”4个节点,且“海”字对应节点为该词链的结束节点;
热词“中华人民万岁”对应的词链包含“中”、“华”、“人”、“民”、“万”、“岁”6个节点,且“岁”字对应节点为该词链的结束节点;
热词“万事如意”对应的词链包含“万”、“事”、“如”、“意”4个节点,且“意”字对应节点为该词链的结束节点,结束节点在多叉树词链数据中需进行标记。
热词“国泰民安”对应的词链包含“国”、“泰”、“民”、“安”4个节点,且“安”字对应节点为该词链的结束节点,结束节点在多叉树词链数据中需进行标记。
另外,节点“华”、节点“人”、节点“民”和节点“万”是公用节点,公用节点在多叉树词链数据中需进行标记,且节点“民”既是结束节点,也是公用节点。
搜索引擎热词推荐模块用于接收用户输入的文字,将已输入文字发送至用户网关代理服务模块,用户网关代理服务模块将已输入文字传输至热词节点模块,热词节点模块根据已输入文字查找多叉树词链数据,将与已输入文字匹配的词链作为推荐词。热词节点模块将查询得到的推荐词发送至用户终端的搜索引擎显示。例如,用户输入“中华”两个文字后,遍历上述实施例的多叉树词链数据可得到与“中华”匹配的词链有“中华民族”、“中华人民万岁”和“中华人民”。
进一步,本实施例的基于多叉树的搜索词推荐系统中热词节点模块根据已输入文字查找多叉树词链数据包括:提取已输入文字的首个文字,查找多叉树词链数据中与首个文字匹配的节点;已输入文字的剩余文字依序逐一匹配首个文字匹配节点的分支节点;已输入文字全部匹配完成后,继续读取已输入文字对应的所有节点所在词链的剩余节点。例如,用户输入“中华”两个文字后,首先提取首个文字“中”,在上述实施例的多叉树词链数据中查找到“中”字对应的节点;然后“华”字去匹配“中”字节点对应的分支,通过匹配得到“华”字对应的节点。确定“中”“华”两个节点后,继续读取已输入文字对应的所有节点所在词链的剩余节点,即确定“中”“华”两个节点所处的词链,本实施例中“中”“华”两个节点所处的词链包括三个“中华民族”、“中华人民万岁”和“中华人民”。
进一步,本实施例的基于多叉树的搜索词推荐系统中热词节点模块中继续读取已输入文字对应的所有节点所在词链的剩余节点包括:继续读取已输入文字对应的所有节点所在词链的剩余节点直至遇到结束节点,结束节点为多叉树词链数据生成过程每个热词的结束文字对应的节点。本实施例中三个词链中词链“中华民族”的结束节点为“族”字对应的节点,词链“中华人民万岁”的结束节点为“岁”字对应的节点,词链“中华人民”对应的结束节点为“民”字对应的节点。
本实施例的基于多叉树的搜索词推荐系统中热词节点模块中将与已输入文字匹配的词链作为推荐词包括:将已输入文字对应的所有节点所在所有词链中节点数量最少的词链作为推荐词。例如词链“中”“华”“民”“族”对应的推荐词为“中华民族”;词链“国”、“泰”、“民”、“安”对应的推荐词为“国泰民安”。
进一步,本实施例的基于多叉树的搜索词推荐系统中热词节点模块中将已输入文字对应的所有节点所在所有词链中节点数量最少的词链作为推荐词包括:将已输入文字对应的所有节点所在所有词链中节点数量最少的词链中所有节点对应的文字按照从前到后的匹配顺序组合为推荐词。例如本实施例中“中”“华”两个节点所处的词链包括三个“中华民族”、“中华人民万岁”和“中华人民”,其中“中华民族”和“中华人民”为节点数量最少的词链,则将“中华民族”和“中华人民”作为推荐词。
参考图2,本实施例的基于多叉树的搜索词推荐系统中热词节点模块包括多个热词节点子模块,热词加载服务器根据热词的属性信息将多叉树词链数据分为多个多叉树词链子数据,每个热词节点子模块对应一个多叉树词链子数据。用户网关代理服务模块根据已输入文字的首个文字的属性信息选择对应的热词节点子模块,热词节点子模块根据当前已输入文字查找选定的多叉树词链子数据。热词节点子模块的数量可根据需要设置,通过扩充热词节点子模块的数量实现搜索词库的热词数量增加。本实施例中热词加载服务器和用户网关代理服务模块使用相同的属性信息,即热词加载服务器和用户网关代理服务模块使用相同的文字分发算法。
作为选择,本实施例的基于多叉树的搜索词推荐系统中热词加载服务器接收多个热词数据源发送的更新热词,将每个更新热词拆分为多个单独文字,按照文字在更新热词中从前到后的排序生成一个更新词链,每个文字为更新词链中的一个节点,将更新词链中每个节点与现有多叉树词链数据进行相同节点融合,更新多叉树词链数据。
本实施例的基于多叉树的搜索词推荐系统中若热词节点模块包括多个热词节点子模块,则根据更新热词的属性信息将对应的更新词链更新至对应属性的多叉树词链子数据中。作为选择,在本实施例的基于多叉树的搜索词推荐系统中文字为汉字,属性信息为文字对应的汉语拼音的首字母的排序。可以理解,不同语言中文字的规律不同,可根据每种语言中文字的排序规律确定文字的属性信息,以及更新热词的属性信息。
本实施例通过多叉树算法搭建方式,将现有技术中的耗时1000ms优化到1ms以下的方式,并降低机器成本,由昂贵的高性能服务器和昂贵数据库软件投入转变为普通机器横向扩展投入,并且投入按照数据使用量可选。
实施例2
本实施例的基于多叉树的搜索词推荐方法中文字包括但不限于汉字、外语单词、阿拉伯数字等,其中外语单词可为英语单词、法语单词、德语单词、西班牙语单词等,所有语言的文字都可使用本实施例的基于多叉树的搜索词推荐方法。具体的,本实施例的基于多叉树的搜索词推荐方法包括下述步骤:
A、多叉树词链数据生成过程:将每个热词拆分为多个单独文字,按照文字在热词中从前到后的排序生成一个词链,每个文字为词链中的一个节点,将不同词链中相同文字对应的节点作为公用节点,生成多叉树词链数据。例如,图3中多叉树词链数据包括:“中华人民”、“中华民族”、“人山人海”、“中华人民万岁”、“万事如意”、“国泰民安”,其中,
热词“中华人民”对应的词链包含中华人民“中”、“华”、“人”、“民”4个节点,且“民”字对应节点为该词链的结束节点;
热词“中华民族”对应的词链包含“中”、“华”、“民”、“族”4个节点,且“族”字对应节点为该词链的结束节点;
热词“人山人海”对应的词链包含“人”、“山”、“人”、“海”4个节点,且“海”字对应节点为该词链的结束节点;
热词“中华人民万岁”对应的词链包含“中”、“华”、“人”、“民”、“万”、“岁”6个节点,且“岁”字对应节点为该词链的结束节点;
热词“万事如意”对应的词链包含“万”、“事”、“如”、“意”4个节点,且“意”字对应节点为该词链的结束节点,结束节点在多叉树词链数据中需进行标记。
热词“国泰民安”对应的词链包含“国”、“泰”、“民”、“安”4个节点,且“安”字对应节点为该词链的结束节点,结束节点在多叉树词链数据中需进行标记。
另外,节点“华”、节点“人”、节点“民”和节点“万”是公用节点,公用节点在多叉树词链数据中需进行标记,且节点“民”既是结束节点,也是公用节点。
B、搜索词推荐过程:根据当前已输入文字查找多叉树词链数据,将与已输入文字匹配的词链作为推荐词。
本实施例的基于多叉树的搜索词推荐方法中根据当前已输入文字查找多叉树词链数据包括:
b1、提取已输入文字的首个文字,查找多叉树词链数据中与首个文字匹配的节点;
b2、已输入文字的剩余文字依序逐一匹配首个文字匹配节点的分支节点;
b3、已输入文字全部匹配完成后,继续读取已输入文字对应的所有节点所在词链的剩余节点。
进一步,本实施例的基于多叉树的搜索词推荐方法中继续读取已输入文字对应的所有节点所在词链的剩余节点包括:继续读取已输入文字对应的所有节点所在词链的剩余节点直至遇到结束节点,结束节点为多叉树词链数据生成过程每个热词的结束文字对应的节点。
进一步,本实施例的基于多叉树的搜索词推荐方法中将与已输入文字匹配的词链作为推荐词包括:将已输入文字对应的所有节点所在所有词链中节点数量最少的词链作为推荐词。
进一步,本实施例的基于多叉树的搜索词推荐方法中将已输入文字对应的所有节点所在所有词链中节点数量最少的词链作为推荐词包括:将已输入文字对应的所有节点所在所有词链中节点数量最少的词链中所有节点对应的文字按照从前到后的匹配顺序组合为推荐词。
作为选择,本实施例的基于多叉树的搜索词推荐方法中在多叉树词链数据生成过程中:根据热词的属性信息将多叉树词链数据分为多个多叉树词链子数据。则根据当前已输入文字查找多叉树词链数据包括:根据已输入文字的首个文字的属性信息选择对应的多叉树词链子数据,根据当前已输入文字查找选定的多叉树词链子数据。
作为选择,本实施例的基于多叉树的搜索词推荐方法还包括:
C、多叉树词链数据更新过程:将每个更新热词拆分为多个单独文字,按照文字在更新热词中从前到后的排序生成一个更新词链,每个文字为更新词链中的一个节点,将更新词链中每个节点与现有多叉树词链数据进行相同节点融合,更新多叉树词链数据。
本实施例的基于多叉树的搜索词推荐方法中在多叉树词链数据更新过程:若热词节点模块包括多个热词节点子模块,则根据更新热词的属性信息将对应的更新词链更新至对应属性的多叉树词链子数据中。作为选择,在本实施例的基于多叉树的搜索词推荐方法中,文字为汉字,属性信息为文字对应的汉语拼音的首字母的排序。
本实施例通过多叉树算法搭建方式,将现有技术中的耗时1000ms优化到1ms以下的方式,并降低机器成本,由昂贵的高性能服务器和昂贵数据库软件投入转变为普通机器横向扩展投入,并且投入按照数据使用量可选。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
以上实施例只为说明本发明的技术构思及特点,其目的在于让熟悉此项技术的人士能够了解本发明的内容并据此实施,并不能限制本发明的保护范围。凡跟本发明权利要求范围所做的均等变化与修饰,均应属于本发明权利要求的涵盖范围。

Claims (20)

  1. 一种基于多叉树的搜索词推荐方法,其特征在于,包括:
    A、多叉树词链数据生成过程:将每个热词拆分为多个单独文字,按照文字在热词中从前到后的排序生成一个词链,每个文字为所述词链中的一个节点,将不同词链中相同文字对应的节点作为公用节点,生成所述多叉树词链数据;
    B、搜索词推荐过程:根据当前已输入文字查找所述多叉树词链数据,将与所述已输入文字匹配的词链作为推荐词。
  2. 根据权利要求1所述的基于多叉树的搜索词推荐方法,其特征在于,所述根据当前已输入文字查找所述多叉树词链数据包括:
    b1、提取所述已输入文字的首个文字,查找所述多叉树词链数据中与所述首个文字匹配的节点;
    b2、所述已输入文字的剩余文字依序逐一匹配所述首个文字匹配节点的分支节点;
    b3、所述已输入文字全部匹配完成后,继续读取所述已输入文字对应的所有节点所在词链的剩余节点。
  3. 根据权利要求2所述的基于多叉树的搜索词推荐方法,其特征在于,所述继续读取所述已输入文字对应的所有节点所在词链的剩余节点包括:
    继续读取所述已输入文字对应的所有节点所在词链的剩余节点直至遇到结束节点,所述结束节点为所述多叉树词链数据生成过程每个热词的结束文字对应的节点。
  4. 根据权利要求3所述的基于多叉树的搜索词推荐方法,其特征在于,所述将与所述已输入文字匹配的词链作为推荐词包括:
    将所述已输入文字对应的所有节点所在所有词链中节点数量最少的词链作为推荐词。
  5. 根据权利要求4所述的基于多叉树的搜索词推荐方法,其特征在于,所述将所述已输入文字对应的所有节点所在所有词链中节点数量最少的词链作为推荐词包括:
    将所述已输入文字对应的所有节点所在所有词链中节点数量最少的词链中所有节点对应的文字按照从前到后的匹配顺序组合为推荐词。
  6. 根据权利要求1至5中任一项所述的基于多叉树的搜索词推荐方法,其特征在于,在多叉树词链数据生成过程中:根据所述热词的属性信息将所述多叉树词链数据分为多个多叉树词链子数据;
    所述根据当前已输入文字查找所述多叉树词链数据包括:根据所述已输入文字的首个文字的属性信息选择对应的多叉树词链子数据,根据当前已输入文字查找选定的所述多叉树词链子数据。
  7. 根据权利要求6所述的基于多叉树的搜索词推荐方法,其特征在于,还包括:
    C、多叉树词链数据更新过程:将每个更新热词拆分为多个单独文字,按照文字在所述更新热词中从前到后的排序生成一个更新词链,每个文字为所述更新词链中的一个节点,将所述更新词链中每个节点与现有多叉树词链数据进行相同节点融合,更新所述多叉树词链数据。
  8. 根据权利要求7所述的基于多叉树的搜索词推荐方法,其特征在于,在多叉树词链数据更新过程:
    若所述热词节点模块包括多个热词节点子模块,则根据所述更新热词的属性信息将对应的更新词链更新至对应属性的多叉树词链子数据中。
  9. 根据权利要求8所述的基于多叉树的搜索词推荐方法,其特征在于,所述文字为汉字,所述属性信息为文字对应的汉语拼音的首字母的排序。
  10. 根据权利要求1所述的基于多叉树的搜索词推荐方法,其特征在于,所述文字为汉字、外语单词、阿拉伯数字中的一种或几种。
  11. 一种基于多叉树的搜索词推荐系统,其特征在于,包括搜索引擎热词推荐模块、热词查询服务器、热词加载服务器和多个热词数据源,所述搜索引擎热词推荐模块安装在用户终端的搜索引擎上,所述热词查询服务器包括用户网关代理服务模块和热词节点模块;
    所述搜索引擎热词推荐模块通信连接所述用户网关代理服务模块,所述用户网关代理服务模块通信连接所述热词节点模块,所述热词节点模块通信连接所述热词加载服务器;所述热词加载服务器通信连接每个所述热词数据源;
    所述热词加载服务器接收所有所述热词数据源发送的热词,并发送至所述热词节点模块存储,所述热词节点模块将每个热词拆分为多个单独文字,按照文字在热词中从前到后的排序生成一个词链,每个文字为所述词链中的一个节点,将不同词链中相同文字对应的节点作为公用节点,生成多叉树词链数据;
    所述搜索引擎热词推荐模块用于接收用户输入的文字,将已输入文字发送至所述用户网关代理服务模块,所述用户网关代理服务模块将所述已输入文字传输至所述热词节点模块,所述热词节点模块根据所述已输入文字查找所述多叉树词链数据,将与所述已输入文字匹配的词链作为推荐词;
    所述热词节点模块将查询得到的所述推荐词发送至所述用户终端的搜索引擎显示。
  12. 根据权利要求11所述的基于多叉树的搜索词推荐系统,其特征在于,所述热词节点模块根据所述已输入文字查找所述多叉树词链数据包括:
    提取所述已输入文字的首个文字,查找所述多叉树词链数据中与所述首个文字匹配的节点;所述已输入文字的剩余文字依序逐一匹配所述首个文字匹配节点的分支节点;所述已输入文字全部匹配完成后,继续读取所述已输入文字对应的所有节点所在词链的剩余节点。
  13. 根据权利要求12所述的基于多叉树的搜索词推荐系统,其特征在于,所述热词节点模块中所述继续读取所述已输入文字对应的所有节点所在词链的剩余节点包括:
    继续读取所述已输入文字对应的所有节点所在词链的剩余节点直至遇到结束节点,所述结束节点为所述多叉树词链数据生成过程每个热词的结束文字对应的节点。
  14. 根据权利要求13所述的基于多叉树的搜索词推荐系统,其特征在于,所述热词节点模块中所述将与所述已输入文字匹配的词链作为推荐词包括:
    将所述已输入文字对应的所有节点所在所有词链中节点数量最少的词链作为推荐词。
  15. 根据权利要求14所述的基于多叉树的搜索词推荐系统,其特征在于,所述热词节点模块中所述将所述已输入文字对应的所有节点所在所有词链中节点数量最少的词链作为推荐词包括:
    将所述已输入文字对应的所有节点所在所有词链中节点数量最少的词链中所有节点对应的文字按照从前到后的匹配顺序组合为推荐词。
  16. 根据权利要求11至15中任一项所述的基于多叉树的搜索词推荐系统,其特征在于,所述热词节点模块包括多个热词节点子模块,所述热词加载服务器根据所述热词的属性信息将所述多叉树词链数据分为多个多叉树词链子数据,每个所述热词节点子模块对应一个所述多叉树词链子数据;
    所述用户网关代理服务模块根据所述已输入文字的首个文字的属性信息选择对应的热词节点子模块,所述热词节点子模块根据当前已输入文字查找选定的所述多叉树词链子数据。
  17. 根据权利要求16所述的基于多叉树的搜索词推荐系统,其特征在于,所述热词加载服务器接收多个所述热词数据源发送的更新热词,将每个更新热词拆分为多个单独文字,按照文字在更新热词中从前到后的排序生成一个更新词链,每个文字为所述更新词链中的一个节点,将所述更新词链中每个节点与现有多叉树词链数据进行相同节点融合,更新所述多叉树词链数据。
  18. 根据权利要求17所述的基于多叉树的搜索词推荐系统,其特征在于,若所述热词节点模块包括多个热词节点子模块,则根据所述更新热词的属性信息将对应的更新词链更新至对应属性的多叉树词链子数据中。
  19. 根据权利要求18所述的基于多叉树的搜索词推荐系统,其特征在于,所述文字为汉字,所述属性信息为文字对应的汉语拼音的首字母的排序。
  20. 根据权利要求11所述的基于多叉树的搜索词推荐系统,其特征在于,所述文字为汉字、外语单词、阿拉伯数字中的一种或几种。
PCT/CN2020/090647 2020-05-15 2020-05-15 一种基于多叉树的搜索词推荐方法及系统 WO2021227059A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/090647 WO2021227059A1 (zh) 2020-05-15 2020-05-15 一种基于多叉树的搜索词推荐方法及系统
US17/467,268 US11947608B2 (en) 2020-05-15 2021-09-05 Search term recommendation method and system based on multi-branch tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/090647 WO2021227059A1 (zh) 2020-05-15 2020-05-15 一种基于多叉树的搜索词推荐方法及系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/467,268 Continuation US11947608B2 (en) 2020-05-15 2021-09-05 Search term recommendation method and system based on multi-branch tree

Publications (1)

Publication Number Publication Date
WO2021227059A1 true WO2021227059A1 (zh) 2021-11-18

Family

ID=78525933

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/090647 WO2021227059A1 (zh) 2020-05-15 2020-05-15 一种基于多叉树的搜索词推荐方法及系统

Country Status (2)

Country Link
US (1) US11947608B2 (zh)
WO (1) WO2021227059A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021227059A1 (zh) * 2020-05-15 2021-11-18 深圳市世强元件网络有限公司 一种基于多叉树的搜索词推荐方法及系统
CN115687617B (zh) * 2022-10-14 2023-10-27 荣耀终端有限公司 一种数据处理方法和数据处理装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199954A (zh) * 2012-06-26 2014-12-10 北京奇虎科技有限公司 一种用于搜索输入的推荐系统及方法
CN104331434A (zh) * 2014-10-22 2015-02-04 乐视网信息技术(北京)股份有限公司 一种生成搜索提示词服务的方法及其装置
CN107590214A (zh) * 2017-08-30 2018-01-16 腾讯科技(深圳)有限公司 搜索关键词的推荐方法、装置及电子设备
CN108694186A (zh) * 2017-04-07 2018-10-23 阿里巴巴集团控股有限公司 数据发送方法及服务器应用、计算设备及计算机可读介质

Family Cites Families (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2002213279A1 (en) * 2000-10-16 2002-04-29 Text Analysis International, Inc. Method for analyzing text and method for builing text analyzers
US7676462B2 (en) * 2002-12-19 2010-03-09 International Business Machines Corporation Method, apparatus, and program for refining search criteria through focusing word definition
CA2675216A1 (en) * 2007-01-10 2008-07-17 Nick Koudas Method and system for information discovery and text analysis
US8346764B1 (en) * 2007-06-01 2013-01-01 Thomson Reuters Global Resources Information retrieval systems, methods, and software with content-relevancy enhancements
US8060455B2 (en) * 2007-12-31 2011-11-15 Yahoo! Inc. Hot term prediction for contextual shortcuts
US20090253112A1 (en) * 2008-04-07 2009-10-08 Microsoft Corporation Recommending questions to users of community qiestion answering
US8024332B2 (en) * 2008-08-04 2011-09-20 Microsoft Corporation Clustering question search results based on topic and focus
US8706709B2 (en) * 2009-01-15 2014-04-22 Mcafee, Inc. System and method for intelligent term grouping
US20100306249A1 (en) * 2009-05-27 2010-12-02 James Hill Social network systems and methods
US9286345B2 (en) * 2009-06-12 2016-03-15 International Business Machines Corporation Query tree navigation
US8583675B1 (en) * 2009-08-28 2013-11-12 Google Inc. Providing result-based query suggestions
US20110065082A1 (en) * 2009-09-17 2011-03-17 Michael Gal Device,system, and method of educational content generation
CA2780918A1 (en) * 2009-11-17 2011-05-26 University Health Network Systems, methods, and computer program products for generating relevant search results using snomed ct and semantic ontological terminology
WO2012082859A1 (en) * 2010-12-14 2012-06-21 The Regents Of The University Of California High efficiency prefix search algorithm supporting interactive, fuzzy search on geographical structured data
US8788508B2 (en) * 2011-03-28 2014-07-22 Microth, Inc. Object access system based upon hierarchical extraction tree and related methods
US8676937B2 (en) * 2011-05-12 2014-03-18 Jeffrey Alan Rapaport Social-topical adaptive networking (STAN) system allowing for group based contextual transaction offers and acceptances and hot topic watchdogging
CN103365839B (zh) * 2012-03-26 2017-12-12 深圳市世纪光速信息技术有限公司 一种搜索引擎的推荐搜索方法和装置
CN102682090B (zh) 2012-04-26 2015-09-02 焦点科技股份有限公司 一种基于聚合词树的敏感词匹配处理系统及方法
CN102768681B (zh) 2012-06-26 2014-10-22 北京奇虎科技有限公司 一种用于搜索输入的推荐系统及方法
US9305114B2 (en) * 2012-12-17 2016-04-05 Microsoft Technology Licensing, Llc Building long search queries
US9223898B2 (en) * 2013-05-08 2015-12-29 Facebook, Inc. Filtering suggested structured queries on online social networks
US9703859B2 (en) * 2014-08-27 2017-07-11 Facebook, Inc. Keyword search queries on online social networks
US9836529B2 (en) * 2014-09-22 2017-12-05 Oracle International Corporation Semantic text search
US9727648B2 (en) * 2014-12-19 2017-08-08 Quixey, Inc. Time-box constrained searching in a distributed search system
US10380144B2 (en) * 2015-06-16 2019-08-13 Business Objects Software, Ltd. Business intelligence (BI) query and answering using full text search and keyword semantics
CN105955986A (zh) 2016-04-18 2016-09-21 乐视控股(北京)有限公司 一种字符的转换方法及装置
CN107665217A (zh) 2016-07-29 2018-02-06 苏宁云商集团股份有限公司 一种用于搜索业务的词汇处理方法及系统
CN106326484A (zh) * 2016-08-31 2017-01-11 北京奇艺世纪科技有限公司 搜索词纠错方法及装置
US10162886B2 (en) * 2016-11-30 2018-12-25 Facebook, Inc. Embedding-based parsing of search queries on online social networks
GB2561660A (en) * 2017-02-10 2018-10-24 Count Tech Ltd Computer-implemented method of querying a dataset
CN106934006B (zh) 2017-03-08 2020-07-10 中国银行股份有限公司 基于多叉树模型的页面推荐方法及装置
US20180300407A1 (en) * 2017-04-13 2018-10-18 Runtime Collective Limited Query Generation for Social Media Data
US10268646B2 (en) * 2017-06-06 2019-04-23 Facebook, Inc. Tensor-based deep relevance model for search on online social networks
US10546023B2 (en) * 2017-10-03 2020-01-28 Google Llc Providing command bundle suggestions for an automated assistant
CN107992481B (zh) 2017-12-25 2021-05-04 鼎富智能科技有限公司 一种基于多叉树的正则表达式匹配方法、装置及系统
CN109101235B (zh) * 2018-06-05 2021-03-19 北京航空航天大学 一种软件程序的智能解析方法
CN109753648B (zh) 2018-11-30 2022-12-20 平安科技(深圳)有限公司 词链模型的生成方法、装置、设备及计算机可读存储介质
CN110851722A (zh) 2019-11-12 2020-02-28 腾讯云计算(北京)有限责任公司 基于字典树的搜索处理方法、装置、设备和存储介质
WO2021227059A1 (zh) * 2020-05-15 2021-11-18 深圳市世强元件网络有限公司 一种基于多叉树的搜索词推荐方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199954A (zh) * 2012-06-26 2014-12-10 北京奇虎科技有限公司 一种用于搜索输入的推荐系统及方法
CN104331434A (zh) * 2014-10-22 2015-02-04 乐视网信息技术(北京)股份有限公司 一种生成搜索提示词服务的方法及其装置
CN108694186A (zh) * 2017-04-07 2018-10-23 阿里巴巴集团控股有限公司 数据发送方法及服务器应用、计算设备及计算机可读介质
CN107590214A (zh) * 2017-08-30 2018-01-16 腾讯科技(深圳)有限公司 搜索关键词的推荐方法、装置及电子设备

Also Published As

Publication number Publication date
US20210397667A1 (en) 2021-12-23
US11947608B2 (en) 2024-04-02

Similar Documents

Publication Publication Date Title
US11487744B2 (en) Domain name generation and searching using unigram queries
US7433894B2 (en) Method and system for searching a multi-lingual database
CN103488648B (zh) 一种多语种混合检索方法和系统
US9645979B2 (en) Device, method and program for generating accurate corpus data for presentation target for searching
CN117056471A (zh) 知识库构建方法及基于生成式大语言模型的问答对话方法和系统
TW201131402A (en) Enabling faster full-text searching using a structured data store
US9990432B1 (en) Generic folksonomy for concept-based domain name searches
US10380248B1 (en) Acronym identification in domain names
US10380210B1 (en) Misspelling identification in domain names
US9787634B1 (en) Suggesting domain names based on recognized user patterns
US10467536B1 (en) Domain name generation and ranking
WO2021227059A1 (zh) 一种基于多叉树的搜索词推荐方法及系统
CN105573990A (zh) 外语句子制作支援装置以及方法
CN110119404B (zh) 一种基于自然语言理解的智能取数系统及其方法
JP2007157123A (ja) 改善された中国語−英語翻訳ツール
US20100205229A1 (en) System and method for instances registering based on history
CN116361416A (zh) 基于语义分析及高维建模的语音检索方法、系统及介质
JP5085584B2 (ja) 記事特徴語抽出装置、記事特徴語抽出方法及びプログラム
CN108920452A (zh) 一种信息处理方法及装置
JP4646328B2 (ja) 関係情報抽出装置及びその方法
WO2001055901A1 (fr) Systeme de traduction automatique, serveur et client de ce systeme
CN111737986A (zh) 一种基于多叉树的搜索词推荐方法及系统
US20230186022A1 (en) Method and system for finding associations between natural language and computer language
JP2002149648A (ja) 統合検索方法及び装置及び統合検索プログラムを格納した記憶媒体
JP5160120B2 (ja) 情報検索装置、情報検索方法及び情報検索プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20935473

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20935473

Country of ref document: EP

Kind code of ref document: A1