CN103914569B - Input creation method, the device of reminding method, device and dictionary tree-model - Google Patents

Input creation method, the device of reminding method, device and dictionary tree-model Download PDF

Info

Publication number
CN103914569B
CN103914569B CN201410169141.6A CN201410169141A CN103914569B CN 103914569 B CN103914569 B CN 103914569B CN 201410169141 A CN201410169141 A CN 201410169141A CN 103914569 B CN103914569 B CN 103914569B
Authority
CN
China
Prior art keywords
word
main body
partition
input
dictionary tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410169141.6A
Other languages
Chinese (zh)
Other versions
CN103914569A (en
Inventor
柳阳
谢朴锐
任志杰
郭楚钦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201410169141.6A priority Critical patent/CN103914569B/en
Publication of CN103914569A publication Critical patent/CN103914569A/en
Application granted granted Critical
Publication of CN103914569B publication Critical patent/CN103914569B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of input reminding method, this method includes:Obtain input word;Input word is decoupled to obtain N number of partition word, wherein N is positive integer;Preset dictionary tree-model is inquired respectively to obtain N number of prompt results set respectively according to N number of partition word;And final prompt result is generated according to N number of prompt results set.The input reminding method of the embodiment of the present invention decouples word, intermediate partition word etc. by carrying out partition extension to input word to obtain the prefix of input word, and word, the preset dictionary tree-model of intermediate partition word inquiry are decoupled by prefix, it obtains and decouples the matched prompt result of word with prefix partition word, centre respectively, improve the accuracy of prompt, and by incorporating the context processing of input word and the cutting granularity of input word being reduced to minimum, possibility when retrieval matching is increased, the user experience is improved.The invention also discloses a kind of input prompt device and a kind of creation methods and device of dictionary tree-model.

Description

Input creation method, the device of reminding method, device and dictionary tree-model
Technical field
The present invention relates to search technique field more particularly to a kind of input reminding methods and device and dictionary tree-model Creation method and device.
Background technology
Currently, in search field at home, such as Webpage search and hanging down is searched, if search term input prompt dictionary is ten It more than ten thousand ranks, being substantially all and is to rely on Trie trees (a kind of searching algorithm based on memory), the process of index is to contribute, And retrieving is to traversal of tree.However be specifically how to contribute, how to traverse, different designers can be according to data It is specific, formulate different methods.
Currently, the domestic Auto Complete algorithms based on Chinese, be essentially all first each chinese character by word according to It is secondary to be inserted into each node from root to leaf of Trie trees, Chinese character is then turned into phonetic and is contribute again, when search be exactly from Root node traverses successively to leaf node.This method can solve the problems, such as most input search term prompt.
But if search term input by user is not the prefix of any word in dictionary, this dependence Trie trees Method is with regard to infeasible;The entry being not present has been prompted in other words, and " exceedingly high empire it " recalls for example, as shown in Figure 1, search term " Di Ren of exceedingly high empire is outstanding ", and correct and existing entry is " the exceedingly high empire of Di Ren outstanding person ", and be in this way it is a kind of not sternly Careful way, to reduce the accuracy of search term prompt, poor user experience.
Invention content
The present invention is directed to solve above-mentioned one of technological deficiency at least to a certain extent.
For this purpose, first purpose of the present invention is to propose a kind of input reminding method.This method by input word into Row partition extension, and preset dictionary tree-model is inquired to be prompted as a result, improving prompt according to input word after partition Accuracy, and by incorporating the context processing of input word and the cutting granularity of input word being reduced to minimum, increase retrieval Possibility when matching, the user experience is improved.
Second object of the present invention is to propose a kind of creation method of dictionary tree-model.
Third object of the present invention is to propose a kind of input prompt device.
Fourth object of the present invention is to propose a kind of creating device of dictionary tree-model.
To achieve the goals above, the input reminding method of first aspect present invention embodiment, including:Obtain input word; The input word is decoupled to obtain N number of partition word, wherein N is positive integer;It is inquired respectively according to N number of partition word Preset dictionary tree-model to obtain N number of prompt results set respectively;And it is generated according to N number of prompt results set final Prompt result.
The input reminding method of the embodiment of the present invention can decouple the input word of acquisition to obtain N number of partition word, and According to N number of partition word preset dictionary tree-model is inquired respectively to obtain N number of prompt results set respectively, it later can be to the N A prompt results set merge and duplicate removal to generate final prompt as a result, by carrying out partition extension to input word, can The prefix partition word of input word, intermediate partition word etc. are obtained, word is decoupled by prefix in this way, intermediate partition word inquires preset word Allusion quotation tree-model can get the matched prompt result of prefix partition word, the intermediate matched prompt of partition word carries automatically as a result, improving The accuracy shown, also, the context by incorporating input word is handled, and the cutting granularity of input word is reduced to minimum, it increases Possibility when retrieval matching, the user experience is improved.
To achieve the goals above, the creation method of the dictionary tree-model of second aspect of the present invention embodiment, including:It obtains Multiple sample words;The multiple sample word is ranked up according to temperature is accessed, and multiple sample words after sequence are made respectively For multiple main body words;The corresponding related term of each main body word is generated according to the multiple main body word respectively;And according to described more A main body word and the corresponding related term of each main body word create dictionary tree-model.
The creation method of the dictionary tree-model of the embodiment of the present invention, by being amplified the sample word after sequence to generate Several related terms solve the problems, such as to be unable to get in the fields Auto Complete from intermediate matched prompt result, and The feature that the tree node of a part is shared by related term and main body word, reduces memory headroom.
To achieve the goals above, the input prompt device of third aspect present invention embodiment, including:Acquisition module is used In acquisition input word;Module is decoupled, for being decoupled to the input word to obtain N number of partition word, wherein N is positive integer; Enquiry module, for inquiring preset dictionary tree-model respectively according to N number of partition word to obtain N number of prompt result set respectively It closes;And generation module, for generating final prompt result according to N number of prompt results set.
The input prompt device of the embodiment of the present invention decouples to obtain N the input word of acquisition by decoupling module A partition word, enquiry module inquire preset dictionary tree-model to obtain N number of prompt result respectively respectively according to N number of partition word Set, generation module is merged to N number of prompt results set and duplicate removal is to generate final prompt as a result, passing through as a result, Partition extension is carried out to input word with the prefix partition word for obtaining input word, intermediate partition word etc., in this way by prefix decouple word, Centre partition word inquires preset dictionary tree-model, can get the matched prompt result of prefix partition word, intermediate partition word matching Prompt as a result, improve the accuracy of automatic prompt, also, the context by incorporating input word is handled, and by input word Cutting granularity is reduced to minimum, increases possibility when retrieval matching, the user experience is improved.
To achieve the goals above, the creating device of the dictionary tree-model of fourth aspect present invention embodiment, including:It obtains Module, for obtaining multiple sample words;Sorting module, for carrying out descending row to the multiple sample word according to access temperature Sequence, and respectively to multiple sample words after sequence as multiple main body words;Generation module, for respectively according to the multiple main body Word generates the corresponding related term of each main body word;And creation module, for according to the multiple main body word and each master The corresponding related term of pronouns, general term for nouns, numerals and measure words creates dictionary tree-model.
The creating device of the dictionary tree-model of the embodiment of the present invention, by being amplified the sample word after sequence to generate Several related terms solve the problems, such as to be unable to get in the fields Auto Complete from intermediate matched prompt result, and The feature that the tree node of a part is shared by related term and main body word, reduces memory headroom.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partly become from the following description Obviously, or practice through the invention is recognized.
Description of the drawings
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, wherein
Fig. 1 is the schematic diagram of input reminding method in the prior art;
Fig. 2 is the flow chart of input reminding method according to an embodiment of the invention;
Fig. 3 is the schematic diagram of dictionary tree-model according to an embodiment of the invention;
Fig. 4 is the flow chart of the creation method of dictionary tree-model according to an embodiment of the invention;
Fig. 5 is the structural schematic diagram of input prompt device according to an embodiment of the invention;
Fig. 6 is the structural schematic diagram of the creating device of dictionary tree-model according to an embodiment of the invention.
Specific implementation mode
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and is not considered as limiting the invention.
Below with reference to the accompanying drawings input reminding method and device and dictionary tree-model according to the ... of the embodiment of the present invention are described Creation method and device.
Currently, there are many kinds of the methods of realization Auto Complete, such as Markov model and some machine learning sides Method can realize, but its determination is to need a large amount of training, and performance differs and surely reaches requirement.Industry is relatively generally acknowledged Preferable scheme be Trie trees, and Trie trees technical field and diversified, and how being solved with space as few as possible Comprehensive, accurate and compatible error correction etc. is recalled to be a problem to be solved.
For this purpose, the present invention proposes a kind of input reminding method, including:Obtain input word;To input word decoupled with Obtain N number of partition word, wherein N is positive integer;Preset dictionary tree-model is inquired respectively to obtain N respectively according to N number of partition word A prompt results set;And final prompt result is generated according to N number of prompt results set.
Fig. 2 is the flow chart of input reminding method according to an embodiment of the invention.
As shown in Fig. 2, the input reminding method may include:
S201 obtains input word.
For example, can obtain filled in input frame when user is scanned for by browser or search for application it is defeated Enter word, such as " pc faster ".
S202 decouples input word to obtain N number of partition word, wherein N is positive integer.
Specifically, in one embodiment of the invention, can first be removed according to Fei Biliu word dictionaries non-in input word Word must be stayed.It later, can be to going unless the input word after must staying word is decoupled to obtain N number of partition word.Wherein, the present invention's In embodiment, Fei Biliu word dictionaries are related to language form, i.e., different language form has different Fei Biliu word dictionaries, language Speech type may include such as Chinese, English, French, Thai.
It should be appreciated that Fei Biliu words are to express input word theme not too important word, it will be appreciated that for input word not phase The word of pass.That is, Fei Biliu words are likely to contain in the input word that user fills in, for example, input word " downloads pc " download " in faster " must stay word with regard to right and wrong;In another example in input word " sea cucumber of how foaming " " how ".It as a result, can be first Fei Biliu words are matched by Fei Biliu words dictionary, remove the Fei Biliu words being matched to, can be gone in this way unless must stay Input word " pc faster " after word.Later, partition extension can be carried out to " the pc faster ", for example, detachable at " pc Faster " and " faster " this 2 partition words.In this way, can get the prefix partition word of input word and intermediate partition word.
In another embodiment of the present invention, first input word can be decoupled according to the context of input word to obtain One partition word results set.Later, can be decoupled according to the corresponding minimum linguistic unit pair first of language form word results set into Row partition is to obtain the second partition word results set, wherein the second partition word results set includes N number of partition word.
For example, can be English by input word known to the context of input word by taking input word " pc faster " as an example, And include two words " pc ", " faster ", first input word can be decoupled decouple word results set to obtain in this way (“pc”、“faster”).Later, the partition word results set (" pc ", " faster ") can be carried out according to minimum linguistic unit Partition is to obtain ((" p ", " c "), (" f ", " a ", " s ", " t ", " e ", " r ")).It as a result, in this way can be by input word point Minimum unit is splitted into, possibility when retrieval matching is increased.
S203 inquires preset dictionary tree-model to obtain N number of prompt results set respectively respectively according to N number of partition word.
Specifically, it can go to traverse preset dictionary tree-model respectively with N number of partition word, N number of prompt result set can be obtained It closes.It should be noted that in one embodiment of the invention, it can be by the concurrent mode of multiple threads to preset dictionary tree Model is traversed.Wherein, in one embodiment of the invention, preset dictionary tree-model can be pre-created, specifically The realization method of establishment can refer to the detailed description of subsequent embodiment.
S204 generates final prompt result according to N number of prompt results set.
Specifically, first N number of prompt results set can be merged, it later can be to N number of prompt result set after merging It closes and carries out duplicate removal to generate final prompt result.
It should be noted that in an embodiment of the present invention, it is final that the item number of result is prompted to be generally 10 (because of Auto Complete generally only selects 10 displays), first 7 articles can be prompt that forerunner matches as a result, the 8th, 9 article can be centre With obtain prompt as a result, the 10th article can be Similarity matching obtain prompt result (comprising wrongly written character match).In this way, can not only be complete At spelling automatic prompt, and it can know that the potential retrieval purpose of user and have error correction so that Auto Complete Use meaning distilled.
The input reminding method of the embodiment of the present invention can decouple the input word of acquisition to obtain N number of partition word, and According to N number of partition word preset dictionary tree-model is inquired respectively to obtain N number of prompt results set respectively, it later can be to the N A prompt results set merge and duplicate removal to generate final prompt as a result, by carrying out partition extension to input word, can The prefix partition word of input word, intermediate partition word etc. are obtained, word is decoupled by prefix in this way, intermediate partition word inquires preset word Allusion quotation tree-model can get the matched prompt result of prefix partition word, the intermediate matched prompt of partition word carries automatically as a result, improving The accuracy shown, also, the context by incorporating input word is handled, and the cutting granularity of input word is reduced to minimum, it increases Possibility when retrieval matching, the user experience is improved.
It should be noted that in one embodiment of the invention, preset dictionary tree-model can be created by following steps It builds:
S101 ' obtains multiple sample words.
For example, when can obtain that user is scanned for by browser or search for application over a period to come, defeated Enter the multiple input word filled in frame, and duplicate removal carried out to these input words, later can using the multiple input word after duplicate removal as Multiple sample words.
S102 ', according to access temperature multiple sample words are ranked up, and respectively using multiple sample words after sequence as Multiple main body words.
Specifically, descending sort can be carried out to sample word according to temperature is accessed, to obtain the orderly vocabulary of descending sort.It answers Work as understanding, the high sample word of weight is first inserted into dictionary tree, and it is the minimum sample word of weight to be finally inserted, as long as so first It is the result according to weight descending sort that sequence, which traverses the result that this dictionary tree obtains,.It should be noted that the present invention's In one embodiment, descending sort can also be carried out to sample word according to other weights, for example, can be according to character in language form Sequencing to sample word carry out descending sort.
It should be appreciated that in an embodiment of the present invention, main body word can be the sample word after duplicate removal.
S103 ' generates the corresponding related term of each main body word according to multiple main body words respectively.
Specifically, each main body word W processing can such as be removed non-in main body word W according to Fei Biliu word dictionaries Word etc. must be stayed, and is directed to and goes unless the main body word W after must staying word is extended, to generate several corresponding phases of each main body word Close word (W1, W2, W3 ...).For example, main body word " baidu pc faster " carries out amplification extension to the main body word, can amplify Related term out is (" baidu pc faster ", " pc faster ", " faster ").It is appreciated that in the reality of the present invention Apply in example, related term can be sample word is amplified obtained from the relevant word of sample word.
S104 ' creates preset dictionary tree-model according to multiple main body words and the corresponding related term of each main body word.
Specifically, can main body word dictionary tree first be created according to each main body word.It can be corresponded to later according to each main body word Related term create related term dictionary tree.Finally, main body word dictionary tree and related term dictionary tree can be merged pre- to create If dictionary tree-model.
It should be noted that can preset dictionary tree mould conventionally be established according to main body word and corresponding related term Type, but different with conventional method is that related term comes pair when being inserted into preset dictionary tree-model not as common entry It waits for, but corresponds to the root node of main body word at it and establish several indexes, be directed toward position of the related term in main body word.As a result, this Sample can solve dictionary tree algorithm when handling identical suffix word, can not share the short slab of memory space, considerably increase memory sky Between utilization rate.
For example, as shown in figure 3, by taking sample word " baidu pc faster " as an example, the related term extended out is (" baidu pc faster ", " pc faster ", " faster "), these three words correspond to three in preset dictionary tree-model Traverse path, but the corresponding leaf node in these three paths is all (baidu pc faster).Sample in this way regardless of obtaining Word is that " ba ", " pc " or " fas " is likely to recall " baidu pc faster ".
It should be appreciated that being can be found that from Fig. 3:Since related term and main body word share the tree node of a part, so can Memory headroom needed for saving.For example, the dictionary for including N number of sample word, if each sample word is averaged, character length is L, each sample word can generate n related term, then in total needed for maximum memory space be (L+n) * N*K, wherein K is single The memory space of node, and leaf node, as the data structure of other nodes, only its heir pointer is sky.
It should be noted that in one embodiment of the invention, also needing the context and language form pair according to sample word The minimum linguistic unit answered amplifies sample word to generate more related terms.
For example, it is not no space in Thai, between word and word, space is only by taking the language form of Thai as an example Appear between sentence and sentence and Thai and non-Thai language between, and many words are all the conjunctions of other words in Thai And so cause to acquire a certain degree of difficulty to the understanding of Thai in this way, and user also easy tos produce mistake when input.Example Such as, the Thai language of " pen " is, it is that (Thai language is by " mouth ") and " crow " (Thai language is) composition, it is It looks after those users inputed by mistake, input word can be decoupled to least unit.So if being that " I buys there are one Thai word One pen " should include (" I has bought a mouth crow ") this entry in the related term extended out.
As a result, by being amplified the sample word after sequence to generate several related terms, solves Auto Be unable to get in the fields Complete from intermediate matched prompt result the problem of, and pass through related term and main body word shared one The feature of partial tree node, reduces memory headroom.
Dictionary tree-model according to input word and search prompt result during play the role of it is very important, once work as word After allusion quotation tree-model is created, the prompt results set of input word can be obtained by queries dictionary tree-model, and according to prompt Results set generates final prompt result.
Therefore, in order to realize above-described embodiment, the invention also provides a kind of creation methods of dictionary tree-model, including: Obtain multiple sample words;Multiple sample words are ranked up according to temperature is accessed, and multiple sample words after sequence are made respectively For multiple main body words;Respectively the corresponding related term of each main body word is generated according to multiple main body words;And according to multiple main body words Related term corresponding with each main body word creates dictionary tree-model.
Fig. 4 is the flow chart of the creation method of dictionary tree-model according to an embodiment of the invention.
As shown in figure 4, the creation method of the dictionary tree-model may include:
S401 obtains multiple sample words.
For example, when can obtain that user is scanned for by browser or search for application over a period to come, defeated Enter the multiple input word filled in frame, and duplicate removal carried out to these input words, later can using the multiple input word after duplicate removal as Multiple sample words.
S402 is ranked up multiple sample words according to temperature is accessed, and respectively to multiple sample word conducts after sequence Multiple main body words.
Specifically, descending sort can be carried out to sample word according to temperature is accessed, to obtain the orderly vocabulary of descending sort.It answers Work as understanding, the high sample word of weight is first inserted into dictionary tree, and it is the minimum sample word of weight to be finally inserted, as long as so first It is the result according to weight descending sort that sequence, which traverses the result that this dictionary tree obtains,.It should be noted that the present invention's In one embodiment, descending sort can also be carried out to sample word according to other weights, for example, can be according to character in language form Sequencing to sample word carry out descending sort.
It should be appreciated that in an embodiment of the present invention, main body word can be the sample word after duplicate removal.
S403 generates the corresponding related term of each main body word according to multiple main body words respectively.
Specifically, each main body word W processing can such as be removed non-in main body word W according to Fei Biliu word dictionaries Word etc. must be stayed, and is directed to and goes unless the main body word W after must staying word is extended, to generate several corresponding phases of each main body word Close word (W1, W2, W3 ...).For example, main body word " baidu pc faster " carries out amplification extension to the main body word, can amplify Related term out is (" baidu pc faster ", " pc faster ", " faster ").It is appreciated that in the reality of the present invention Apply in example, related term can be sample word is amplified obtained from the relevant word of sample word.
S404 creates dictionary tree-model according to multiple main body words and the corresponding related term of each main body word.
Specifically, can main body word dictionary tree first be created according to each main body word, can be corresponded to later according to each main body word Related term create related term dictionary tree.Finally, the main body word dictionary tree and related term dictionary tree can be merged to create Dictionary tree-model.
It should be noted that can dictionary tree-model conventionally be established according to main body word and corresponding related term, but Different with conventional method to be, related term is treated when being inserted into dictionary tree-model not as common entry, but right at its It answers the root node of main body word to establish several indexes, is directed toward position of the related term in main body word.It can solve dictionary tree so as a result, Algorithm can not share the short slab of memory space, considerably increase the utilization rate of memory headroom when handling identical suffix word.
For example, as shown in figure 3, by taking sample word " baidu pc faster " as an example, the related term extended out is (" baidu pc faster ", " pc faster ", " faster "), these three words correspond to three traversal roads in dictionary tree-model Diameter, but the corresponding leaf node in these three paths is all (baidu pc faster).In this way no matter the sample word obtained is " ba ", " pc " or " fas " is likely to recall " baidu pc faster ".
It should be appreciated that being can be found that from Fig. 3:Since related term and main body word share the tree node of a part, so can Memory headroom needed for saving.For example, the dictionary for including N number of sample word, if each sample word is averaged, character length is L, each sample word can generate n related term, then in total needed for maximum memory space be (L+n) * N*K, wherein K is single The memory space of node, and leaf node, as the data structure of other nodes, only its heir pointer is sky.
It should be noted that in one embodiment of the invention, also needing the context and language form pair according to sample word The minimum linguistic unit answered amplifies sample word to generate more related terms.
For example, it is not no space in Thai, between word and word, space is only by taking the language form of Thai as an example Appear between sentence and sentence and Thai and non-Thai language between, and many words are all the conjunctions of other words in Thai And so cause to acquire a certain degree of difficulty to the understanding of Thai in this way, and user also easy tos produce mistake when input.Example Such as, the Thai language of " pen " is, it is that (Thai language is by " mouth ") and " crow " (Thai language is) composition, it is It looks after those users inputed by mistake, input word can be decoupled to least unit.So if being that " I buys there are one Thai word One pen " should include (" I has bought a mouth crow ") this entry in the related term extended out.
The creation method of the dictionary tree-model of the embodiment of the present invention, by being amplified the sample word after sequence to generate Several related terms solve the problems, such as to be unable to get in the fields Auto Complete from intermediate matched prompt result, and The feature that the tree node of a part is shared by related term and main body word, reduces memory headroom.
In order to realize above-described embodiment, the invention also provides a kind of input prompt devices, including:Acquisition module is used for Obtain input word;Module is decoupled, for being decoupled to input word to obtain N number of partition word, wherein N is positive integer;Inquire mould Block, for inquiring preset dictionary tree-model respectively according to N number of partition word to obtain N number of prompt results set respectively;And it is raw At module, for generating final prompt result according to N number of prompt results set.
Fig. 5 is the structural schematic diagram of input prompt device according to an embodiment of the invention.
As shown in figure 5, the input prompt device may include:Acquisition module 110, partition module 120, enquiry module 130 With generation module 140.
Specifically, acquisition module 110 is for obtaining input word.For example, acquisition module 110, which can obtain user, passes through browser Or the input word that search for application is filled in input frame when scanning for, such as " pc faster ".
Partition module 120 is for decoupling input word to obtain N number of partition word, wherein N is positive integer.Inquire mould Block 130 is used to inquire preset dictionary tree-model respectively according to N number of partition word to obtain N number of prompt results set respectively.
More specifically, enquiry module 130 can go to traverse preset dictionary tree-model respectively with N number of partition word, can be obtained N number of Prompt results set.It should be noted that in one embodiment of the invention, it can be by the concurrent mode of multiple threads to pre- If dictionary tree-model traversed.Wherein, in one embodiment of the invention, preset dictionary tree-model can be to create in advance It builds, concrete implementation mode can refer to subsequent embodiment.
Generation module 140 is used to generate final prompt result according to N number of prompt results set.Specifically, in this hair In bright one embodiment, generation module 140 can be specifically used for:N number of prompt results set is merged, and to merging after N number of prompt results set carry out duplicate removal to generate final prompt result.
It should be noted that in an embodiment of the present invention, it is final that the item number of result is prompted to be generally 10 (because of Auto Complete generally only selects 10 displays), first 7 articles can be prompt that forerunner matches as a result, the 8th, 9 article can be centre With obtain prompt as a result, the 10th article can be Similarity matching obtain prompt result (comprising wrongly written character match).In this way, can not only be complete At spelling automatic prompt, and it can know that the potential retrieval purpose of user and have error correction so that Auto Complete Use meaning distilled.
Optionally, in one embodiment of the invention, as shown in figure 5, the partition module 120 may include removal unit 121 and first decouple unit 122.Specifically, removal unit 121 be used for according to Fei Biliu word dictionaries remove input word in it is non-must Stay word, wherein Fei Biliu word dictionaries are related to language form, i.e., different language form has different Fei Biliu word dictionaries. First partition unit 122 is used for going unless the input word after must staying word is decoupled to obtain N number of partition word.
It should be appreciated that Fei Biliu words are to express input word theme not too important word, it will be appreciated that for input word not phase The word of pass.That is, Fei Biliu words are likely to contain in the input word that user fills in, for example, input word " downloads pc " download " in faster " must stay word with regard to right and wrong;In another example in input word " sea cucumber of how foaming " " how ".It removes as a result, Unit 121 can first pass through Fei Biliu words dictionary and be matched to Fei Biliu words, remove the Fei Biliu words being matched to, available in this way Go the input word " pc faster " unless after word must be stayed.First partition unit 122 can carry out partition expansion to " the pc faster " Exhibition, for example, detachable at " pc faster " and " faster " this 2 partition words.In this way, can get the prefix partition of input word Word and intermediate partition word.
Optionally, in one embodiment of the invention, as shown in figure 5, the partition module 120 may include the second partition Unit 123 and third decouple unit 124.Specifically, the second partition unit 123 be used for according to the context of input word to input word into Row partition is to obtain the first partition word results set.Third decouples unit 124 and is used for according to the corresponding minimum language of language form Unit pair first decouples word results set and is decoupled to obtain the second partition word results set, wherein the second partition word result Set includes N number of partition word.
For example, by taking input word " pc faster " as an example, the second partition unit 123 can by the context of input word Know that input word is English, and include two words " pc ", " faster ", first input word can be decoupled to be decoupled in this way Word results set (" pc ", " faster ").Third decouples unit 124 can be according to minimum linguistic unit to the partition word results set (" pc ", " faster ") is decoupled with acquisition ((" p ", " c "), (" f ", " a ", " s ", " t ", " e ", " r ")).Pass through as a result, Input word can be split into minimum unit by this mode, increase possibility when retrieval matching.
Optionally, in one embodiment of the invention, which can also include creation module 150.Wound Modeling block 150 can be specifically used for:Multiple sample words can first be obtained.Later, multiple sample words can be arranged according to temperature is accessed Sequence, and respectively using multiple sample words after sequence as multiple main body words.Then, it can be generated respectively according to multiple main body words each The corresponding related term of main body word.Finally, preset word can be created according to multiple main body words and the corresponding related term of each main body word Allusion quotation tree-model.
In one embodiment of the invention, creation module 150 can also be specifically used for:It first, can be according to each main body word Create main body word dictionary tree.Later, related term dictionary tree can be created according to the corresponding related term of each main body word.Finally, it can incite somebody to action Main body word dictionary tree and related term dictionary tree are synthesized to create preset dictionary tree-model.Wherein preset dictionary tree-model The specific implementation of establishment can refer to the detailed description of the above method.
The input prompt device of the embodiment of the present invention decouples to obtain N the input word of acquisition by decoupling module A partition word, enquiry module inquire preset dictionary tree-model to obtain N number of prompt result respectively respectively according to N number of partition word Set, generation module is merged to N number of prompt results set and duplicate removal is to generate final prompt as a result, passing through as a result, Partition extension is carried out to input word with the prefix partition word for obtaining input word, intermediate partition word etc., in this way by prefix decouple word, Centre partition word inquires preset dictionary tree-model, can get the matched prompt result of prefix partition word, intermediate partition word matching Prompt as a result, improve the accuracy of automatic prompt, also, the context by incorporating input word is handled, and by input word Cutting granularity is reduced to minimum, increases possibility when retrieval matching, the user experience is improved.
In order to realize above-described embodiment, the invention also provides a kind of creating devices of dictionary tree-model, including:Obtain mould Block, for obtaining multiple sample words;Sorting module for carrying out descending sort to multiple sample words according to access temperature, and is divided Other multiple sample words to after sequence are as multiple main body words;Generation module is generated for respectively according to multiple main body words each The corresponding related term of main body word;And creation module, for being created according to multiple main body words and the corresponding related term of each main body word Build dictionary tree-model.
Fig. 6 is the structural schematic diagram of the creating device of dictionary tree-model according to an embodiment of the invention.
As shown in fig. 6, the creating device of the dictionary tree-model may include:Acquisition module 210, generates sorting module 220 Module 230 and creation module 240.
Specifically, acquisition module 210 is for obtaining multiple sample words.For example, acquisition module 210 can be obtained in the regular period When interior user is scanned for by browser or search for application, the multiple input word filled in input frame, and to this A little input words carry out duplicate removal, later can be using the multiple input word after duplicate removal as multiple sample words.
Sorting module 220 be used for according to access temperature to multiple sample words carry out descending sort, and respectively to sequence after Multiple sample words are as multiple main body words.More specifically, sorting module 220 can carry out descending row according to temperature is accessed to sample word Sequence, to obtain the orderly vocabulary of descending sort.It should be appreciated that the high sample word of weight is first inserted into dictionary tree, be finally inserted It is the minimum sample word of weight, as long as the result that this dictionary tree of preorder traversal obtains in this way is according to weight descending sort Result.It should be noted that in one embodiment of the invention, descending can also be carried out to sample word according to other weights Sequence, for example, descending sort can be carried out to sample word according to the sequencing of character in language form.
It should be appreciated that in an embodiment of the present invention, main body word can be the sample word after duplicate removal.
Generation module 230 according to multiple main body words for generating the corresponding related term of each main body word respectively.More specifically, Generation module 230 can such as remove the Fei Biliu in main body word W to each main body word W processing according to Fei Biliu word dictionaries Word etc., and be directed to and go unless the main body word W after must staying word is extended, to generate several corresponding related terms of each main body word (W1, W2, W3 ...).For example, main body word " baidu pc faster " carries out amplification extension to the main body word, can extend out Related term be (" baidu pc faster ", " pc faster ", " faster ").It is appreciated that in the embodiment of the present invention In, related term can be sample word is amplified obtained from the relevant word of sample word.
Creation module 240 is used to create dictionary tree-model according to multiple main body words and the corresponding related term of each main body word.
Further, in one embodiment of the invention, as shown in fig. 6, creation module 240 may include the first establishment Unit 241, the second creating unit 242 and third creating unit 243.Specifically, the first creating unit 241 is used for according to each main Pronouns, general term for nouns, numerals and measure words creates main body word dictionary tree.Second creating unit 242 is used to create related term according to the corresponding related term of each main body word Dictionary tree.Third creating unit 243 is for synthesizing main body word dictionary tree and related term dictionary tree to create dictionary tree mould Type.
It should be noted that creation module 240 conventionally can establish word according to main body word and corresponding related term Allusion quotation tree-model, but different with conventional method is that related term comes pair when being inserted into dictionary tree-model not as common entry It waits for, but corresponds to the root node of main body word at it and establish several indexes, be directed toward position of the related term in main body word.As a result, this Sample can solve dictionary tree algorithm when handling identical suffix word, can not share the short slab of memory space, considerably increase memory sky Between utilization rate.
For example, as shown in figure 3, by taking sample word " baidu pc faster " as an example, the related term extended out is (" baidu pc faster ", " pc faster ", " faster "), these three words correspond to three traversal roads in dictionary tree-model Diameter, but the corresponding leaf node in these three paths is all (baidu pc faster).In this way no matter the sample word obtained is " ba ", " pc " or " fas " is likely to recall " baidu pc faster ".
It should be appreciated that being can be found that from Fig. 3:Since related term and main body word share the tree node of a part, so can Memory headroom needed for saving.For example, the dictionary for including N number of sample word, if each sample word is averaged, character length is L, each sample word can generate n related term, then in total needed for maximum memory space be (L+n) * N*K, wherein K is single The memory space of node, and leaf node, as the data structure of other nodes, only its heir pointer is sky.
It should be noted that in one embodiment of the invention, generation module 230 also need according to the context of sample word and The corresponding minimum linguistic unit of language form amplifies sample word to generate more related terms.
For example, it is not no space in Thai, between word and word, space is only by taking the language form of Thai as an example Appear between sentence and sentence and Thai and non-Thai language between, and many words are all the conjunctions of other words in Thai And so cause to acquire a certain degree of difficulty to the understanding of Thai in this way, and user also easy tos produce mistake when input.Example Such as, the Thai language of " pen " is, it is that (Thai language is by " mouth ") and " crow " (Thai language is) composition, it is It looks after those users inputed by mistake, input word can be decoupled to least unit.So if being that " I buys there are one Thai word One pen " should include (" I has bought a mouth crow ") this entry in the related term extended out.
The creating device of the dictionary tree-model of the embodiment of the present invention, by being amplified the sample word after sequence to generate Several related terms solve the problems, such as to be unable to get in the fields Auto Complete from intermediate matched prompt result, and The feature that the tree node of a part is shared by related term and main body word, reduces memory headroom.
It should be noted that due in searching for relevant technical field, it is main solve the problems, such as be:Retrieval, re-scheduling, Sequence, so, re-scheduling and sequence are all placed on offline to do, in this way by the present invention (such as by some word processings and Optimized Measures) Whole performance can be greatly increased, to cope with the environmental requirement of quick response and big data quantity.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discuss suitable Sequence, include according to involved function by it is basic simultaneously in the way of or in the opposite order, to execute function, this should be of the invention Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (system of such as computer based system including processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicating, propagating or passing Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.The more specific example (non-exhaustive list) of computer-readable medium includes following:Electricity with one or more wiring Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable Medium, because can be for example by carrying out optical scanner to paper or other media, then into edlin, interpretation or when necessary with it His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the present invention can be realized with hardware, software, firmware or combination thereof.Above-mentioned In embodiment, software that multiple steps or method can in memory and by suitable instruction execution system be executed with storage Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware Any one of row technology or their combination are realized:With the logic gates for realizing logic function to data-signal Discrete logic, with suitable combinational logic gate circuit application-specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..
Those skilled in the art are appreciated that realize all or part of step that above-described embodiment method carries Suddenly it is that relevant hardware can be instructed to complete by program, the program can be stored in a kind of computer-readable storage medium In matter, which includes the steps that one or a combination set of embodiment of the method when being executed.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, it can also That each unit physically exists alone, can also two or more units be integrated in a module.Above-mentioned integrated mould The form that hardware had both may be used in block is realized, can also be realized in the form of software function module.The integrated module is such as Fruit is realized in the form of software function module and when sold or used as an independent product, can also be stored in a computer In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..
In addition, term " first ", " second " are used for description purposes only, it is not understood to indicate or imply relative importance Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three It is a etc., unless otherwise specifically defined.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It can be combined in any suitable manner in one or more embodiments or example.In addition, without conflicting with each other, the skill of this field Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples It closes and combines.
Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example Property, it is not considered as limiting the invention, those skilled in the art within the scope of the invention can be to above-mentioned Embodiment is changed, changes, replacing and modification.

Claims (6)

1. a kind of input reminding method, which is characterized in that including:
Obtain input word;
The input word is decoupled to obtain N number of partition word, wherein N is positive integer, and N number of partition word is minimum language Say unit;
Preset dictionary tree-model is inquired respectively to obtain N number of prompt results set respectively according to N number of partition word;And root Final prompt result is generated according to N number of prompt results set;
Wherein, input word is decoupled and includes to obtain N number of partition word:
The input word is decoupled according to the context of the input word to obtain the first partition word results set;And
The first partition word results set is decoupled to obtain second according to the corresponding minimum linguistic unit of language form Decouple word results set, wherein the second partition word results set includes N number of partition word;
Wherein, the preset dictionary tree-model is created by following steps:
Obtain multiple sample words;
The multiple sample word is ranked up according to temperature is accessed, and respectively using multiple sample words after sequence as multiple masters Pronouns, general term for nouns, numerals and measure words;
The corresponding related term of each main body word is generated according to the multiple main body word respectively;And
The preset dictionary tree-model is created according to the multiple main body word and the corresponding related term of each main body word, In, the related term establishes index, the rope when being inserted into the preset dictionary tree-model, in the root node of the main body word It is incorporated in the position for being directed toward the related term in corresponding main body word;
Wherein, the Fei Biliu words in each main body word are removed according to Fei Biliu word dictionaries, and is directed to and goes unless the master after word must be stayed Pronouns, general term for nouns, numerals and measure words is extended, to generate the corresponding related term of each main body word;
Wherein, described to create the preset dictionary tree mould according to multiple main body words and the corresponding related term of each main body word Type includes:
Main body word dictionary tree is created according to each main body word;
Related term dictionary tree is created according to the corresponding related term of each main body word;And
The main body word dictionary tree and the related term dictionary tree are synthesized to create the preset dictionary tree-model.
2. according to the method described in claim 1, it is characterized in that, described decouple input word to obtain N number of partition word Including:
The Fei Biliu words in the input word are removed according to Fei Biliu word dictionaries;
Input word after the removal Fei Biliu words is decoupled to obtain N number of partition word, wherein the Fei Biliu words Dictionary is related to language form.
3. according to the method described in claim 1, it is characterized in that, described generate final carry according to N number of prompt results set Show that result includes:
N number of prompt results set is merged, and duplicate removal is carried out to generate to N number of prompt results set after merging State final prompt result.
4. a kind of input prompt device, which is characterized in that including:
Acquisition module, for obtaining input word;
Module is decoupled, for being decoupled to the input word to obtain N number of partition word, wherein N is positive integer, described N number of point Word-breaking is minimum linguistic unit;
Enquiry module is tied for inquiring preset dictionary tree-model respectively according to N number of partition word with obtaining N number of prompt respectively Fruit set;And
Generation module, for generating final prompt result according to N number of prompt results set;
Wherein, the partition module includes:
Second partition unit, for being decoupled to the input word according to the context of the input word to obtain the first partition word Results set;And
Third decouples unit, for according to the corresponding minimum linguistic unit of language form to the first partition word results set into Row partition is to obtain the second partition word results set, wherein the second partition word results set includes N number of partition word;
Creation module, the creation module, which has, to be used for:
Obtain multiple sample words;
The multiple sample word is ranked up according to temperature is accessed, and respectively using multiple sample words after sequence as multiple masters Pronouns, general term for nouns, numerals and measure words;
The corresponding related term of each main body word is generated according to the multiple main body word respectively;And
The preset dictionary tree-model is created according to the multiple main body word and the corresponding related term of each main body word, In, the related term establishes index, the rope when being inserted into the preset dictionary tree-model, in the root node of the main body word It is incorporated in the position for being directed toward the related term in corresponding main body word;
Wherein, the creation module has and is used for:The Fei Biliu words in each main body word, and needle are removed according to Fei Biliu word dictionaries To going unless the main body word after must staying word is extended, to generate the corresponding related term of each main body word;
Wherein, the creation module also particularly useful for:
Main body word dictionary tree is created according to each main body word;
Related term dictionary tree is created according to the corresponding related term of each main body word;And
The main body word dictionary tree and the related term dictionary tree are synthesized to create the preset dictionary tree-model.
5. device according to claim 4, which is characterized in that the partition module includes:
Removal unit, for removing the Fei Biliu words in the input word according to Fei Biliu word dictionaries, wherein the Fei Biliu words Dictionary is related to language form;And
First partition unit, for being decoupled the input word after the removal Fei Biliu words to obtain N number of partition word.
6. device according to claim 4, which is characterized in that the generation module is specifically used for:N number of prompt is tied Fruit set merges, and carries out duplicate removal to N number of prompt results set after merging to generate the final prompt result.
CN201410169141.6A 2014-04-24 2014-04-24 Input creation method, the device of reminding method, device and dictionary tree-model Active CN103914569B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410169141.6A CN103914569B (en) 2014-04-24 2014-04-24 Input creation method, the device of reminding method, device and dictionary tree-model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410169141.6A CN103914569B (en) 2014-04-24 2014-04-24 Input creation method, the device of reminding method, device and dictionary tree-model

Publications (2)

Publication Number Publication Date
CN103914569A CN103914569A (en) 2014-07-09
CN103914569B true CN103914569B (en) 2018-09-07

Family

ID=51040249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410169141.6A Active CN103914569B (en) 2014-04-24 2014-04-24 Input creation method, the device of reminding method, device and dictionary tree-model

Country Status (1)

Country Link
CN (1) CN103914569B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108241695B (en) * 2016-12-26 2021-11-02 北京国双科技有限公司 Information processing method and device
CN107967259A (en) * 2017-11-27 2018-04-27 传神语联网网络科技股份有限公司 The method and device of Thai syllable splitting
CN108304384B (en) * 2018-01-29 2021-08-27 上海名轩软件科技有限公司 Word splitting method and device
CN109933217B (en) 2019-03-12 2020-05-01 北京字节跳动网络技术有限公司 Method and device for pushing sentences
CN111400584A (en) * 2020-03-16 2020-07-10 南方科技大学 Association word recommendation method and device, computer equipment and storage medium
CN113625884A (en) * 2020-05-07 2021-11-09 顺丰科技有限公司 Input word recommendation method and device, server and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440312A (en) * 2013-08-27 2013-12-11 深圳市华傲数据技术有限公司 System and terminal for inquiring zip code for mailing address
CN103631929A (en) * 2013-12-09 2014-03-12 江苏金智教育信息技术有限公司 Intelligent prompt method, module and system for search

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8209358B2 (en) * 2007-05-09 2012-06-26 Illinois Institute Of Technology Hierarchical structured abstract data organization system
CN102084363B (en) * 2008-07-03 2014-11-12 加利福尼亚大学董事会 A method for efficiently supporting interactive, fuzzy search on structured data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440312A (en) * 2013-08-27 2013-12-11 深圳市华傲数据技术有限公司 System and terminal for inquiring zip code for mailing address
CN103631929A (en) * 2013-12-09 2014-03-12 江苏金智教育信息技术有限公司 Intelligent prompt method, module and system for search

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"思路解密:SEO搜索中文分词算法原理";上海英才网HR;《百度贴吧:http://tieba.baidu.com/p/1556295187》;20120427;第1-2页 *

Also Published As

Publication number Publication date
CN103914569A (en) 2014-07-09

Similar Documents

Publication Publication Date Title
CN103914569B (en) Input creation method, the device of reminding method, device and dictionary tree-model
CN108804641B (en) Text similarity calculation method, device, equipment and storage medium
CN102479191B (en) Method and device for providing multi-granularity word segmentation result
CN110442777B (en) BERT-based pseudo-correlation feedback model information retrieval method and system
US8463593B2 (en) Natural language hypernym weighting for word sense disambiguation
CN109739973A (en) Text snippet generation method, device, electronic equipment and storage medium
CN111104488B (en) Method, device and storage medium for integrating retrieval and similarity analysis
CN102915299A (en) Word segmentation method and device
CN102768681A (en) Recommending system and method used for search input
CN107844493B (en) File association method and system
CN108875065B (en) Indonesia news webpage recommendation method based on content
CN111159359A (en) Document retrieval method, document retrieval device and computer-readable storage medium
JP6722615B2 (en) Query clustering device, method, and program
EP3679488A1 (en) System and method for recommendation of terms, including recommendation of search terms in a search system
CN110134970B (en) Header error correction method and apparatus
CN114880447A (en) Information retrieval method, device, equipment and storage medium
WO2016015267A1 (en) Rank aggregation based on markov model
JP5718405B2 (en) Utterance selection apparatus, method and program, dialogue apparatus and method
CN109933216B (en) Word association prompting method, device and equipment for intelligent input and computer storage medium
CN111859950A (en) Method for automatically generating lecture notes
US8229970B2 (en) Efficient storage and retrieval of posting lists
CN113190692B (en) Self-adaptive retrieval method, system and device for knowledge graph
Collarana et al. A question answering system on regulatory documents
CN110704613B (en) Vocabulary database construction and query method, database system, equipment and medium
JP5869948B2 (en) Passage dividing method, apparatus, and program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant