CN103914569B - Input creation method, the device of reminding method, device and dictionary tree-model - Google Patents
Input creation method, the device of reminding method, device and dictionary tree-model Download PDFInfo
- Publication number
- CN103914569B CN103914569B CN201410169141.6A CN201410169141A CN103914569B CN 103914569 B CN103914569 B CN 103914569B CN 201410169141 A CN201410169141 A CN 201410169141A CN 103914569 B CN103914569 B CN 103914569B
- Authority
- CN
- China
- Prior art keywords
- word
- main body
- partition
- input
- dictionary tree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of input reminding method, this method includes:Obtain input word;Input word is decoupled to obtain N number of partition word, wherein N is positive integer;Preset dictionary tree-model is inquired respectively to obtain N number of prompt results set respectively according to N number of partition word;And final prompt result is generated according to N number of prompt results set.The input reminding method of the embodiment of the present invention decouples word, intermediate partition word etc. by carrying out partition extension to input word to obtain the prefix of input word, and word, the preset dictionary tree-model of intermediate partition word inquiry are decoupled by prefix, it obtains and decouples the matched prompt result of word with prefix partition word, centre respectively, improve the accuracy of prompt, and by incorporating the context processing of input word and the cutting granularity of input word being reduced to minimum, possibility when retrieval matching is increased, the user experience is improved.The invention also discloses a kind of input prompt device and a kind of creation methods and device of dictionary tree-model.
Description
Technical field
The present invention relates to search technique field more particularly to a kind of input reminding methods and device and dictionary tree-model
Creation method and device.
Background technology
Currently, in search field at home, such as Webpage search and hanging down is searched, if search term input prompt dictionary is ten
It more than ten thousand ranks, being substantially all and is to rely on Trie trees (a kind of searching algorithm based on memory), the process of index is to contribute,
And retrieving is to traversal of tree.However be specifically how to contribute, how to traverse, different designers can be according to data
It is specific, formulate different methods.
Currently, the domestic Auto Complete algorithms based on Chinese, be essentially all first each chinese character by word according to
It is secondary to be inserted into each node from root to leaf of Trie trees, Chinese character is then turned into phonetic and is contribute again, when search be exactly from
Root node traverses successively to leaf node.This method can solve the problems, such as most input search term prompt.
But if search term input by user is not the prefix of any word in dictionary, this dependence Trie trees
Method is with regard to infeasible;The entry being not present has been prompted in other words, and " exceedingly high empire it " recalls for example, as shown in Figure 1, search term
" Di Ren of exceedingly high empire is outstanding ", and correct and existing entry is " the exceedingly high empire of Di Ren outstanding person ", and be in this way it is a kind of not sternly
Careful way, to reduce the accuracy of search term prompt, poor user experience.
Invention content
The present invention is directed to solve above-mentioned one of technological deficiency at least to a certain extent.
For this purpose, first purpose of the present invention is to propose a kind of input reminding method.This method by input word into
Row partition extension, and preset dictionary tree-model is inquired to be prompted as a result, improving prompt according to input word after partition
Accuracy, and by incorporating the context processing of input word and the cutting granularity of input word being reduced to minimum, increase retrieval
Possibility when matching, the user experience is improved.
Second object of the present invention is to propose a kind of creation method of dictionary tree-model.
Third object of the present invention is to propose a kind of input prompt device.
Fourth object of the present invention is to propose a kind of creating device of dictionary tree-model.
To achieve the goals above, the input reminding method of first aspect present invention embodiment, including:Obtain input word;
The input word is decoupled to obtain N number of partition word, wherein N is positive integer;It is inquired respectively according to N number of partition word
Preset dictionary tree-model to obtain N number of prompt results set respectively;And it is generated according to N number of prompt results set final
Prompt result.
The input reminding method of the embodiment of the present invention can decouple the input word of acquisition to obtain N number of partition word, and
According to N number of partition word preset dictionary tree-model is inquired respectively to obtain N number of prompt results set respectively, it later can be to the N
A prompt results set merge and duplicate removal to generate final prompt as a result, by carrying out partition extension to input word, can
The prefix partition word of input word, intermediate partition word etc. are obtained, word is decoupled by prefix in this way, intermediate partition word inquires preset word
Allusion quotation tree-model can get the matched prompt result of prefix partition word, the intermediate matched prompt of partition word carries automatically as a result, improving
The accuracy shown, also, the context by incorporating input word is handled, and the cutting granularity of input word is reduced to minimum, it increases
Possibility when retrieval matching, the user experience is improved.
To achieve the goals above, the creation method of the dictionary tree-model of second aspect of the present invention embodiment, including:It obtains
Multiple sample words;The multiple sample word is ranked up according to temperature is accessed, and multiple sample words after sequence are made respectively
For multiple main body words;The corresponding related term of each main body word is generated according to the multiple main body word respectively;And according to described more
A main body word and the corresponding related term of each main body word create dictionary tree-model.
The creation method of the dictionary tree-model of the embodiment of the present invention, by being amplified the sample word after sequence to generate
Several related terms solve the problems, such as to be unable to get in the fields Auto Complete from intermediate matched prompt result, and
The feature that the tree node of a part is shared by related term and main body word, reduces memory headroom.
To achieve the goals above, the input prompt device of third aspect present invention embodiment, including:Acquisition module is used
In acquisition input word;Module is decoupled, for being decoupled to the input word to obtain N number of partition word, wherein N is positive integer;
Enquiry module, for inquiring preset dictionary tree-model respectively according to N number of partition word to obtain N number of prompt result set respectively
It closes;And generation module, for generating final prompt result according to N number of prompt results set.
The input prompt device of the embodiment of the present invention decouples to obtain N the input word of acquisition by decoupling module
A partition word, enquiry module inquire preset dictionary tree-model to obtain N number of prompt result respectively respectively according to N number of partition word
Set, generation module is merged to N number of prompt results set and duplicate removal is to generate final prompt as a result, passing through as a result,
Partition extension is carried out to input word with the prefix partition word for obtaining input word, intermediate partition word etc., in this way by prefix decouple word,
Centre partition word inquires preset dictionary tree-model, can get the matched prompt result of prefix partition word, intermediate partition word matching
Prompt as a result, improve the accuracy of automatic prompt, also, the context by incorporating input word is handled, and by input word
Cutting granularity is reduced to minimum, increases possibility when retrieval matching, the user experience is improved.
To achieve the goals above, the creating device of the dictionary tree-model of fourth aspect present invention embodiment, including:It obtains
Module, for obtaining multiple sample words;Sorting module, for carrying out descending row to the multiple sample word according to access temperature
Sequence, and respectively to multiple sample words after sequence as multiple main body words;Generation module, for respectively according to the multiple main body
Word generates the corresponding related term of each main body word;And creation module, for according to the multiple main body word and each master
The corresponding related term of pronouns, general term for nouns, numerals and measure words creates dictionary tree-model.
The creating device of the dictionary tree-model of the embodiment of the present invention, by being amplified the sample word after sequence to generate
Several related terms solve the problems, such as to be unable to get in the fields Auto Complete from intermediate matched prompt result, and
The feature that the tree node of a part is shared by related term and main body word, reduces memory headroom.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partly become from the following description
Obviously, or practice through the invention is recognized.
Description of the drawings
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments
Obviously and it is readily appreciated that, wherein
Fig. 1 is the schematic diagram of input reminding method in the prior art;
Fig. 2 is the flow chart of input reminding method according to an embodiment of the invention;
Fig. 3 is the schematic diagram of dictionary tree-model according to an embodiment of the invention;
Fig. 4 is the flow chart of the creation method of dictionary tree-model according to an embodiment of the invention;
Fig. 5 is the structural schematic diagram of input prompt device according to an embodiment of the invention;
Fig. 6 is the structural schematic diagram of the creating device of dictionary tree-model according to an embodiment of the invention.
Specific implementation mode
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end
Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached
The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and is not considered as limiting the invention.
Below with reference to the accompanying drawings input reminding method and device and dictionary tree-model according to the ... of the embodiment of the present invention are described
Creation method and device.
Currently, there are many kinds of the methods of realization Auto Complete, such as Markov model and some machine learning sides
Method can realize, but its determination is to need a large amount of training, and performance differs and surely reaches requirement.Industry is relatively generally acknowledged
Preferable scheme be Trie trees, and Trie trees technical field and diversified, and how being solved with space as few as possible
Comprehensive, accurate and compatible error correction etc. is recalled to be a problem to be solved.
For this purpose, the present invention proposes a kind of input reminding method, including:Obtain input word;To input word decoupled with
Obtain N number of partition word, wherein N is positive integer;Preset dictionary tree-model is inquired respectively to obtain N respectively according to N number of partition word
A prompt results set;And final prompt result is generated according to N number of prompt results set.
Fig. 2 is the flow chart of input reminding method according to an embodiment of the invention.
As shown in Fig. 2, the input reminding method may include:
S201 obtains input word.
For example, can obtain filled in input frame when user is scanned for by browser or search for application it is defeated
Enter word, such as " pc faster ".
S202 decouples input word to obtain N number of partition word, wherein N is positive integer.
Specifically, in one embodiment of the invention, can first be removed according to Fei Biliu word dictionaries non-in input word
Word must be stayed.It later, can be to going unless the input word after must staying word is decoupled to obtain N number of partition word.Wherein, the present invention's
In embodiment, Fei Biliu word dictionaries are related to language form, i.e., different language form has different Fei Biliu word dictionaries, language
Speech type may include such as Chinese, English, French, Thai.
It should be appreciated that Fei Biliu words are to express input word theme not too important word, it will be appreciated that for input word not phase
The word of pass.That is, Fei Biliu words are likely to contain in the input word that user fills in, for example, input word " downloads pc
" download " in faster " must stay word with regard to right and wrong;In another example in input word " sea cucumber of how foaming " " how ".It as a result, can be first
Fei Biliu words are matched by Fei Biliu words dictionary, remove the Fei Biliu words being matched to, can be gone in this way unless must stay
Input word " pc faster " after word.Later, partition extension can be carried out to " the pc faster ", for example, detachable at " pc
Faster " and " faster " this 2 partition words.In this way, can get the prefix partition word of input word and intermediate partition word.
In another embodiment of the present invention, first input word can be decoupled according to the context of input word to obtain
One partition word results set.Later, can be decoupled according to the corresponding minimum linguistic unit pair first of language form word results set into
Row partition is to obtain the second partition word results set, wherein the second partition word results set includes N number of partition word.
For example, can be English by input word known to the context of input word by taking input word " pc faster " as an example,
And include two words " pc ", " faster ", first input word can be decoupled decouple word results set to obtain in this way
(“pc”、“faster”).Later, the partition word results set (" pc ", " faster ") can be carried out according to minimum linguistic unit
Partition is to obtain ((" p ", " c "), (" f ", " a ", " s ", " t ", " e ", " r ")).It as a result, in this way can be by input word point
Minimum unit is splitted into, possibility when retrieval matching is increased.
S203 inquires preset dictionary tree-model to obtain N number of prompt results set respectively respectively according to N number of partition word.
Specifically, it can go to traverse preset dictionary tree-model respectively with N number of partition word, N number of prompt result set can be obtained
It closes.It should be noted that in one embodiment of the invention, it can be by the concurrent mode of multiple threads to preset dictionary tree
Model is traversed.Wherein, in one embodiment of the invention, preset dictionary tree-model can be pre-created, specifically
The realization method of establishment can refer to the detailed description of subsequent embodiment.
S204 generates final prompt result according to N number of prompt results set.
Specifically, first N number of prompt results set can be merged, it later can be to N number of prompt result set after merging
It closes and carries out duplicate removal to generate final prompt result.
It should be noted that in an embodiment of the present invention, it is final that the item number of result is prompted to be generally 10 (because of Auto
Complete generally only selects 10 displays), first 7 articles can be prompt that forerunner matches as a result, the 8th, 9 article can be centre
With obtain prompt as a result, the 10th article can be Similarity matching obtain prompt result (comprising wrongly written character match).In this way, can not only be complete
At spelling automatic prompt, and it can know that the potential retrieval purpose of user and have error correction so that Auto Complete
Use meaning distilled.
The input reminding method of the embodiment of the present invention can decouple the input word of acquisition to obtain N number of partition word, and
According to N number of partition word preset dictionary tree-model is inquired respectively to obtain N number of prompt results set respectively, it later can be to the N
A prompt results set merge and duplicate removal to generate final prompt as a result, by carrying out partition extension to input word, can
The prefix partition word of input word, intermediate partition word etc. are obtained, word is decoupled by prefix in this way, intermediate partition word inquires preset word
Allusion quotation tree-model can get the matched prompt result of prefix partition word, the intermediate matched prompt of partition word carries automatically as a result, improving
The accuracy shown, also, the context by incorporating input word is handled, and the cutting granularity of input word is reduced to minimum, it increases
Possibility when retrieval matching, the user experience is improved.
It should be noted that in one embodiment of the invention, preset dictionary tree-model can be created by following steps
It builds:
S101 ' obtains multiple sample words.
For example, when can obtain that user is scanned for by browser or search for application over a period to come, defeated
Enter the multiple input word filled in frame, and duplicate removal carried out to these input words, later can using the multiple input word after duplicate removal as
Multiple sample words.
S102 ', according to access temperature multiple sample words are ranked up, and respectively using multiple sample words after sequence as
Multiple main body words.
Specifically, descending sort can be carried out to sample word according to temperature is accessed, to obtain the orderly vocabulary of descending sort.It answers
Work as understanding, the high sample word of weight is first inserted into dictionary tree, and it is the minimum sample word of weight to be finally inserted, as long as so first
It is the result according to weight descending sort that sequence, which traverses the result that this dictionary tree obtains,.It should be noted that the present invention's
In one embodiment, descending sort can also be carried out to sample word according to other weights, for example, can be according to character in language form
Sequencing to sample word carry out descending sort.
It should be appreciated that in an embodiment of the present invention, main body word can be the sample word after duplicate removal.
S103 ' generates the corresponding related term of each main body word according to multiple main body words respectively.
Specifically, each main body word W processing can such as be removed non-in main body word W according to Fei Biliu word dictionaries
Word etc. must be stayed, and is directed to and goes unless the main body word W after must staying word is extended, to generate several corresponding phases of each main body word
Close word (W1, W2, W3 ...).For example, main body word " baidu pc faster " carries out amplification extension to the main body word, can amplify
Related term out is (" baidu pc faster ", " pc faster ", " faster ").It is appreciated that in the reality of the present invention
Apply in example, related term can be sample word is amplified obtained from the relevant word of sample word.
S104 ' creates preset dictionary tree-model according to multiple main body words and the corresponding related term of each main body word.
Specifically, can main body word dictionary tree first be created according to each main body word.It can be corresponded to later according to each main body word
Related term create related term dictionary tree.Finally, main body word dictionary tree and related term dictionary tree can be merged pre- to create
If dictionary tree-model.
It should be noted that can preset dictionary tree mould conventionally be established according to main body word and corresponding related term
Type, but different with conventional method is that related term comes pair when being inserted into preset dictionary tree-model not as common entry
It waits for, but corresponds to the root node of main body word at it and establish several indexes, be directed toward position of the related term in main body word.As a result, this
Sample can solve dictionary tree algorithm when handling identical suffix word, can not share the short slab of memory space, considerably increase memory sky
Between utilization rate.
For example, as shown in figure 3, by taking sample word " baidu pc faster " as an example, the related term extended out is
(" baidu pc faster ", " pc faster ", " faster "), these three words correspond to three in preset dictionary tree-model
Traverse path, but the corresponding leaf node in these three paths is all (baidu pc faster).Sample in this way regardless of obtaining
Word is that " ba ", " pc " or " fas " is likely to recall " baidu pc faster ".
It should be appreciated that being can be found that from Fig. 3:Since related term and main body word share the tree node of a part, so can
Memory headroom needed for saving.For example, the dictionary for including N number of sample word, if each sample word is averaged, character length is
L, each sample word can generate n related term, then in total needed for maximum memory space be (L+n) * N*K, wherein K is single
The memory space of node, and leaf node, as the data structure of other nodes, only its heir pointer is sky.
It should be noted that in one embodiment of the invention, also needing the context and language form pair according to sample word
The minimum linguistic unit answered amplifies sample word to generate more related terms.
For example, it is not no space in Thai, between word and word, space is only by taking the language form of Thai as an example
Appear between sentence and sentence and Thai and non-Thai language between, and many words are all the conjunctions of other words in Thai
And so cause to acquire a certain degree of difficulty to the understanding of Thai in this way, and user also easy tos produce mistake when input.Example
Such as, the Thai language of " pen " is, it is that (Thai language is by " mouth ") and " crow " (Thai language is) composition, it is
It looks after those users inputed by mistake, input word can be decoupled to least unit.So if being that " I buys there are one Thai word
One pen " should include (" I has bought a mouth crow ") this entry in the related term extended out.
As a result, by being amplified the sample word after sequence to generate several related terms, solves Auto
Be unable to get in the fields Complete from intermediate matched prompt result the problem of, and pass through related term and main body word shared one
The feature of partial tree node, reduces memory headroom.
Dictionary tree-model according to input word and search prompt result during play the role of it is very important, once work as word
After allusion quotation tree-model is created, the prompt results set of input word can be obtained by queries dictionary tree-model, and according to prompt
Results set generates final prompt result.
Therefore, in order to realize above-described embodiment, the invention also provides a kind of creation methods of dictionary tree-model, including:
Obtain multiple sample words;Multiple sample words are ranked up according to temperature is accessed, and multiple sample words after sequence are made respectively
For multiple main body words;Respectively the corresponding related term of each main body word is generated according to multiple main body words;And according to multiple main body words
Related term corresponding with each main body word creates dictionary tree-model.
Fig. 4 is the flow chart of the creation method of dictionary tree-model according to an embodiment of the invention.
As shown in figure 4, the creation method of the dictionary tree-model may include:
S401 obtains multiple sample words.
For example, when can obtain that user is scanned for by browser or search for application over a period to come, defeated
Enter the multiple input word filled in frame, and duplicate removal carried out to these input words, later can using the multiple input word after duplicate removal as
Multiple sample words.
S402 is ranked up multiple sample words according to temperature is accessed, and respectively to multiple sample word conducts after sequence
Multiple main body words.
Specifically, descending sort can be carried out to sample word according to temperature is accessed, to obtain the orderly vocabulary of descending sort.It answers
Work as understanding, the high sample word of weight is first inserted into dictionary tree, and it is the minimum sample word of weight to be finally inserted, as long as so first
It is the result according to weight descending sort that sequence, which traverses the result that this dictionary tree obtains,.It should be noted that the present invention's
In one embodiment, descending sort can also be carried out to sample word according to other weights, for example, can be according to character in language form
Sequencing to sample word carry out descending sort.
It should be appreciated that in an embodiment of the present invention, main body word can be the sample word after duplicate removal.
S403 generates the corresponding related term of each main body word according to multiple main body words respectively.
Specifically, each main body word W processing can such as be removed non-in main body word W according to Fei Biliu word dictionaries
Word etc. must be stayed, and is directed to and goes unless the main body word W after must staying word is extended, to generate several corresponding phases of each main body word
Close word (W1, W2, W3 ...).For example, main body word " baidu pc faster " carries out amplification extension to the main body word, can amplify
Related term out is (" baidu pc faster ", " pc faster ", " faster ").It is appreciated that in the reality of the present invention
Apply in example, related term can be sample word is amplified obtained from the relevant word of sample word.
S404 creates dictionary tree-model according to multiple main body words and the corresponding related term of each main body word.
Specifically, can main body word dictionary tree first be created according to each main body word, can be corresponded to later according to each main body word
Related term create related term dictionary tree.Finally, the main body word dictionary tree and related term dictionary tree can be merged to create
Dictionary tree-model.
It should be noted that can dictionary tree-model conventionally be established according to main body word and corresponding related term, but
Different with conventional method to be, related term is treated when being inserted into dictionary tree-model not as common entry, but right at its
It answers the root node of main body word to establish several indexes, is directed toward position of the related term in main body word.It can solve dictionary tree so as a result,
Algorithm can not share the short slab of memory space, considerably increase the utilization rate of memory headroom when handling identical suffix word.
For example, as shown in figure 3, by taking sample word " baidu pc faster " as an example, the related term extended out is
(" baidu pc faster ", " pc faster ", " faster "), these three words correspond to three traversal roads in dictionary tree-model
Diameter, but the corresponding leaf node in these three paths is all (baidu pc faster).In this way no matter the sample word obtained is
" ba ", " pc " or " fas " is likely to recall " baidu pc faster ".
It should be appreciated that being can be found that from Fig. 3:Since related term and main body word share the tree node of a part, so can
Memory headroom needed for saving.For example, the dictionary for including N number of sample word, if each sample word is averaged, character length is
L, each sample word can generate n related term, then in total needed for maximum memory space be (L+n) * N*K, wherein K is single
The memory space of node, and leaf node, as the data structure of other nodes, only its heir pointer is sky.
It should be noted that in one embodiment of the invention, also needing the context and language form pair according to sample word
The minimum linguistic unit answered amplifies sample word to generate more related terms.
For example, it is not no space in Thai, between word and word, space is only by taking the language form of Thai as an example
Appear between sentence and sentence and Thai and non-Thai language between, and many words are all the conjunctions of other words in Thai
And so cause to acquire a certain degree of difficulty to the understanding of Thai in this way, and user also easy tos produce mistake when input.Example
Such as, the Thai language of " pen " is, it is that (Thai language is by " mouth ") and " crow " (Thai language is) composition, it is
It looks after those users inputed by mistake, input word can be decoupled to least unit.So if being that " I buys there are one Thai word
One pen " should include (" I has bought a mouth crow ") this entry in the related term extended out.
The creation method of the dictionary tree-model of the embodiment of the present invention, by being amplified the sample word after sequence to generate
Several related terms solve the problems, such as to be unable to get in the fields Auto Complete from intermediate matched prompt result, and
The feature that the tree node of a part is shared by related term and main body word, reduces memory headroom.
In order to realize above-described embodiment, the invention also provides a kind of input prompt devices, including:Acquisition module is used for
Obtain input word;Module is decoupled, for being decoupled to input word to obtain N number of partition word, wherein N is positive integer;Inquire mould
Block, for inquiring preset dictionary tree-model respectively according to N number of partition word to obtain N number of prompt results set respectively;And it is raw
At module, for generating final prompt result according to N number of prompt results set.
Fig. 5 is the structural schematic diagram of input prompt device according to an embodiment of the invention.
As shown in figure 5, the input prompt device may include:Acquisition module 110, partition module 120, enquiry module 130
With generation module 140.
Specifically, acquisition module 110 is for obtaining input word.For example, acquisition module 110, which can obtain user, passes through browser
Or the input word that search for application is filled in input frame when scanning for, such as " pc faster ".
Partition module 120 is for decoupling input word to obtain N number of partition word, wherein N is positive integer.Inquire mould
Block 130 is used to inquire preset dictionary tree-model respectively according to N number of partition word to obtain N number of prompt results set respectively.
More specifically, enquiry module 130 can go to traverse preset dictionary tree-model respectively with N number of partition word, can be obtained N number of
Prompt results set.It should be noted that in one embodiment of the invention, it can be by the concurrent mode of multiple threads to pre-
If dictionary tree-model traversed.Wherein, in one embodiment of the invention, preset dictionary tree-model can be to create in advance
It builds, concrete implementation mode can refer to subsequent embodiment.
Generation module 140 is used to generate final prompt result according to N number of prompt results set.Specifically, in this hair
In bright one embodiment, generation module 140 can be specifically used for:N number of prompt results set is merged, and to merging after
N number of prompt results set carry out duplicate removal to generate final prompt result.
It should be noted that in an embodiment of the present invention, it is final that the item number of result is prompted to be generally 10 (because of Auto
Complete generally only selects 10 displays), first 7 articles can be prompt that forerunner matches as a result, the 8th, 9 article can be centre
With obtain prompt as a result, the 10th article can be Similarity matching obtain prompt result (comprising wrongly written character match).In this way, can not only be complete
At spelling automatic prompt, and it can know that the potential retrieval purpose of user and have error correction so that Auto Complete
Use meaning distilled.
Optionally, in one embodiment of the invention, as shown in figure 5, the partition module 120 may include removal unit
121 and first decouple unit 122.Specifically, removal unit 121 be used for according to Fei Biliu word dictionaries remove input word in it is non-must
Stay word, wherein Fei Biliu word dictionaries are related to language form, i.e., different language form has different Fei Biliu word dictionaries.
First partition unit 122 is used for going unless the input word after must staying word is decoupled to obtain N number of partition word.
It should be appreciated that Fei Biliu words are to express input word theme not too important word, it will be appreciated that for input word not phase
The word of pass.That is, Fei Biliu words are likely to contain in the input word that user fills in, for example, input word " downloads pc
" download " in faster " must stay word with regard to right and wrong;In another example in input word " sea cucumber of how foaming " " how ".It removes as a result,
Unit 121 can first pass through Fei Biliu words dictionary and be matched to Fei Biliu words, remove the Fei Biliu words being matched to, available in this way
Go the input word " pc faster " unless after word must be stayed.First partition unit 122 can carry out partition expansion to " the pc faster "
Exhibition, for example, detachable at " pc faster " and " faster " this 2 partition words.In this way, can get the prefix partition of input word
Word and intermediate partition word.
Optionally, in one embodiment of the invention, as shown in figure 5, the partition module 120 may include the second partition
Unit 123 and third decouple unit 124.Specifically, the second partition unit 123 be used for according to the context of input word to input word into
Row partition is to obtain the first partition word results set.Third decouples unit 124 and is used for according to the corresponding minimum language of language form
Unit pair first decouples word results set and is decoupled to obtain the second partition word results set, wherein the second partition word result
Set includes N number of partition word.
For example, by taking input word " pc faster " as an example, the second partition unit 123 can by the context of input word
Know that input word is English, and include two words " pc ", " faster ", first input word can be decoupled to be decoupled in this way
Word results set (" pc ", " faster ").Third decouples unit 124 can be according to minimum linguistic unit to the partition word results set
(" pc ", " faster ") is decoupled with acquisition ((" p ", " c "), (" f ", " a ", " s ", " t ", " e ", " r ")).Pass through as a result,
Input word can be split into minimum unit by this mode, increase possibility when retrieval matching.
Optionally, in one embodiment of the invention, which can also include creation module 150.Wound
Modeling block 150 can be specifically used for:Multiple sample words can first be obtained.Later, multiple sample words can be arranged according to temperature is accessed
Sequence, and respectively using multiple sample words after sequence as multiple main body words.Then, it can be generated respectively according to multiple main body words each
The corresponding related term of main body word.Finally, preset word can be created according to multiple main body words and the corresponding related term of each main body word
Allusion quotation tree-model.
In one embodiment of the invention, creation module 150 can also be specifically used for:It first, can be according to each main body word
Create main body word dictionary tree.Later, related term dictionary tree can be created according to the corresponding related term of each main body word.Finally, it can incite somebody to action
Main body word dictionary tree and related term dictionary tree are synthesized to create preset dictionary tree-model.Wherein preset dictionary tree-model
The specific implementation of establishment can refer to the detailed description of the above method.
The input prompt device of the embodiment of the present invention decouples to obtain N the input word of acquisition by decoupling module
A partition word, enquiry module inquire preset dictionary tree-model to obtain N number of prompt result respectively respectively according to N number of partition word
Set, generation module is merged to N number of prompt results set and duplicate removal is to generate final prompt as a result, passing through as a result,
Partition extension is carried out to input word with the prefix partition word for obtaining input word, intermediate partition word etc., in this way by prefix decouple word,
Centre partition word inquires preset dictionary tree-model, can get the matched prompt result of prefix partition word, intermediate partition word matching
Prompt as a result, improve the accuracy of automatic prompt, also, the context by incorporating input word is handled, and by input word
Cutting granularity is reduced to minimum, increases possibility when retrieval matching, the user experience is improved.
In order to realize above-described embodiment, the invention also provides a kind of creating devices of dictionary tree-model, including:Obtain mould
Block, for obtaining multiple sample words;Sorting module for carrying out descending sort to multiple sample words according to access temperature, and is divided
Other multiple sample words to after sequence are as multiple main body words;Generation module is generated for respectively according to multiple main body words each
The corresponding related term of main body word;And creation module, for being created according to multiple main body words and the corresponding related term of each main body word
Build dictionary tree-model.
Fig. 6 is the structural schematic diagram of the creating device of dictionary tree-model according to an embodiment of the invention.
As shown in fig. 6, the creating device of the dictionary tree-model may include:Acquisition module 210, generates sorting module 220
Module 230 and creation module 240.
Specifically, acquisition module 210 is for obtaining multiple sample words.For example, acquisition module 210 can be obtained in the regular period
When interior user is scanned for by browser or search for application, the multiple input word filled in input frame, and to this
A little input words carry out duplicate removal, later can be using the multiple input word after duplicate removal as multiple sample words.
Sorting module 220 be used for according to access temperature to multiple sample words carry out descending sort, and respectively to sequence after
Multiple sample words are as multiple main body words.More specifically, sorting module 220 can carry out descending row according to temperature is accessed to sample word
Sequence, to obtain the orderly vocabulary of descending sort.It should be appreciated that the high sample word of weight is first inserted into dictionary tree, be finally inserted
It is the minimum sample word of weight, as long as the result that this dictionary tree of preorder traversal obtains in this way is according to weight descending sort
Result.It should be noted that in one embodiment of the invention, descending can also be carried out to sample word according to other weights
Sequence, for example, descending sort can be carried out to sample word according to the sequencing of character in language form.
It should be appreciated that in an embodiment of the present invention, main body word can be the sample word after duplicate removal.
Generation module 230 according to multiple main body words for generating the corresponding related term of each main body word respectively.More specifically,
Generation module 230 can such as remove the Fei Biliu in main body word W to each main body word W processing according to Fei Biliu word dictionaries
Word etc., and be directed to and go unless the main body word W after must staying word is extended, to generate several corresponding related terms of each main body word
(W1, W2, W3 ...).For example, main body word " baidu pc faster " carries out amplification extension to the main body word, can extend out
Related term be (" baidu pc faster ", " pc faster ", " faster ").It is appreciated that in the embodiment of the present invention
In, related term can be sample word is amplified obtained from the relevant word of sample word.
Creation module 240 is used to create dictionary tree-model according to multiple main body words and the corresponding related term of each main body word.
Further, in one embodiment of the invention, as shown in fig. 6, creation module 240 may include the first establishment
Unit 241, the second creating unit 242 and third creating unit 243.Specifically, the first creating unit 241 is used for according to each main
Pronouns, general term for nouns, numerals and measure words creates main body word dictionary tree.Second creating unit 242 is used to create related term according to the corresponding related term of each main body word
Dictionary tree.Third creating unit 243 is for synthesizing main body word dictionary tree and related term dictionary tree to create dictionary tree mould
Type.
It should be noted that creation module 240 conventionally can establish word according to main body word and corresponding related term
Allusion quotation tree-model, but different with conventional method is that related term comes pair when being inserted into dictionary tree-model not as common entry
It waits for, but corresponds to the root node of main body word at it and establish several indexes, be directed toward position of the related term in main body word.As a result, this
Sample can solve dictionary tree algorithm when handling identical suffix word, can not share the short slab of memory space, considerably increase memory sky
Between utilization rate.
For example, as shown in figure 3, by taking sample word " baidu pc faster " as an example, the related term extended out is
(" baidu pc faster ", " pc faster ", " faster "), these three words correspond to three traversal roads in dictionary tree-model
Diameter, but the corresponding leaf node in these three paths is all (baidu pc faster).In this way no matter the sample word obtained is
" ba ", " pc " or " fas " is likely to recall " baidu pc faster ".
It should be appreciated that being can be found that from Fig. 3:Since related term and main body word share the tree node of a part, so can
Memory headroom needed for saving.For example, the dictionary for including N number of sample word, if each sample word is averaged, character length is
L, each sample word can generate n related term, then in total needed for maximum memory space be (L+n) * N*K, wherein K is single
The memory space of node, and leaf node, as the data structure of other nodes, only its heir pointer is sky.
It should be noted that in one embodiment of the invention, generation module 230 also need according to the context of sample word and
The corresponding minimum linguistic unit of language form amplifies sample word to generate more related terms.
For example, it is not no space in Thai, between word and word, space is only by taking the language form of Thai as an example
Appear between sentence and sentence and Thai and non-Thai language between, and many words are all the conjunctions of other words in Thai
And so cause to acquire a certain degree of difficulty to the understanding of Thai in this way, and user also easy tos produce mistake when input.Example
Such as, the Thai language of " pen " is, it is that (Thai language is by " mouth ") and " crow " (Thai language is) composition, it is
It looks after those users inputed by mistake, input word can be decoupled to least unit.So if being that " I buys there are one Thai word
One pen " should include (" I has bought a mouth crow ") this entry in the related term extended out.
The creating device of the dictionary tree-model of the embodiment of the present invention, by being amplified the sample word after sequence to generate
Several related terms solve the problems, such as to be unable to get in the fields Auto Complete from intermediate matched prompt result, and
The feature that the tree node of a part is shared by related term and main body word, reduces memory headroom.
It should be noted that due in searching for relevant technical field, it is main solve the problems, such as be:Retrieval, re-scheduling,
Sequence, so, re-scheduling and sequence are all placed on offline to do, in this way by the present invention (such as by some word processings and Optimized Measures)
Whole performance can be greatly increased, to cope with the environmental requirement of quick response and big data quantity.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes
It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion
Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discuss suitable
Sequence, include according to involved function by it is basic simultaneously in the way of or in the opposite order, to execute function, this should be of the invention
Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use
In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for
Instruction execution system, device or equipment (system of such as computer based system including processor or other can be held from instruction
The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set
It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicating, propagating or passing
Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment
It sets.The more specific example (non-exhaustive list) of computer-readable medium includes following:Electricity with one or more wiring
Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory
(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits
Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable
Medium, because can be for example by carrying out optical scanner to paper or other media, then into edlin, interpretation or when necessary with it
His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the present invention can be realized with hardware, software, firmware or combination thereof.Above-mentioned
In embodiment, software that multiple steps or method can in memory and by suitable instruction execution system be executed with storage
Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware
Any one of row technology or their combination are realized:With the logic gates for realizing logic function to data-signal
Discrete logic, with suitable combinational logic gate circuit application-specific integrated circuit, programmable gate array (PGA), scene
Programmable gate array (FPGA) etc..
Those skilled in the art are appreciated that realize all or part of step that above-described embodiment method carries
Suddenly it is that relevant hardware can be instructed to complete by program, the program can be stored in a kind of computer-readable storage medium
In matter, which includes the steps that one or a combination set of embodiment of the method when being executed.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, it can also
That each unit physically exists alone, can also two or more units be integrated in a module.Above-mentioned integrated mould
The form that hardware had both may be used in block is realized, can also be realized in the form of software function module.The integrated module is such as
Fruit is realized in the form of software function module and when sold or used as an independent product, can also be stored in a computer
In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..
In addition, term " first ", " second " are used for description purposes only, it is not understood to indicate or imply relative importance
Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or
Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three
It is a etc., unless otherwise specifically defined.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not
It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office
It can be combined in any suitable manner in one or more embodiments or example.In addition, without conflicting with each other, the skill of this field
Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples
It closes and combines.
Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example
Property, it is not considered as limiting the invention, those skilled in the art within the scope of the invention can be to above-mentioned
Embodiment is changed, changes, replacing and modification.
Claims (6)
1. a kind of input reminding method, which is characterized in that including:
Obtain input word;
The input word is decoupled to obtain N number of partition word, wherein N is positive integer, and N number of partition word is minimum language
Say unit;
Preset dictionary tree-model is inquired respectively to obtain N number of prompt results set respectively according to N number of partition word;And root
Final prompt result is generated according to N number of prompt results set;
Wherein, input word is decoupled and includes to obtain N number of partition word:
The input word is decoupled according to the context of the input word to obtain the first partition word results set;And
The first partition word results set is decoupled to obtain second according to the corresponding minimum linguistic unit of language form
Decouple word results set, wherein the second partition word results set includes N number of partition word;
Wherein, the preset dictionary tree-model is created by following steps:
Obtain multiple sample words;
The multiple sample word is ranked up according to temperature is accessed, and respectively using multiple sample words after sequence as multiple masters
Pronouns, general term for nouns, numerals and measure words;
The corresponding related term of each main body word is generated according to the multiple main body word respectively;And
The preset dictionary tree-model is created according to the multiple main body word and the corresponding related term of each main body word,
In, the related term establishes index, the rope when being inserted into the preset dictionary tree-model, in the root node of the main body word
It is incorporated in the position for being directed toward the related term in corresponding main body word;
Wherein, the Fei Biliu words in each main body word are removed according to Fei Biliu word dictionaries, and is directed to and goes unless the master after word must be stayed
Pronouns, general term for nouns, numerals and measure words is extended, to generate the corresponding related term of each main body word;
Wherein, described to create the preset dictionary tree mould according to multiple main body words and the corresponding related term of each main body word
Type includes:
Main body word dictionary tree is created according to each main body word;
Related term dictionary tree is created according to the corresponding related term of each main body word;And
The main body word dictionary tree and the related term dictionary tree are synthesized to create the preset dictionary tree-model.
2. according to the method described in claim 1, it is characterized in that, described decouple input word to obtain N number of partition word
Including:
The Fei Biliu words in the input word are removed according to Fei Biliu word dictionaries;
Input word after the removal Fei Biliu words is decoupled to obtain N number of partition word, wherein the Fei Biliu words
Dictionary is related to language form.
3. according to the method described in claim 1, it is characterized in that, described generate final carry according to N number of prompt results set
Show that result includes:
N number of prompt results set is merged, and duplicate removal is carried out to generate to N number of prompt results set after merging
State final prompt result.
4. a kind of input prompt device, which is characterized in that including:
Acquisition module, for obtaining input word;
Module is decoupled, for being decoupled to the input word to obtain N number of partition word, wherein N is positive integer, described N number of point
Word-breaking is minimum linguistic unit;
Enquiry module is tied for inquiring preset dictionary tree-model respectively according to N number of partition word with obtaining N number of prompt respectively
Fruit set;And
Generation module, for generating final prompt result according to N number of prompt results set;
Wherein, the partition module includes:
Second partition unit, for being decoupled to the input word according to the context of the input word to obtain the first partition word
Results set;And
Third decouples unit, for according to the corresponding minimum linguistic unit of language form to the first partition word results set into
Row partition is to obtain the second partition word results set, wherein the second partition word results set includes N number of partition word;
Creation module, the creation module, which has, to be used for:
Obtain multiple sample words;
The multiple sample word is ranked up according to temperature is accessed, and respectively using multiple sample words after sequence as multiple masters
Pronouns, general term for nouns, numerals and measure words;
The corresponding related term of each main body word is generated according to the multiple main body word respectively;And
The preset dictionary tree-model is created according to the multiple main body word and the corresponding related term of each main body word,
In, the related term establishes index, the rope when being inserted into the preset dictionary tree-model, in the root node of the main body word
It is incorporated in the position for being directed toward the related term in corresponding main body word;
Wherein, the creation module has and is used for:The Fei Biliu words in each main body word, and needle are removed according to Fei Biliu word dictionaries
To going unless the main body word after must staying word is extended, to generate the corresponding related term of each main body word;
Wherein, the creation module also particularly useful for:
Main body word dictionary tree is created according to each main body word;
Related term dictionary tree is created according to the corresponding related term of each main body word;And
The main body word dictionary tree and the related term dictionary tree are synthesized to create the preset dictionary tree-model.
5. device according to claim 4, which is characterized in that the partition module includes:
Removal unit, for removing the Fei Biliu words in the input word according to Fei Biliu word dictionaries, wherein the Fei Biliu words
Dictionary is related to language form;And
First partition unit, for being decoupled the input word after the removal Fei Biliu words to obtain N number of partition word.
6. device according to claim 4, which is characterized in that the generation module is specifically used for:N number of prompt is tied
Fruit set merges, and carries out duplicate removal to N number of prompt results set after merging to generate the final prompt result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410169141.6A CN103914569B (en) | 2014-04-24 | 2014-04-24 | Input creation method, the device of reminding method, device and dictionary tree-model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410169141.6A CN103914569B (en) | 2014-04-24 | 2014-04-24 | Input creation method, the device of reminding method, device and dictionary tree-model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103914569A CN103914569A (en) | 2014-07-09 |
CN103914569B true CN103914569B (en) | 2018-09-07 |
Family
ID=51040249
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410169141.6A Active CN103914569B (en) | 2014-04-24 | 2014-04-24 | Input creation method, the device of reminding method, device and dictionary tree-model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103914569B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108241695B (en) * | 2016-12-26 | 2021-11-02 | 北京国双科技有限公司 | Information processing method and device |
CN107967259A (en) * | 2017-11-27 | 2018-04-27 | 传神语联网网络科技股份有限公司 | The method and device of Thai syllable splitting |
CN108304384B (en) * | 2018-01-29 | 2021-08-27 | 上海名轩软件科技有限公司 | Word splitting method and device |
CN109933217B (en) | 2019-03-12 | 2020-05-01 | 北京字节跳动网络技术有限公司 | Method and device for pushing sentences |
CN111400584A (en) * | 2020-03-16 | 2020-07-10 | 南方科技大学 | Association word recommendation method and device, computer equipment and storage medium |
CN113625884A (en) * | 2020-05-07 | 2021-11-09 | 顺丰科技有限公司 | Input word recommendation method and device, server and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103440312A (en) * | 2013-08-27 | 2013-12-11 | 深圳市华傲数据技术有限公司 | System and terminal for inquiring zip code for mailing address |
CN103631929A (en) * | 2013-12-09 | 2014-03-12 | 江苏金智教育信息技术有限公司 | Intelligent prompt method, module and system for search |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8209358B2 (en) * | 2007-05-09 | 2012-06-26 | Illinois Institute Of Technology | Hierarchical structured abstract data organization system |
CN102084363B (en) * | 2008-07-03 | 2014-11-12 | 加利福尼亚大学董事会 | A method for efficiently supporting interactive, fuzzy search on structured data |
-
2014
- 2014-04-24 CN CN201410169141.6A patent/CN103914569B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103440312A (en) * | 2013-08-27 | 2013-12-11 | 深圳市华傲数据技术有限公司 | System and terminal for inquiring zip code for mailing address |
CN103631929A (en) * | 2013-12-09 | 2014-03-12 | 江苏金智教育信息技术有限公司 | Intelligent prompt method, module and system for search |
Non-Patent Citations (1)
Title |
---|
"思路解密:SEO搜索中文分词算法原理";上海英才网HR;《百度贴吧:http://tieba.baidu.com/p/1556295187》;20120427;第1-2页 * |
Also Published As
Publication number | Publication date |
---|---|
CN103914569A (en) | 2014-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103914569B (en) | Input creation method, the device of reminding method, device and dictionary tree-model | |
CN108804641B (en) | Text similarity calculation method, device, equipment and storage medium | |
CN102479191B (en) | Method and device for providing multi-granularity word segmentation result | |
CN110442777B (en) | BERT-based pseudo-correlation feedback model information retrieval method and system | |
US8463593B2 (en) | Natural language hypernym weighting for word sense disambiguation | |
CN109739973A (en) | Text snippet generation method, device, electronic equipment and storage medium | |
CN111104488B (en) | Method, device and storage medium for integrating retrieval and similarity analysis | |
CN102915299A (en) | Word segmentation method and device | |
CN102768681A (en) | Recommending system and method used for search input | |
CN107844493B (en) | File association method and system | |
CN108875065B (en) | Indonesia news webpage recommendation method based on content | |
CN111159359A (en) | Document retrieval method, document retrieval device and computer-readable storage medium | |
JP6722615B2 (en) | Query clustering device, method, and program | |
EP3679488A1 (en) | System and method for recommendation of terms, including recommendation of search terms in a search system | |
CN110134970B (en) | Header error correction method and apparatus | |
CN114880447A (en) | Information retrieval method, device, equipment and storage medium | |
WO2016015267A1 (en) | Rank aggregation based on markov model | |
JP5718405B2 (en) | Utterance selection apparatus, method and program, dialogue apparatus and method | |
CN109933216B (en) | Word association prompting method, device and equipment for intelligent input and computer storage medium | |
CN111859950A (en) | Method for automatically generating lecture notes | |
US8229970B2 (en) | Efficient storage and retrieval of posting lists | |
CN113190692B (en) | Self-adaptive retrieval method, system and device for knowledge graph | |
Collarana et al. | A question answering system on regulatory documents | |
CN110704613B (en) | Vocabulary database construction and query method, database system, equipment and medium | |
JP5869948B2 (en) | Passage dividing method, apparatus, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |