CN102298581A - Method and device for processing input method word stock - Google Patents

Method and device for processing input method word stock Download PDF

Info

Publication number
CN102298581A
CN102298581A CN2010102060028A CN201010206002A CN102298581A CN 102298581 A CN102298581 A CN 102298581A CN 2010102060028 A CN2010102060028 A CN 2010102060028A CN 201010206002 A CN201010206002 A CN 201010206002A CN 102298581 A CN102298581 A CN 102298581A
Authority
CN
China
Prior art keywords
speech
search
dictionary
words
user terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010102060028A
Other languages
Chinese (zh)
Other versions
CN102298581B (en
Inventor
刘致远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Tencent Computer Systems Co Ltd
Original Assignee
Shenzhen Tencent Computer Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Tencent Computer Systems Co Ltd filed Critical Shenzhen Tencent Computer Systems Co Ltd
Priority to CN201010206002.8A priority Critical patent/CN102298581B/en
Publication of CN102298581A publication Critical patent/CN102298581A/en
Application granted granted Critical
Publication of CN102298581B publication Critical patent/CN102298581B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method and device for processing an input method word stock. The method comprises the following steps of: acquiring search vocabulary information input by a first user terminal group; generating a special word stock according to the search vocabulary information; and transmitting the special word stock to a second user terminal group. In the invention, the problem that direct inclusion of search vocabulary cannot be realized due to a large quantity of words, a special word stock related to the search vocabulary is provided for an input method, the word stock source of the input method is enriched, the problem of the size-quality ratio of the word stock is solved simultaneously, and the input efficiency of a user is increased.

Description

A kind of disposal route of input method dictionary and device
Technical field
The present invention relates to field of computer technology, especially relate to a kind of disposal route and device of input method dictionary.
Background technology
Along with the continuous development of computing machine, the continuous of network popularized, and computer input method has become a necessary tool software.Input method is meant the coding method of adopting for various symbols are imported computing machines or other equipment (as mobile phone).If want in PC World, to exchange, so just must grasp at least one input method with other people.
At present, the internet has become people's life, work, the indispensable instrument of study, produces profound influence over against different social sectors.Internet industry market scales such as ecommerce, the web advertisement, online game, search engine increase by on a year-on-year basis all above 20%, and everything all be unable to do without input method technology.The data presentation of the Ministry of Industry and Information Technology's issue, by first quarter end in 2010, China Internet netizen sum reaches 4.04 hundred million, the user of China's social network sites reaches 1.91 hundred million, numerous network user like this, almost all carrying out man-machine conversation all the time, can realize that the prerequisite of talking with is an input method technology.Just because of this, the development of input method technology is affecting the nerve of the network industry all the time, and the pulse that is accompanied by civilization is being quivered.
And dictionary can not be ignored for the status of input method.Input method is according to the difference of character, and is also different to the degree of dependence of dictionary.Some very old calligraphy is not rely on dictionary, and what it was paid attention to is the input of word.So certainly, greatly reduce the group speech function of input method.For example user terminal is want to get a name " Huang Jiguang " (or movie name " Titanic ") with spelling input method, if there is not the input method of dictionary, even if risk " huangjiguagn " (or " taitannikehao ") fully, input method also can't provide normal result, also to do selection to the word of each sound correspondence, but, input method collects the dictionary that includes these speech when having had, even if easy spelling " hjg " (or " ttnkh "), input method also can preferentially be chosen the higher word of probability of use by the group word algorithm and provide correct result in dictionary.
Though different input methods has different operational mode (group word algorithm difference) to dictionary, and is unquestionable, the input method that dictionary is abundant more, the input efficiency that arrives the user relatively also can be higher relatively.
The source of dictionary mainly is two kinds at present, and a kind of is to collect the word that user terminal was imported, and by being incorporated in after cutting apart in the dictionary, this dictionary is commonly referred to " user thesaurus ".Another kind is by collecting the special-purpose dictionary of available data group speech.These data might be the books allusions, as " Tang poetry 300 head ", also may be existing famous movie name intersections, so long as useful data can be made dictionary.But the problem that individual significant is arranged here is exactly the size of dictionary and the problem of quality ratio: general dictionary size of input method is 10,000 to 100,000 difference normally.The quality of dictionary is directly connected to user's input efficiency, many more dictionaries, it is more convenient no doubt to allow the user import in theory more, but include the word that is of little use too much, the speech that is of little use too much can allow input efficiency reduce on the contrary, because can have influence on the necessary word arrangement of common wordss when input.Therefore, remitting ordering according to user's everyday words is crucial for improving input efficiency.
And search engine technique is very popular in recent years technology, search engine has become the platform of the network user by inputted search vocabulary inquiry internet information resource, wherein, search vocabulary word or speech that to be the user import when using search engine, that can at utmost summarize the information content that the user will search, being the concentrated reflection of network user's common wordss, is the generalization and the centralization of information.The high vocabulary of word frequency in the search vocabulary often is again various types of common wordss or popular vocabulary or the new popular vocabulary that produces.
Thereby for the input method dictionary, the special-purpose dictionary of search vocabulary can be good at tackling the size of dictionary and the problem of quality ratio.But but be not used in the special-purpose dictionary of search vocabulary at present, because this input speech the unknown originally of search engine, if it is obviously unrealistic to include all search vocabulary by usual way, it is impossible wanting direct listing search speech, the speech amount is too big, can't realize.
Summary of the invention
The invention provides a kind of disposal route and device of input method dictionary, for input method provides about searching for the special-purpose dictionary of vocabulary.
In order to achieve the above object, the invention provides a kind of disposal route of input method dictionary, comprising:
Obtain the search lexical information of first user terminal group input;
Generate proprietary dictionary according to described search lexical information;
Described proprietary dictionary is sent to the second user terminal group, add proprietary dictionary for the second user terminal group.
Describedly obtain the search lexical information that user terminal is imported in search engine, specifically comprise:
From search engine database, obtain the search lexical information of first user terminal group input, wherein, stored the search lexical information that the described first user terminal group is imported in the described search engine database in search engine.
Generate proprietary dictionary according to described search lexical information, specifically comprise:
Determine speech to be selected according to described search lexical information;
Obtain set of words to be selected according to described speech to be selected;
Generate described proprietary dictionary according to described set of words to be selected.
Obtain set of words to be selected according to described speech to be selected, specifically comprise:
Described speech to be selected is gone heavily, and the multiplicity of adding up described speech to be selected;
According to the multiplicity of described speech to be selected, determine the weight of each speech correspondence to be selected, and set up the set of speech to be selected and respective weights according to the corresponding relation of speech to be selected and weight;
The weight of speech to be selected in the described set according to correspondence sorted, obtain described set of words to be selected.
Generate described proprietary dictionary according to described set of words to be selected, comprising:
From described set of words to be selected, choose the speech to be selected of default number according to first preset strategy;
Generate described proprietary dictionary according to the speech of choosing to be selected.
Generate described proprietary dictionary according to described set of words to be selected, specifically comprise:
Adjust set of words to be selected according to second preset strategy, and when adjusted speech to be selected reaches predetermined number, generate proprietary dictionary.
The present invention also provides a kind for the treatment of apparatus of input method dictionary, comprising:
Acquisition module is used to obtain the search lexical information of first user terminal group input;
Generation module is used for generating proprietary dictionary according to the described search lexical information that described acquisition module obtains;
Sending module is used for the described proprietary dictionary that described generation module generates is sent to the second user terminal group, adds proprietary dictionary for the second user terminal group.
Acquisition module specifically is used for obtaining the search lexical information that the first user terminal group is imported from search engine database, wherein, has stored the search lexical information that the described first user terminal group is imported in the described search engine database in search engine.
Described generation module comprises:
Determine submodule, be used for determining speech to be selected according to described search lexical information;
Obtain submodule, be used for obtaining set of words to be selected according to the speech described to be selected that described definite submodule is determined;
Generate submodule, be used for obtaining the set of words described to be selected that submodule obtains and generating described proprietary dictionary according to described.
The described submodule that obtains specifically is used for described speech to be selected is gone heavily, and the multiplicity of adding up described speech to be selected;
According to the multiplicity of described speech to be selected, determine the weight of each speech correspondence to be selected, and set up the set of speech to be selected and respective weights according to the corresponding relation of speech to be selected and weight;
The weight of speech to be selected in the described set according to correspondence sorted, obtain described set of words to be selected.
Described generation submodule specifically is used for choosing the speech to be selected of presetting number from described set of words to be selected according to first preset strategy;
Generate described proprietary dictionary according to the speech of choosing to be selected.
Described generation submodule also is used for adjusting set of words to be selected according to second preset strategy, and when adjusted speech to be selected reaches predetermined number, generates proprietary dictionary.
Compared with prior art, the present invention has the following advantages at least:
By obtaining and handling to user terminal search lexical information in the search engine database, generate proprietary dictionary, and add in the input method dictionary, thereby when having solved direct listing search vocabulary, because of the speech amount is too big, the problem that can't realize, for input method provides about searching for the special-purpose dictionary of vocabulary, enrich input method dictionary source, solved the size of dictionary and the problem of quality ratio simultaneously, improved user's input efficiency.
Description of drawings
In order to be illustrated more clearly in the present invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in the present invention or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 adds input method dictionary synoptic diagram in the prior art;
Fig. 2 is the schematic flow sheet of the disposal route of a kind of input method dictionary of the present invention;
Fig. 3 is the schematic flow sheet of the disposal route of the another kind of input method dictionary of the present invention;
Fig. 4 is the treating apparatus synoptic diagram of a kind of input method dictionary of the present invention.
Embodiment
Below in conjunction with the accompanying drawing among the present invention, the technical scheme among the present invention is carried out clear, complete description, obviously, described embodiment is a part of embodiment of the present invention, rather than whole embodiment.Based on the embodiment among the present invention, the every other embodiment that those of ordinary skills are obtained under the prerequisite of not making creative work belongs to the scope of protection of the invention.
Among the present invention,, utilize participle, screen, change speech by obtaining the search lexical information in the search engine database, mode such as choose handles described search lexical information, and generate proprietary dictionary and add in the input method dictionary.Thereby when having solved direct listing search vocabulary, because of the speech amount is too big, the problem that can't realize for input method provides special-purpose dictionary about search vocabulary, has been enriched input method dictionary source, has improved user's input efficiency.
Based on above-mentioned thought, the disposal route of a kind of input method dictionary proposed by the invention, as shown in Figure 2, this method may further comprise the steps:
Step 201, obtain the search lexical information of first user terminal group input.
Concrete, the described search lexical information that obtains the input of the first user terminal group, specifically comprise: the search lexical information that from search engine database, obtains the input of the first user terminal group, wherein, stored the search lexical information that the described first user terminal group is imported in the described search engine database in search engine.
Need to prove, the special-purpose dictionary of the listing search vocabulary that the present invention proposes, its condition is will be based on search engine, described search lexical information be meant search engine record all use the search daily record of this engine user terminal, the content of search daily record is the key word of user terminal input.
Step 202, generate proprietary dictionary according to described search lexical information.
Among the present invention, generate proprietary dictionary, specifically comprise: determine speech to be selected according to described search lexical information according to described search lexical information; Obtain set of words to be selected according to described speech to be selected; Generate described proprietary dictionary according to described set of words to be selected.
Concrete, obtain set of words to be selected according to described speech to be selected, specifically comprise: described speech to be selected is gone heavily, and the multiplicity of adding up described speech to be selected; According to the multiplicity of described speech to be selected, determine the weight of each speech correspondence to be selected, and set up the set of speech to be selected and respective weights according to the corresponding relation of speech to be selected and weight; The weight of speech to be selected in the described set according to correspondence sorted, obtain described set of words to be selected.
Generate described proprietary dictionary according to described set of words to be selected, comprising: the speech to be selected of from described set of words to be selected, choosing default number according to first preset strategy; Generate described proprietary dictionary according to the speech of choosing to be selected.
Generate described proprietary dictionary according to described set of words to be selected, specifically comprise: adjust set of words to be selected according to second preset strategy, and when adjusted speech to be selected reaches predetermined number, generate proprietary dictionary.
Step 203, described proprietary dictionary is sent to the second user terminal group, add proprietary dictionary for the second user terminal group.
Need to prove that the first user terminal group among the present invention is the user terminal that needs to add proprietary dictionary for the user terminal of the search lexical information imported in search engine, the second user terminal group, both can have the part of stack.
As seen, use technical scheme provided by the invention,, generate proprietary dictionary, and add in the input method dictionary by user terminal in the search engine database is searched for obtaining and handling of lexical information.Thereby when having solved direct listing search vocabulary, because of the speech amount is too big, the problem that can't realize, for input method provides about searching for the special-purpose dictionary of vocabulary, enrich input method dictionary source, solved the size of dictionary and the problem of quality ratio simultaneously, improved user's input efficiency.
In order more clearly to set forth technical scheme provided by the invention, below the present invention is carried out concrete detailed explanation.As shown in Figure 3, the disposal route of the another kind of input method dictionary that the present invention proposes, this method may further comprise the steps:
Step 301, search engine are imported user terminal in search engine search vocabulary information stores arrives search engine database.
Wherein, when user terminal in search engine during the inputted search lexical information, search engine need be imported user terminal in search engine search vocabulary information stores is in search engine database, at this moment, can from search engine database, obtain the search lexical information of user terminal input.
Need to prove that the search lexical information that user terminal is imported in search engine is not limited to word, may be word, also may be sentence, and may comprise English in the described search lexical information and wait other language.
Step 302, obtain the search lexical information in the search engine database.
Need to prove that in actual applications, the vocabulary in the search lexical information that obtains may be chaotic, also may be according to word frequency or ordering writing time.
Again owing to search for the vocabulary enormous amount in the search engine, here can obtain a period of time in the search engine search vocabulary in (for example: one day), and obtain manner can be periodically to obtain, and obtains the search lexical information in the search engine database one time every a predetermined period.For example, every other day obtain to search engine database and once search for lexical information, and the vocabulary that the search vocabulary packets of information in the search engine database contains has several ten thousand, hundreds of thousands is individual, even more.
And generally be a phrase for the lexical search daily record of obtaining (search lexical information), be " sharp brother's photo " such as vocabulary in the engine database, for the lexical information of this speech comprise this vocabulary, vocabulary searching times, search for the time of this speech etc.
Step 303, determine the speech to be selected of search lexical information correspondence by participle.
Concrete, process and method for participle, at first for the processing logic of single search entry, adopt single entry to carry out participle, for example: in actual applications, search daily record (search lexical information) generally is a phrase, is " sharp brother's photo " such as vocabulary in the engine database, this phrase is carried out participle, obtain: sharp/brother/photo/.
Secondly, can also directly use original binary combination and triple combination to carry out compound word to speech extracts.Such as vocabulary in the engine database or " sharp brother's photo ", binary is extracted as: sharp brother, brother's photo; Ternary is extracted as: sharp brother's photo.
Therefore, extract if adopt binary combination and triple combination to carry out compound word, as can be known,
Search word: sharp brother's photo=>speech to be selected: { sharp brother, brother's photo, sharp brother's photo }, certain speech to be selected also comprises sharp brother's photo.
Need to prove that in actual applications, two kinds of methods also can be used simultaneously.
Step 304, described speech to be selected is gone heavily, and the multiplicity of adding up described speech to be selected.
Need to prove,, in this step same speech to be selected is only preserved one, but will add up the number of times that this speech to be selected repeats speech identical in the speech to be selected (speech that repeats).
For example: for speech to be selected: { sharp brother, brother's photo, sharp brother's photo }, wherein, the user search number of times is 10001 times within sharp brother's one speech one day, just repeats 10001 times, at this speech, preserve sharp brother's one speech, and the corresponding searching times (multiplicity just) 10001 of preserving.
Step 305, according to the multiplicity of described speech to be selected, determine the weight of each speech correspondence to be selected, and set up the set of speech to be selected and respective weights according to the corresponding relation of speech to be selected and weight.
For example, obtain speech to be selected by " sharp brother's photo " and be { sharp brother, brother's photo, sharp brother's photo, sharp brother's photo }, because word frequency and sharp brother (brother's photo of sharp brother's photo, sharp brother's photo) word frequency has difference again, if sharp brother's photo frequency of occurrence is 5531 times, and sharp brother's multiplicity is 10001 times, and weight is by determining in conjunction with calculating two kinds of word frequency.What weight embodied is the comprehensive multiplicity of speech to be selected.
Step 306, the weight of the speech to be selected in the described set according to correspondence sorted, obtain described set of words to be selected.
Concrete, can sort according to weight is descending or ascending among the present invention, promptly can sort according to word frequency, owing in step 303, the participle operation is arranged, therefore need sort again according to weight in the present invention.
For example, obtain speech to be selected for " sharp brother's photo " and be { sharp brother, brother's photo, sharp brother's photo, sharp brother's photo }, obtain a plurality of speech to be selected, and order is mixed and disorderly, therefore, also need these speech to be selected are sorted according to weight again, thereby obtain the set after speech to be selected and the ordering of its respective weights.
Need to prove that the process that obtains set of words to be selected according to the search lexical information in the every day search engine data is:
All are searched for lexical informations use participles and obtain speech to be selected to every day.Suppose to search for every day lexical information and be set X.Algorithm is as follows so:
For any vocabulary x among the X, { use participle to obtain speech to be selected, add interim set Y}
Obtain all set of words Y to be selected.
Y is gone heavily, and counting, to the speech calculating multiplicity to be selected of each repetition.
With multiplicity, be foundation, adopt certain algorithm f (x), obtain the weight y of each entry, so we just obtain a new set Z.
Z={ speech to be selected, weight } set, to speech to be selected by the weight ordering.
Step 307, from described set of words to be selected, choose the speech to be selected of default number according to first preset strategy.
Concrete, the set of sorting according to weight is a set of words to be selected, the set of speech just to be selected and its corresponding weight, at this moment, can from described set of words to be selected, (for example choose default number according to first preset strategy, in the set whole speech to be selected or the set in the big part of weight) speech to be selected, wherein, this first preset strategy can be selected according to actual conditions, this first preset strategy can be the weight maximum, the individual speech to be selected of preceding 1000 (default numbers) that promptly can the weight selection maximum, the number of specifically choosing among the present invention can be set arbitrarily.This first preset strategy can also be predetermined weights, promptly reaches the speech to be selected of certain numerical value with weight selection.
Step 308, adjust set of words to be selected according to second preset strategy.
Need to prove that because the word that speech to be selected not necessarily can normally use, this speech to be selected might be a long sentence, wrongly written or mispronounced characters might be arranged, and also might be the restricted word of local law, therefore, the speech to be selected that needs audit to choose, i.e. the speech to be selected that audit has been chosen.
Adjust set of words to be selected according to second preset strategy and be specially, the speech to be selected of the default number that step 307 is chosen screens and changes speech.
Screening: the word that can't organize speech and usual (input method has had) speech that often uses.
Change speech: the word that serious or star's wrongly written or mispronounced characters are arranged is corrected network new meaning speech exception.
Step 309, when adjusted speech to be selected reaches predetermined number, generate special-purpose dictionary.
Need to prove that above-mentioned predetermined number can be adjusted according to actual needs, can be 100 speech to be selected, also can be 1000 speech to be selected, and this predetermined number is by dictionary decision to be set specifically.
Step 310, proprietary dictionary is sent to the input method dictionary of user terminal, add this proprietary dictionary for user terminal.
Concrete, the proprietary dictionary that generates is sent to user terminal by patch or mode that client is installed with described proprietary dictionary, finish the interpolation of proprietary dictionary for user terminal.
Need to prove, for mode be: send to user terminal earlier and upgrade prompting with the process that proprietary dictionary adds in the input method dictionary with patch, user terminal adds proprietary dictionary in the input method dictionary of user terminal in the patch mode after accepting to upgrade.
For with the process that proprietary dictionary adds in the input method dictionary being: proprietary dictionary is added to install in the client earlier in the mode that client is installed, this installation client of user terminal downloads and finish installation is added the input method that comprises proprietary dictionary to realize user terminal then.
As seen, use technical scheme provided by the invention, by obtaining and handling in search engine to user terminal search lexical information, the many vocabulary of word frequency in the decimated search lexical information, by audit the special-purpose dictionary of input method about search engine formed in useful vocabulary, add to then in the input method dictionary.Thereby when having solved direct listing search vocabulary, because of the speech amount is too big, the problem that can't realize for input method provides special-purpose dictionary about search vocabulary, has been enriched input method dictionary source, has improved user's input efficiency.
Based on identical technical conceive, the present invention also provides a kind for the treatment of apparatus of input method dictionary.This device can pass through computer program, and perhaps computer program and necessary hardware realize.
As shown in Figure 4, this device comprises: acquisition module 410, generation module 420, sending module 430, wherein:
Acquisition module 410 is used to obtain the search lexical information of first user terminal group input.
Generation module 420 links to each other with described acquisition module 410, is used for generating proprietary dictionary according to the described search lexical information that described acquisition module 410 obtains.
Sending module 430 links to each other with described generation module 420, is used for the described proprietary dictionary that described generation module 420 generates is sent to the second user terminal group, adds proprietary dictionary for the second user terminal group.
In the said apparatus, described generation module 420 comprises: determine submodule 421, be used for determining speech to be selected according to described search lexical information; Obtain submodule 422, link to each other, be used for obtaining set of words to be selected according to the speech described to be selected that described definite submodule 421 is determined with definite submodule 421; Generate submodule 423, and obtain submodule 422 and link to each other, be used for obtaining the set of words described to be selected that submodule 422 obtains and generating described proprietary dictionary according to described.
Wherein, the described submodule 422 that obtains specifically is used for described speech to be selected is gone heavily, and the multiplicity of adding up described speech to be selected; According to the multiplicity of described speech to be selected, determine the weight of each speech correspondence to be selected, and set up the set of speech to be selected and respective weights according to the corresponding relation of speech to be selected and weight; The weight of speech to be selected in the described set according to correspondence sorted, obtain described set of words to be selected.
Described generation submodule 423 specifically is used for choosing the speech to be selected of presetting number from described set of words to be selected according to first preset strategy; Generate described proprietary dictionary according to the speech of choosing to be selected.
Described generation submodule 423 also is used for adjusting set of words to be selected according to second preset strategy, and when adjusted speech to be selected reaches predetermined number, generates proprietary dictionary.
As seen, by obtaining and handling to user terminal search lexical information in the search engine database, generate proprietary dictionary, and add in the input method dictionary, thereby when having solved direct listing search vocabulary, because of the speech amount too big, the problem that can't realize for input method provides about searching for the special-purpose dictionary of vocabulary, has been enriched input method dictionary source, solve the size of dictionary and the problem of quality ratio simultaneously, improved user's input efficiency.
Through the above description of the embodiments, those skilled in the art can be well understood to the present invention and can realize by the mode that software adds essential general hardware platform, can certainly pass through hardware, but the former is better embodiment under a lot of situation.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product is stored in the storage medium, comprise that some instructions are with so that a computer equipment (can be a personal computer, server, the perhaps network equipment etc.) carry out the described method of each embodiment of the present invention.
It will be appreciated by those skilled in the art that accompanying drawing is the synoptic diagram of a preferred embodiment, module in the accompanying drawing or flow process might not be that enforcement the present invention is necessary.
It will be appreciated by those skilled in the art that the module in the device among the embodiment can be distributed in the device of embodiment according to the embodiment description, also can carry out respective change and be arranged in the one or more devices that are different from present embodiment.The module of the foregoing description can be merged into a module, also can further split into a plurality of submodules.
The invention described above embodiment sequence number is not represented the quality of embodiment just to description.
More than disclosed only be several specific embodiment of the present invention, still, the present invention is not limited thereto, any those skilled in the art can think variation all should fall into protection scope of the present invention.

Claims (12)

1. the disposal route of an input method dictionary is characterized in that, comprising:
Obtain the search lexical information of first user terminal group input;
Generate proprietary dictionary according to described search lexical information;
Described proprietary dictionary is sent to the second user terminal group, add proprietary dictionary for the second user terminal group.
2. the method for claim 1 is characterized in that, the described search lexical information that obtains the input of the first user terminal group specifically comprises:
From search engine database, obtain the search lexical information of first user terminal group input, wherein, stored the search lexical information that the described first user terminal group is imported in the described search engine database in search engine.
3. the method for claim 1 is characterized in that, generates proprietary dictionary according to described search lexical information, specifically comprises:
Determine speech to be selected according to described search lexical information;
Obtain set of words to be selected according to described speech to be selected;
Generate described proprietary dictionary according to described set of words to be selected.
4. method as claimed in claim 3 is characterized in that, obtains set of words to be selected according to described speech to be selected, specifically comprises:
Described speech to be selected is gone heavily, and the multiplicity of adding up described speech to be selected;
According to the multiplicity of described speech to be selected, determine the weight of each speech correspondence to be selected, and set up the set of speech to be selected and respective weights according to the corresponding relation of speech to be selected and weight;
The weight of speech to be selected in the described set according to correspondence sorted, obtain described set of words to be selected.
5. method as claimed in claim 3 is characterized in that, generates described proprietary dictionary according to described set of words to be selected, comprising:
From described set of words to be selected, choose the speech to be selected of default number according to first preset strategy;
Generate described proprietary dictionary according to the speech of choosing to be selected.
6. as claim 3 or 5 described methods, it is characterized in that, generate described proprietary dictionary, specifically comprise according to described set of words to be selected:
Adjust set of words to be selected according to second preset strategy, and when adjusted speech to be selected reaches predetermined number, generate proprietary dictionary.
7. the treating apparatus of an input method dictionary is characterized in that, comprising:
Acquisition module is used to obtain the search lexical information of first user terminal group input;
Generation module is used for generating proprietary dictionary according to the described search lexical information that described acquisition module obtains;
Sending module is used for the described proprietary dictionary that described generation module generates is sent to the second user terminal group, adds proprietary dictionary for the second user terminal group.
8. device as claimed in claim 7 is characterized in that,
Acquisition module specifically is used for obtaining the search lexical information that the first user terminal group is imported from search engine database, wherein, has stored the search lexical information that the described first user terminal group is imported in the described search engine database in search engine.
9. device as claimed in claim 7 is characterized in that, described generation module comprises:
Determine submodule, be used for determining speech to be selected according to described search lexical information;
Obtain submodule, be used for obtaining set of words to be selected according to the speech described to be selected that described definite submodule is determined;
Generate submodule, be used for obtaining the set of words described to be selected that submodule obtains and generating described proprietary dictionary according to described.
10. device as claimed in claim 9 is characterized in that,
The described submodule that obtains specifically is used for described speech to be selected is gone heavily, and the multiplicity of adding up described speech to be selected;
According to the multiplicity of described speech to be selected, determine the weight of each speech correspondence to be selected, and set up the set of speech to be selected and respective weights according to the corresponding relation of speech to be selected and weight;
The weight of speech to be selected in the described set according to correspondence sorted, obtain described set of words to be selected.
11. device as claimed in claim 9 is characterized in that,
Described generation submodule specifically is used for choosing the speech to be selected of presetting number from described set of words to be selected according to first preset strategy;
Generate described proprietary dictionary according to the speech of choosing to be selected.
12. as claim 9 or 11 described devices, it is characterized in that,
Described generation submodule also is used for adjusting set of words to be selected according to second preset strategy, and when adjusted speech to be selected reaches predetermined number, generates proprietary dictionary.
CN201010206002.8A 2010-06-23 2010-06-23 A kind of disposal route of input method dictionary and device Active CN102298581B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010206002.8A CN102298581B (en) 2010-06-23 2010-06-23 A kind of disposal route of input method dictionary and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010206002.8A CN102298581B (en) 2010-06-23 2010-06-23 A kind of disposal route of input method dictionary and device

Publications (2)

Publication Number Publication Date
CN102298581A true CN102298581A (en) 2011-12-28
CN102298581B CN102298581B (en) 2015-11-25

Family

ID=45359003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010206002.8A Active CN102298581B (en) 2010-06-23 2010-06-23 A kind of disposal route of input method dictionary and device

Country Status (1)

Country Link
CN (1) CN102298581B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699659A (en) * 2013-12-26 2014-04-02 乐视网信息技术(北京)股份有限公司 Method and system for managing word library of video resources
CN107085472A (en) * 2017-05-10 2017-08-22 陈普军 Chinese character coordinate input method with dictionary function and poem
CN107247798A (en) * 2017-06-27 2017-10-13 北京京东尚科信息技术有限公司 The method and apparatus for building search dictionary

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1912872A (en) * 2006-07-25 2007-02-14 北京搜狗科技发展有限公司 Method and system for abstracting new word
CN1924858A (en) * 2006-08-09 2007-03-07 北京搜狗科技发展有限公司 Method and device for fetching new words and input method system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1912872A (en) * 2006-07-25 2007-02-14 北京搜狗科技发展有限公司 Method and system for abstracting new word
CN1924858A (en) * 2006-08-09 2007-03-07 北京搜狗科技发展有限公司 Method and device for fetching new words and input method system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699659A (en) * 2013-12-26 2014-04-02 乐视网信息技术(北京)股份有限公司 Method and system for managing word library of video resources
CN107085472A (en) * 2017-05-10 2017-08-22 陈普军 Chinese character coordinate input method with dictionary function and poem
CN107247798A (en) * 2017-06-27 2017-10-13 北京京东尚科信息技术有限公司 The method and apparatus for building search dictionary
CN107247798B (en) * 2017-06-27 2021-05-25 北京京东尚科信息技术有限公司 Method and device for constructing search word bank

Also Published As

Publication number Publication date
CN102298581B (en) 2015-11-25

Similar Documents

Publication Publication Date Title
US9449271B2 (en) Classifying resources using a deep network
US8386240B2 (en) Domain dictionary creation by detection of new topic words using divergence value comparison
CN102254557B (en) Navigation method and system based on natural voice identification
CN102567304B (en) Filtering method and device for network malicious information
CN102163228B (en) Method, apparatus and device for determining sorting result of resource candidates
CN113687826B (en) Test case multiplexing system and method based on demand item extraction
CN104504150A (en) News public opinion monitoring system
CN102207945A (en) Knowledge network-based text indexing system and method
CN102262625A (en) Method and device for extracting keywords of page
CN1936893A (en) Method and system for generating input-method word frequency base based on internet information
US11068479B2 (en) Method and system for analytic based connections among user types in an online platform
CN102867511A (en) Method and device for recognizing natural speech
CN103678362A (en) Search method and search system
CN113886604A (en) Job knowledge map generation method and system
CN103177039A (en) Data processing method and data processing device
CN102322866A (en) Navigation method and system based on natural speech recognition
CN101470699B (en) Information extraction model training apparatus, information extraction apparatus and information extraction system and method thereof
CN102591897A (en) Apparatus and method for searching document
CN102298581A (en) Method and device for processing input method word stock
CN102347026B (en) Audio/video on demand method and system based on natural voice recognition
Rezaei et al. Sentiment analysis on Twitter using McDiarmid tree algorithm
Viveros-Jiménez et al. Improving the boilerpipe algorithm for boilerplate removal in news articles using html tree structure
Al Marouf et al. Lyricist identification using stylometric features utilizing banglamusicstylo dataset
Reddy et al. An efficient approach for web document summarization by sentence ranking
Banados et al. Optimizing support vector machine in classifying sentiments on product brands from Twitter

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant