CN100483417C - Method for catching limit word information, optimizing output and input method system - Google Patents

Method for catching limit word information, optimizing output and input method system Download PDF

Info

Publication number
CN100483417C
CN100483417C CNB2007100996440A CN200710099644A CN100483417C CN 100483417 C CN100483417 C CN 100483417C CN B2007100996440 A CNB2007100996440 A CN B2007100996440A CN 200710099644 A CN200710099644 A CN 200710099644A CN 100483417 C CN100483417 C CN 100483417C
Authority
CN
China
Prior art keywords
word
information
target word
candidate item
eigenwert
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2007100996440A
Other languages
Chinese (zh)
Other versions
CN101055588A (en
Inventor
吕杰勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CNB2007100996440A priority Critical patent/CN100483417C/en
Publication of CN101055588A publication Critical patent/CN101055588A/en
Priority to PCT/CN2008/071064 priority patent/WO2008145055A1/en
Application granted granted Critical
Publication of CN100483417C publication Critical patent/CN100483417C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/018Input/output arrangements for oriental characters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/274Converting codes to words; Guess-ahead of partial word inputs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method for acquiring the limiting word information, comprising the steps of: acquiring a target word; acquiring the corresponding characteristic information of target word; jugding whether the characteristic information or the corresponding numerical result accords with the preset condition, if being suitable, determining the target word as a limit word and recording the related limiting information which is used for limiting the arrangement when the word is outputed alone. The inventive embodiment, by preseting the word stock including the inout method of limiting word information, judges whether the output candidative item accords with the preset condition of application limiting information when user inputs the word, and further, based on the result, judges whether the candidative item with limiting word information is displayed and outputed, accordingly user can obtain more effective output without increasing the operation, the character output process of input method system is optimized greatly, and the intelligentance of input method system is also improved.

Description

Obtain the method for limit word information, the method and the input method system of optimization output
Technical field
The present invention relates to computer character input data processing field, particularly relate to a kind of method and apparatus of limit word information, a kind of method of input method dictionary, a kind of method and a kind of input method system of optimizing output of upgrading of obtaining.
Background technology
Along with popularizing and development of computer technology and Internet technology, the user of different professional domains, different interest and use habit is more and more higher for the intelligent requirement of input method system.
In the prior art, occurred utilizing numerous and jumbled internet corpus statistics, screening the technology that obtains the input method dictionary.Can comprise in the resulting internet dictionary a lot of by sealing language material information (as modern Chinese dictionary, news, newspaper etc.) before the neologisms that can't obtain, thereby can improve people's input efficiency greatly.But,, make to have some defectives on linguistics or the use input habit by some speech that therefrom obtain by the word frequency statistics screening just because of the complicacy of internet corpus.
For example, Pinyin coding character string " liangjiang " for user's input, general obtainable candidate item comprises " two rivers ", " good general " etc., candidate item with internet dictionary also may comprise " amount is incited somebody to action ", because " amount is incited somebody to action " this speech frequency of occurrences in internet web page is still quite high, but it generally all appears at the junction (being used to express linking relationship) of a plurality of speech in the sentence, for example, and " passenger amount will above ".To in " amount is incited somebody to action " this speech income input method dictionary, no doubt can increase intelligent (the reaching higher intelligent word effect) of input method, but but since " amount will " speech under the situation that becomes speech separately, seldom occur, thereby also might cause trouble to user's input, increase the candidate item quantity that the user need select, reduce input efficiency.
Therefore, pressing for the technical matters that those skilled in the art solve is exactly: how to find out such speech with defective on linguistics or the use habit, and limited in input process.
Summary of the invention
Technical matters to be solved by this invention provides a kind of method and apparatus that obtains limit word information, can find out the speech with defective on linguistics or the use habit from a large amount of vocabulary, thereby the input that improves the user is experienced.
Another object of the present invention provides a kind of method of input method dictionary, a kind of method and a kind of input method system of optimizing output of upgrading, can be implemented in the actual input process some speech is limited in some cases, thereby can be implemented under the prerequisite that does not increase user's operation, reach and improve the intelligent purpose of input method.
In order to solve the problems of the technologies described above, the invention discloses a kind of method of obtaining limit word information, specifically can comprise:
Obtain a target word; Obtain the corresponding characteristic information of this target word, described characteristic information is with numeric representation; Judge described characteristic information or utilize characteristic information to carry out result calculated and whether meet prerequisite, if meet, determine that then this target word is restrictive word and writes down relevant limit information, described restrictive word is meant the speech with defective on linguistics or the use habit, the ordering that described restricted information is used to limit this target word when exporting separately.
Wherein, preferred, described characteristic information is: the individual character that is positioned at prefix in this target word is in the eigenwert of default corpus as prefix, and the individual character that is positioned at suffix in this target word is in the eigenwert of default corpus as suffix; Whether described prerequisite is: exist at least one eigenwert to belong to presetting range in the above-mentioned eigenwert.
Perhaps, preferred, described characteristic information is: the eigenwert that ties up in the default corpus is closed in each monosyllabic word that is comprised in this target word and/or the collocation of the linguistics of multi-character words; Whether described prerequisite is: exist at least one eigenwert to belong to presetting range in the above-mentioned eigenwert.
Perhaps, preferred, described characteristic information is: this target word eigenwert that the user imports separately in input method is used; Described prerequisite is: whether this eigenwert belongs to presetting range.
Perhaps, preferred, described characteristic information comprises: the individual character that is positioned at prefix in this target word is in the eigenwert of default corpus as prefix; The individual character that is positioned at suffix in this target word is in the eigenwert of default corpus as suffix; And the general word frequency of this target word; Described prerequisite is: whether the ratio of at least one eigenwert and the general word frequency of this target word belongs to presetting range in the above-mentioned eigenwert.
Perhaps, preferred, described characteristic information comprises: the eigenwert that ties up in the default corpus is closed in each monosyllabic word that is comprised in this target word and/or the collocation of the linguistics of multi-character words; And the general word frequency of this target word; Described prerequisite is: whether the ratio of at least one eigenwert and the general word frequency of this target word belongs to presetting range in the above-mentioned eigenwert.
Perhaps, preferred, described characteristic information is: this target word eigenwert that the user imports separately in input method is used; And the general word frequency of this target word; Described prerequisite is: whether the ratio of this eigenwert and the general word frequency of this target word belongs to presetting range.
Perhaps, preferred, described characteristic information is: this target word is at the user's sorting position information in each candidate word of same input coding; And the original sorting position information of this target word; Wherein, described user's sorting position information is relevant with this target word eigenwert that the user imports separately in input method is used; Described original sorting position information is relevant with the general word frequency of this target word; Described prerequisite is: whether the difference of described user's sorting position information and described original sorting position information belongs to presetting range.
Preferably, described restricted information comprises: the restriction of this restrictive word under each default scene be the weight of output separately.Further, described restricted information can comprise: the linguistics collocation parameter of this restrictive word in default corpus; Described linguistics collocation parameter is used to limit the ordering of this speech when intelligent word is exported.
Preferably, described method can also comprise: generate a dictionary or vocabulary, described dictionary or vocabulary comprise described restrictive word and relevant limit information thereof; Perhaps, generate a dictionary, described dictionary comprises described restrictive word and relevant limit information thereof, and general words.
According to another embodiment of the present invention, a kind of method of obtaining limit word information is disclosed, specifically can comprise:
Obtain a target word; Obtain the linguistics collocation parameter of this target word in default corpus, described linguistics collocation parameter is with numeric representation; Judge whether described linguistics collocation parameter meets prerequisite, if meet, then write down the restricted information of this target word, described restricted information comprises corresponding linguistics collocation parameter, described restricted information is that the speech with defective on linguistics or the use habit is limited, the ordering when described restricted information is used to limit this target word intelligent word output.
Wherein, preferred, described linguistics collocation parameter is a general parameter; Perhaps, described linguistics collocation parameter comprises the branch parameter at each default scene.
According to another embodiment of the present invention, a kind of method of upgrading dictionary is also disclosed, comprising:
Obtain a target word; Obtain the corresponding characteristic information of this target word, described characteristic information is with numeric representation; Judge described characteristic information or utilize characteristic information to carry out result calculated and whether meet prerequisite, if meet, determine that then this target word is restrictive word and writes down relevant limit information, described restrictive word is meant the speech with defective on linguistics or the use habit, the ordering that described restricted information is used to limit this target word when exporting separately, and/or, the ordering when being used to limit this target word intelligent word output; Described restrictive word and relevant limit information thereof are added in the existing dictionary of input method.
Preferably, described interpolation can for: judge that whether this restrictive word exists in the existing dictionary of described input method,, then only write down its relevant limit information to the existing dictionary of described input method if exist; Perhaps, described interpolation can also for: directly described restrictive word and relevant limit information thereof are recorded in the existing dictionary of described input method,, then cover original entry if entry repeats; Perhaps, described interpolation can also for: with described restrictive word and relevant limit information stores thereof is a restriction vocabulary, and the existing dictionary of described restriction vocabulary and input method is used for cooperation and finishes the candidate item ordering.
Further, described restrictive word has the restricted information under each default scene.
According to another embodiment of the present invention, a kind of device that obtains limit word information is also disclosed, specifically can comprise:
The target word acquiring unit is used to obtain a target word;
Characteristic acquisition unit is used to obtain the corresponding characteristic information of this target word, and described characteristic information is with numeric representation;
The restricted information acquiring unit, be used to judge described characteristic information or utilize characteristic information to carry out result calculated and whether meet prerequisite, if meet, determine that then this target word is restrictive word and writes down relevant limit information, described restrictive word is meant the speech with defective on linguistics or the use habit, the ordering that described restricted information is used to limit this target word when exporting separately, and/or, the ordering when being used to limit this target word intelligent word output.
According to another embodiment of the present invention, a kind of method of optimizing output is also disclosed, comprising:
Receive user's input information, and described input information is changed; Obtain output candidate item; Judge whether an output candidate item meets the prerequisite of application limitations information; Described restricted information is that the speech with defective on linguistics or the use habit is limited; If then extract the corresponding restricted information of this output candidate item, and each candidate item sorted according to described restricted information; Wherein, described ordering is to realize by the mode that direct setting represents position or order, or realizes by the mode of revising word frequency.
Preferably, the prerequisite of described application limitations information is: whether described output candidate item is the speech of exporting separately; Perhaps, the prerequisite of described application limitations information is: whether described output candidate item belongs to the intelligent word situation.
Preferably, can obtain described restricted information by following steps: obtain a target word; Obtain the corresponding characteristic information of this target word, described characteristic information is with numeric representation; Judge described characteristic information or utilize characteristic information to carry out result calculated whether meet prerequisite, if meet, then at this target word record relevant limit information.
Further, when needs judge whether described output candidate item is the speech of output separately, can finish: judge whether an output candidate item only comprises an element, and length is greater than 1 output character by following steps; Described element is to preset the words of storing in the dictionary; If determine that then this output candidate item is the speech of exporting separately.
According to another embodiment of the present invention, a kind of input method system is also disclosed, comprise input interface unit and display unit, described input method system can also comprise:
Dictionary: described dictionary comprises restricted information;
Candidate item acquiring unit: be used for obtaining output candidate item according to user's input information;
Judging unit is used to judge whether an output candidate item meets the prerequisite of application limitations information; Described restricted information is that the speech with defective on linguistics or the use habit is limited;
The candidate item sequencing unit is used for when meeting prerequisite, extracts the corresponding restricted information of this output candidate item, and according to described restricted information each candidate item is sorted; Wherein, described ordering is to realize by the mode that direct setting represents position or order, or realizes by the mode of revising word frequency.
Preferably, the prerequisite of described application limitations information is: whether described output candidate item is the speech of exporting separately; Perhaps, the prerequisite of described application limitations information is: whether described output candidate item belongs to the intelligent word situation.
Preferably, described judging unit further can comprise: be used to judge whether this output candidate item only comprises the subelement of an element; Wherein, described element is to preset the words of storing in the dictionary; And whether the length that is used to judge this output candidate item is greater than the subelement of 1 output character; And, be used for when this output candidate item meets above-mentioned two Rule of judgment, determining that it is the subelement of the speech of independent output.
Preferably, the input interface unit of described input method system, display unit and dictionary are arranged in same computing equipment; Perhaps, the input interface unit of described input method system, display unit are arranged in first computing equipment, and dictionary is arranged in second computing equipment, and described input method system is according to the information of user's input, obtain corresponding information from being arranged in second computing equipment, show corresponding words at first computing equipment.
Compared with prior art, the embodiment of the invention has the following advantages: the embodiment of the invention is by presetting the input method dictionary that comprises limit word information, when the user imports, judge whether the output candidate item meets the prerequisite of application limitations information, and then according to the result who whether meets, control has the whether demonstration and the output of the candidate item of limit word information, thereby make the user under the prerequisite that does not increase operation, can obtain (for example more effectively to export, in practice, make restrictive word " amount is incited somebody to action " when being exported separately, not be presented in the candidate item, and the speech of participation group in other cases), greatly optimize the character output procedure of input method system, improved the intelligent of input method system.
Description of drawings
Fig. 1 is a kind of flow chart of steps of obtaining the method embodiment 1 of limit word information of the present invention;
Fig. 2 is a kind of flow chart of steps of obtaining the method embodiment 2 of limit word information of the present invention;
Fig. 3 is a kind of flow chart of steps of upgrading the method embodiment of input method dictionary of the present invention;
Fig. 4 is a kind of structured flowchart that obtains the device embodiment of limit word information of the present invention;
Fig. 5 is a kind of flow chart of steps of optimizing the method embodiment of output of the present invention;
Fig. 6 is a kind of speech grid synoptic diagram of phonetic network cutting method;
Fig. 7 is the structured flowchart of a kind of input method system embodiment.
Embodiment
For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, the present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.
With reference to Fig. 1, show a kind of method embodiment 1 that obtains limit word information, specifically can comprise:
Step 101, obtain a target word;
The described process of obtaining target word can obtain from the internet, promptly directly (for example from the internet corpus, internet web page set or searched key set of words etc.) middle process is added up, screening obtains, also can obtain from existing dictionary, the present invention does not need this to be limited, as long as can obtain a target word set; As for the range size of this set, those skilled in the art set according to actual needs and get final product.
Preferably, this target word set for being obtained can also comprise an optimization step, adopts some attributes of target word to remove some vocabulary, further to reduce the scope.For example, from this set, remove internet word frequency or dictionary word frequency speech smaller or equal to predetermined threshold value; From this set, remove and to determine not belong to speech (for example universal word in the dictionary) of restrictive word or the like.Certainly, described this optimization step also can be finished in the process of obtaining the target word set fully.
Step 102, obtain the corresponding characteristic information of this target word;
Step 103, judge whether described characteristic information or its corresponding calculated result meet prerequisite,, determine that then this target word is restrictive word and writes down relevant limit information, the ordering that described restricted information is used to limit this speech when exporting separately if meet.For example,, when output separately, do not appear in the candidate item for restrictive word " amount will ", " last one " etc., but when exporting with other words intelligent words without limits.Concrete example:, be " amount will " according to article one of the pre-output of word frequency information, but, then from candidate item, remove owing to it has the restricted information mark when input " liangjiang "; When input " lvkeliangjiangchaoguo ", then output " passenger amount will surpass ", " amount will " this speech does not need to be limited to export at this moment.
Restrictive word that present embodiment obtains and restricted information thereof can directly be stored in the independent dictionary (or vocabulary), for example, generate a dictionary (or vocabulary), and described dictionary comprises described restrictive word and relevant limit information thereof; Also can generate an input method dictionary with general words, for example, generate a dictionary, described dictionary comprises described restrictive word and relevant limit information thereof, and general words; Can also directly it be added in the existing dictionary of input method.
Described restricted information can also can have numerical value (for example, two decimals of from 0 to 1) for sign (for example, this restrictive word in dictionary marks 0 or 1), is used for the ordering of candidate item is adjusted, and does not show to be exactly a kind of extreme case certainly.Resulting restrictive word and restricted information thereof are manually changed by the user according to actual needs, and perhaps the server update modification all is feasible.
According to the difference of the characteristic information that is obtained, the corresponding judgment condition also can be different in the present embodiment, enumerates a plurality of examples below step 102 and 103 is described.Presetting corpus and can be any corpus wherein; Described eigenwert can obtain through statistics, also can be rule of thumb or existing knowledge directly obtain; Described eigenwert can be various numerical value, for example probability or frequency etc.Rule of judgment wherein only is that those skilled in the art can set more complicated Rule of judgment as required for example, and the present invention does not limit this.
Example 1
Described characteristic information is: the individual character that is positioned at prefix in this target word is in the eigenwert of default corpus as prefix, and the individual character that is positioned at suffix in this target word is in the eigenwert of default corpus as suffix;
The described prerequisite that is used for judging is: whether above-mentioned eigenwert exists at least one eigenwert whether to belong to presetting range.
For example, seldom appear at prefix,, can judge then that " amount will " be restrictive word if its prefix frequency of occurrences is less than or equal to predetermined threshold value for the individual character " amount " in " amount will ".
Certainly, be that three or above word are formed for target word, might judge also that then being arranged in certain locational individual character of speech is in eigenwert on the speech same position at default corpus.
Example 2
Described characteristic information is: the eigenwert that ties up in the default corpus is closed in each monosyllabic word that is comprised in this target word and/or the collocation of the linguistics of multi-character words;
The described prerequisite that is used for judging is: whether above-mentioned eigenwert exists at least one eigenwert to belong to presetting range.
Described linguistics collocation relation can comprise the collocation parameter of speech and speech, the multiple matching relationships such as collocation parameter of collocation parameter, part of speech and the part of speech of speech and part of speech.Those skilled in the art can select for use or the above-mentioned various matching relationships of applied in any combination according to actual needs.
For example, for " being a to play " speech, immediately following verb, such collocation pass ties up on the linguistics rarely found after the "Yes", is less than or equal to predetermined threshold value so can obtain its collocation eigenwert, can judge that then " being to play " is restrictive word.
Example 3
Described characteristic information is: this target word eigenwert that the user imports separately in input method is used;
The described prerequisite that is used to judge is: whether this eigenwert belongs to presetting range.
The independent input of described user can be a user's statistical value, also can be the assembly average of a customer group.The described user separately eigenwert of input can be directly obtains from the record of user thesaurus, can also obtain by supervisory user input behavior.
For example, for " being a to play " speech, the user imports this speech seldom separately, is restrictive word so when the eigenwert of statistics is less than or equal to predetermined threshold value, then can judge " being to play ".
In following several examples, in order further to improve the judgement accuracy of restrictive word, introduced general word frequency in Rule of judgment, described general word frequency can be the internet word frequency, also can be the dictionary word frequency.Just repeat no more with the previous example similarity in the following example, specifically see also aforementioned.
Example 4
Described characteristic information comprises: the individual character that is positioned at prefix in this target word is in the eigenwert of default corpus as prefix; The individual character that is positioned at suffix in this target word is in the eigenwert of default corpus as suffix; And the general word frequency of this target word;
The described prerequisite that is used for judging is: whether the ratio of at least one eigenwert of above-mentioned eigenwert and the general word frequency of this target word belongs to presetting range.
Example 5
Described characteristic information comprises: the eigenwert that ties up in the default corpus is closed in each monosyllabic word that is comprised in this target word and/or the collocation of the linguistics of multi-character words; And the general word frequency of this target word;
The described prerequisite that is used for judging is: whether the ratio of at least one eigenwert of above-mentioned eigenwert and the general word frequency of this target word belongs to presetting range.
Example 6
Described characteristic information is: this target word is at the user's sorting position information in each candidate word of same input coding; And the original sorting position information of this target word; Wherein, described user's sequencing information is relevant with this target word eigenwert that the user imports separately in input method is used; Described original sequencing information is relevant with the general word frequency of this target word;
The described prerequisite that is used to judge is: whether the difference of described user's sorting position information and described original sorting position information belongs to presetting range.
Example 7
Described characteristic information is: this target word eigenwert that the user imports separately in input method is used; And the general word frequency of this target word;
The described prerequisite that is used to judge is: whether the ratio of this eigenwert and the general word frequency of this target word belongs to presetting range.
A kind of specific implementation process that specifically describes example 7 is as follows:
A, add up the general word frequency f_web of each speech;
B, frequency f _ user that each speech of statistics is imported separately in user group's input record;
C, calculating alpha=f_user/f_web, restrictive word thought in the speech that is far smaller than normal level as alpha;
D or, calculate alpha=f_user/f_web, when alpha is far smaller than normal level and restrictive word just thought in the very low speech of f_user.
Wherein, alpha is a result of calculation, and f_web is the general word frequency information of words, and f_user is the feature word frequency information of words.
Particularly, can calculate its corresponding alpha value for all target vocabulary, and sort from small to large according to the alpha value.Come the speech of top for those alpha values, as preceding 5%, and word frequency own is higher, as greater than 10000, thinks that then it is a restrictive word.
Need to prove that the Rule of judgment in above-mentioned each example can also be used in combination.In a word, those skilled in the art can set various decision procedures as required, can't enumerate one by one at this.
Preferably, described restricted information can comprise: the restriction of this restrictive word under each default scene be the weight of output separately.Promptly this restrictive word can have the restricted information under the different application scene, not merely has a general restricted information.For example, determine user's application scenarios when the user imports, to call the restricted information value that is limited under this default scene (for example, working words environment) in word by the input method present procedure.
Further, described restricted information can also comprise: the linguistics collocation parameter of this restrictive word in default corpus; Described linguistics collocation parameter is used to limit the ordering of this speech when intelligent word is exported.Promptly, when exporting separately, need be limited, and when its intelligent word output, also need to be limited for some restrictive word.For example,, when independent output, need be limited, do not appeared in the candidate item as far as possible, and when exporting, also will concern and be limited, do not appeared in the candidate item as far as possible according to arranging in pairs or groups for " last one " and " lining " intelligent word for " last an one " speech.
Wherein, described restricted information can comprise all linguistics collocation parameter (for example, part of speech collocation parameter) of this speech in default corpus, also can only preserve required collocation parameter.For example, the threshold value of a restriction output is set, if certain collocation parameter is then preserved this collocation parameter smaller or equal to this threshold value.
Need to prove that the described language material information that presets can be imported record language material information for internet language material information and/or user.Wherein, described internet language material information can grasp the magnanimity webpage by Netz pider and obtains from the internet; Described user imports the record corpus can comprise direct information and collateral information, and for example, the character record of user's input etc. can be used as direct information, and the character distribution statistics of user's input etc. then can be used as collateral information.Certainly, described preset language material information also can by those skilled in the art as required or experience be provided with, the present invention does not need this to limit.
With reference to Fig. 2, show a kind of method embodiment 2 that obtains limit word information, can comprise:
Step 201, obtain a target word;
Step 202, obtain the linguistics collocation parameter of this target word in default corpus;
Step 203, judge whether described linguistics collocation parameter meets prerequisite, if meet, then write down the restricted information of this target word, described restricted information comprises corresponding linguistics collocation parameter, the ordering when described restricted information is used to limit this speech intelligent word output.
For example, " last once " is very low with the collocation parameter value of the noun of locality, if then a candidate item is " last one " and the collocation of the noun of locality, then from candidate item " last one " is removed when carrying out intelligent word.
Again for example, the collocation parameter of " saying " and verb is less than predetermined threshold, if then a candidate item then will " be said " from the sequence of intelligent word and remove for " saying " collocation with verb.
Preferably, described linguistics collocation parameter can be a general parameter; Perhaps, described linguistics collocation parameter also can comprise the branch parameter at each default scene.Described linguistics collocation parameter can comprise the collocation parameter of speech and speech, collocation parameter of collocation parameter, part of speech and the part of speech of speech and part of speech or the like.The described linguistics collocation performance numerical value that parameter adopted can be for adjacent with frequency now, with now probability or strength of joint value etc., and these numerical value can preset that statistics obtains the corpus from arbitrary, also can directly obtain according to existing experience or knowledge.
Need to prove,, qualified restrictive word can be removed from the sequence of intelligent word, thereby the efficient of intelligent word is improved in the search volume when having reduced intelligent word by above-mentioned screening step.
With reference to Fig. 3, show a kind of method embodiment that upgrades the input method dictionary, specifically can comprise:
Step 301, obtain a target word;
Step 302, obtain the corresponding characteristic information of this target word;
Step 303, judge whether described characteristic information or its corresponding calculated result meet prerequisite, if meet, determine that then this target word is restrictive word and writes down relevant limit information, the ordering that described restricted information is used to limit this speech when exporting separately, and/or, the ordering when being used to limit this speech intelligent word output;
Step 304, described restrictive word and relevant limit information thereof are added in the existing dictionary of input method.
Present embodiment can be applied to: server end has obtained limit word information, then it is upgraded in time to the existing dictionary of input method.The restriction of being upgraded can comprise aforementioned two restricted informations that embodiment obtained, and promptly can comprise the information of ordering when being used to limit this speech exports separately, also can comprise the information of ordering when being used to limit this speech intelligent word output; The two can individualism, also can and deposit.Described restricted information comprises: the restriction of this restrictive word under each default scene be the weight of output separately.
Certainly, also can after being added into dictionary with restricted information, server end then renewal be issued in new dictionary.Concrete renewal transmission mode has not just described in detail at this.
Interpolation described in the step 304 can be variety of way, for example,
Described being added to: judge whether this restrictive word exists in described original dictionary,, then only write down its relevant limit information to the existing dictionary of described input method if exist;
Perhaps, described being added to: directly described restrictive word and relevant limit information thereof are recorded in the existing dictionary of described input method,, then cover original entry if entry repeats;
Perhaps, described being added to: with described restrictive word and relevant limit information stores thereof is a restriction vocabulary, and the existing dictionary of described restriction vocabulary and input method is used for cooperation and finishes the candidate item ordering.
With reference to Fig. 4, show a kind of device embodiment that obtains limit word information, specifically can comprise:
Target word acquiring unit 401 is used to obtain a target word;
Characteristic acquisition unit 402 is used to obtain the corresponding characteristic information of this target word;
Restricted information acquiring unit 403, be used to judge whether described characteristic information or its corresponding calculated result meet prerequisite, if meet, determine that then this target word is restrictive word and writes down relevant limit information, the ordering that described restricted information is used to limit this speech when exporting separately, and/or, the ordering when being used to limit this speech intelligent word output.
With reference to Fig. 5, show a kind of method embodiment that optimizes output, specifically can comprise:
Step 501, receive user's input information, and described input information is changed;
Described input information can comprise coded string, also can comprise the information of handwritten input information and phonetic entry, carries out the candidate item ordering because these input modes also all need to use dictionary.Be the input method platform that the present invention can be applied to various input modes, comprise keyboard symbol, hand-written information and phonetic entry or the like.Because the information translation in these input modes all belongs to known technology, just do not described in detail at this.
For example, when the user imports, input method system can carry out cutting to the coded string of user's input, is example with the cutting to the Pinyin coding character string, usually, a Pinyin coding character string is carried out cutting, can obtain multiple cutting scheme, for example, for Pinyin coding character string " fangan ", can be cut into " fang ' an ", also can be cut into " fan ' gan " etc.Certainly, the method for described cutting can be arbitrary method of the prior art, and the present invention does not need this to limit.
Step 502, acquisition output candidate item;
With a kind of phonetic network syncopation is example, and the process that obtains output candidate item according to the coded string after the described cutting is equivalent to the process that a continuous phonetic transcription stream of input is automatically converted to corresponding word flow.Specifically, described process is: for a given continuous phonetic transcription stream A, can cutting be a pinyin sequence A according to certain phonetic stream cutting algorithm 1A 2... A m, each phonetic A wherein iOne group of corresponding unisonance words can be shown W with one group of row node table I1W I2... W I3So for pinyin sequence A 1A 2... A m, corresponding candidate's unisonance words can be organized the row node with m and represent.Obviously, candidate's unisonance word matrix formed in candidate's phonetically similar word speech of a pinyin sequence correspondence.Adjacent node is coupled together with directed edge, form the speech grid.The speech grid has constituted the state space of Chinese character input problem.Sound word transfer problem develops into optimum path problems of search in the speech grid.
For example, import a phonetic stream " zheshiyizhipiaoliangdemao ", and process phonetic stream cutting generation " zhe ' shi ' yi ' zhi ' piaoliang ' de ' mao " pinyin sequence, the speech grid of this pinyin sequence correspondence is shown in Figure 6.
Then, rule match is carried out in the language rule storehouse of inquiry system, recursively all nodes that can mate the adjacent column of a certain language rule is bundled into the language element node, the forming element grid.This element grid has constituted the new state space of sound word conversion.By using the Viterbi dynamic programming algorithm, the probable value of the Bigram of system statistics storehouse and Bigram learning database is got up by weighted combination, calculate the probability of prepare word in the words all in the element grid, select the words candidate who wherein has maximum probability to export as sound word transformation result.
Certainly, it all is feasible that those skilled in the art adopt any method that obtains described output candidate item, and the present invention does not need this to limit.
Step 503, judge whether to meet the prerequisite of application limitations information;
Step 504 is if then extract the corresponding restricted information of output candidate item, and according to described restricted information each candidate item is sorted.
According to described restricted information to each candidate item sort can by direct setting represent the position or the order mode realize, also can by the correction word frequency (include but not limited to weighting, fall power) mode realize; Wherein, the most extreme is exactly to remove from candidate item and do not show.
When certain speech had the restricted information of the independent output of restriction, the prerequisite of described application limitations information was: whether described output candidate item is the speech of exporting separately; Described restricted information then can obtain described restricted information by following steps: obtain a target word; Obtain the corresponding characteristic information of this target word; Judge whether described characteristic information or its corresponding calculated result meet prerequisite, if meet, then at this target word record relevant limit information.
When certain speech had the restricted information of restriction group speech output, the prerequisite of described application limitations information is: whether described output candidate item belonged to the intelligent word situation, and described restricted information then can obtain by following steps: obtain a target word; Obtain the linguistics collocation parameter of this target word in default corpus; Judge whether described linguistics collocation parameter meets prerequisite, if meet, then writes down the restricted information of this target word, described restricted information comprises corresponding linguistics collocation parameter, the ordering when described restricted information is used to limit this speech intelligent word output.
Preferably, when needs judge whether described output candidate item is the speech of output separately, can finish by following steps:
Coded string at user's input at first obtains all possible output candidate item; Then, judge whether an output candidate item only comprises an element, and length is greater than 1 output character; Described element is to preset the words of storing in the dictionary; If determine that then this output candidate item is the speech of exporting separately.For the judgement that whether comprises an element, can from dictionary, inquire about acquisitions by the mode of ID mapping, perhaps pass through the number of the judgement containing element ID of institute, can determine whether described output candidate item only comprises an element.
Described 1 output character can be the character of different byte lengths or other length in the different input method systems, and for example, for Chinese, Japanese or Korean input method, described 1 output character is the word that comprises 2 bytes; For the judgement of described length, can judge that described length parameter can be stored in the attribute of corresponding entry at described words ID by reading the length parameter that presets in the dictionary; Perhaps, judge, and to adopt other method of the prior art all be feasible that the present invention does not limit this by the length of directly obtaining described output candidate item.
For example, for the situation of user's input coding character string " liangjiangzong ", finish after the cutting of phonetic network at this coded string, the possible candidate item of each that obtains is: two rivers are total, amount general, two rivers, good general or the like.Wherein, suppose that each candidate item can be expressed as<entry 1 attribute 1 〉,<entry 2, attribute 2〉... ..; Perhaps, the ID of<entry 1, attribute 1 〉,<ID of entry 2, attribute 2 〉.
Such as, for candidate item " two rivers are total ", just can be expressed as:<two river p1,<total p2 〉;
For candidate item " amount is incited somebody to action ", just can be expressed as:<amount is with q1 〉;
And for<measure q1 for, it only comprises an element, and greater than 1 output character; Continue to judge whether its attribute q1 comprises the restricted information mark, because it has restricted information mark (for example, tag non-0), so this candidate item is not exported.Can also comprise length parameter among the described attribute q1.
The candidate item of promptly final output is: two rivers are total, two rivers, good general.
For generalized case, a candidate item is not independent output, then is exactly to belong to the output of group speech, so said process also can be used for the judgement of intelligent word situation.
Certainly, when only having imported two syllables, can pass through above-mentioned deterministic process, directly be judged to be independent output, because two syllables generally can not be the situations of intelligent word as the user.Be that described judging whether can comprise any method of the prior art for the method for exporting separately, for example,, judge that the output candidate item that obtains is the speech of exporting separately for the coded string that does not need to carry out cutting of user's input; Perhaps, for the coded string of user input output candidate item, be defined as the speech of output separately corresponding to single entry in the dictionary.
With reference to Fig. 7, show a kind of input method system embodiment, specifically can comprise:
Input interface unit 701 and display unit 702, and
Dictionary 703: described dictionary comprises restricted information; Wherein said restricted information can be aforesaid various restricted informations; The existing way of described restricted information also can be various, for example, is present in the dictionary in the mode of vocabulary, perhaps realizes by the mode to the corresponding entry marking in the dictionary.
Candidate item acquiring unit 704: be used for obtaining output candidate item according to user's input information;
Judging unit 705 is used to judge whether an output candidate item meets the prerequisite of application limitations information;
Candidate item sequencing unit 706 is used for when meeting prerequisite, extracts the corresponding restricted information of this output candidate item, and according to described restricted information each candidate item is sorted.
Described dictionary 703 can comprise entry information and limit word information, promptly can be for the speech record limit word information that meets prerequisite in existing dictionary.Another kind of preferred situation is that described dictionary 703 is for comprising basic dictionary and limit vocabulary that described restriction vocabulary has the vocabulary of limit word information for record.In this case, can be a restriction vocabulary with word and the corresponding restricted information separate, stored that meets prerequisite, the input method dictionary in the present embodiment promptly formed in this restriction vocabulary and basic dictionary.Certainly, it also is feasible that those skilled in the art adopt other method of the prior art to preset the input method dictionary, and the present invention does not limit this.
Preferably, when certain speech had the restricted information of the independent output of restriction, the prerequisite of described application limitations information was: whether described output candidate item is the speech of exporting separately; Then described judging unit further can comprise: be used to judge whether an output candidate item only comprises the subelement of an element; Wherein, described element is to preset the words of storing in the dictionary; And whether the length that is used to judge this output candidate item is greater than the subelement of 1 output character; And, be used for when this output candidate item meets above-mentioned two Rule of judgment, determining that it is the subelement of the speech of independent output.
When certain speech had the restricted information of restriction group speech output, the prerequisite of described application limitations information is: whether described output candidate item belonged to the intelligent word situation.Its decision procedure also can adopt preceding method, if do not meet Rule of judgment, then belongs to the intelligent word situation.
Above-mentioned input method system can be common input method system, and for example, the input interface unit of described input method system, display unit and dictionary are arranged in same computing equipment; Above-mentioned input method system can be the input method in network system, for example, the input interface unit of described input method system, display unit are arranged in first computing equipment, dictionary is arranged in second computing equipment, described input method system is according to the information of user's input, obtain corresponding information from being arranged in second computing equipment, show corresponding words at first computing equipment.
Because aforesaid each embodiment is based on the same design of the present invention, so mutual description emphatically is the difference part, similarity can be referring to this instructions appropriate section.
More than to a kind of method and apparatus that obtains limit word information provided by the present invention, a kind ofly upgrade the method for dictionary, a kind of method and a kind of input method system of optimizing output is described in detail, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, the part that all can change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims (25)

1, a kind of method of obtaining limit word information is characterized in that, comprising:
Obtain a target word;
Obtain the corresponding characteristic information of this target word, described characteristic information is with numeric representation;
Judge described characteristic information or utilize characteristic information to carry out result calculated and whether meet prerequisite, if meet, determine that then this target word is restrictive word and writes down relevant limit information, described restrictive word is meant the speech with defective on linguistics or the use habit, the ordering that described restricted information is used to limit this target word when exporting separately.
2, the method for claim 1 is characterized in that,
Described characteristic information is: the individual character that is positioned at prefix in this target word is in the eigenwert of default corpus as prefix, and the individual character that is positioned at suffix in this target word is in the eigenwert of default corpus as suffix;
Whether described prerequisite is: exist at least one eigenwert to belong to presetting range in the above-mentioned eigenwert.
3, the method for claim 1 is characterized in that,
Described characteristic information is: the eigenwert that ties up in the default corpus is closed in each monosyllabic word that is comprised in this target word and/or the collocation of the linguistics of multi-character words;
Whether described prerequisite is: exist at least one eigenwert to belong to presetting range in the above-mentioned eigenwert.
4, the method for claim 1 is characterized in that,
Described characteristic information is: this target word eigenwert that the user imports separately in input method is used;
Described prerequisite is: whether this eigenwert belongs to presetting range.
5, the method for claim 1 is characterized in that,
Described characteristic information comprises: the individual character that is positioned at prefix in this target word is in the eigenwert of default corpus as prefix; The individual character that is positioned at suffix in this target word is in the eigenwert of default corpus as suffix; And the general word frequency of this target word;
Described prerequisite is: whether the ratio of at least one eigenwert and the general word frequency of this target word belongs to presetting range in the above-mentioned eigenwert.
6, the method for claim 1 is characterized in that,
Described characteristic information comprises: the eigenwert that ties up in the default corpus is closed in each monosyllabic word that is comprised in this target word and/or the collocation of the linguistics of multi-character words; And the general word frequency of this target word;
Described prerequisite is: whether the ratio of at least one eigenwert and the general word frequency of this target word belongs to presetting range in the above-mentioned eigenwert.
7, the method for claim 1 is characterized in that,
Described characteristic information is: this target word eigenwert that the user imports separately in input method is used; And the general word frequency of this target word;
Described prerequisite is: whether the ratio of this eigenwert and the general word frequency of this target word belongs to presetting range.
8, the method for claim 1 is characterized in that,
Described characteristic information is: this target word is at the user's sorting position information in each candidate word of same input coding; And the original sorting position information of this target word; Wherein, described user's sorting position information is relevant with this target word eigenwert that the user imports separately in input method is used; Described original sorting position information is relevant with the general word frequency of this target word;
Described prerequisite is: whether the difference of described user's sorting position information and described original sorting position information belongs to presetting range.
9, as the described arbitrary method of claim 1-8, it is characterized in that described restricted information comprises: the restriction of this restrictive word under each default scene be the weight of output separately.
10, as the described arbitrary method of claim 1-8, it is characterized in that,
Described restricted information comprises: the linguistics collocation parameter of this restrictive word in default corpus; Described linguistics collocation parameter is used to limit the ordering of this speech when intelligent word is exported.
11, as the described arbitrary method of claim 1-8, it is characterized in that, also comprise:
Generate a dictionary or vocabulary, described dictionary or vocabulary comprise described restrictive word and relevant limit information thereof;
Perhaps, generate a dictionary, described dictionary comprises described restrictive word and relevant limit information thereof, and general words.
12, a kind of method of obtaining limit word information is characterized in that, comprising:
Obtain a target word;
Obtain the linguistics collocation parameter of this target word in default corpus, described linguistics collocation parameter is with numeric representation;
Judge whether described linguistics collocation parameter meets prerequisite, if meet, then write down the restricted information of this target word, described restricted information comprises corresponding linguistics collocation parameter, described restricted information is that the speech with defective on linguistics or the use habit is limited, the ordering when described restricted information is used to limit this target word intelligent word output.
13, method as claimed in claim 12 is characterized in that:
Described linguistics collocation parameter is a general parameter;
Perhaps, described linguistics collocation parameter comprises the branch parameter at each default scene.
14, a kind of method of upgrading dictionary is characterized in that, comprising:
Obtain a target word;
Obtain the corresponding characteristic information of this target word, described characteristic information is with numeric representation;
Judge described characteristic information or utilize characteristic information to carry out result calculated and whether meet prerequisite, if meet, determine that then this target word is restrictive word and writes down relevant limit information, described restrictive word is meant the speech with defective on linguistics or the use habit, the ordering that described restricted information is used to limit this target word when exporting separately, and/or, the ordering when being used to limit this target word intelligent word output;
Described restrictive word and relevant limit information thereof are added in the existing dictionary of input method.
15, method as claimed in claim 14 is characterized in that,
Described being added to: judge whether this restrictive word exists in the existing dictionary of described input method,, then only write down its relevant limit information to the existing dictionary of described input method if exist;
Perhaps, described being added to: directly described restrictive word and relevant limit information thereof are recorded in the existing dictionary of described input method,, then cover original entry if entry repeats;
Perhaps, described being added to: with described restrictive word and relevant limit information stores thereof is a restriction vocabulary, and the existing dictionary of described restriction vocabulary and input method is used for cooperation and finishes the candidate item ordering.
16, method as claimed in claim 14 is characterized in that, described restrictive word has the restricted information under each default scene.
17, a kind of device that obtains limit word information is characterized in that, comprising:
The target word acquiring unit is used to obtain a target word;
Characteristic acquisition unit is used to obtain the corresponding characteristic information of this target word, and described characteristic information is with numeric representation;
The restricted information acquiring unit, be used to judge described characteristic information or utilize characteristic information to carry out result calculated and whether meet prerequisite, if meet, determine that then this target word is restrictive word and writes down relevant limit information, described restrictive word is meant the speech with defective on linguistics or the use habit, the ordering that described restricted information is used to limit this target word when exporting separately, and/or, the ordering when being used to limit this target word intelligent word output.
18, a kind of method of optimizing output is characterized in that, comprising:
Receive user's input information, and described input information is changed;
Obtain output candidate item;
Judge whether an output candidate item meets the prerequisite of application limitations information; Described restricted information is that the speech with defective on linguistics or the use habit is limited;
If then extract the corresponding restricted information of this output candidate item, and each candidate item sorted according to described restricted information; Wherein, described ordering is to realize by the mode that direct setting represents position or order, or realizes by the mode of revising word frequency.
19, method as claimed in claim 18 is characterized in that:
The prerequisite of described application limitations information is: whether described output candidate item is the speech of exporting separately;
Perhaps, the prerequisite of described application limitations information is: whether described output candidate item belongs to the intelligent word situation.
20, method as claimed in claim 18 is characterized in that, obtains described restricted information by following steps:
Obtain a target word;
Obtain the corresponding characteristic information of this target word, described characteristic information is with numeric representation;
Judge described characteristic information or utilize characteristic information to carry out result calculated whether meet prerequisite, if meet, then at this target word record relevant limit information.
21, method as claimed in claim 19 is characterized in that, when needs judge whether described output candidate item is the speech of output separately, finishes by following steps:
Judge whether an output candidate item only comprises an element, and length is greater than 1 output character; Described element is to preset the words of storing in the dictionary;
If determine that then this output candidate item is the speech of exporting separately.
22, a kind of input method system comprises input interface unit and display unit, it is characterized in that, described input method system also comprises:
Dictionary: described dictionary comprises restricted information;
Candidate item acquiring unit: be used for obtaining output candidate item according to user's input information;
Judging unit is used to judge whether an output candidate item meets the prerequisite of application limitations information; Described restricted information is that the speech with defective on linguistics or the use habit is limited;
The candidate item sequencing unit is used for when meeting prerequisite, extracts the corresponding restricted information of this output candidate item, and according to described restricted information each candidate item is sorted; Wherein, described ordering is to realize by the mode that direct setting represents position or order, or realizes by the mode of revising word frequency.
23, the system as claimed in claim 22 is characterized in that:
The prerequisite of described application limitations information is: whether described output candidate item is the speech of exporting separately;
Perhaps, the prerequisite of described application limitations information is: whether described output candidate item belongs to the intelligent word situation.
24, input method system as claimed in claim 22 is characterized in that, described judging unit further comprises:
Be used to judge whether this output candidate item only comprises the subelement of an element; Wherein, described element is to preset the words of storing in the dictionary; And,
Whether the length that is used to judge this output candidate item is greater than the subelement of 1 output character; And,
Be used for when this output candidate item meets above-mentioned two Rule of judgment, determining that it is the subelement of the speech of independent output.
25, input method system as claimed in claim 23 is characterized in that, the input interface unit of described input method system, display unit and dictionary are arranged in same computing equipment;
Perhaps, the input interface unit of described input method system, display unit are arranged in first computing equipment, and dictionary is arranged in second computing equipment, and described input method system is according to the information of user's input, obtain corresponding information from being arranged in second computing equipment, show corresponding words at first computing equipment.
CNB2007100996440A 2007-05-25 2007-05-25 Method for catching limit word information, optimizing output and input method system Active CN100483417C (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CNB2007100996440A CN100483417C (en) 2007-05-25 2007-05-25 Method for catching limit word information, optimizing output and input method system
PCT/CN2008/071064 WO2008145055A1 (en) 2007-05-25 2008-05-23 The method for obtaining restriction word information, optimizing output and the input method system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2007100996440A CN100483417C (en) 2007-05-25 2007-05-25 Method for catching limit word information, optimizing output and input method system

Publications (2)

Publication Number Publication Date
CN101055588A CN101055588A (en) 2007-10-17
CN100483417C true CN100483417C (en) 2009-04-29

Family

ID=38795424

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2007100996440A Active CN100483417C (en) 2007-05-25 2007-05-25 Method for catching limit word information, optimizing output and input method system

Country Status (2)

Country Link
CN (1) CN100483417C (en)
WO (1) WO2008145055A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103455160A (en) * 2012-05-29 2013-12-18 阿里巴巴集团控股有限公司 Method and device for recommending candidate words according to geographic position
CN103869998A (en) * 2012-12-11 2014-06-18 百度国际科技(深圳)有限公司 Method and device for sorting candidate items generated by input method

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100483417C (en) * 2007-05-25 2009-04-29 北京搜狗科技发展有限公司 Method for catching limit word information, optimizing output and input method system
US8407236B2 (en) 2008-10-03 2013-03-26 Microsoft Corp. Mining new words from a query log for input method editors
CN102141868B (en) * 2010-01-28 2013-08-14 北京搜狗科技发展有限公司 Method for quickly operating information interaction page, input method system and browser plug-in
CN102193639B (en) * 2010-03-04 2014-03-12 阿里巴巴集团控股有限公司 Method and device of statement generation
CN102495679A (en) * 2011-12-01 2012-06-13 上海量明科技发展有限公司 Composite spelling input method, word bank and system thereof
CN103365875B (en) * 2012-03-29 2018-05-11 百度在线网络技术(北京)有限公司 A kind of method and apparatus for being used to provide contact object in current application
CN106156056B (en) * 2015-03-27 2020-03-06 联想(北京)有限公司 Text mode learning method and electronic equipment
CN105094368B (en) * 2015-07-24 2018-05-15 上海二三四五网络科技有限公司 A kind of control method and control device that frequency modulation sequence is carried out to candidates of input method
CN105955495A (en) * 2016-04-29 2016-09-21 百度在线网络技术(北京)有限公司 Information input method and device
CN107390896B (en) * 2017-07-21 2019-12-03 深圳市鹰硕技术有限公司 A kind of the dictionary management method and device of input method
CN107424461B (en) * 2017-08-01 2019-12-03 深圳市鹰硕技术有限公司 Information screen method and system
CN108509555B (en) * 2018-03-22 2021-07-23 武汉斗鱼网络科技有限公司 Search term determination method, device, equipment and storage medium
CN108733831B (en) * 2018-05-25 2022-05-17 腾讯音乐娱乐科技(深圳)有限公司 Method and device for processing word stock
CN111381684A (en) * 2018-12-28 2020-07-07 北京搜狗科技发展有限公司 Method and device for shielding gray self-made phrase
CN112083814A (en) * 2020-08-28 2020-12-15 的卢技术有限公司 Word bank generating method based on AI and cloud computing

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1123815C (en) * 1997-07-25 2003-10-08 上海欧姆龙计算机有限公司 Automatic logging method and device for phonetic words relation table in Chinese character input system
CN1203387C (en) * 2001-02-15 2005-05-25 英业达股份有限公司 Method for regulating character frequency
US7478033B2 (en) * 2004-03-16 2009-01-13 Google Inc. Systems and methods for translating Chinese pinyin to Chinese characters
JP2006050160A (en) * 2004-08-03 2006-02-16 Sharp Corp Device, program and recording medium for inputting chinese language
CN100550011C (en) * 2004-11-29 2009-10-14 广东瑞图万方科技有限公司 Set up the method and the corresponding association input system and the method for association input system
CN100424703C (en) * 2006-08-23 2008-10-08 北京搜狗科技发展有限公司 Method for obtaining newly encoded character string, input method system and word stock generation device
CN100483417C (en) * 2007-05-25 2009-04-29 北京搜狗科技发展有限公司 Method for catching limit word information, optimizing output and input method system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103455160A (en) * 2012-05-29 2013-12-18 阿里巴巴集团控股有限公司 Method and device for recommending candidate words according to geographic position
CN103869998A (en) * 2012-12-11 2014-06-18 百度国际科技(深圳)有限公司 Method and device for sorting candidate items generated by input method
CN103869998B (en) * 2012-12-11 2018-05-01 百度国际科技(深圳)有限公司 A kind of method and device being ranked up to candidate item caused by input method

Also Published As

Publication number Publication date
CN101055588A (en) 2007-10-17
WO2008145055A1 (en) 2008-12-04

Similar Documents

Publication Publication Date Title
CN100483417C (en) Method for catching limit word information, optimizing output and input method system
CN108304375B (en) Information identification method and equipment, storage medium and terminal thereof
CN101470732B (en) Auxiliary word stock generation method and apparatus
CN110826331A (en) Intelligent construction method of place name labeling corpus based on interactive and iterative learning
CN101079024B (en) Special word list dynamic generation system and method
CN112818093B (en) Evidence document retrieval method, system and storage medium based on semantic matching
CN103577394B (en) A kind of machine translation method based on even numbers group searching tree and device
CN106021572B (en) The construction method and device of binary feature dictionary
CN104636478A (en) Information query method and device
CN112395385B (en) Text generation method and device based on artificial intelligence, computer equipment and medium
CN104199965A (en) Semantic information retrieval method
CN102043808A (en) Method and equipment for extracting bilingual terms using webpage structure
US9720976B2 (en) Extracting method, computer product, extracting system, information generating method, and information contents
CN112115232A (en) Data error correction method and device and server
CN104866511A (en) Method and equipment for adding multi-media files
CN110853625A (en) Speech recognition model word segmentation training method and system, mobile terminal and storage medium
WO2018213783A1 (en) Computerized methods of data compression and analysis
CN111753514B (en) Automatic generation method and device of patent application text
CN110929518A (en) Text sequence labeling algorithm using overlapping splitting rule
CN112417875B (en) Configuration information updating method and device, computer equipment and medium
CN114385791A (en) Text expansion method, device, equipment and storage medium based on artificial intelligence
CN114139560A (en) Translation system based on artificial intelligence
CN113779987A (en) Event co-reference disambiguation method and system based on self-attention enhanced semantics
CN113240485A (en) Training method of text generation model, and text generation method and device
CN111857688A (en) SQL code automatic completion method, system and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant