CN101276245B - Reminding method and system for coding to correct error in input process - Google Patents

Reminding method and system for coding to correct error in input process Download PDF

Info

Publication number
CN101276245B
CN101276245B CN2008101042171A CN200810104217A CN101276245B CN 101276245 B CN101276245 B CN 101276245B CN 2008101042171 A CN2008101042171 A CN 2008101042171A CN 200810104217 A CN200810104217 A CN 200810104217A CN 101276245 B CN101276245 B CN 101276245B
Authority
CN
China
Prior art keywords
candidate item
user
string
coding
equivalent way
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2008101042171A
Other languages
Chinese (zh)
Other versions
CN101276245A (en
Inventor
杨磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN2008101042171A priority Critical patent/CN101276245B/en
Publication of CN101276245A publication Critical patent/CN101276245A/en
Application granted granted Critical
Publication of CN101276245B publication Critical patent/CN101276245B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The present invention provides a suggestive and device of error correcting encode in the input course, the method includes following steps: receiving the encode string of user input; obtaining a corresponding candidate item by a conversion according to the received encode string; judging the candidate item whether exists candidate item obtained by easily-confused encode identification method or not; if yes, suggesting the error correcting encode information. In order to help the user input correct encode string, the invention creatively provides the suggestive information containing correct encoding string to the user under the confused condition, to help the user actively improve the input accuracy of the encode string in the use course, reducing the dependence of the fuzzy tone for user.

Description

The reminding method of code correction and system in a kind of input process
Technical field
The present invention relates to computerized information input technology field, particularly relate to a kind of in the information input process reminding method and the system of code correction.
Background technology
For users such as Chinese, Japanese, Korean, generally all need to be undertaken alternately by input method procedure and computing machine: the user is by keyboard input coding character string, can it be transformed to the candidate item of corresponding language according to its standard mapping rule that presets by input method procedure then, and then confirm to import information needed by the user.
But because variety of issues such as people's speech habits, region difference, people are when the input coding character string, coded string (different) that may input error with the applied standard mapping rule of input method procedure, and then be difficult to obtain the required candidate item result of this user.
For example, for the Chinese pinyin input method, because China is vast in territory, there are various dialects in different areas.Be subjected to dialect influence, particularly southern regional users, the flat cerebral of ubiquity (z/zh, s/sh, c/ch) is regardless of, front and back nasal sound (an/ang, en/eng, in/ing) is regardless of, (l/n) problem off grade.The minority area also exists h/f to be regardless of, l/r is regardless of, eng and ong are regardless of or the like.When relating to these initial consonants/simple or compound vowel of a Chinese syllable,, bring very big inconvenience to input because the user can't confirm the accurate pronunciation of Chinese character.For example when the user can't distinguish the front and back nasal sound, the user was difficult to distinguish in the standard mapping rule of spelling input method, and it still is " feng " that Chinese character " wind " and " branch " are read " fen " actually, can only import by the means that make repeated attempts.
These users' input for convenience, a lot of existing input method all provide " fuzzy sound " function.Be about to the syllable that the certain user may obscure and be considered as being equal to (which syllable need be considered as being equal to, can be provided with voluntarily, see the surface chart of Fig. 1), thereby allow the user to use the mode of oneself being accustomed to more to import by the user.For example, for the user of uncomfortable back nasal sound, under situation about being equal to,, and participate in word frequency adjustment (will be placed on the front) than word commonly used as long as input " fen " just can obtain " branch " and " wind " simultaneously; Thereby greatly facilitate this class user's input, ask for an interview Fig. 2.
But, from above-mentioned application process as can be seen, in order to realize support (i.e. being equal to the syllable that may obscure) to fuzzy sound, the feasible candidate item that had different phonetics originally has been combined in together, and the confession user selects (as, " branch " and " wind "), make that promptly the problem of repeated code candidate item is more serious, allow the user select selected ci poem word difficulty more.And owing to the unisonance character/word exists, repeated code is the problem that spelling input method faces and needs to solve always, but because input method makes obviously that to the support of bluring sound the problem of repeated code is more serious.
Top example is only at spelling input method, and for other input methods, this problem exists equally in fact, so long as support easily to obscure the input method of coding identical functions, all can have the technological deficiency of strengthening coincident code problem.
In a word, need the urgent technical matters that solves of those skilled in the art to be exactly: how can easily obscure under the situation of coding identical functions, to reduce coincident code problem, reduce the dependence of user to fuzzy sound in the input method support.
Summary of the invention
Technical matters to be solved by this invention provides the reminding method and the system of code correction in a kind of input process, can be to the various code correction information of user prompt, with the input accuracy that helps the user in use initiatively to improve coded string, thereby reduce the influence of easily obscuring the increase repeated code that the identical functions of encoding brings owing to the input method support.
In order to address the above problem, the invention discloses the reminding method of code correction in a kind of input process, comprising: the coded string that receives user's input; Coded string according to being received is converted to corresponding candidate item; Judge whether exist in the described candidate item by easily obscuring the candidate item that the coding equivalent way obtains; If then point out code correction information.
Preferably, can point out code correction information in the following manner: when representing this candidate item, represent its corresponding correct coding character string.
Preferably, when exist a plurality of by easily obscure the coding equivalent way obtain candidate item the time, also comprise: according to presetting rule above-mentioned a plurality of candidate item are screened, only to wherein qualified part candidate item and corresponding correct coding character string thereof represent prompting.
Preferably, before prompting code correction information, also comprise: judge that further whether this occurrence number by the particular candidate item easily obscuring the coding equivalent way and obtain or the frequency of occurrences are more than or equal to predetermined threshold, if then point out code correction information.
Preferably, the described occurrence number or the frequency of occurrences are at the input method active user; Perhaps, the described occurrence number or the frequency of occurrences are at whole input method customer group.
Preferably, described method can also comprise: collect code correction information, generate the error correction record sheet, described error correction record sheet comprises user's input string, standard characters and corresponding candidate item.
Preferably, can judge in the following manner and learn that whether a candidate item obtains by easily obscuring the coding equivalent way: the coded string that the user imported and the standard code string of this candidate item are compared, if different, determine that then this candidate item is to obtain by easily obscuring the coding equivalent way.
Preferably, can judge in the following manner and learn that whether a candidate item obtains by easily obscuring the coding equivalent way: the coded string that the user imported and the standard code string of this candidate item are compared, if it is different, judge further then whether it satisfies the normal conversion rule, if do not satisfy, determine that then this candidate item is to obtain by easily obscuring the coding equivalent way.
Preferably, when using spelling input method, can judge to learn whether a candidate item obtains by easily obscuring the coding equivalent way in the following manner: obtain the process of candidate item at coded string from user's input, the attribute of record syllable create-rule, if used the specific coding equivalent way of easily obscuring, determine that then this candidate item is to obtain by easily obscuring the coding equivalent way.
Preferably, can judge in the following manner and learn that whether a candidate item obtains by easily obscuring the coding equivalent way: with the standard code string of coded string that the user imported and this candidate item as mapping relations, in the mapping ruler table of input method, retrieve it and whether satisfy the specific coding equivalent way of easily obscuring, if determine that then this candidate item is to obtain by easily obscuring the coding equivalent way.
According to another preferred embodiment of the present invention, the suggestion device of code correction in a kind of input process is also disclosed, comprising:
Interface unit is used to receive the coded string that the user imports;
Transcoder unit is used for being converted to corresponding candidate item according to the coded string that is received;
Obscure judging unit, be used for judging whether described candidate item exists by easily obscuring the candidate item that the coding equivalent way obtains; If, announcement information display unit then;
The information exhibition unit is used to point out code correction information.
Preferably, can point out code correction information in the following manner: when representing this candidate item, represent its corresponding correct coding character string.
Preferably, when exist a plurality of by easily obscure the coding equivalent way obtain candidate item the time, obscuring between judging unit and the information exhibition unit and can also comprise: the first screening module, be used for above-mentioned a plurality of candidate item being screened, only to wherein qualified part candidate item and corresponding correct coding character string announcement information display unit thereof represent prompting according to presetting rule.
Preferably, obscuring between judging unit and the information exhibition unit and can also comprise: the second screening module, be used to judge that whether this occurrence number by the particular candidate item easily obscuring the coding equivalent way and obtain or the frequency of occurrences are more than or equal to predetermined threshold, if then the announcement information display unit represents prompting.
Preferably, described device can also comprise: error correction record sheet generation unit, be used to collect code correction information, and generate the error correction record sheet, described error correction record sheet comprises user's output encoder character string, standard characters and corresponding candidate item.
Preferably, can judge in the following manner and learn that whether a candidate item obtains by easily obscuring the coding equivalent way: the coded string that the user imported and the standard code string of this candidate item are compared, if different, determine that then this candidate item is to obtain by easily obscuring the coding equivalent way.
Preferably, can judge in the following manner and learn that whether a candidate item obtains by easily obscuring the coding equivalent way: the coded string that the user imported and the standard code string of this candidate item are compared, if it is different, judge further then whether it satisfies the normal conversion rule, if do not satisfy, determine that then this candidate item is to obtain by easily obscuring the coding equivalent way.
Preferably, when using spelling input method, can judge to learn whether a candidate item obtains by easily obscuring the coding equivalent way in the following manner: obtain the process of candidate item at coded string from user's input, the attribute of record syllable create-rule, if used the specific coding equivalent way of easily obscuring, determine that then this candidate item is to obtain by easily obscuring the coding equivalent way.
Preferably, can judge in the following manner and learn that whether a candidate item obtains by easily obscuring the coding equivalent way: with the standard code string of coded string that the user imported and this candidate item as mapping relations, in the mapping ruler table of input method, retrieve it and whether satisfy the specific coding equivalent way of easily obscuring, if determine that then this candidate item is to obtain by easily obscuring the coding equivalent way.
Compared with prior art, the present invention has the following advantages:
The present invention can reduce repeated code from root, promptly as specific user more accurately during the input coding character string, input method easily obscure using that the coding identical functions just can be fewer, and then can reduce the dependence of user to fuzzy sound, reducing repeated code influences.But how could help the user to import correct coded string?
Since input method easily obscure the coding identical functions be actually yield to obscured under user's the situation candidate item with different coding character string (as, obscured the Chinese character of different pronunciations), make that the user in use need not to distinguish obscuring coded string, thereby may not know its coded string (as, the right pronunciation of fuzzy sound) accurately forever.Therefore, the user can initiatively not improve the accuracy of coded string in input process, and then causes coincident code problem to exist always, even is encouraged.
So the present invention imports correct coded string in order to help the user, the proposition of novelty is under situation about obscuring, the information of correct coding character string is provided from the trend user, with the accuracy that helps the user in use initiatively to improve coded string, reduce the dependence of user to fuzzy sound.
Description of drawings
Fig. 1 is that existing fuzzy sound function is provided with the interface synoptic diagram;
Fig. 2 is that the existing candidate item of using fuzzy sound function is selected the interface synoptic diagram;
Fig. 3 is the flow chart of steps of the specific embodiment 1 of the reminding method of code correction in a kind of input process of the present invention;
Fig. 4 is the present invention carries out error correcting prompt to the candidate item of using fuzzy sound function an interface synoptic diagram;
Fig. 5 is the flow chart of steps of the specific embodiment 2 of the reminding method of code correction in a kind of input process of the present invention;
Fig. 6 is the flow chart of steps of the specific embodiment 3 of the reminding method of code correction in a kind of input process of the present invention;
Fig. 7 is the process synoptic diagram of phonetic input;
Fig. 8 is a kind of network diagram of syllable splitting;
Fig. 9 is the network diagram of another kind of more complicated syllable splitting;
The structured flowchart of the suggestion device embodiment 1 of code correction in a kind of input process of Figure 10 the present invention;
Figure 11 is the structured flowchart of the suggestion device embodiment 2 of code correction in a kind of input process of the present invention.
Embodiment
For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, the present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.
The present invention can be used in numerous general or special purpose computingasystem environment or the configuration.For example: personal computer, server computer, handheld device or portable set, plate equipment, multicomputer system and comprise distributed computing environment of above any system or equipment or the like.
The present invention can describe in the general context of the computer executable instructions of being carried out by computing machine, for example program module.Usually, program module comprises the routine carrying out particular task or realize particular abstract, program, object, assembly, data structure or the like.Also can in distributed computing environment, put into practice the present invention, in these distributed computing environment, by by communication network connected teleprocessing equipment execute the task.In distributed computing environment, program module can be arranged in the local and remote computer-readable storage medium that comprises memory device.
With reference to Fig. 3, show the specific embodiment 1 of the reminding method of code correction in a kind of input process of the present invention, can comprise:
The coded string of step 301, reception user input;
The coded string that step 302, foundation are received is converted to corresponding candidate item;
Step 303, judge whether exist in the described candidate item by easily obscuring the candidate item that obtains of coding equivalent way;
Step 304 is if then point out code correction information.
Step 301 in the present embodiment and 302 for existing various input methods, all is more known, does not therefore repeat them here.In the step 302 there being much concrete transformation rule, for example, just can comprise five, simplicity, spelling, Two bors d's oeuveres or the like for Chinese character input; The present invention does not need this is limited.For various transformation rules, all may exist it easily to obscure the situation that coding is equal to application accordingly.Be primarily aimed at fuzzy sound in the spelling input method in the following description and obscure situation and be illustrated, other obscure situation with reference to getting final product.
For the concrete judgment mode of step 303, may exist a variety ofly, the present invention can be described in detail in the back.Following elder generation describes in detail to the specific implementation of step 304.
The implementation of prompting code correction information also can have multiple, as follows in the step 304:
Example 1
Point out code correction information in the following manner: when representing this candidate item, represent its corresponding correct coding character string.Suppose user's input coding character string " fen ", obtained candidate item " wind " by fuzzy sound equivalent way, then need be to the correct coding character string " feng " of user prompt " wind " according to the present invention.Described representing can comprise visual presence, can also comprise modes such as audible displays.
Certainly, concrete ways of presentation is just more.For example, simple, with reference to Fig. 4, can in candidate item, increase the accurate phonetic that shows Chinese character, with the correct Chinese-character pronunciation of prompting user.Again for example, point out the user by modes such as message pop-up or bubbles.Again for example, can also increase other viewing area, with the correct Chinese-character pronunciation of prompting user; As increasing delegation's information in the candidate item beneath window: input Chinese character " wind ", " envelope " are used and are used phonetic " feng ".
For fear of the input of interference user, then can provide miscue earlier, select whether to need to check the correct coding character string by the user then.For example, simply, can be by different colors or font etc. being set at this candidate item " wind ", the pronunciation of this candidate item of prompting user needs to correct, if the user need check, then click near button of this candidate item " wind " or link etc. and trigger assembly, the correct coding character string " feng " of showing " wind " then by input method to the user.
Example 2
A plurality of when easily obscuring the candidate item that the coding equivalent way obtains when existing, can also screen above-mentioned a plurality of candidate item according to presetting rule, only qualified part candidate item wherein and corresponding correct coding character string thereof are represented prompting.
For example,, in preceding 5 candidate item that show, there are two candidate item " wind ", " envelope " to obtain, therefore, should all point out the error correction information of the two by fuzzy sound equivalent technologies with reference to Fig. 4; But in order to reduce interference to user's input process, preferred, can be only to ordering in above-mentioned two candidate item the preceding " wind " point out (perhaps, first wrong pronunciation candidate item), and " envelope " of back do not pointed out.
Again for example, when user's selection focus arrives candidate item " wind ", just to its in addition prompting of error correction information, otherwise, will not point out, to improve the specific aim of error correcting prompt.
With reference to Fig. 5, show the specific embodiment 2 of the reminding method of code correction in a kind of input process of the present invention, can comprise:
The coded string of step 501, reception user input;
The coded string that step 502, foundation are received is converted to corresponding candidate item;
Step 503, judge whether exist in the described candidate item by easily obscuring the candidate item that obtains of coding equivalent way;
Step 504 is if judge further that then whether this occurrence number by easily obscuring the particular candidate item that obtains of coding equivalent way or the frequency of occurrences are more than or equal to predetermined threshold;
Step 505 is if then point out code correction information.
The present embodiment and the key distinction embodiment illustrated in fig. 4 are, this example needs the occurrence number or the frequency of occurrences of the specific fuzzy sound of statistics, for example, for fuzzy sound, this example needs the statistics user to import the occurrence number or the frequency of occurrences of each fuzzy sound, only the fuzzy sound candidate item that surpasses certain number of times or frequency is pointed out, and occurs obscuring input once in a while for the user, then can point out, avoid too much interference user input.
The occurrence number described in this example or the frequency of occurrences can be at the input method active users, promptly are used to adapt to active user's individual sexual custom because the fuzzy sound that each user need correct in the reality may and inequality; The occurrence number described in this example or the frequency of occurrences also can be at whole input method customer groups, the general character custom of promptly adding up each user.
With reference to Fig. 6, show the specific embodiment 3 of the reminding method of code correction in a kind of input process of the present invention, can comprise:
The coded string of step 601, reception user input;
The coded string that step 602, foundation are received is converted to corresponding candidate item;
Step 603, judge whether exist in the described candidate item by easily obscuring the candidate item that obtains of coding equivalent way;
Step 604 is if then point out code correction information;
Step 605, collection code correction information generate the error correction record sheet, and described error correction record sheet can comprise user's input string, standard characters and corresponding candidate item.
Present embodiment can be collected the error message (as at a period of time) that the user once occurred, and is aggregated into table, checks, learns or use to do him as statistics for the user.
Below to core among the present invention or difficult point: how to discern the application (for example, how discerning the use of fuzzy sound) of easily obscuring the coding equivalent way and be elaborated.Specific implementation may be more, at this several exemplary only is shown as space is limited.
Identifying schemes 1
Judge to learn whether a candidate item obtains by easily obscuring the coding equivalent way in the following manner: the coded string that the user imported and the standard code string of this candidate item are compared, if different, determine that then this candidate item is to obtain by easily obscuring the coding equivalent way.
Be input as example with phonetic, the basic thought of this scheme is, the phonetic of user's input is compared with the right pronunciation of Chinese character, and whether discern it is to blur sound.For example the user has imported " fen ", and retrieval obtains candidate " wind feng ", can be by input Pinyin string " fen " and correct pinyin string " feng " be compared, and the judgement user has used fuzzy sound.The basis of this scheme is that input method has write down the pairing user's input string of single Chinese character and its pronunciation.
Preferably, when a plurality of Chinese characters of the disposable input of user (speech or sentence), input method needs and can carry out correct cutting finding input string that should Chinese character to pinyin string, and existing input method technology can both satisfy the input of speech or sentence substantially.And, in the input method dictionary, can store each Chinese character and corresponding right pronunciation thereof, therefore, input method when generating candidate item, can write down candidate item the accurate pronunciation of corresponding Chinese character.
For example, user's input " fenge " provides option " style ", input method can know that by syllable splitting (as being cut into syllable " fen ' ge ") user's input string of corresponding Chinese character " wind " is " fen ", can know that by dictionary the accurate pronunciation of Chinese character " wind " is " feng ".By these two character strings are compared, can judge whether the user has used fuzzy sound again.When input string and standard pinyin string were compared, the simplest means were to see whether the two is identical.If less demanding,, carry out the prompting of right pronunciation as long as the two difference then just can be considered as fuzzy sound.
In most cases, input method can be for the right pronunciation of Chinese character in word frequency adjustment, the separation information of learning speech purposes such as (as user thesaurus) active recording user input Pinyin string and the candidate item.Therefore except increasing display module, as long as increase a comparison procedure (relatively whether the two is identical).Therefore, this is a scheme very simple and that realize easily.
But, find that in actual applications the pinyin string of user's input is different with the standard pinyin string, might not just belong to fuzzy sound, for example, may also comprise situations such as simplicity, Two bors d's oeuveres.In order can only to point out fuzzy sound, the present invention has provided another identifying schemes:
Identifying schemes 2
Judge to learn whether a candidate item obtains by easily obscuring the coding equivalent way in the following manner: the coded string that the user imported and the standard code string of this candidate item are compared, if it is different, judge further then whether it satisfies the normal conversion rule, if do not satisfy, determine that then this candidate item is to obtain by easily obscuring the coding equivalent way.That is, need to adopt known normal conversion rule that input string and standard code string are screened in this identifying schemes, to improve the degree of accuracy of error correcting prompt.
Suppose that the normal conversion rule comprises simplicity, promptly by using incomplete phonetic to finish the candidate item conversion.
In the simplicity transformation rule,, so also be inequality between input string and the standard phonetic because syllable does not have input fully.For example, the user may directly import " fg " and obtain candidate item " style " or " separation ".For " lattice " or " every ", the user is actual to have imported phonetic " ge " with " g ", but this is not fuzzy sound, should not point out because for simplicity, " fg " correspondence " fengge " belongs to the normal conversion pattern.In addition, some is the syllable of initial consonant with cerebral (zh, ch, sh), and the certain user is accustomed to importing complete initial consonant; For example the simplicity of zhong is zh.Promptly in normal simplicity transformation rule, the user can pass through " zh " input " middle zhong ", if but the user only uses single letter " z " input " middle zhong ", and then should be considered as fuzzy sound and be pointed out.
Suppose that the normal conversion rule comprises Two bors d's oeuveres.
In the Two bors d's oeuveres transformation rule, phonetic is used two letter representations usually.For example represent fen with ff, fg represents feng.If the user has imported " wind " with ff, and the standard pronunciation of " wind " should be " fg ", will be understood that then the user has used fuzzy sound.
Promptly for different transformation rules, criterion can be different, therefore, should use the canonical representation method of Two bors d's oeuveres to come user's input is compared under the Two bors d's oeuveres pattern.Promptly preferred, can learn concrete transformation rule by input method status, perhaps, before showing, candidate item knows concrete transformation rule according to counter the discovering and seizing of syllable.Certainly,, can show the accurate spelling of this candidate item, also can show the accurate spelling of this candidate item under the Two bors d's oeuveres mode when the user shows error correction information.
Above-mentioned identifying schemes 2 can guarantee the accuracy of fuzzy sound prompting to a certain extent, avoids the application of some non-fuzzy sounds is pointed out the interference user input process.But still may there be some deviations in it, for example:
Keystroke is fault-tolerant.For the ease of user's input, some input method has the keystroke fault tolerance.For example search dog phonetic allows the user to use " tign " input " to listen ting ".This also can cause input string and pinyin string inequality, but should not be considered as fuzzy sound.(certainly, in identifying schemes 1 of the present invention and 2, also can point out this situation, if but special at fuzzy sound prompting, then can use different prompting modes).
The ong problem.In most cases, when user's input string and standard pronunciation only differ an alphabetical g at the end, fuzzy (an/ang, en/eng, the in/ing) that causes of normally flat cerebral.But exception is arranged.Most of input method is imported for the ease of the user, and only input " ton " just can access " same tong ".Because " ton " is not a legal syllables, so this should not be counted as fuzzy sound.Some input method even permission user only import " to " and obtain " same tong ", should not be considered as fuzzy sound equally.
By above introduction to identifying schemes 1 and 2, can learn that concrete identifying schemes depends on Pinyin rule, and to pointing out desired precision.Precision is not high, then as long as relatively whether input string is identical with standard phonetic.Simplicity can obtain the solution (the syllable length that fuzzy sound relates to is not less than 2 usually, and general simplicity only comprises a character) of part usually by identification syllable length.If require highlyer, then need the difference between input string and the standard phonetic is carried out more careful comparison.
Following the present invention introduces two more preferred identifying schemes.
Identifying schemes 3 (based on interpretative rule)
When using spelling input method, judge to learn whether a candidate item obtains by easily obscuring the coding equivalent way in the following manner: obtain the process of candidate item at coded string from user's input, the attribute of record syllable create-rule, if used the specific coding equivalent way of easily obscuring, determine that then this candidate item is to obtain by easily obscuring the coding equivalent way.
With reference to Fig. 7, show the procedure chart of phonetic input generally speaking, comprise user's input, syllable splitting, syllable explanation, generate the candidate and show 5 steps.
Promptly obtain user's input and carry out in the process of syllable splitting regeneration candidate (for example, searching system's dictionary, user thesaurus etc.), have the process of a syllable conversion in input method.For example user input " fenge " at first is cut into " fen ' ge ".For the input method of supporting fuzzy sound, system can know that the user imports fen and can be interpreted as syllable fen, also can be interpreted as syllable feng.Previous interpretative rule is employed to be standard phonetic, and then a rule has promptly been used fuzzy sound.Therefore, as long as write down the attribute (as adopting the mode of zone bit) of syllable create-rule, just can understand certain candidate item and whether use fuzzy sound, thereby can carry out fuzzy accurately sound error correcting prompt.Concrete as, zongguo-->[zhong, fuzzy] [guo, standard]-->in [zhong, fuzzy] state's [guo, standard]
Can know that by checking zone bit whether having used fuzzy sound is equal to.Certainly, in the fuzzy sound attribute of record, can also write down more information, user's original input string etc. for example is so that follow-up use; What perhaps operation parameter was distinguished user's use is which kind of is fuzzy, for follow-up statistics or the like.
Below syllable splitting is done simple declaration.
A kind of preferred version of syllable splitting is to be embodied as a network.For example user's input " dandan " can be dan ' dan by cutting.If consider the fuzzy of " an/ang ", its correspondence various explanation, comprise various situations such as " dan ' dan " " dan ' dang " " dang ' dan " " dang ' dang " (only, take on dang, work as dang as dang).In order to simplify, we can take network method for expressing shown in Figure 8.In Fig. 8, by starting point set out reach home pairing arbitrary paths just in time corresponding a kind of syllable explain (totally four kinds).Therefore, this network syllable splitting that should be considered as a kind of compression is represented; Constitute the limit in path just in time corresponding syllable, can add the interpretative rule of syllable on it.
More complicated situation is referring to Fig. 9, and the user imports " fangan ", has two kinds of different slit modes " fan ' gan " and " fang ' an " (gang that dislikes, returns to post, the firm gang of square fang, scheme).
Identifying schemes 3 based on interpretative rule can have greater flexibility.Such as, when relating to simplicity, can determine whether additional fuzzy sound attribute according to rule.Such as with single letter " z " input " zhong ", this syllable is interpreted as fuzzy sound when using the initial consonant simplicity; And when using the initial simplicity, this syllable explains it is not fuzzy sound.Whether point out the determined property that follow-up judgement is only explained according to syllable get final product by needs.Ong and for example, when with " zhon " input " zhong ", it is not to be generated by fuzzy sound rule, so can not be interpreted as bluring sound, thereby does not add prompting.
On the other hand, when a plurality of prompt rules existed simultaneously, this identifying schemes can provide more information.For example, if the user uses " zhegn " input " true zhen ", the fuzzy sound of use simultaneously (eng--〉en) and the fault-tolerant rule of input (gn--〉ng).When these two attributes all are added into prompt rules, can point out simultaneously then that user's rhotacism is true, keystroke is incorrect.
Identifying schemes 4
Judge to learn whether a candidate item obtains by easily obscuring the coding equivalent way in the following manner: with the standard code string of coded string that the user imported and this candidate item as mapping relations, in the mapping ruler table of input method, retrieve it and whether satisfy the specific coding equivalent way of easily obscuring, if determine that then this candidate item is to obtain by easily obscuring the coding equivalent way.
The mapping ruler table of input method generally can be called the Keymap table; Can be used for the keystroke custom of recording user, be to generate according to rules such as complete/Two bors d's oeuveres, fuzzy sound, error correction.
Such as syllable [feng], have mapping feng-under the spelling〉feng; If relate to fuzzy sound en-〉eng, just have mapping fen-〉[feng]; If relate to error correction gn-〉ng, just have mapping fegn-〉[feng]; Eng represents with alphabetical g under Two bors d's oeuveres, and therefore mapping fg-is arranged〉[feng].
Such as syllable xue, a lot of people are write as xve, therefore two mapping ruler: xue-can be arranged again〉[xue]; Xve-〉[xue].
Again such as, if in order to accelerate input speed, input method allows hong is write a Chinese character in simplified form into hon, and therefore mapping hon-is then arranged〉[hong].
Again such as, a lot of users have got used to importing " Sohu " " search dog " with sohu, sogou, but so is not legal pinyin syllable in fact, then can increase a mapping relations so-by hand〉[sou].Like this, sohu can be interpreted into into [sou] [hu], thereby obtains " Sohu ".
All these mapping rulers are put together, have just constituted the Keymap table.In the use, needing frequently to search in the Keymap table according to character string may corresponding syllable id.The Keymap table is the basis that " character string of user's input " is converted to " possible pinyin string ", and this transfer process is exactly a syllable splitting.
Owing to concentrated various mapping rulers in the Keymap table, wherein necessarily include the required mapping ruler of error correcting prompt (as fuzzy sound), then identifying schemes 4 is exactly to retrieve in the Keymap table, the end that is mapped between user's input string and the standard pinyin string is to belong to what mapping ruler, if error correcting prompt is required, then corresponding candidate item is carried out error correcting prompt and get final product.
Concrete, as, for " wind " in the candidate item, retrieval in the Keymap table, fen-〉[feng] be any mapping relations actually; For in the candidate item " in ", retrieval in the Keymap table, zong-what mapping relations [zhong] be actually; Belong to fuzzy sound mapping ruler if find it, then carry out error correcting prompt and get final product.Do not belong to the required fuzzy sound mapping ruler of error correcting prompt if find it, then will not get final product by error correcting prompt.
Because the advantage of technical scheme itself, identifying schemes 3 and 4 is than identifying schemes 1 and 2 higher in the specific recognition accuracy of obscuring on the rule.
Need to prove, in the present invention's description in front, mainly described independently initial consonant fuzzy (z/zh, s/sh, c/ch) or simple or compound vowel of a Chinese syllable fuzzy (an/ang, en/eng, in/ing), perhaps the combination that initial consonant is fuzzy and simple or compound vowel of a Chinese syllable is fuzzy etc. for fuzzy sound.
But in fact, also have more complicated situation, as:
Full syllable is fuzzy.For example certain areas are " fei " with " hui " pronunciation, perhaps, " fei " pronunciation are " hui ", but can distinguish other syllables of part h/f beginning.Therefore initial consonant h/f being equal to not is good way.At this moment may need whole syllable is equal to, this is equal to mapping ruler and still belongs to a kind of of fuzzy sound.
Chinese character is fuzzy.For example the Chinese character of certain areas has special pronunciation, and for example the pronunciation of " wind feng " is " fong ", and the pronunciation of " me " is " mo ".Then for " fong-〉" feng ", " mo "-" me ", this is equal to mapping ruler and still belongs to a kind of of fuzzy sound.
Therefore, the mistake input that the so-called fuzzy sound general reference of this paper causes owing to inaccurate pronunciation, input method is carried out fault-tolerant to user's mistake input by fuzzy sound function, be convenient to the user and import Chinese character smoothly.By error correcting prompt function of the present invention, can help the user can be familiar with the right pronunciation of Chinese character gradually, thereby improve the accuracy rate of input, mandarin is standard more.
With reference to Figure 10, show the suggestion device embodiment 1 of code correction in a kind of input process of the present invention, specifically can comprise:
Interface unit 1001 is used to receive the coded string that the user imports;
Transcoder unit 1002 is used for being converted to corresponding candidate item according to the coded string that is received;
Obscure judging unit 1003, be used for judging whether described candidate item exists by easily obscuring the candidate item that the coding equivalent way obtains; If, announcement information display unit then;
Information exhibition unit 1004 is used to point out code correction information.Simply, for example, when representing this candidate item, represent its corresponding correct coding character string.
Based on the detailed introduction of front to four kinds of identifying schemes, can learn, obscure judging unit 1003 and can judge to learn whether a candidate item obtains by easily obscuring the coding equivalent way by any or combination in any in following four kinds of modes:
The coded string that the user imported and the standard code string of this candidate item are compared,, determine that then this candidate item is for the coding equivalent way obtains by easily obscuring if different.
Perhaps, the coded string that the user imported and the standard code string of this candidate item are compared,, judge further then whether it satisfies the normal conversion rule if different, if do not satisfy, determine that then this candidate item is to obtain by easily obscuring the coding equivalent way.
Perhaps, obtain the process of candidate item at the coded string from user's input, the attribute of record syllable create-rule if used the specific coding equivalent way of easily obscuring, determines that then this candidate item is to obtain by easily obscuring the coding equivalent way.
Perhaps, with the standard code string of coded string that the user imported and this candidate item as mapping relations, in the mapping ruler table of input method, retrieve it and whether satisfy the specific coding equivalent way of easily obscuring, if determine that then this candidate item is to obtain by easily obscuring the coding equivalent way.
With reference to Figure 11, show the suggestion device embodiment 2 of code correction in a kind of input process, specifically can comprise:
Interface unit 1101 is used to receive the coded string that the user imports;
Transcoder unit 1102 is used for being converted to corresponding candidate item according to the coded string that is received;
Obscure judging unit 1103, be used for judging whether described candidate item exists by easily obscuring the candidate item that the coding equivalent way obtains; If then notify the second screening module;
Whether the second screening module 1104 is used to judge this occurrence number by the particular candidate item easily obscuring the coding equivalent way and obtain or the frequency of occurrences more than or equal to predetermined threshold, if, announcement information display unit then;
Information exhibition unit 1105 is used to point out code correction information.Simply, for example, when representing this candidate item, represent its corresponding correct coding character string.
Preferably, when exist a plurality of by easily obscure the coding equivalent way obtain candidate item the time, can also comprise: the first screening module 1106, be used for above-mentioned a plurality of candidate item being screened, only to wherein qualified part candidate item and corresponding correct coding character string announcement information display unit thereof represent prompting according to presetting rule.
Check and learn for the ease of the user, device embodiment shown in Figure 11 can also comprise: error correction record sheet generation unit 1107, be used to collect code correction information, generate the error correction record sheet, described error correction record sheet comprises user's output encoder character string, standard characters and corresponding candidate item.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed all is and the difference of other embodiment that identical similar part is mutually referring to getting final product between each embodiment.For device embodiment, because it is similar substantially to method embodiment, so description is fairly simple, relevant part gets final product referring to the part explanation of method embodiment.
More than to the reminding method and the device of code correction in a kind of input process provided by the present invention, be described in detail, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, the part that all can change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims (11)

1. the reminding method of code correction in the input process is characterized in that, comprising:
Receive the coded string of user's input;
Coded string according to being received is converted to corresponding candidate item;
Judge whether exist in the described candidate item by easily obscuring the candidate item that the coding equivalent way obtains;
If, then to the correct coding character string of this candidate item of user prompt;
Wherein, judge to learn whether a candidate item obtains by easily obscuring the coding equivalent way in the following manner:
The coded string that the user imported and the standard code string of this candidate item are compared,, determine that then this candidate item is for the coding equivalent way obtains by easily obscuring if different; Perhaps
The coded string that the user imported and the standard code string of this candidate item are compared,, judge further then whether it satisfies the normal conversion rule,, determine that then this candidate item is for the coding equivalent way obtains by easily obscuring if do not satisfy if different; Perhaps
Obtain the process of candidate item at the coded string from user's input, the attribute of record syllable create-rule if used the specific coding equivalent way of easily obscuring, determines that then this candidate item is to obtain by easily obscuring the coding equivalent way; Perhaps
With the standard code string of coded string that the user imported and this candidate item as mapping relations, in the mapping ruler table of input method, retrieve it and whether satisfy the specific coding equivalent way of easily obscuring, if determine that then this candidate item is to obtain by easily obscuring the coding equivalent way.
2. the method for claim 1 is characterized in that, in the following manner to the correct coding character string of this candidate item of user prompt:
When representing this candidate item, represent its corresponding correct coding character string.
3. method as claimed in claim 2 is characterized in that, when exist a plurality of by easily obscure the coding equivalent way obtain candidate item the time, also comprise:
According to presetting rule above-mentioned a plurality of candidate item are screened, only to wherein qualified part candidate item and corresponding correct coding character string thereof represent prompting.
4. as claim 1 or 3 described methods, it is characterized in that, before the correct coding character string of this candidate item of user prompt, also comprising:
Whether further judge this occurrence number by the particular candidate item easily obscuring the equivalent way of encoding and obtain or the frequency of occurrences more than or equal to predetermined threshold, if, then to the correct coding character string of this candidate item of user prompt.
5. method as claimed in claim 4 is characterized in that,
The described occurrence number or the frequency of occurrences are at the input method active user;
Perhaps, the described occurrence number or the frequency of occurrences are at whole input method customer group.
6. the method for claim 1 is characterized in that, also comprises:
Collect code correction information, generate the error correction record sheet, described error correction record sheet comprises user's input string, standard characters and corresponding candidate item.
7. the prompt system of code correction in the input process is characterized in that, comprising:
Interface unit is used to receive the coded string that the user imports;
Transcoder unit is used for being converted to corresponding candidate item according to the coded string that is received;
Obscure judging unit, be used for judging whether described candidate item exists by easily obscuring the candidate item that the coding equivalent way obtains; If, announcement information display unit then;
The information exhibition unit is used for the correct coding character string to this candidate item of user prompt;
Wherein, describedly obscure judging unit and judge in the following manner and learn that whether a candidate item obtains by easily obscuring the coding equivalent way:
The coded string that the user imported and the standard code string of this candidate item are compared,, determine that then this candidate item is for the coding equivalent way obtains by easily obscuring if different; Perhaps
The coded string that the user imported and the standard code string of this candidate item are compared,, judge further then whether it satisfies the normal conversion rule,, determine that then this candidate item is for the coding equivalent way obtains by easily obscuring if do not satisfy if different; Perhaps
Obtain the process of candidate item at the coded string from user's input, the attribute of record syllable create-rule if used the specific coding equivalent way of easily obscuring, determines that then this candidate item is to obtain by easily obscuring the coding equivalent way; Perhaps
With the standard code string of coded string that the user imported and this candidate item as mapping relations, in the mapping ruler table of input method, retrieve it and whether satisfy the specific coding equivalent way of easily obscuring, if determine that then this candidate item is to obtain by easily obscuring the coding equivalent way.
8. system as claimed in claim 7 is characterized in that, in the following manner to the correct coding character string of this candidate item of user prompt:
When representing this candidate item, represent its corresponding correct coding character string.
9. system as claimed in claim 8 is characterized in that, and is a plurality of when easily obscuring the candidate item that the coding equivalent way obtains when existing, and also comprises obscuring between judging unit and the information exhibition unit:
The first screening module is used for according to presetting rule above-mentioned a plurality of candidate item being screened, only to wherein qualified part candidate item and corresponding correct coding character string announcement information display unit thereof represent prompting.
10. as claim 6 or 8 described systems, it is characterized in that, also comprise obscuring between judging unit and the information exhibition unit:
The second screening module, be used to judge that whether this occurrence number by the particular candidate item easily obscuring the coding equivalent way and obtain or the frequency of occurrences are more than or equal to predetermined threshold, if then the announcement information display unit is to the correct coding character string of this candidate item of user prompt.
11. system as claimed in claim 6 is characterized in that, also comprises:
Error correction record sheet generation unit is used to collect code correction information, generates the error correction record sheet, and described error correction record sheet comprises user's output encoder character string, standard characters and corresponding candidate item.
CN2008101042171A 2008-04-16 2008-04-16 Reminding method and system for coding to correct error in input process Active CN101276245B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008101042171A CN101276245B (en) 2008-04-16 2008-04-16 Reminding method and system for coding to correct error in input process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008101042171A CN101276245B (en) 2008-04-16 2008-04-16 Reminding method and system for coding to correct error in input process

Publications (2)

Publication Number Publication Date
CN101276245A CN101276245A (en) 2008-10-01
CN101276245B true CN101276245B (en) 2010-07-07

Family

ID=39995733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008101042171A Active CN101276245B (en) 2008-04-16 2008-04-16 Reminding method and system for coding to correct error in input process

Country Status (1)

Country Link
CN (1) CN101276245B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9348479B2 (en) 2011-12-08 2016-05-24 Microsoft Technology Licensing, Llc Sentiment aware user interface customization
US9378290B2 (en) 2011-12-20 2016-06-28 Microsoft Technology Licensing, Llc Scenario-adaptive input method editor

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023782A (en) * 2009-09-15 2011-04-20 北京搜狗科技发展有限公司 Method and device for determining modification point in input conversion process
CN102402298A (en) * 2010-09-16 2012-04-04 腾讯科技(深圳)有限公司 Pinyin input method and user word adding method and system of same
CN102478968B (en) * 2010-11-23 2016-02-17 深圳市世纪光速信息技术有限公司 Chinese phonetic input method and Chinese pinyin input system
CN102479174B (en) * 2010-11-23 2016-03-16 盛乐信息技术(上海)有限公司 For Chinese character automatic Verification and error correction system and the method thereof of GBK coding
CN102541281A (en) * 2010-12-22 2012-07-04 张家港市赫图阿拉信息技术有限公司 Method for imputing knotty characters
CN102135814B (en) * 2011-03-30 2017-08-08 北京搜狗科技发展有限公司 A kind of character and word input method and system
CN103064825B (en) * 2011-10-18 2016-03-02 阿里巴巴集团控股有限公司 Fuzzy phoneme is to foundation, method to set up and input method and device thereof and system
CN102495679A (en) * 2011-12-01 2012-06-13 上海量明科技发展有限公司 Composite spelling input method, word bank and system thereof
CN104428734A (en) 2012-06-25 2015-03-18 微软公司 Input method editor application platform
JP6122499B2 (en) 2012-08-30 2017-04-26 マイクロソフト テクノロジー ライセンシング,エルエルシー Feature-based candidate selection
CN103345308B (en) * 2013-06-08 2016-02-24 百度在线网络技术(北京)有限公司 For inputting the method and apparatus of amendment
WO2015018055A1 (en) * 2013-08-09 2015-02-12 Microsoft Corporation Input method editor providing language assistance
CN103699233B (en) * 2013-12-20 2019-04-09 百度在线网络技术(北京)有限公司 Character string input method and input unit
CN103903615B (en) * 2014-03-10 2018-11-09 联想(北京)有限公司 A kind of information processing method and electronic equipment
CN105589570B (en) * 2014-10-23 2019-04-09 北京搜狗科技发展有限公司 A kind of method and apparatus handling input error
CN105892702A (en) * 2014-10-27 2016-08-24 陆海涛 Simplified pinyin Chinese character input system and alphabetical keyboard using same
CN106484131B (en) * 2015-09-02 2021-06-22 北京搜狗科技发展有限公司 Input error correction method and input method device
CN105549760B (en) * 2016-01-27 2018-07-20 百度在线网络技术(北京)有限公司 Data inputting method and device
CN109426354B (en) * 2017-08-25 2022-07-12 北京搜狗科技发展有限公司 Input method, device and device for input
CN109656384B (en) * 2018-12-24 2023-07-18 抖音视界有限公司 Character string input method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9348479B2 (en) 2011-12-08 2016-05-24 Microsoft Technology Licensing, Llc Sentiment aware user interface customization
US9378290B2 (en) 2011-12-20 2016-06-28 Microsoft Technology Licensing, Llc Scenario-adaptive input method editor

Also Published As

Publication number Publication date
CN101276245A (en) 2008-10-01

Similar Documents

Publication Publication Date Title
CN101276245B (en) Reminding method and system for coding to correct error in input process
US7761295B2 (en) Computer-aided transcription system using pronounceable substitute text with a common cross-reference library
CN101669116B (en) For generating the recognition architecture of asian characters
US8473295B2 (en) Redictation of misrecognized words using a list of alternatives
CN1667699B (en) Generating large units of graphonemes with mutual information criterion for letter to sound conversion
CN109313896B (en) Extensible dynamic class language modeling method, system for generating an utterance transcription, computer-readable medium
US7831911B2 (en) Spell checking system including a phonetic speller
CN102682763B (en) Method, device and terminal for correcting named entity vocabularies in voice input text
US9047870B2 (en) Context based language model selection
CN101067780B (en) Character inputting system and method for intelligent equipment
CN106598939A (en) Method and device for text error correction, server and storage medium
CN102272827B (en) Method and apparatus utilizing voice input to resolve ambiguous manually entered text input
CN105283914A (en) System and methods for recognizing speech
CN100472411C (en) Method for cancelling character string in inputting method and word inputting system
JP2007122719A (en) Automatic completion recommendation word provision system linking plurality of languages and method thereof
KR20070024771A (en) System and method for providing automatically completed query using automatic query transform
US20070288240A1 (en) User interface for text-to-phone conversion and method for correcting the same
JP2001092484A (en) Recognized work registering method, speech recognition method, speech recognition device, recoring medium in which software product for registering recognized word is stored, and recording medium in which software product for recognizing speech is stored
KR20060118253A (en) System and method for providing automatically completed query and computer readable recording medium recording program for implementing the method
CN102156693B (en) Method and system for inputting braille alphabet
CN100561469C (en) Create and use the method and system of Chinese language data and user-corrected data
CN112560431A (en) Method, apparatus, device, storage medium, and computer program product for generating test question tutoring information
CN1357821A (en) Phonetic input method
JP4749438B2 (en) Phonetic character conversion device, phonetic character conversion method, and phonetic character conversion program
JP2002189490A (en) Method of pinyin speech input

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant