CN1256650C - Chinese whole sentence input method - Google Patents

Chinese whole sentence input method Download PDF

Info

Publication number
CN1256650C
CN1256650C CN 200410000116 CN200410000116A CN1256650C CN 1256650 C CN1256650 C CN 1256650C CN 200410000116 CN200410000116 CN 200410000116 CN 200410000116 A CN200410000116 A CN 200410000116A CN 1256650 C CN1256650 C CN 1256650C
Authority
CN
China
Prior art keywords
user
literal
phonetic
string
mistake
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 200410000116
Other languages
Chinese (zh)
Other versions
CN1556458A (en
Inventor
罗迈克
郑方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing d-Ear Technologies Co., Ltd.
Original Assignee
郑方
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 郑方 filed Critical 郑方
Priority to CN 200410000116 priority Critical patent/CN1256650C/en
Publication of CN1556458A publication Critical patent/CN1556458A/en
Application granted granted Critical
Publication of CN1256650C publication Critical patent/CN1256650C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The present invention relates to a whole sentence input method of Chinese characters, which belongs to the technical field of Chinese character input methods of computers. The present invention is characterized in that pinyin strings or stroke strings which are input by a user are decoded according to a Chinese character language model to obtain character strings; the user confirms the decoded character strings, modifies errors and corrects partial character strings; the whole sentence pinyin strings or stroke strings which are modified and corrected are decoded, and characters modified by the user in the decoded character strings are kept to be identical with the characters before decoding; the operation is repeated until the user confirms the whole sentence of the Chinese characters. The present invention has the advantages that the present invention provides modes of quick reselection of fuzzy pronunciations and modification of input errors and provides an automatic re-dividing modification mode for dividing errors of an input system; the present invention provides a mode of selecting alternate characters for correction for character selection errors of the system. In the method, information implicated in the modification of the user each time is used to carry out automatic re-decoding; thus, other possible errors in the sentence can be quickly corrected, and the efficiency and the accuracy of error correction are improved.

Description

A kind of Chinese whole sentence input method
Technical field the present invention relates to a kind of Chinese whole sentence input method, belong to computing machine (comprising desk-top computer, notebook computer, palm PC, PDA (Personal Digital Assistant) etc.) Chinese character input method technical field, particularly be used for wireless telecommunications system (as mobile phone, smart mobile phone etc.) Chinese character input method.
Background technology Chinese whole sentence input method (abbreviation whole sentence input method) is a method of importing Chinese character by the mode of whole sentence, and it allows the user needn't select just can import a whole word to every single word when input.Whole sentence input method uses Chinese language model (abbreviation language model) to predict each Chinese character of importing possibly in the sentence intelligently usually.That is to say that language model will be decoded into most probable Chinese character string to user's input string.This process is also referred to as " decode procedure " or " search procedure ".The form of expression of user's input string can be the articulation type (as phonetic) of word, also can be the ways of writing (as stroke) of word.
In Chinese whole sentence input method, two kinds of mistakes may take place usually: 1, the input error of user in input process (as incorrect insertion, deletion, replacement, or wrong order), user error be called for short; 2, whole sentence input method mistake that input string decoding is produced (as carried out wrong cutting or selected be not the Chinese character that the user will import) is called for short system mistake or decoding error.
1, user error: user error can be divided into following two classes.Narration respectively below.
(1) fuzzy sound mistake: such mistake mostly occurs in the heavier people of accent or dialect, and particularly those are from the people of southern china." fuzzy sound " mistake has:
Figure C20041000011600041
Figure C20041000011600042
Figure C20041000011600043
Figure C20041000011600045
Figure C20041000011600046
Figure C20041000011600047
Figure C20041000011600049
Figure C200410000116000410
Figure C200410000116000413
Or the like.
Existing input method generally is to use option to the correcting method of fuzzy sound mistake.Provide a fuzzy sound input option as the purple light input method, when the user is provided with this option, input method will be carried out fuzzy matching when user's input Pinyin string.As the user can import with " zong " " in ", import " always " with " zhong ".Fuzzy sound matching candidate word is listed in correct matching candidate word.
For calculating and the limited equipment such as PDA(Personal Digital Assistant) of storage resources, when this processing mode can cause the user to import, but candidate's Chinese character is too much, reduces the precision and the speed of input.
(2) input error: user's input error is refered in particular in user's input process wrong letter/stroke input.Comprise:
Imported unnecessary letter/stroke.For example: " he " is defeated to be " hei ", and " a Pie Dian " is defeated to be " a Pie Dian Dian ";
Omitted a letter/stroke.For example: " hei " is defeated to be " he ", and " a Pie Dian Dian " is defeated to be " a Pie Dian ";
Input is wrong letter/stroke.For example: " nai " is defeated to be " bai ", and " a Shu Pie Dian " is defeated to be " Pie Dian one by one ".
Certainly, above-mentioned mistake may be a kind of incessantly in whole sentence input process (be " ping " as will " bin " defeated, " Shu one one by one " is defeated to be " Shu Shu Shu one by one ").
Existing input method as the purple light input method, allows the pinyin string of user in any location updating input.It with the user once continuous being presented on the input frame of pinyin string of input, and allow the user cursor to be moved on the position of wanting with directionkeys.On cursor position, the user can correct input error, as: add the letter of omitting or delete incorrect or unnecessary letter.
But for some display resolution small device, as PDA(Personal Digital Assistant), the whole pinyin string of explicit user once to import is unpractical.In addition, for touch panel device, give me a little bit selecting with pen and put than many easily with directionkeys.
2, system mistake: system mistake has system's cutting mistake and system decodes mistake.Narration respectively below.
(1) cutting mistake: the selection of system's cutting mistake and cutting algorithm is closely related.For example, maximum matching algorithm is not having can not to decode " Henan " from " henan " under the user intervention, and the smallest match algorithm is not having can not to decode " safety " from " pingan " under the user intervention.Generally speaking, the cutting mistake of whole sentence input method generation can be divided into following a few class:
A word is cut into two words.For example: " elder generation " (xian) is cut into " Xi'an " (xi ' an);
Two words are merged into a word.For example, " Xi'an " (xi ' an) be merged into " elder generation " (xian);
Two words are divided into two words in addition by mistake.For example, " Henan " (he ' nan) is divided into " very peace " (hen ' an) by mistake.
For the cutting mistake, we are example with the purple light input method on the PC, and it is with " ' " expression phonetic separator, allow the user manually to edit input string and revise wrong cutting.But the user is the incorrect phonetic separator of deletion earlier, inserts the phonetic separator in correct position then.According to top said, this method is not suitable for this class small screen device of PDA equally.
Microsoft's whole sentence input method has used another method to come cutting mistake and input error are handled.It allows the user directly cursor to be moved on the Chinese character wrong in the sentence and to this word to edit, rather than pinyin string is edited.On cursor position, the user needs the Chinese character of first deletion error, re-enters correct phonetic then.The problem of this method is the modification time of a little input error (having struck a letter " g " as leakage) similar with modification (as " bin " changed into " ping ") institute's time spent of big input error.
(2) mistake is selected in word select: since the precision of language model generally all can not reach 100% correct, commonplace in the polyphone and the phonetically similar word problem that add Chinese, therefore the sentence that utilizes the Chinese whole sentence input method of language model when decoding, to provide sometimes, the user wishes to import not every Chinese character all the time, and we claim that this decoding error is that mistake is selected in word select.Imported the phonetic of " I will buy machine " as the user, the possibility of result is " I will sell and ".
Select mistake in this case at Chinese character, existing whole sentence input method can provide following solution usually: 1) at first the user selects word wrong in the sentence; 2) whole sentence input method will show the Chinese character candidate that other are possible; 3) user need therefrom to select the word of input, and whole sentence input method upgrades selected word.When having a plurality of words mistake, it is all correct up to all Chinese characters that the user need repeat above-mentioned steps.Generally speaking, word wrong in the sentence is many more, and user's manual modification institute's time spent is long more.
The whole sentence input method that also has provides " the step manual modification " done according to the user, corrects the function of other erroneous words automatically.For example, in Microsoft's spelling input method, if the user will " and " in " reaching " word when changing " machine " into, whole sentence input method will change second word " its " into " device " automatically.But this mode only relates to the automatic modification of the related words of a speech inside at every turn, and can not influence or revise word beyond the related term automatically.
Summary of the invention the objective of the invention is to propose a kind of Chinese whole sentence input method, to solve the existing Chinese whole sentence input method shortcoming more loaded down with trivial details to bug patch, makes input in Chinese more quick and easy.
The Chinese whole sentence input method that the present invention proposes may further comprise the steps:
(1) utilizes Chinese language model that the pinyin string of user input or stroke string are decoded and obtain text strings;
(2) by the user above-mentioned decoding text strings being confirmed, if confirm this decoded result, then is correct Chinese whole sentence, if do not confirm this decoded result, then the user makes amendment to any one mistake in the text strings, then according to the local corrigendum of being modified for of user;
(3) whole sentence pinyin string or stroke string through above-mentioned modification and corrigendum are decoded, be consistent before and after the literal that user's modification is crossed in the decoded text strings, the method for decoding comprises the steps:
(a) according to language model, generate a lexical tree, set the searching route of a sky, and it is stored in the array of path, with the pointed lexical tree tree root of lexical tree;
(b) according to from left to right order, whole sentence pinyin string or stroke string after the search corrigendum;
(c) from the array of path, take out a searching route, if the user has selected literal according to phonetic or stroke, be a new route then with the pathway permutations of taking out, if the not selected literal of user, be one or more new route with the pathway permutations of taking out then, revise corresponding lexical tree pointer and search information in the new route simultaneously according to lexical tree;
(d) new route that above-mentioned every displacement is obtained is judged, if arrived the leaf of lexical tree, then according to above-mentioned language model, presses the accumulative total logarithm probability that all literal occur in the following formula calculating path:
Accumulative total logarithm probability=former accumulative total logarithm probability+current logarithm probability
Logarithm probability=lnP (w n| w N-2, w N-1)
Then again lexical tree pointed lexical tree tree root,
In the following formula, w nRepresent the speech in the lexical tree leaf of this path indication, current speech in the literal that correspondence decodes out, w N-1And w N-2Two speech of current speech front in the literal of representing respectively to decode out, P (w n| w N-2, w N-1) expression speech string (w N-2, w N-1, w n) probability of occurrence;
(e) repeating step (c) and (d), all paths in the array of path all displacement finish;
(f) new route after all displacements is sorted from high to low by the accumulative total logarithm probability that calculates,, in the array of path, keep the high path of accumulative total logarithm probability according to the number of path pool-size;
(g) repeating step (b)~(f) is handled up to whole sentence pinyin string or stroke string;
(4) repeating step (2) and (3) are until obtaining the Chinese whole sentence that the user confirms.
Mistake in the text strings in the said method is user error or system mistake, and user error wherein is fuzzy sound mistake or input error, and system mistake wherein is that mistake is selected in cutting mistake or word select.
The method that the user makes amendment to the mistake in the above-mentioned text strings and correct the part is had four kinds.
First kind is that the user makes amendment and local corrigendum to fuzzy sound mistake, comprises the steps:
(1) user selects wrong literal, shows the phonetic corresponding with this literal;
(2) user selects correct phonetic from the corresponding fuzzy sound menu of the phonetic of above-mentioned wrong literal.
Second kind is that the user makes amendment to input error and local corrigendum, comprises the steps:
(1) user selects wrong literal, shows phonetic or the stroke corresponding with this literal;
(2) phonetic or stroke are made amendment, it is become and correct corresponding phonetic of literal or stroke.
The third is that the user makes amendment to the cutting mistake and local corrigendum, comprises the steps:
(1) user selects two or more adjacent wrong literal;
(2) mobile cutting symbol position in the continuous phonetic transcription string of the wrong literal that the user selectes makes the continuous phonetic transcription of wrong literal conspire to create continuous phonetic transcription string into correct literal.
The 4th kind to be the user to word select select that mistake is made amendment and local corrigendum, comprises the steps:
(1) user selects wrong literal, shows to have other all literal of identical phonetic or stroke with this literal;
(2) user selects correct literal from above-mentioned all literal.
The Chinese whole sentence input method that the present invention proposes, its advantage is:
1, the present invention provides the method for bluring the fast automatic reselection procedure of sound for those because accent or dialect custom can not accurately be risked the user of phonetic;
2, the present invention provides alter mode easily for the input error in the user's modification input process;
3, the present invention provides easily automatically heavily cutting correcting mode for cutting mistake that system decodes caused;
4, the present invention selects mistake for the word select that system decodes caused the corrigendum mode of selecting candidate fast is provided;
5, the present invention makes full use of the information that the user is contained in revising each time, carries out re-decoding automatically, thereby can revise other possible mistakes in the sentence fast, has improved correct mistakes efficient and accuracy.
Description of drawings
Fig. 1 is the FB(flow block) of the inventive method.
Fig. 2 is inputting interface figure in the inventive method.
Fig. 3, Fig. 4 are to revise in the inventive method and three different embodiment that correct with Fig. 5.
Fig. 6 is the process flow diagram to decoding through the whole sentence pinyin string or the stroke string of modification and corrigendum in the inventive method.
Fig. 7 is to revising and the whole sentence pinyin string of corrigendum or stroke string used lexical tree example when decoding.
Embodiment
The Chinese whole sentence input method that the present invention proposes, its flow process at first utilize Chinese language model that the pinyin string of user's input or stroke string are decoded and obtain text strings as shown in Figure 1; By the user above-mentioned decoding text strings being confirmed, if confirm this decoded result, then is correct Chinese whole sentence, if do not confirm this decoded result, then the user makes amendment to any one mistake in the text strings, then according to the local corrigendum of being modified for of user; To decoding, be consistent before and after the literal that user's modification is crossed in the decoded text strings through whole sentence pinyin string or the stroke string revising and correct; Repeat said process, until obtaining the Chinese whole sentence that the user confirms.
Mistake in the text strings in the said method is user error or system mistake, and user error wherein is fuzzy sound mistake or input error, and system mistake wherein is that mistake is selected in cutting mistake or word select.
The method that the user makes amendment to the mistake in the above-mentioned text strings and correct the part is had four kinds.
First kind is that the user makes amendment and local corrigendum to fuzzy sound mistake, comprises the steps:
(1) user selects wrong literal, shows the phonetic corresponding with this literal;
(2) user selects correct phonetic from the corresponding fuzzy sound menu of the phonetic of above-mentioned wrong literal.
Second kind is that the user makes amendment to input error and local corrigendum, comprises the steps:
(1) user selects wrong literal, shows phonetic or the stroke corresponding with this literal;
(2) phonetic or stroke are made amendment, it is become and correct corresponding phonetic of literal or stroke.
The third is that the user makes amendment to the cutting mistake and local corrigendum, comprises the steps:
(1) user selects two or more adjacent wrong literal;
(2) mobile cutting symbol position in the continuous phonetic transcription string of the wrong literal that the user selectes makes the continuous phonetic transcription of wrong literal conspire to create continuous phonetic transcription string into correct literal.
The 4th kind to be the user to word select select that mistake is made amendment and local corrigendum, comprises the steps:
(1) user selects wrong literal, shows to have other all literal of identical phonetic or stroke with this literal;
(2) user selects correct literal from above-mentioned all literal.
In the said method, to through modification and the flow process of the whole sentence pinyin string of corrigendum or the method that stroke string is decoded as shown in Figure 6, according to language model, generate a lexical tree, set the searching route of a sky, and it is stored in the array of path, with the pointed lexical tree tree root of lexical tree; According to order from left to right, whole sentence pinyin string or stroke string after the search corrigendum; From the array of path, take out a searching route, if the user has selected literal according to phonetic or stroke, be a new route then with the pathway permutations of taking out, if the not selected literal of user, be one or more new route with the pathway permutations of taking out then, revise corresponding lexical tree pointer and search information in the new route simultaneously according to lexical tree; The new route that every displacement obtains is judged, if arrived the leaf of lexical tree, then, press the accumulative total logarithm probability that all literal occur in the following formula calculating path: accumulative total logarithm probability=former accumulative total logarithm probability+current logarithm probability according to above-mentioned language model
Logarithm probability=lnP (w n| w N-2, w N-1)
Then again lexical tree pointed lexical tree tree root.In the following formula, w nRepresent the speech in the lexical tree leaf of this path indication, current speech in the literal that correspondence decodes out, w N-1And w N-2Two speech of current speech front in the literal of representing respectively to decode out, P (w n| w N-2, w N-1) expression speech string (w N-2, w N-1, w n) probability of occurrence, estimate by existent method;
Repeating the whole displacements in above-mentioned all paths in the array of path finishes; New route after all displacements is sorted from high to low by the accumulative total logarithm probability that calculates,, in the array of path, keep the high path of accumulative total logarithm probability according to the number of path pool-size; Repeating said process handles up to whole sentence pinyin string or stroke string.
With Chinese whole sentence input method the process to decoding through the whole sentence pinyin string or the stroke string of modification and corrigendum in the inventive method is described below based on phonetic.The example of lexical tree as shown in Figure 7, by the syllable tissue, wherein Ф represents sky, with any phonetic coupling.Press the direction of arrow among the figure, go to leaf from tree root, can obtain a pinyin string, this pinyin string is corresponding to the speech of being preserved in the corresponding leaf, and this pinyin string is mated in order to the input Pinyin string with the user.The vocabulary in the language model formed in the speech that comprises in all leaves.
Being chosen to be " state " by the user with user's input " zhong guo ren min " and wherein " guo " is example, and search procedure is described.
When (1) search begins, have only a dead circuit footpath, the tree root of lexical tree pointed lexical tree wherein, the program variable of record accumulative total logarithm probability is clearly 0.
(2) according to from left to right order, whole sentence pinyin string or the stroke string after (3) and (4) search corrigendum set by step.
(3) from the array of path, take out a searching route, if the user has selected literal according to phonetic or stroke, be a new route then with the pathway permutations of taking out, if the not selected literal of user, be one or more new route with the pathway permutations of taking out then, revise corresponding lexical tree pointer and search information in the new route simultaneously according to lexical tree; Such as, (a) as if lexical tree pointed tree root in the current path, and current first phonetic " zhong " that mating, check indication under the lexical tree tree root, has only a direction with user input " zhong " is complementary, go down along this direction, there are two to need not to mate the direction that any syllable (Ф) can arrive the leaf node in addition, therefore be this pathway permutations three new routes, that node of " zhong " indication in the lexical tree pointed lexical tree of article one new route wherein, in the lexical tree pointed lexical tree of second new route " in " that leaf node, " loyalty " that leaf node in the lexical tree pointed lexical tree of the 3rd new route; (b) if the node of " zhong " indication in the current lexical tree pointed lexical tree, and the user imports directed towards user selected " state ", whether check has with " state " coupling in the pairing leafy node of follow-up node of " zhong " indication node in the lexical tree, find " China ", " " center "; " loyalty "; " in " and " loyalty " 5 leaves in have only one can mate with state; be a new path so with this pathway permutations; that node of " guo " indication in the lexical tree pointed lexical tree; and be a Ф after " guo ", therefore directly with pointed " China " leaf node.
(4) new route that above-mentioned every displacement is obtained is judged, if arrived the leaf of lexical tree, then according to above-mentioned language model, presses the accumulative total logarithm probability that all literal occur in the following formula calculating path:
Accumulative total logarithm probability=former accumulative total logarithm probability+current logarithm probability
Logarithm probability=ln P (w n| w N-2, w N-1)
Then again lexical tree pointed lexical tree tree root, in the following formula, w nRepresent the speech in the lexical tree leaf of this path indication, current speech in the literal that correspondence decodes out, w N-1And w N-2Two speech of current speech front in the literal of representing respectively to decode out, P (w n| w N-2, w N-1) expression speech string (w N-2, w N-1, w n) probability of occurrence, estimate by existent method; Such as in certain paths, treated to second syllable, lexical tree pointed " state " leaf node, and the preceding continuous speech that this path keeps be " in ", and accumulative total logarithm probability is-2.99, the probability of speech string " in, state " is P (state | in)=0.1, add up so the logarithm probability=-2.99+ln 0.1=(2.99)+(2.30)=-5.29.
(5) repeating step (3) and (4), all paths in the array of path all displacement finish.
(6) new route after all displacements is sorted from high to low by the accumulative total logarithm probability that calculates,, in the array of path, keep the high path of accumulative total logarithm probability according to the number of path pool-size.Such as, when searching phonetic " ren ", we obtain some such paths and add up the logarithm probability accordingly
Path (a): " in, state, people " ,-8.34
Path (b): " in, state, ren " ,-5.29
Path (c): " loyalty, state, people " ,-10.56
Path (d): " loyalty, state, ren " ,-8.60
Path (e): " China, people " ,-5.10
Path (f): " China, ren "-3.78
Path (g): " Chinese "-4.90
And the array capacity is 5, will be retained in path (f), (g), (e), (b), (d) in the array of path so.
(7) handle up to whole sentence pinyin string repeating step (2)~(6), obtains " Chinese people " at last.
The method of above-mentioned four kinds of modifications and local corrigendum on mobile device such as PDA, mobile phone etc., needs a user interface, as showing the viewing area of at least 6 characters (corresponding the longest phonetic is as " zhuang ") width, as Fig. 2.
Introduce embodiments of the invention below in conjunction with accompanying drawing.
Shown in Figure 3 is the embodiment that revises fuzzy sound mistake and carry out local corrigendum:
The user selects wrong literal, shows the phonetic corresponding with this literal; The user selects correct phonetic from the corresponding fuzzy sound menu of the literal phonetic of above-mentioned wrong literal.
For example, after the input of user error " yizangpiao ", system decodes is " hundred million hide ticket ".Then, the user selects " Tibetan " word, will show " zang " in the phonetic zone.Then the user pins the phonetic zone, at this moment will eject a floating menu that " zan ", " zhan " and " zhang " three options are arranged.After the user selected " zhang ", system was updated to " yizhangpiao " with original pinyin string.For having the automatic pinyin string system of decoding function again, system will decode again to the pinyin string after upgrading, and obtain Chinese character string " ticket ".
Shown in Figure 4 is the embodiment that revises input error and carry out local corrigendum:
The user selects wrong literal, shows phonetic or the stroke corresponding with this literal; Phonetic or stroke are made amendment, it is become and correct corresponding phonetic of literal or stroke.
For example, after the input of user error " nirushuo ", system decodes is " you are as saying ".At this moment, the user selects " you " word, and system will show " ni " in the phonetic zone.Then but the user clicks this pinyin string and makes it to become editing mode.The user changes " bi " into " ni " then, and clicks this pinyin string to finish this modification.Because " bi " is an effective pinyin string, therefore, system is updated to " birushuo " with original pinyin string.For having the automatic pinyin string system of decoding function again, system will decode again to the pinyin string after upgrading, obtain Chinese character string " such as ".
Shown in Figure 5 is the embodiment that revises the cutting mistake and carry out local corrigendum:
The user selects two or more adjacent wrong literal; Mobile cutting symbol position in the continuous phonetic transcription string of wrong literal makes the continuous phonetic transcription of wrong literal conspire to create continuous phonetic transcription string into correct literal.
For example, user's input string is " henansheng ", and the cutting of system mistake is for " hen ' an ' sheng " and export Chinese character string " very peaceful ".At this moment, the user has selected " very peace " with pen.System will obtain left word pinyin string " hen " and right word pinyin string " an ", and these two polyphones be connect with a phonetic separation in the centre and to be called a string " hen ' an ".System judges with subalgorithm whether this string exists other slit mode.Subalgorithm moves to left one with the phonetic separator from current location, and both pinyin string became " he ' nan ", then this string was decoded with the local solution code calculation.At this moment decoding is correct, and then decoded result returns to system.System will change original pinyin string into " he ' nan ' sheng ".For having the automatic pinyin string system of decoding function again, input string is decoded again obtains " Henan Province ".
The comprehensive example of revising various mistakes and carrying out local corrigendum
For example, behind user's input Pinyin string " suijienishuijiaoshichishuijiao ", the decoded result of system mistake is " a year muddy water teacher sleeping and eating are felt ".
At this moment, the user with first word in the sentence " year " choose, wish blur the sound selection, system prompt has " sui " and " shui ", the user selects " shui ", and system will lock to be revised and decoding again automatically, obtains result's " hydrolysis muddy water teacher sleeping and eating feel ".The user selectes first Chinese character, and system lists all Chinese characters with identical phonetic " shui ", and the user therefrom locks Chinese character " who ", and system will lock and revise and decoding again automatically this moment, and the result is " whose muddy water teacher sleeping and eating feel ".
After the user was " jiao " to the phonetic error correction of second word, system will lock revised and is decoded as again automatically " who makes you sleeping and eating in bed feel ".
The user changes the last character in the sentence " feel " into " dumpling " then, and system will lock modification and to whole pinyin string re-decoding, the output result is correct Chinese character string " who makes you eat boiled dumplings in bed ".
Therefore, though always have the word of 8 mistakes in the sentence, the user only need make four modifications just can obtain correct sentence.The problem of if there is no fuzzy sound mistake and input error, then twice modification can be revised 8 mistakes, and efficient improves greatly.

Claims (6)

1, a kind of Chinese whole sentence input method is characterized in that, this method may further comprise the steps:
(1) utilizes Chinese language model that the pinyin string of user input or stroke string are decoded and obtain text strings;
(2) by the user above-mentioned decoding text strings being confirmed, if confirm this decoded result, then is correct Chinese whole sentence, if do not confirm this decoded result, then the user makes amendment to any one mistake in the text strings, then according to the local corrigendum of being modified for of user;
(3) whole sentence pinyin string or stroke string through above-mentioned modification and corrigendum are decoded, be consistent before and after the literal that user's modification is crossed in the decoded text strings, the method for decoding comprises the steps:
(a) according to language model, generate a lexical tree, set the searching route of a sky, and it is stored in the array of path, with the pointed lexical tree tree root of lexical tree;
(b) according to from left to right order, whole sentence pinyin string or stroke string after the search corrigendum;
(c) from the array of path, take out a searching route, if the user has selected literal according to phonetic or stroke, be a new route then with the pathway permutations of taking out, if the not selected literal of user, be one or more new route with the pathway permutations of taking out then, revise corresponding lexical tree pointer and search information in the new route simultaneously according to lexical tree;
(d) new route that above-mentioned every displacement is obtained is judged, if arrived the leaf of lexical tree, then according to above-mentioned language model, presses the accumulative total logarithm probability that all literal occur in the following formula calculating path:
Accumulative total logarithm probability=former accumulative total logarithm probability+current logarithm probability
Logarithm probability=lnP (w n| w N-2, w N-1) then again lexical tree pointed lexical tree tree root,
In the following formula, w nRepresent the speech in the lexical tree leaf of this path indication, current speech in the literal that correspondence decodes out, w N-1And w N-2Two speech of current speech front in the literal of representing respectively to decode out, P (w n| w N-2, w N-1) expression speech string (w N-2, w N-1, w n) probability of occurrence;
(e) repeating step (c) and (d), all paths in the array of path all displacement finish;
(f) new route after all displacements is sorted from high to low by the accumulative total logarithm probability that calculates,, in the array of path, keep the high path of accumulative total logarithm probability according to the number of path pool-size;
(g) repeating step (b)~(f) is handled up to whole sentence pinyin string or stroke string;
(4) repeating step (2) and (3) are until obtaining the Chinese whole sentence that the user confirms.
2, the method for claim 1 is characterized in that the mistake in the text strings in the step (2) is user error or system mistake, and user error wherein is fuzzy sound mistake or input error, and system mistake wherein is that mistake is selected in cutting mistake or word select.
3, method as claimed in claim 2 is characterized in that the user makes amendment to fuzzy sound mistake and the method for local corrigendum comprises the steps:
(1) user selects wrong literal, shows the phonetic corresponding with this literal;
(2) user selects correct phonetic from the corresponding fuzzy sound menu of the phonetic of above-mentioned wrong literal.
4, method as claimed in claim 2 is characterized in that the user makes amendment to input error and the method for local corrigendum comprises the steps:
(1) user selects wrong literal, shows phonetic or the stroke corresponding with this literal;
(2) phonetic or stroke are made amendment, it is become and correct corresponding phonetic of literal or stroke.
5, method as claimed in claim 2 is characterized in that the user makes amendment to the cutting mistake and the method for local corrigendum comprises the steps:
(1) user selects two or more adjacent wrong literal;
(2) mobile cutting symbol position in the continuous phonetic transcription string of the wrong literal that the user selectes makes the continuous phonetic transcription of wrong literal conspire to create continuous phonetic transcription string into correct literal.
6, method as claimed in claim 2 is characterized in that the user selects the method that mistake is made amendment and correct the part to word select and comprises the steps:
(1) user selects wrong literal, shows to have other all literal of identical phonetic or stroke with this literal;
(2) user selects correct literal from above-mentioned all literal.
CN 200410000116 2004-01-05 2004-01-05 Chinese whole sentence input method Expired - Fee Related CN1256650C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200410000116 CN1256650C (en) 2004-01-05 2004-01-05 Chinese whole sentence input method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200410000116 CN1256650C (en) 2004-01-05 2004-01-05 Chinese whole sentence input method

Publications (2)

Publication Number Publication Date
CN1556458A CN1556458A (en) 2004-12-22
CN1256650C true CN1256650C (en) 2006-05-17

Family

ID=34350347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200410000116 Expired - Fee Related CN1256650C (en) 2004-01-05 2004-01-05 Chinese whole sentence input method

Country Status (1)

Country Link
CN (1) CN1256650C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103760990A (en) * 2014-01-09 2014-04-30 深圳市欧珀通信软件有限公司 Pinyin input method and pinyin input device

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101388012B (en) * 2007-09-13 2012-05-30 阿里巴巴集团控股有限公司 Phonetic check system and method with easy confusion tone recognition
CN101382844A (en) * 2008-10-24 2009-03-11 上海埃帕信息科技有限公司 Method for inputting spacing participle
CN106648132B (en) * 2009-12-30 2020-08-25 谷歌技术控股有限责任公司 Method and apparatus for character entry
CN102193639B (en) * 2010-03-04 2014-03-12 阿里巴巴集团控股有限公司 Method and device of statement generation
CN102768583B (en) * 2011-05-03 2016-01-20 中国移动通信集团公司 The candidate word filter method of intelligent and portable equipment and the input of whole sentence thereof and device
CN104503597B (en) * 2014-12-19 2017-12-12 北京奇虎科技有限公司 stroke input method, device and system
CN104571584B (en) * 2014-12-30 2017-12-19 北京奇虎科技有限公司 Character input method and device
CN105302336B (en) * 2015-10-30 2019-01-18 北京搜狗科技发展有限公司 A kind of input error correction method and device
CN105653061B (en) * 2015-12-29 2020-03-31 北京京东尚科信息技术有限公司 Entry retrieval and wrong word detection method and system for pinyin input method
CN107678564A (en) * 2017-10-26 2018-02-09 北京百度网讯科技有限公司 For obtaining the method and device of entry
CN109085932B (en) * 2018-08-17 2023-07-25 科大讯飞股份有限公司 Candidate entry adjustment method, device, equipment and readable storage medium
CN110852074B (en) * 2019-11-07 2023-05-16 腾讯科技(深圳)有限公司 Method and device for generating correction statement, storage medium and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103760990A (en) * 2014-01-09 2014-04-30 深圳市欧珀通信软件有限公司 Pinyin input method and pinyin input device
CN103760990B (en) * 2014-01-09 2017-08-04 广东欧珀移动通信有限公司 A kind of phonetics input method and device

Also Published As

Publication number Publication date
CN1556458A (en) 2004-12-22

Similar Documents

Publication Publication Date Title
CN1205572C (en) Language input architecture for converting one text form to another text form with minimized typographical errors and conversion errors
CN1180369C (en) Equipment and method for input of character string
CN1256650C (en) Chinese whole sentence input method
CN1670723A (en) Systems and methods for improved spell checking
US8712989B2 (en) Wild card auto completion
CN1232226A (en) Sentence processing apparatus and method thereof
CN1918578A (en) Handwriting and voice input with automatic correction
US20120259615A1 (en) Text prediction
CN1910573A (en) System for identifying and classifying denomination entity
CN1426561A (en) Computer-aided reading system and method with cross-languige reading wizard
CN1890669A (en) Incremental search of keyword strings
CN101051323A (en) Character input method, input method system and method for updating word stock
CN101065746A (en) System and method for automatic enrichment of documents
CN101067766A (en) Method for cancelling character string in inputting method and word inputting system
CN1573926A (en) Discriminative training of language models for text and speech classification
CN1819018A (en) Efficient language identification
CN1781102A (en) Low memory decision tree
CN104571587A (en) Method and device for screening on-screen candidate items of input method
CN112684913B (en) Information correction method and device and electronic equipment
CN111160013B (en) Text error correction method and device
CN1908935A (en) Search method and system of a natural language
US7366984B2 (en) Phonetic searching using multiple readings
CN1771494A (en) Automatic segmentation of texts comprising chunsk without separators
CN1224954C (en) Speech recognition device comprising language model having unchangeable and changeable syntactic block
CN1637695A (en) Split on-screen keyboard

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: BEIJING D-EAR TECHNOLOGIES CO., LTD.

Free format text: FORMER OWNER: ZHENG FANG

Effective date: 20130105

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20130105

Address after: 100084 room 1005, B building, Tsinghua Science and Technology Park, Haidian District, Beijing

Patentee after: Beijing d-Ear Technologies Co., Ltd.

Address before: 100084 Haidian District Tsinghua Yuan, Beijing, Tsinghua University, West 14-4-202

Patentee before: Zheng Fang

C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20060517

Termination date: 20140105