CN102103416B - Chinese character input method and device - Google Patents
Chinese character input method and device Download PDFInfo
- Publication number
- CN102103416B CN102103416B CN200910261064A CN200910261064A CN102103416B CN 102103416 B CN102103416 B CN 102103416B CN 200910261064 A CN200910261064 A CN 200910261064A CN 200910261064 A CN200910261064 A CN 200910261064A CN 102103416 B CN102103416 B CN 102103416B
- Authority
- CN
- China
- Prior art keywords
- entry
- candidate
- weight
- probability
- pinyin string
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Document Processing Apparatus (AREA)
Abstract
The embodiment of the invention provides a Chinese character input method and a Chinese character input device for solving the problem that the Chinese character input speed is low in the prior art. The method comprises the following steps of: acquiring a Pinyin string; splitting the Pinyin string according to a dictionary to obtain the Pinyin sub-strings of the Pinyin string; acquiring candidate entries corresponding to the Pinyin sub-strings according to the dictionary, and the appearance probabilities corresponding to the candidate entries, the appearance probabilities of the candidate entries under the condition that the other entries appear, and the characteristics of the candidate entries; calculating the weight of each candidate entry according to the above candidate entry from left to right; and determining the input result according to the weight of the candidate entry. In the embodiment of the invention, as the characteristic of the word is considered and a certain constraint relation of the characteristics exists, the correctness of inputting the Chinese character corresponding to the Pinyin string is improved, and the input speed is quickened.
Description
Technical field
The present invention relates to a kind of Chinese character entering technique, relate in particular to a kind of Chinese character input method and device.
Background technology
In typewriting; We need use the input method system record to want the information of expressing; And a very big part all is long sentence in these information, and with the sentence that the phonetic one whole is imported and obtained wanting, this just needs to use a critical function-smart group sentence in the input method system.We know same pinyin string can corresponding a plurality of speech, entry or sentence; Input method system possible provides this phonetic expressed information to the user for bigger, and the probability that mainly occurs according to entry of input method system finds probability of occurrence maximum entry, phrase or sentence as alternate item at present.
The alternative word that provides of input method system generally is to select probability of occurrence in the daily life higher entry, entry and English word according to descending sort in the input Chinese character.When importing long sentence, just use intelligent matching algorithm to be combined into the maximum sentence of co-occurrence probabilities as alternate item.For example after input Pinyin string xian ' cheng; The frequency (perhaps probability) that occurs according to each corresponding entry of this pinyin string sorts; " county town " can come " ready-made " and " thread " front; And " taking advantage of earlier " such entry is because occurrence number is less, and the dictionary that is not recorded input method system is chosen.
When input a word, input method system can carry out cutting according to the phonetic of input, and the frequency that occurs according to speech again finds the maximum sentence of co-occurrence probabilities.For example shown in Figure 1.
As shown in Figure 1; Pinyin string " bushoufanshiqinrao " for input; Through pinyin string being carried out the corresponding later result of phonetic substring cutting of Chinese word character be " bu ' shou ' fan ' shi ' qin ' rao "; These phonetic substrings can corresponding following individual character: " do not receive meal be parent around " perhaps " portion receives downer Qin Rao " or the like, be combined into speech according to individual character again, and each speech is identified by a spiral.As shown in Figure 1; The pairing phonetic substring of speech that is spliced into by word has: " bushou ", " fanshi ", " qinrao "; Wherein, phonetic substring " bushou " can corresponding " not receiving ", speech, phonetic substring " qinrao " correspondence speech such as " invasions " such as speech, phonetic substring " fanshi " correspondence " every ", " every " such as " not receiving ".Present method is the probability P (A that occurs according to former and later two speech A, B
i| A
I-1), the probability P (A that current entry occurs
i), in conjunction with recessive Markov model, try to achieve the maximum probability that whole sentence occurs.General formula is that
is according to above formula; Can calculate Weight (S1), Weight (S2) ... the probable value that waits, select to have the output of the whole sentence S of maximum probability P (S) as the smart group sentence.
Though present technology can be good at satisfying the demand of smart group sentence to a certain extent, still have certain problem.Present method has just been considered the frequency of speech appearance and the co-occurrence probabilities of two speech, does not consider other relations such as attribute of entry.Because the quantity of entry is huge, doublet quantity can become quadratic relationship, and current input method system is stored in for the relation with these magnanimity in the middle of the limited space, can only remove some unessential relations.This has influenced the accuracy rate of smart group sentence to a certain extent.And the conditional probability between the single use entry and the frequency of appearance not can solve all problems.As shown in Figure 1, what input method system was very natural will " not invaded and not harassed by everything ", translate into " not receiving every invasion ".Therefore, in input process, the user has to change input results, thereby causes the slow problem of input speed.
Summary of the invention
Embodiments of the invention provide a kind of Chinese character input method and device, can solve the slow problem of prior art Chinese character input speed.
Embodiments of the invention provide a kind of Chinese character input method, comprising: obtain pinyin string; According to dictionary said pinyin string is carried out cutting obtaining the phonetic substring of pinyin string, said dictionary comprises the conditional probability between the probability of occurrence, part of speech, part of speech of this entry under the probability of occurrence, other entry Conditions of entry, phonetic that entry is corresponding, entry; From dictionary, obtain the candidate entry corresponding with the phonetic substring, and the probability of occurrence, the part of speech of this candidate's entry of this candidate's entry under the corresponding probability of occurrence of this candidate's entry, other entry Conditions; Calculate the weight of each candidate's entry from left to right according to the probability of occurrence of this candidate's entry under the probability of occurrence of candidate's entry, other entry Conditions, the part of speech of said candidate's entry; From the weight of all corresponding candidate's entries of the last phonetic substring that comprises pinyin string; Find out the maximum candidate's entry of weight; And confirm each candidate's entry that said pinyin string is corresponding according to the maximum candidate's entry of this weight; As input results, the computing formula of said weight is following with the combination of these candidate's entries:
Weight(A
i)=max(Weight(A
i-1)+(a×log(P(A
i|A
i-1))+b×log(P(A
i))+c×log(P(Prop(A
i)|Prop(A
i-1)))))
Wherein, i=1 is to M, and M is that pinyin string is the number of the pairing phonetic substring of single Chinese character by cutting; A
iRepresent the entry of i position, Weight (A
i) expression entry A
iWeight, a, b, c are constants; P (A
i| A
I-1) be meant at entry A
I-1Condition under A
iThe probability that occurs; P (A
i) be entry A
iThe probability that occurs, Prop (A) is the part of speech of entry A; P (Prop (A
i) | Prop (A
I-1)) be at A
I-1Part of speech Prop (A
I-1) A under the condition that occurs
iPart of speech Prop (A
i) probability that occurs;
Said candidate's entry according to this weight maximum confirms that the step of each candidate's entry that said pinyin string is corresponding comprises: remove the phonetic substring of the maximum candidate's entry of this weight from pinyin string; With this phonetic substring as current pinyin string; From the weight of all corresponding candidate's entries of the last phonetic substring that comprises current pinyin string; Find out the maximum candidate's entry of weight; Comprise till the phonetic substring that begins most that up to current pinyin string resulting each candidate's entry is each corresponding candidate's entry of pinyin string.
The embodiment of the invention also provides a kind of Chinese input unit, specifically comprises: dictionary, and it comprises the conditional probability between the probability of occurrence, part of speech, part of speech of this entry under the probability of occurrence, other entry Conditions of the corresponding phonetic of entry, entry, entry; First acquiring unit is used to obtain pinyin string; The cutting unit is used for according to dictionary said pinyin string being carried out cutting to obtain the phonetic substring of pinyin string; Second acquisition unit; Be used for obtaining the candidate entry corresponding with the phonetic substring from dictionary, and the probability of occurrence, the part of speech of this candidate's entry, the conditional probability between the part of speech of this candidate's entry under the corresponding probability of occurrence of this candidate's entry, other entry Conditions; Computing unit is used for the probability of occurrence of this candidate's entry under the probability of occurrence, other entry Conditions according to candidate's entry, the part of speech of said candidate's entry is calculated the weight of each candidate's entry from left to right, and the computing formula of this weight is following:
Weight(A
i)=max(Weight(A
i-1)+(a×log(P(A
i|A
i-1))+b×log(P(A
i))+c×log(P(Prop(A
i)|Prop(A
i-1)))))
Wherein, i=1 is to M, and M is that pinyin string is the number of the pairing phonetic substring of single Chinese character by cutting; A
iRepresent the entry of i position, Weight (A
i) expression entry A
iWeight, a, b, c are constants; P (A
i| A
I-1) be meant at entry A
I-1Condition under the probability that occurs of Ai; P (A
i) be entry A
iThe probability that occurs, Prop (A) is the part of speech of entry A; P (Prop (A
i) | Prop (A
I-1)) be at A
I-1Part of speech Prop (A
I-1) A under the condition that occurs
iPart of speech Prop (A
i) probability that occurs; Confirm the unit; Be used for finding out the maximum candidate's entry of weight, remove the phonetic substring of the maximum candidate's entry of this weight from pinyin string from the weight of all corresponding candidate's entries of the last phonetic substring that comprises pinyin string; With this phonetic substring as current pinyin string; From the weight of all corresponding candidate's entries of the last phonetic substring that comprises current pinyin string, find out the maximum candidate's entry of weight, comprise till the phonetic substring that begins most up to current pinyin string; Resulting each candidate's entry is each corresponding candidate's entry of pinyin string, with the combination of these candidate's entries as input results.
Embodiments of the invention owing to considered the part of speech of speech, because part of speech has certain restriction relation, through this restriction relation, can improve the correctness of the corresponding Chinese character of input Pinyin string, and then improve input speed.
Description of drawings
Fig. 1 shows the Chinese character segmenting method of prior art;
Fig. 2 shows the Chinese character input method of the embodiment of the invention;
Fig. 3 shows the Chinese character segmenting method in the embodiment of the invention;
Fig. 4 shows the Chinese input unit of the embodiment of the invention.
Embodiment
Understand and realization the present invention for the ease of persons skilled in the art, combine accompanying drawing to describe embodiments of the invention at present.
Embodiment one
As shown in Figure 2, present embodiment provides a kind of Chinese character input method, comprises the steps:
Step 21, obtain pinyin string.
Step 22, according to dictionary said pinyin string is carried out cutting obtaining the phonetic substring of pinyin string, said dictionary comprises the conditional probability between the probability of occurrence, part of speech, part of speech of this entry under the probability of occurrence, other entry Conditions of entry, phonetic that entry is corresponding, entry etc.Said phonetic substring can be a phonetic substring of forming single Chinese character, perhaps can be the phonetic substring of forming speech.
Step 23, from dictionary, obtain candidate entry or the candidate corresponding with said phonetic substring, and probability of occurrence, the part of speech of this candidate's entry or candidate under the corresponding probability of occurrence of this candidate's entry or candidate, other entry Conditions.For describing conveniently, candidate or candidate's entry are referred to as candidate's entry, speech and entry are identical concepts.
Step 24, calculate the weight of each candidate's entry from left to right according to the probability of occurrence of this candidate's entry under the probability of occurrence of candidate's entry, other entry Conditions, the part of speech of this candidate's entry; The computing formula of weight is following:
Weight(A
i)=max(Weight(A
i-1)+(a×log(P(A
i|A
i-1))+b×log(P(A
i))+c×log(P(Prop(A
i)|Prop(A
i-1))))
Wherein, i=1 is to M, and M is that pinyin string is the total number of the pairing phonetic substring of single Chinese character by cutting; A
iRepresent the entry of i position, Weight (A
i) expression entry A
iWeight, a, b, c are constants; P (A
i| A
I-1) be meant at entry A
I-1Condition under A
iThe probability that occurs; P (A
i) be entry A
iThe probability that occurs, Prop (A) is the part of speech of speech A; P (Prop (A
i) | Prop (A
I-1)) be at A
I-1Part of speech Prop (A
I-1) A under the condition that occurs
iPart of speech Prop (A
i) probability that occurs.
Because above-mentioned formula has been used part of speech, the accuracy of the sentence of increase group greatly.
Preferably, for the weight of calculated candidate entry, line up before and after the order of phonetic substring that these candidate's entries are corresponding according to input.Because pinyin string can be cut into each phonetic substring of various combination, promptly to pinyin string multiple slit mode can be arranged, each slit mode makes pinyin string be made up of the phonetic substring of various combination.Like this, the phonetic substring of these various combinations after the cutting can be formed the matrix of two dimension, promptly can regard the matrix of a N * M as.Wherein, N is the number of the maximum possible phonetic substring that in pinyin string, begins from any Chinese character position; M be pinyin string by the maximum number of the phonetic substring of cutting, this number is that pinyin string is the total number of the pairing phonetic substring of single Chinese character by cutting.Each unit of matrix is called as node, the position of the first Chinese character of classifying the corresponding speech of phonetic substring as of node.Like this, all nodes of first row of matrix all exist, and some node of other row possibly not exist.Each is not that empty node is represented a phonetic substring, and this phonetic substring can corresponding one or more speech according to dictionary.Next phonetic substring that each phonetic substring is adjacent links to each other, and will above-mentioned matrix be become a figure.
Step 25, from the weight of all corresponding candidate's entries of the last phonetic substring that comprises pinyin string, find out the maximum candidate's entry of weight, confirm each candidate's entry that said pinyin string is corresponding according to the maximum candidate's entry of this weight.
Confirm that according to the maximum candidate's entry of this weight each corresponding candidate's entry concrete steps of said pinyin string are following: remove the phonetic substring of the maximum candidate's entry of this weight from pinyin string; With this phonetic substring as current pinyin string; From the weight of all corresponding candidate's entries of the last phonetic substring that comprises current pinyin string; Find out the maximum candidate's entry of weight; Comprise till the phonetic substring that begins most that up to current pinyin string resulting each candidate's entry is each corresponding candidate's entry of pinyin string.With the combination of these candidate's entries as input results.
Further describe the present invention with a concrete example below.Be that example is explained the present invention promptly with pinyin string " shifengongli ".The content of supposing dictionary comprises table 1, table 2 and table 3.Wherein, table 1 comprises probability of occurrence and the part of speech of the corresponding phonetic of entry, entry, entry etc.; Table 2 comprises the probability of occurrence of this entry under other entry Conditions; Table 3 comprises the conditional probability between the part of speech.
Table 1
Entry | Phonetic | Probability of occurrence | Part of speech |
Be | shi | 15 | Judge verb |
Thing | shi | 13 | Noun |
Divide | fen | 12 | Noun |
Part | fen | 10 | Measure word |
The worker | gong | 11 | Noun |
Attack | gong | 10 | Verb |
Altogether | gong | 10 | Adverbial word |
Lee | li | 12 | Name |
Power | li | 10 | Noun |
Very | shifen | 31 | Adverbial word |
Time-division | shifen | 30 | Noun |
Kilometer | gongli | 28 | Noun |
Material gain | gongli | 26 | Adjective |
Table 2
The probability condition of speech | Probable value |
P (very | be) | 12 |
P (very | very) | 5 |
P (be | very) | 3 |
P (material gain | be) | 2 |
Table 3
The probability condition of part of speech | Probable value |
P (adjective | adverbial word) | 5 |
P (verb | adverbial word) | 5 |
P (noun | name) | 2 |
P (adjective | noun) | 3 |
P (noun | noun) | 2 |
P (verb | noun) | 1 |
P (noun | verb) | ?2 |
P (verb | verb) | ?2 |
P (adjective | verb) | ?4 |
P (noun | adjective) | ?5 |
At first pinyin string " shfengongli " is cut into the corresponding phonetic substring " shi " of single Chinese character, " fen ", " gong ", " li " and corresponding phonetic substring " shifen ", " gongli " of speech.Like this, each phonetic substring just can form a node, from dictionary, finds out the corresponding candidate's entry of this phonetic substring according to the phonetic substring again, that is, each node can corresponding one or more candidate's entries, and is as follows:
Shi can corresponding following candidate's entry: be, thing ...
Fen can corresponding following candidate's entry: divide, part ...
Gong can corresponding following candidate's entry: the worker, attack, be total to ...
Li can corresponding following candidate's entry: Lee, power ...
Shfen can corresponding following candidate's entry: very, the time-division ...
Gongli can corresponding following candidate's entry: kilometer, material gain ...
Ordinal relation according to pinyin string connects into reticulate texture with node, and is as shown in Figure 3.Can clearly react the context of each node through Fig. 3.Then, can according to the formula in the step 24, and table 1, table 2, table 3 can to calculate the weight of each speech in each node following:
Weight (being)=15
Weight (thing)=13
Weight (very)=31
Weight (time-division)=30
Weight (branch)=max (weight (being)+P (branch)+P (noun | verb), weight (thing)+P (branch)+P (noun | noun))=29
Weight (part)=max (weight (being)+P (part), weight (thing)+P (part))=25
Weight (worker)=max (weight (branch)+P (worker)+P (noun | noun), weight (part)+P (worker), weight (very)+P (worker)+P (noun | adjective), weight (time-division)+P (worker)+P (noun | noun))=47
Weight (attacking)=max (weight (branch)+P (attacking)+P (verb | noun), weight (part)+P (attacking), weight (very)+P (attacking), weight (time-division)+P (attacking)+P (verb | noun))=41
Weight (being total to)=max (weight (branch)+P (being total to), weight (part)+P (being total to), weight (very)+P (being total to), weight (time-division)+P (being total to))=41
Weight (kilometer)=max (weight (very)+P (kilometer), weight (branch)+P (kilometer)+P (noun | noun), weight (time-division)+P (kilometer)+P (noun | noun), weight (part)+P (kilometer))=61
Weight material gain=max (weight (very)+P (material gain)+P (adverbial word | adjective), weight (branch)+P (material gain), weight (time-division)+P (material gain), weight (part)+P (material gain)+P (adjective | noun))=62
Weight (Lee)=max (weight (worker)+P (Lee), weight (being total to)+P (Lee), weight (attacking)+P (Lee))=59
Weight (power)=max (weight (worker)+P (power)+P (noun | noun), weight (being total to)+P (power), weight (attacking)+P (Lee))=59
Symbol max (a, b, c) expression a, b, that element of maximum among the c.
After the weight of having calculated each speech on each node, in the node " gongli " that comprises the end string, " li ", seek the speech of weight limit.The set that comprises all speech of the node of end mark is { kilometer, material gain, Lee, power }.In above-mentioned set, entry " material gain " weight is maximum.According to the unique relationships between the entry definite in the process of calculating the entry weight, can know that candidate's entry that above-mentioned middle for example group can be organized sentence is: the ten minutes material gain.
Here the part of each entry back bracket is represented the weight of this entry, and the weight of entry has been passed through particular processing here, be 1 greater than zero integer.
For the sentence of confirming that pinyin string is corresponding, need be from the last phonetic substring that comprises pinyin string in the weight of corresponding all candidate's entries (that is, " li ", " gongli "); Find out the maximum candidate's entry of weight; I.e. " material gain ", and remove the phonetic substring of this candidate's entry from pinyin string, with this phonetic substring as current pinyin string; Find out candidate's entry of weight limit as stated above; Till the phonetic substring that begins most that comprises current pinyin string, thereby confirm each candidate's entry that said pinyin string is corresponding, with the combination of these candidate's entries as input results.
In order to raise the efficiency, present embodiment can use viterbi algorithm that problem is reduced to the polynomial expression complexity from exponential complexity when seeking the whole sentence of weight limit, has improved the efficient of program greatly.
In the weight process of each speech that calculates each node; If remove part of speech information; Because the word frequency of " kilometer " is much larger than the word frequency of " material gain ", therefore, the probability P (S) of " very kilometer " this sentence will be than higher; According to the conventional method, will obtain sentence " very kilometer " by pinyin string " shifengongli ".But according to the embodiment of the invention, owing to considered the part of speech of speech, with noun, so P (Prop (kilometer) | Prop (very)) just equals 0, the value of P (Prop (material gain) | Prop (very)) is greater than 0 on the contrary like the adverbial word back.Therefore the probability of " very material gain " will at last, will obtain sentence " very material gain " as candidate's entry by pinyin string " shifengongli " greater than the probability of " very kilometer ".Thereby improved the correctness of input, and then improved input speed.
Embodiment two
As shown in Figure 4; Present embodiment provides a kind of Chinese input unit; Specifically comprise: dictionary, it comprises the conditional probability between the probability of occurrence, part of speech, part of speech of this candidate's entry under the probability of occurrence, other entry Conditions of the corresponding phonetic of entry, entry, entry; First acquiring unit is used to obtain pinyin string; The cutting unit is used for according to dictionary said pinyin string being carried out cutting to obtain the phonetic substring of pinyin string; Second acquisition unit is used for obtaining the candidate entry corresponding with the phonetic substring from dictionary, and the probability of occurrence, the part of speech of this candidate's entry of this candidate's entry under the corresponding probability of occurrence of this candidate's entry, other entry Conditions; Computing unit is used for the probability of occurrence of this candidate's entry under the probability of occurrence, other entry Conditions according to candidate's entry, the part of speech of said candidate's entry is calculated the weight of each candidate's entry from left to right; Confirm the unit; Be used for from the weight of all corresponding candidate's entries of the last phonetic substring that comprises pinyin string; Find out the maximum candidate's entry of weight; And confirm each candidate's entry that said pinyin string is corresponding according to the maximum candidate's entry of this weight, with the combination of these candidate's entries as input results.
Said computing unit uses following formula:
Weight(A
i)=max(Weight(A
i-1)+(a×log(P(A
i|A
i-1))+b×log(P(A
i))+c×log(P(Prop(A
i)|Prop(A
i-1)))))
Wherein, i=1 is to M, and M is that pinyin string is the number of the pairing phonetic substring of single Chinese character by cutting; A
iRepresent the entry of i position, Weight (A
i) expression entry A
iWeight, a, b, c are constants; P (A
i| A
I-1) be meant at entry A
I-1Condition under A
iThe probability that occurs; P (A
i) be entry A
iThe probability that occurs, Prop (A) is the part of speech of speech A; P (Prop (A
i) | Prop (A
I-1)) be at A
I-1Part of speech Prop (A
I-1) A under the condition that occurs
iPart of speech Prop (A
i) probability that occurs.
The principle of work of each unit of present embodiment can be referring to the description of embodiment one.
Though described the present invention through embodiment, those of ordinary skills know, under the situation that does not break away from spirit of the present invention and essence, just can make the present invention that many distortion and variation are arranged, and scope of the present invention is limited appended claim.
Claims (2)
1. a Chinese character input method is characterized in that, comprising:
Obtain pinyin string;
According to dictionary said pinyin string is carried out cutting obtaining the phonetic substring of pinyin string, said dictionary comprises the conditional probability between the probability of occurrence, part of speech, part of speech of this entry under the probability of occurrence, other entry Conditions of entry, phonetic that entry is corresponding, entry;
From dictionary, obtain the candidate entry corresponding with the phonetic substring, and the probability of occurrence, the part of speech of this candidate's entry of this candidate's entry under the corresponding probability of occurrence of this candidate's entry, other entry Conditions;
Calculate the weight of each candidate's entry from left to right according to the probability of occurrence of this candidate's entry under the probability of occurrence of candidate's entry, other entry Conditions, the part of speech of said candidate's entry, the computing formula of this weight is following:
Weight(A
i)=max(Weight(A
i-1)+(a×log(P(A
i|A
i-1))+b×log(P(A
i))+c×log(P(Prop(A
i)|Prop(A
i-1)))))
Wherein, i=1 is to M, and M is that pinyin string is the number of the pairing phonetic substring of single Chinese character by cutting; A
iRepresent the entry of i position, Weight (A
i) expression entry A
iWeight, a, b, c are constants; P (A
i| A
I-1) be meant at entry A
I-1Condition under A
iThe probability that occurs; P (A
i) be entry A
iThe probability that occurs, Prop (A) is the part of speech of entry A; P (Prop (A
i) | Prop (A
I-1)) be at A
I-1Part of speech Prop (A
I-1) A under the condition that occurs
iPart of speech Prop (A
i) probability that occurs;
From the weight of all corresponding candidate's entries of the last phonetic substring that comprises pinyin string; Find out the maximum candidate's entry of weight; Remove the phonetic substring of the maximum candidate's entry of this weight from pinyin string, with this phonetic substring as current pinyin string, from the weight of all corresponding candidate's entries of the last phonetic substring that comprises current pinyin string; Find out the maximum candidate's entry of weight; Comprise till the phonetic substring that begins most that up to current pinyin string resulting each candidate's entry is each corresponding candidate's entry of pinyin string, with the combination of these candidate's entries as input results.
2. a Chinese input unit is characterized in that, specifically comprises:
Dictionary, it comprises the conditional probability between the probability of occurrence, part of speech, part of speech of this entry under the probability of occurrence, other entry Conditions of the corresponding phonetic of entry, entry, entry;
First acquiring unit is used to obtain pinyin string;
The cutting unit is used for according to dictionary said pinyin string being carried out cutting to obtain the phonetic substring of pinyin string;
Second acquisition unit; Be used for obtaining the candidate entry corresponding with the phonetic substring from dictionary, and the probability of occurrence, the part of speech of this candidate's entry, the conditional probability between the part of speech of this candidate's entry under the corresponding probability of occurrence of this candidate's entry, other entry Conditions;
Computing unit is used for the probability of occurrence of this candidate's entry under the probability of occurrence, other entry Conditions according to candidate's entry, the part of speech of said candidate's entry is calculated the weight of each candidate's entry from left to right, and the computing formula of this weight is following:
Weight(A
i)=max(Weight(A
i-1)+(a×log(P(A
i|A
i-1))+b×log(P(A
i))+c×log(P(Prop(A
i)|Prop(A
i-1)))))
Wherein, i=1 is to M, and M is that pinyin string is the number of the pairing phonetic substring of single Chinese character by cutting; A
iRepresent the entry of i position, Weight (A
i) expression entry A
iWeight, a, b, c are constants; P (A
i| A
I-1) be meant at entry A
I-1Condition under A
iThe probability that occurs; P (A
i) be entry A
iThe probability that occurs, Prop (A) is the part of speech of entry A; P (Prop (A
i) | Prop (A
I-1)) be at A
I-1Part of speech Prop (A
I-1) A under the condition that occurs
iPart of speech Prop (A
i) probability that occurs;
Confirm the unit; Be used for finding out the maximum candidate's entry of weight, remove the phonetic substring of the maximum candidate's entry of this weight from pinyin string from the weight of all corresponding candidate's entries of the last phonetic substring that comprises pinyin string; With this phonetic substring as current pinyin string; From the weight of all corresponding candidate's entries of the last phonetic substring that comprises current pinyin string, find out the maximum candidate's entry of weight, comprise till the phonetic substring that begins most up to current pinyin string; Resulting each candidate's entry is each corresponding candidate's entry of pinyin string, with the combination of these candidate's entries as input results.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200910261064A CN102103416B (en) | 2009-12-17 | 2009-12-17 | Chinese character input method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200910261064A CN102103416B (en) | 2009-12-17 | 2009-12-17 | Chinese character input method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102103416A CN102103416A (en) | 2011-06-22 |
CN102103416B true CN102103416B (en) | 2012-10-10 |
Family
ID=44156251
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200910261064A Active CN102103416B (en) | 2009-12-17 | 2009-12-17 | Chinese character input method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102103416B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102929864B (en) * | 2011-08-05 | 2016-08-17 | 北京百度网讯科技有限公司 | A kind of tone-character conversion method and device |
CN102955770B (en) * | 2011-08-17 | 2017-07-11 | 深圳市世纪光速信息技术有限公司 | A kind of phonetic automatic identifying method and system |
CN103870001B (en) * | 2012-12-11 | 2018-07-10 | 百度国际科技(深圳)有限公司 | A kind of method and electronic device for generating candidates of input method |
CN104731364A (en) * | 2015-03-30 | 2015-06-24 | 天脉聚源(北京)教育科技有限公司 | Input method and input method system |
CN105069028B (en) * | 2015-07-16 | 2018-05-25 | 广东小天才科技有限公司 | A kind of Chinese character method for pushing and Chinese character pusher based on phonetic |
CN105653061B (en) * | 2015-12-29 | 2020-03-31 | 北京京东尚科信息技术有限公司 | Entry retrieval and wrong word detection method and system for pinyin input method |
CN105718070A (en) * | 2016-01-16 | 2016-06-29 | 上海高欣计算机系统有限公司 | Pinyin long sentence continuous type-in input method and Pinyin long sentence continuous type-in input system |
CN110245331A (en) * | 2018-03-09 | 2019-09-17 | 中兴通讯股份有限公司 | A kind of sentence conversion method, device, server and computer storage medium |
CN109085932B (en) * | 2018-08-17 | 2023-07-25 | 科大讯飞股份有限公司 | Candidate entry adjustment method, device, equipment and readable storage medium |
CN112987941B (en) * | 2019-12-17 | 2024-02-13 | 北京搜狗科技发展有限公司 | Method and device for generating candidate words |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1165336A (en) * | 1996-05-13 | 1997-11-19 | 上海欧姆龙计算机有限公司 | Chinese character phonetic input system |
CN1322984A (en) * | 2000-05-10 | 2001-11-21 | 微软公司 | Chinese characters inputting method and its apparatus |
-
2009
- 2009-12-17 CN CN200910261064A patent/CN102103416B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1165336A (en) * | 1996-05-13 | 1997-11-19 | 上海欧姆龙计算机有限公司 | Chinese character phonetic input system |
CN1322984A (en) * | 2000-05-10 | 2001-11-21 | 微软公司 | Chinese characters inputting method and its apparatus |
Also Published As
Publication number | Publication date |
---|---|
CN102103416A (en) | 2011-06-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102103416B (en) | Chinese character input method and device | |
Sharma et al. | Stemming algorithms: a comparative study and their analysis | |
JP4672418B2 (en) | Efficient capitalization by user modeling | |
US20060206306A1 (en) | Text mining apparatus and associated methods | |
Siivola et al. | On Growing and Pruning Kneser–Ney Smoothed $ N $-Gram Models | |
CN108763196A (en) | A kind of keyword extraction method based on PMI | |
CN101071420A (en) | Method and system for cutting index participle | |
CN102622338A (en) | Computer-assisted computing method of semantic distance between short texts | |
CN109657053B (en) | Multi-text abstract generation method, device, server and storage medium | |
JP2005251206A (en) | Word collection method and system for use in word segmentation | |
CN109522547B (en) | Chinese synonym iteration extraction method based on pattern learning | |
CN102662952A (en) | Chinese text parallel data mining method based on hierarchy | |
US20160283588A1 (en) | Generation apparatus and method | |
WO2023071118A1 (en) | Method and system for calculating text similarity, device, and storage medium | |
CN111460170B (en) | Word recognition method, device, terminal equipment and storage medium | |
CN104298746A (en) | Domain literature keyword extracting method based on phrase network diagram sorting | |
CN106569989A (en) | De-weighting method and apparatus for short text | |
CN101650729A (en) | Dynamic construction method for Web service component library and service search method thereof | |
CN109902290A (en) | A kind of term extraction method, system and equipment based on text information | |
EP3811276A1 (en) | Token matching in large document corpora | |
Gupta | Hybrid algorithm for multilingual summarization of Hindi and Punjabi documents | |
CN110674635A (en) | Method and device for text paragraph division | |
Makazhanov et al. | Spelling correction for kazakh | |
Dianati et al. | Words stemming based on structural and semantic similarity | |
CN101576877A (en) | Fast word segmentation realization method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230419 Address after: Room 501-502, 5/F, Sina Headquarters Scientific Research Building, Block N-1 and N-2, Zhongguancun Software Park, Dongbei Wangxi Road, Haidian District, Beijing, 100193 Patentee after: Sina Technology (China) Co.,Ltd. Address before: 100080, International Building, No. 58 West Fourth Ring Road, Haidian District, Beijing, 1510 floor Patentee before: Sina.com Technology (China) Co.,Ltd. |