Therefore, the object of the present invention is to provide the simple Chinese character input method of a kind of coding method, this method can improve the input efficiency of Chinese character effectively, has merged the characteristics that are easy to learn and use.
Another object of the present invention is to provide a kind of Chinese input unit, and this device has utilized above-mentioned Chinese character input method, has easy and the easy-to-use characteristics that have concurrently equally.
Chinese character input method of the present invention comprises the following steps:
Receive the encode Chinese characters for computer of user by the input media input;
Encode Chinese characters for computer according to input finds out the corresponding Chinese character collection from coding-Chinese character mapping library;
Described Chinese Character Set is presented on the display;
In step display, serve as that order shows described Chinese Character Set with the size of the fractional value relevant with a Chinese character or Chinese character string, wherein, the fractional value of described Chinese character is:
In the formula: λ 1 and λ 2 are weighting coefficient; Uni (A
i) be the frequency of utilization of Chinese character Ai; Bi (A
i/ A
I-1) be Chinese character A
I-1After Chinese character A appears
iProbability.
The present invention also provides the Chinese input unit that utilizes above-mentioned Chinese character input method, comprises:
Input media is used to import encode Chinese characters for computer,
Coding-Chinese character mapping library is used for the mapping relations of memory encoding and Chinese character;
Search device, be used for encode Chinese characters for computer, from described coding-Chinese character mapping library, search and obtain the corresponding Chinese character collection according to described input media input;
Display device is used to show described Chinese Character Set;
The language model storehouse comprises: the group Word probability storehouse of the frequency of utilization storehouse of the frequency of utilization of a Chinese character of expression and the group Word probability of a Chinese character of expression and other Chinese character;
The DISPLAY ORDER arithmetic unit, the fractional value that is used for described each Chinese character of Chinese Character Set of computing, and serve as that order is to described display device output Chinese character with the size of the fractional value of described Chinese character, and on described display device, show Chinese Character Set in proper order with this, the fractional value of the described Chinese character of wherein said Chinese character is:
In the formula: λ 1 and λ 2 are weighting coefficient; Uni (A
i) be Chinese character A
iFrequency of utilization; Bi (A
i/ A
I-1) be Chinese character A
I-1After Chinese character A appears
iProbability.
Other purpose of the present invention, feature and advantage will be by becoming more obvious below in conjunction with accompanying drawing to the description of embodiment.
Describe embodiments of the invention in detail below in conjunction with accompanying drawing.Among the figure:
Referring to Fig. 1, Fig. 1 shows the process flow diagram of Chinese character input method of the present invention.As shown in Figure 1, identical with traditional input method, at first be the coding (S1) that utilizes input media input Chinese character.Then, the encode Chinese characters for computer according to input finds out corresponding Chinese character collection (S2) from coding-Chinese character mapping library.For this two step, all kinds of Chinese character input methods with traditional are identical basically.The method of Chinese character coding that adopts in step S1 can be utilized various existing coding methods, for example, and spelling sound sign indicating number, simplicity sound sign indicating number, five stroke codes etc.Coding among the step S2-Chinese character mapping library is along with using different coded systems, and its content is different.The improvement that the present invention did is at step S3, promptly at step S3, the Chinese character in the Chinese Character Set of being found out in step S2 is sorted, and then, at step S4, shows Chinese character in the Chinese Character Set with the order of being arranged at step S3.The purpose that step S3 sorts to Chinese Character Set be corresponding in the Chinese Character Set of coding of input (most possible in other words) Chinese character of the most normal use be presented at the front, select the Chinese character that will import to make things convenient for the user, thereby reduce code length, improve input efficiency.
The principle that among the step S3 Chinese character is sorted is: the size with the fractional value relevant with a Chinese character serves as that order sorts to the Chinese character in the Chinese Character Set, and what promptly fractional value was big shows the back demonstration that fractional value is little earlier.The account form of the fractional value relevant with Chinese character is as follows:
In the formula: λ 1 and λ 2 are weighting coefficient; Uni (A
i) be Chinese character A
iFrequency of utilization; Bi (A
i/ A
I-1) be Chinese character A
I-1After Chinese character A appears
iProbability.
The method of Chinese character coding with five strokes illustrates the present invention for some examples below.But, should be appreciated that this is an example, be not construed as limiting the invention, the present invention can adopt other encode Chinese characters for computer mode equally.
The coded system of five strokes is once briefly described earlier.The encode Chinese characters for computer of so-called five strokes, the stroke classification that will form Chinese character exactly becomes 5 kinds of strokes, promptly is categorized into: horizontal, vertical, left, points, discount, then, represent this five kinds of strokes respectively with five numerical keys.Its corresponding relation such as following table.
During coding, encode by the sequential write of Chinese character.For example, Chinese character " north " utilizes the coded system of five strokes to encode, and it is encoded to " 21154 ".
Suppose, import Chinese character " north ", at first import its first the sign indicating number " 2 ", represent that its first stroke " erects ".After having imported this sign indicating number " 2 " (step S1), at step S2, from coding-Chinese character mapping ensemblen, find out immediately with this sign indicating number " 2 " all Chinese characters as first yard, form Chinese Character Set.That is, be that the Chinese character of first stroke is all found out with all with " erecting ", form Chinese Character Set.For example: " allusion quotation ", " on ", " fore-telling ", " old ", " returning ", " north " or the like.Traditional mode is that these words are sorted with certain rule, shows then, for example sorts by the stroke number or the pronunciation of Chinese character.But this mode shortcoming that exists that sorts is, can not come the front to the most frequently used word, shows earlier.In the above example, as sorting by stroke number, then the sequencing of its demonstration be " fore-telling ", " on ", " interior ", " old ", " returning ", " north ", " allusion quotation ".If the number of words that a screen shows is 5 Chinese characters, then the Chinese character that will import " north " then will show that such this word of each input all will turn over screen on second screen.And this Chinese character " north " with show that Chinese character " fore-telling ", " interior ", " old " are compared the preceding, more commonly used in Chinese, frequency of utilization is higher.If, can be presented at the foremost by the Chinese character that frequency of utilization is higher, promptly the height by frequency of utilization shows, then can significantly reduce to turn over the screen number of times.For example,, can show by the height of the frequency of utilization of these several Chinese characters in Chinese if in last example, then its DISPLAY ORDER for " on ", " north ", " interior " " old ", " returning ", " allusion quotation ".Like this, " north " word just can show on first screen that the user is as long as directly select input.Thereby reduce the input code length of Chinese character, improve input efficiency.
Explained that above the frequency of utilization with Chinese character serves as according to the situation of Chinese character being carried out sequencing display.On the other hand, can also carry out sequencing display with the size of the combinatory possibility (perhaps combined probability) of this Chinese character and the last Chinese character of having imported.
The continuous example that goes up, if imported Chinese character " north ", first of the back Chinese character that the user will import is encoded to " 4 ", then under traditional situation, to search coding from coding-Chinese character mapping ensemblen is that Chinese Character Sets formed in all Chinese characters of first coding with " 4 ", and the Chinese Character Set that for example finds comprises: " being ", " parent ", " head ", " forever ", " must ", Chinese characters such as " very ", " heart ", " capital ".Then, carry out sequencing display according to stroke number recited above or phonetic, as according to its stroke number, then its put in order into " being ", " heart ", " head ", " forever ", " must ", Chinese character such as " very ", " capital ", " parent ".
The shortcoming that puts in order has so illustrated in the above.If but only sort according to the frequency of utilization of Chinese character recited above, deficiency is here also arranged.With regard to above example is example, if sort according to frequency of utilization, then its rank results is: " being ", " newly ", " capital ", " head ", " heart ", " very ", " parent ".Obviously, under the situation that previous Chinese character has been imported, according to the Chinese rule, the Chinese character of a back appearance and previous Chinese character have very strong relevance.Only determine DISPLAY ORDER, under the situation of having imported previous Chinese character, still exist not enough according to frequency of utilization.Therefore, the present invention is showing the correlativity that has also made full use of the Chinese character front and back in the ordering.Be Bi (A recited above
i/ A
I-1), in order to expression Chinese character A
I-1After Chinese character A appears
iProbability.For this example, each Chinese character in the Chinese Character Set all also has another parameter, i.e. Bi (A
i/ A
I-1), for example, the probability parameter that " being " appears in Chinese character " north " back is Bi (being/north), the probability parameter that " capital " appears in Chinese character " north " back is Bi (capital/north).Obviously, according to the Chinese rule, the probability that occurs " capital " in " north " back is certainly greater than " being ", so the numerical value of Bi (capital/north) is greater than Bi (being/north), like this, when ordering, " capital " word will be arranged in before " being " word, even the frequency of utilization of " being " word is higher than " capital " word.Therefore, according to the present invention, be these two factors of probability of taking all factors into consideration the frequency of utilization of this Chinese character and occurring this Chinese character in last Chinese character back to the principle of ordering of the Chinese character in the Chinese Character Set that finds.Like this, can more meet the rule of Chinese, shorten the code length of Chinese character widely, improve input efficiency.
In the above example, because the back Chinese character that will import is only relevant with the previous Chinese character of having imported.Therefore, top formula can be reduced to:
Score(A,B)=λ1(Uni(A)+Uni(B))+λ2Bi(B/A)
According to experimental result, if do not carry out above-mentioned ordering of the present invention, the user must on average key in could import a Chinese character more than 7 yards, and promptly the input code of Chinese character is grown up in 7.And utilize above-mentioned ordering of the present invention, and then the user on average keys in 3.19 keys and just can import a Chinese character, and promptly the input code length of Chinese character is 3.19, is far smaller than 7.Therefore, effect of the present invention is obvious.
Below, top arithmetic expression (formula 1) is further described.
In the formula (1), Uni (A
i) be Chinese character A
iFrequency of utilization.The value of this frequency of utilization can obtain by statistics and training.General method is, looks for several pieces of collected works that are of universal significance as the example edition collected works, counts Chinese character A in the collected works
iTimes N (the A that occurs
i), go out Uni (A by following formula operation
i):
Uni (A
i)=N (A
i)/N0 formula (2)
In the formula, N0 is the total number of word of example edition collected works.
In the formula (1), Bi (A
i/ A
I-1) be Chinese character A
I-1After Chinese character A appears
iProbability.The value of this probability also can obtain by statistics and training.General method is, looks for several pieces of collected works that are of universal significance as the example edition collected works, counts Chinese character A in the collected works
I-1After Chinese character A appears
iTimes N (A
i, A
I-1), go out Bi (A by following formula operation
i/ A
I-1):
Bi (A
i/ A
I-1)=N (A
i, A
I-1)/N (A
i) formula (3)
λ 1 in the formula (1) and λ 2 are weighting coefficient, by regulating this two coefficients, can adjust Uni (A
i) and Bi (A
i/ A
I-1) weight in fractional value, promptly can adjust frequency of utilization and group Word probability influence degree to the fractional value of this Chinese character.Generally speaking, λ 1 and λ 2 should satisfy: the relation of λ 1+ λ 2=1.According to experimental result, λ 1 is desirable 0.1~0.3, and preferred values is 0.2, and λ 2 is desirable 0.9~0.7, and preferred values is 0.8.
Encode Chinese characters for computer mode with five strokes is that example has been explained Chinese character input method of the present invention above, but should be appreciated that other encode Chinese characters for computer mode also can be applied among the present invention.For example also can adopt the encode Chinese characters for computer mode of Microsoft's spelling input method.Microsoft's spelling input method is a kind of input method of supporting whole sentence input.This input method can allow the user import the coding of a plurality of Chinese characters continuously.For example, the continuous input Pinyin of user " woshiyigebing ".When the phonetic of being totally lost " wo ", utilize the present invention, will demonstrate on the display screen with " wo " is all Chinese characters of coding.And these Chinese characters are to be tactic with the height that uses frequency.Because the frequency of utilization of " I " word, i.e. Uni (I) maximum, therefore, " I " word will make number one, other Chinese character for example " crouches ": " snail ", " holding " etc. then come thereafter.Because the Chinese character of not imported before at " wo ", therefore, Bi (I/*) value is 0.
At this moment, according to the whole sentence input method rule, the user can not select the word that will import, continues input coding.For example, continue input " shi ".The Chinese Character Set that is encoded to " shi " have "Yes", " chamber ", " city ", " reality ", " time " etc.According to formula of the present invention (1), will calculate the various possible combined situation of " woshi " coding.For example calculate Score (I am), Score (my chamber), Score (city), Score (city), Score (crouch and be), Score (bedroom) ..., Score (snail is) ... Deng fractional value, and show selective Chinese character trail according to the size of fractional value.If result of calculation is Score (bedroom)〉Score (I am)〉..., then the DISPLAY ORDER of Chinese character trail for " 1. the bedroom, 2. I am ... ".
At this moment, the user can select input, also can continue input coding.As continue the input " yi ".Corresponding to the Chinese Character Set of " Yi " coding comprise " one ", " with ", " " etc.After this, will calculate the various possible combined situation of " woshiyi " coding according to formula of the present invention (1).For example, calculate Score (I am), Score (my chamber one) ..., Score (bedroom one) ... Deng.And show according to the fractional value size.In this example, Score (I am)〉Score (bedroom one)〉...Therefore, on screen, will show " 1. I be one, 2. bedroom one, 3 ... ".
To finish " woshiyigebing " in the above described manner, i.e. the input of " I am a soldier ".
According to characteristics of the present invention, the present invention generally is applicable to the coded system that repeated code is many.If the present invention has used the encode Chinese characters for computer of five strokes, then more being applicable to needs the input Chinese character but on the electronic installation of enter key less again (for example, only having 10 numerical keys and some functions), for example, and telepilot, Chinese mobile phone etc.
The Chinese input unit that utilizes above-mentioned Chinese character input method of the present invention is described below.
Fig. 2 shows the structured flowchart according to Chinese input unit of the present invention.As shown in Figure 2, Chinese input unit of the present invention includes:
Input media 10, input media 10 is generally keyboard, and this keyboard can be the western language keyboard of standard, also can be the keyboard that only comprises numerical key and some function keys;
Coding-Chinese character mapping library 20, this mapping library 20 has been stored the mapping relations of coding and Chinese character;
Search device 30, this device is used for realizing the function of the process flow diagram step S2 of Fig. 1, and its is searched from coding-Chinese character mapping library 20 and obtain the corresponding Chinese character collection according to the coding by input media 10 inputs;
Language model storehouse 40, this model bank comprise a frequency of utilization storehouse 41 and a group Word probability storehouse 42, and frequency of utilization storehouse 41 is used to store the frequency of utilization Uni (A) of each Chinese character, and group Word probability storehouse 42 is used to store the group Word probability Bi (A of a Chinese character and other Chinese character
i/ A
I-1), the probability (or possibility) of this Chinese character appears in promptly a certain Chinese character back;
DISPLAY ORDER arithmetic unit 50 is used for the fractional value of each Chinese character of computing Chinese Character Set, and its compute mode is according to above-mentioned formula (1), then, serves as that order sorts with the size of the fractional value of described Chinese character, display device 60 outputs rearwards;
Display device 60 is used to show the Chinese character of described DISPLAY ORDER arithmetic unit 50 outputs, and an example of the display mode of display device 60 as shown in Figure 3.In this example, be as Chinese character input method of the present invention with five stroke Chinese character codings, be that five numerals of digital 1-5 are adopted in encode Chinese characters for computer, correspond respectively to Chinese character horizontal, vertical, pluck, point, five kinds of strokes of folding, the first stroke stroke of the first representation Chinese character of user's input, second stroke of the described Chinese character of second representation, the 3rd stroke of the 3rd representation Chinese character, by that analogy.When first coding by the Chinese character of input media 10 input of user, for example when " 2 ", the demonstration of display device as shown in Figure 3, it puts in order according to the numerical values recited of each Chinese character, every screen shows 5 candidate Chinese characters.The numeral of each Chinese character front is the word selection numeral, uses when being used for selecting this Chinese character of input.For example, if the user will import " north " word, then key in " 7 " and can import this Chinese character " north " by input media 10.If the Chinese character of input does not appear in this screen demonstration, dual mode is arranged: a kind of is to show the second screen candidate Chinese character by turning over the screen function key, up to the Chinese character that occurs importing; Another kind is to continue second yard of input, for example will import " allusion quotation " word, then continues to key in " 5 " second coding of expression " allusion quotation ".Import the 3rd coding then, up to this Chinese character occurring.
Embodiments of the invention have been described above in detail and all sidedly, the those skilled in the art in present technique field should be appreciated that the mode that above-described Chinese character input method of the present invention and Chinese input unit can utilize software, hardware or hardware and hardware to combine realizes.Above-described embodiment just is used for helping to understand the present invention; it is not the restriction that constitutes protection scope of the present invention; various variations and change that design according to the present invention is done specific embodiments of the invention all should fall within the scope of the invention, and protection scope of the present invention should be limited by appended claims.