A kind of Chinese character coding input method simplicity implementation method and system
Technical field
The present invention relates to computer technology, particularly a kind of Chinese character coding input method simplicity implementation method and system.
Background technology
The computer user will be input to computer with Chinese character, will use input in Chinese software.At present, input in Chinese software can be divided into keyboard input and non-keyboard input.
1) keyboard input in Chinese software utilizes keyboard exactly, imports a kind of method of Chinese character according to certain coding rule.At present, Hanzi coding scheme has had hundreds of, and that has wherein moved on computers just has tens kinds, the coding method of Chinese character input, basically all be to adopt sound, shape, justice and specific key are interrelated, make up the input of finishing Chinese character according to different Chinese character again.
2) non-keyboard input in Chinese software comprises handwriting input software, phonetic entry software, OCR Input Software etc.
Present comparative maturity, the widest input in Chinese software that just is based on keyboard of use.In intelligent simple phoneticizing (lu's Simple Phoneticizing) input method, the user can reach the purpose of input Chinese character by any one prefix (prerequisite is that this prefix itself is not complete syllable) of a syllable of input.As input " zhog ", can obtain result as shown in Figure 1.In this input method, the input method software inhouse need safeguard that a simplicity string (character string that the initial of each syllable in the spelling string is formed) arrives the mapping table (as shown in Figure 2) of spelling string and the mapping table (as shown in Figure 3) that the spelling string arrives its corresponding candidate word.As can be seen, the order of index all is according to character string order from small to large, and for this array that has sorted, the algorithm of searching (for example binary search) all is the logarithmic time complexity.
This scheme mainly contains two shortcomings: 1. EMS memory occupation is big, needs to safeguard two tables: a mapping table that is the simplicity string to the spelling string, and another is the mapping table of spelling string to candidate word; The speed of 2. simplicity expansion (comprise and search the mapping table of simplicity to the mapping table of spelling and spelling to candidate word) is slow.The process of tabling look-up comprises three steps, the one, find out whole spelling strings of simplicity string correspondence in the spelling mapping table in simplicity, the 2nd, input Pinyin string and each spelling string of finding out are compared, find out wherein the spelling string of coupling, the 3rd, find out the candidate word of each spelling string correspondence in the phonetic dictionary.Suppose that simplicity is N to the index number of spelling mapping table, spelling is M to the index number of candidate word mapping table, the spelling string number of a so average simplicity string correspondence is M/N, the time of first step cost is logN, the time of second step cost is M/N, and the time of the 3rd step cost is (M/N) logM.Then the time of cost is logN+M/N+ (M/N) logM to these three steps altogether.
Summary of the invention
The object of the present invention is to provide a kind of Chinese character coding input method simplicity implementation method and system, can overcome the defective of prior art.
The technical solution used in the present invention is: a kind of Chinese character coding input method simplicity implementation method, the pinyin string of input is carried out may further comprise the steps after the syllabification: A1, find the node identical with each syllable initial of described pinyin string in the phonetic dictionary; A2, with this node be the center in the phonetic dictionary upwards, search respectively downwards, find out the spelling string that each syllable with described pinyin string is complementary; A3 sorts the candidate word of the spelling string correspondence of described coupling from big to small according to word frequency.
Further, store the mapping table of spelling string in the described phonetic dictionary to corresponding candidate word; The character string that described spelling string is formed according to each syllable initial in the spelling string sorts from small to large; When the character string of forming when each syllable initial is identical, sort from small to large according to the spelling string.
As preferably, in the steps A 2, when the syllable initial of the syllable initial of the spelling string that finds up or down and described pinyin string is incomplete same, stop to search up or down execution in step A3.
A kind of Chinese character coding input method simplicity of the present invention implementation method further comprises step: A4, and the candidate word that described ordering is good is presented in the candidate word window.
The invention also discloses a kind of Chinese character coding input method simplicity and realize system, comprise phonetic dictionary and syllabification module, store the mapping table of spelling string in the described phonetic dictionary to corresponding candidate word; The character string that described spelling string is formed according to each syllable initial in the spelling string sorts from small to large; When the character string of forming when each syllable initial is identical, sort from small to large according to the spelling string.
Further, a kind of Chinese character coding input method simplicity realizes that system also comprises and searches module and order module; Described syllabification module is used for the pinyin string of input is carried out outputing to the described module of searching after the syllabification; The described module of searching is carried out following steps: B1 after receiving described pinyin string, obtain the syllable initial of described pinyin string; B2 finds the spelling string identical with described syllable initial as node in the phonetic dictionary; B3, with this node be the center in the phonetic dictionary upwards, search respectively downwards, find out the spelling string that each syllable with described pinyin string is complementary and send to described order module.
Further, described order module is used for receiving the spelling string from the described module of searching, and the candidate word of described spelling string correspondence is sorted from big to small according to word frequency.
As preferably, the word frequency information stores of described candidate word is in described phonetic dictionary.
Further, also comprise display module, be used for the candidate word of described order module output is presented at the candidate word window.
As preferably, among the step B3, when the syllable initial of the syllable initial of the spelling string that finds up or down and described pinyin string is incomplete same, stop to search up or down.
Beneficial effect of the present invention is: (1) no longer needs the mapping table of simplicity string to the spelling string in the Chinese character coding input method system, reduced taking of internal memory; (2) adopt the ordering rule of new index node, improved the speed of simplicity expansion greatly.
Description of drawings
Fig. 1 is the result schematic diagram that obtains after the input " zhog " in intelligent simple phoneticizing (lu's Simple Phoneticizing) input method;
Fig. 2 is the mapping table that the simplicity string arrives the spelling string in the existing input method;
Fig. 3 is the mapping table that the spelling string arrives its corresponding candidate word in the existing input method;
Fig. 4 is the structural representation that Chinese character coding input method simplicity of the present invention realizes system;
Fig. 5 is the data structure diagram of phonetic dictionary among the present invention;
Fig. 6 is the process flow diagram of Chinese character coding input method simplicity implementation method of the present invention.
Embodiment
The present invention is further elaborated with specific embodiment with reference to the accompanying drawings below.
As shown in Figure 4, a kind of Chinese character coding input method simplicity realize that system 10 comprises syllabification module 101, searches module 102, order module 104, display module 105 and phonetic dictionary 103.
Wherein, phonetic dictionary 103 is responsible for the storage pinyin string to the mapping table of corresponding candidate word, the data such as word frequency information of candidate word.The existing relatively phonetic dictionary of phonetic dictionary 103 of the present invention has lacked the mapping table of simplicity string to the spelling string, only stores the mapping table of spelling string to corresponding candidate word, and adopts new data structure, has reduced taking of internal memory.Following two criterions are followed in the ordering of spelling string in this data structure: 1. the character string of at first forming according to each syllable initial in the spelling string sorts from small to large.2. when the character string of forming when each syllable initial of spelling string is identical, sort from small to large according to the spelling string.That is: when initial is identical, according to the clooating sequence ordering of first syllable second letter in alphabet, according to the 3rd letter sequence, when first syllable is identical, sort when second letter is also identical according to second syllable second letter, and the like.
Index shown in Figure 5 sorts according to above two criterions, and one of left side row " gydg " " gydg " " gydgs " wait and to be not comprised in the data structure among the figure, is for the sort method of notebook data structure is described.An advantage in this structure is: the spelling string that all initials are identical is got together in array, and their positions in array are continuous.After so just can in this array, finding the node that all syllable initials all are complementary with the input Pinyin string earlier, by just finding the pairing whole spelling strings of simplicity string to its adjacent node expansion.
Syllabification module 101 is responsible for receiving the pinyin string that user input device sends over, and this pinyin string is divided according to syllable, sends to then and searches module 102.Wherein, the algorithm of syllabification can adopt dynamic programming algorithm.
Search module 102 and be responsible for whole spelling strings that the pinyin string of acquisition and user's input is complementary, re-send to order module 104.The method that obtains whole spelling strings of coupling is: the syllable initial that at first obtains the pinyin string after the syllabification, in phonetic dictionary 103, find the spelling string all identical as the center node then with above-mentioned syllable initial, again with this node be the center in phonetic dictionary 103 upwards, search respectively downwards, find out the spelling string that each syllable with the pinyin string of user's input is complementary.When the syllable initial of the pinyin string of the syllable initial of the spelling string that finds up or down and user's input is incomplete same, just stop to search up or down.Wherein, described coupling is meant that the syllable of the syllable of input Pinyin string and spelling string is identical, and perhaps the syllable of input Pinyin string is imperfect, but its syllable is the prefix of spelling cross-talk joint.
Order module 104 is responsible for obtaining corresponding candidate word according to the spelling string that receives from phonetic dictionary 103, and the word frequency according to these candidate word sorts from big to small again, sends to display module 105 then.
The pinyin string of the display module 105 responsible candidate word that will receive and user's input (after the syllabification) in the candidate word window, being shown to the user together, display effect is same as shown in Figure 1.
Be the flow process (as shown in Figure 6) of example explanation Chinese character coding input method simplicity implementation method of the present invention below with data structure shown in Figure 5:
Step S1, user use user input device input Pinyin string " gonydiah ".
Step S2, syllabification module 101 adopts dynamic programming algorithms to carry out syllabification, and the result is " gon ' y ' dia ' h ".
Step S3, search module 102 in phonetic dictionary 103, find one with input Pinyin string (after the syllabification) " gon ' y ' dia ' h " node that each syllable initial is all identical, suppose to have found node " gong ' ye ' dian ' han ".
Step S4, searching module 102 is that the center is upwards expanded respectively downwards with node " gong ' ye ' dian ' han ", input Pinyin string and each spelling string of finding out are compared, find out the wherein spelling string of coupling, till the syllable initial of syllable initial that finds a spelling string and input Pinyin string is incomplete same.
Node " gong ' ye ' dian ' han " and input Pinyin string " gon ' y ' dia ' h " the identical and coupling of syllable initial.Upwards expand to " gan ' ying ' dian ' he " node, and this spelling string and input Pinyin string " gon ' y ' dia ' h " relatively, do not match, but syllable initial is identical.Continue upwards to expand to node " guang ' yin ' de ' gu ' shi ", do not match and also syllable initial inequality, stop upwards coupling.Expand to " gong ' yong ' dian ' hua " node downwards, this spelling string and input Pinyin string " gon ' y ' dia ' h " relatively, coupling, and syllable initial is identical.Continue to expand to node " gao ' ya ' dian ' ji " downwards, do not match and also syllable initial inequality, stop downward coupling.Node " gong ' ye ' dian ' han " and " gong ' yong ' dian ' hua " are sent to order module 104.
Step S5 sorts the candidate word of the spelling string correspondence of each coupling from big to small according to word frequency.According to last step, the spelling string of coupling comprises " gong ' ye ' dian ' han " and " gong ' yong ' dian ' hua ".The candidate word of their correspondences is respectively " industrial electric welding " and " public telephone ".Suppose the word frequency of the word frequency of " public telephone " greater than " industrial electric welding ", then ranking results is " public telephone, industrial electric welding ", sends to display module 105.
Step S6, display module 105 is presented at candidate word and user's input Pinyin string in the candidate word window.
The time complexity of simplicity expansion among the present invention once is discussed below.Suppose that simplicity is N to the number of spelling mapping (being the mapping to the spelling string of each syllable initial is formed in the spelling string character string at this moment), spelling is M to the index number of candidate word mapping table, and the spelling string number of a so average simplicity string correspondence is M/N.The vocabulary process of looking among the present invention comprises two steps, the one, in phonetic dictionary 103, find one with all identical node of each syllable initial of input Pinyin string, the time is logM.The 2nd, be that the center is upwards expanded respectively downwards with this node, input Pinyin string and each spelling string of finding out are compared, find out the wherein spelling string of coupling, the time is M/N.The time that these two steps spend altogether is logM+M/N, (M>N) will lack than time logN+M/N+ (M/N) logM that spends in the prior art.
The above is the preferred embodiments of the present invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various changes and variation.Within the spirit and principles in the present invention all, any modification of being done, be equal to replacement, improvement etc., all should be included within the claim scope of the present invention.