Background technology
From the early eighties in last century, the input method of Chinese character coding forms a new major sect, and always impact so far, Here it is radical class code Chinese character input method, this law theory is thought, thousands of Chinese character all forms radical by stroke, consist of Chinese character by radical again, therefore, as long as the radical of these Chinese characters is analyzed out, print to again on the key, which radical word has just beat which key, and to organize word just passable, therefore this coding method is subject to people's common concern, yet, this coding when being born, rise also just contact with finding it difficult to learn, form so-called " eager to learn is not handy; handy is not eager to learn " this bottleneck problem, in order to search out not only eager to learn but also handy encoding scheme, the radical class encoding scheme of various Chinese character separating methods is come thick and fast, here, the fractionation of Chinese character is operated in different regions, spontaneous carrying out in the different system, cause Chinese character separating to present multifarious, diverse and confused staggered result, compete to have five kinds of different method for splitting such as " sheep " word, this situation is brought great inconvenience to Chinese character teaching and Chinese character information processing work, if radical is not carried out standard, the modular working of spoken and written languages must be impacted, and the communication and acceptance of Chinese character information is no matter on speed, or in accuracy, all can be had a strong impact on, the various radical classes that society spread in last century end of national sector are planned as a whole merger and are become Hanzi component for this reason, and promulgate " information processing with GB13000.1 character set Hanzi component standard ", enforcement in 1 day May in 1998.
The promulgation of above-mentioned basis of Chinese character parts standard means that radical class coding can only rely on Hanzi component, and the behavior of other any random fractionation encode Chinese characters for computer can not be approved by national sector, be difficult to be promoted and popularize, and can not enter middle and primary schools.Adopt so Hanzi component can develop encoding scheme easy to learn? in GB13000.1 character set Hanzi component standard, always have 560 Hanzi components, this determines according to word source relation, here a lot of single characters are exactly the word source word, and a lot of word source word structures are larger, namely average stroke is many, here be referred to as large parts, for example " hang down; ghost; black; Huang; deer; mouse, I, resemble, smoked, mediocre, heavy " all be the large parts of Chinese character; and these large parts also can not split in encode Chinese characters for computer again; and no longer split these large parts all only with an alphabetic coding; use so which alphabetic coding; how people remember; can produce how many repeated codes, these all are a problem, how does that solve? tradition shape code coding is to solve like this: although the only corresponding letter of each Hanzi component, namely one yard, but the code length of Chinese character or adjustable, for example for the encoding scheme that is fixed as 4 keys, if parts word or component count are less than 4 word, then replenish with stroke code for increasing code length, then give up unnecessary parts for component count above 4 word on the contrary, for example the widest popular five-stroke character input method is exactly this pattern at present.Adopt this pattern-coding scheme for being convenient for people to use, generally Hanzi component is arranged at the distribution plan of computer keyboard, be imprinted on the paper even engrave on computer keyboard, present this pattern-coding scheme is the main flow input method in the Chinese character shape code.
But above-mentioned this pattern-coding scheme also fails to solve the problem that finds it difficult to learn, and causes complicated reason to have three: one, by word source relation note parts and letter key corresponding to parts, the 2nd, parts give up rule, the 3rd, the additional rule of stroke.Except with the Hanzi component attribute coding, increased again the Chinese-character stroke attribute coding here, also just increased learning difficulty, so five-stroke character input method makes great efforts also to fail to reach popularity through three more than ten years.Can mention so far, also not have any shape code to popularize, also fail to enter middle and primary schools, therefore directly adopting Hanzi component is to develop encoding scheme easy to learn, and this bottleneck of Gonna breakthrough also need look for another way.
Hanzi component is the result that whole word splits, and at first analyzes the effect of Chinese character separating here, if do not split Chinese character, each Chinese character is as 1 letter representation of individual member, so thousands of Chinese characters are distributed on 26 keyboards, although 1 key only pressed in every word, this coding almost is impossible; If Chinese character separating becomes 2 members, each Chinese character will be with 2 letter representations so, therefore in a sense, the direct result of Chinese character separating is to have increased code length, because Chinese character contains same structure, for example " sand; the Chinese; you; the pool ... " removable tell identical " Rui " etc., at this moment the individual member number can reduce in a large number, therefore repeated code has also reduced, and learns also just simply, says that the increase of code length can effectively reduce individual member quantity and reduce repeated code, this makes Chinese character input method be tending towards oversimplifying, certainly code length can not excessively increase, if the man is split into stroke, that code length is long has just run counter to desire.
Since the result that the man splits can increase code length, if do not split the man, Direct Analysis goes out and the related stroke structure of letter from whole word plane but adopt, and then also use letter representation corresponding to letter, thereby finished the golygram coding of whole word, this has equally also increased code length, here, if directly adopt whole word coding, because Chinese character quantity is large, need to learn with a lot of times, how does that shorten learning time? because the smallest group word unit of Chinese character is Hanzi component, the sum of Hanzi component will be far less than whole word, therefore, as long as Direct Analysis goes out and the related stroke structure of letter in the parts plane, and then also use letter representation corresponding to letter, the combination of the contained component coding of so whole word is exactly the golygram coding of whole word.
Summary of the invention
The present invention provides a kind of Chinese-character pictographic codes input method of character-based basic components especially, this input method need not split Chinese character but encode by the golygram of the basis of Chinese character parts being carried out the whole word of golygram coding realization Chinese character, therefore the whole word of Chinese character or basis of Chinese character parts are not to be labeled on the letters case of computer keyboard, but with the English alphabet Direct Mark on the whole word of Chinese character or basis of Chinese character parts planar structure, this input method has increased code length, and reduced repeated code, the study of input method of Chinese character has been significant.
The tradition Chinese character coding comprises the steps: that one is that whole word is split into Chinese character root, and the radical kind is a lot, changes Hanzi component into according to the national standard unification now; The 2nd, the corresponding relation of definite Hanzi component and computing machine letter key; The 3rd, formulate the corresponding encoded rule, comprise code taking rule, complement rule etc., in addition in order to be convenient for people to use and to learn, keyboard also will design a calculating machine, draw corresponding which parts of each letter key and stroke, perhaps Hanzi component and stroke Direct Mark on the letters case of computer keyboard, in a sense its essence of this method be Chinese character marking on computing machine letter key.
The Chinese-character pictographic codes input method of character-based basic components of the present invention, adopt reverse thinking, need not split Chinese character but the whole word of Chinese character is directly encoded with golygram by the basis of Chinese character parts, namely not that the whole word of Chinese character or basis of Chinese character parts are labeled on the letters case of computer keyboard, but the English alphabet Direct Mark on the whole word of Chinese character or basis of Chinese character parts planar structure, in a sense this be the letter key Direct Mark of computing machine on the Chinese character planar structure.
The Chinese-character pictographic codes input method of character-based basic components of the present invention splits into the basis of Chinese character parts with Chinese character to be entered according to Chinese-character order of strokes rule list or physical structure of Chinese characters type list;
Described basis of Chinese character parts are compiled into the character string that is comprised of English alphabet according to the mnemonic(al) rule of Chinese-character order of strokes rule list or physical structure of Chinese characters type list, by pictographic code;
What the method for described pictographic code compiling adopted is the Compilation Method described in the ZL01127987.7 patent.
The basis of Chinese character parts that described character string is corresponding are compiled into character set according to the order of Chinese-character order of strokes rule list or physical structure of Chinese characters type list in the Chinese character planar structure;
Described character set is input in the computing machine by alphabetic keypad, realizes the computer input of this Chinese character to be entered.
The Chinese-character pictographic codes input method of character-based basic components of the present invention, wherein, described basis of Chinese character parts are selected " GB13000.1 character set Hanzi component ", and this parts collection contains 560 basis of Chinese character parts, 20902 Chinese characters of one-tenth capable of being combined;
Described basis of Chinese character parts elder generation mnemonic(al) sequence notation according to Chinese-character order of strokes rule list or physical structure of Chinese characters type list on the Chinese character planar structure is gone out memonic symbol, and memonic symbol is replaced as corresponding English alphabet by pictographic code compiling, generate alphabetical basis of Chinese character parts, the order that more alphabetical basis of Chinese character parts is represented according to memonic symbol is compiled into the character string that is comprised of English alphabet;
The basis of Chinese character parts that described character string is corresponding are compiled into character set according to the order of Chinese-character order of strokes rule list or physical structure of Chinese characters type list in the Chinese character planar structure;
Described character set is input in the computing machine by alphabetic keypad, realizes the computer input of this Chinese character to be entered;
A plurality of memonic symbols on the described basis of Chinese character parts adopt different colors to represent, be convenient in the Chinese character planar structure, distinguish different memonic symbols, show the memonic symbol order, memonic symbol is consistent with orders black, red, green, 4 kinds of color calibrations of purple with the order of Chinese-character order of strokes rule list or physical structure of Chinese characters type list substantially.
The Chinese-character pictographic codes input method of character-based basic components of the present invention, wherein, the corresponding relation of described basis of Chinese character parts and memonic symbol, alphabetical basis of Chinese character parts, English alphabet is as shown in table 1:
Table 1
The Chinese-character pictographic codes input method of character-based basic components of the present invention, described Chinese-character order of strokes rule list is as shown in table 2:
Table 2
The Chinese-character pictographic codes input method of character-based basic components of the present invention, the physical structure of Chinese characters type list (selects from that " spoken and written languages standard guide for use/Li Hangjian, Fei Jinchang writes .-Shanghai: Shanghai Lexicographic Publishing House, 2001.7.ISBN7-5326-0762-3”)。
The fundamental type of modern Chinese character physique structure can be divided into 11 kinds.Except the single character structure, some variants are arranged again in each class.Just some common examples that the below enumerates.
1, single character structure
2, up-down structure
3, left and right sides structure
4. upper left right encirclement structure
5. the right encirclement structure in lower-left
6. the lower-left surrounds structure
7. go up the lower-left and surround structure
8. upper left encirclement structure
9. upper right encirclement structure
10. entirely surround structure
11, symmetrical structure (or claiming framed structure)
The Chinese-character pictographic codes input method of character-based basic components of the present invention, because Chinese character is removable to be closed, in order to reduce the study threshold and to shorten learning time, there is no need all Chinese characters are all identified with memonic symbol, as long as it is just passable to find out the representative structure of Chinese character, do so simple and clear, because the smallest group word unit of Chinese character is the basis of Chinese character parts, the basis of Chinese character parts can be combined into the whole word of Chinese character, therefore adopt and in the planar structure of basis of Chinese character parts, identify memonic symbol, and memonic symbol is replaced as corresponding English alphabet by pictographic code compiling, generate alphabetical basis of Chinese character parts, more alphabetical basis of Chinese character parts are compiled into the character string that is comprised of English alphabet according to the order of Chinese-character order of strokes rule list or physical structure of Chinese characters type list; Then the basis of Chinese character parts that character string is corresponding are compiled into character set according to the order of Chinese-character order of strokes rule list or physical structure of Chinese characters type list in the Chinese character planar structure; Character set is input in the computing machine by alphabetic keypad, realizes the computer input that this treats Chinese character.
The Chinese-character pictographic codes input method of character-based basic components of the present invention, the first sequence number of classifying Hanzi component as in the table 1; Second classifies corresponding basis of Chinese character parts as; The 3rd classifies as in basis of Chinese character parts planar structure and marks memonic symbol, and represent with black, red, green, purple four kinds of colors, when the memonic symbol of basis of Chinese character parts surpasses four, then again with these four kinds of color marks, these four kinds of colors have also represented ordinal relation between the memonic symbol, this memonic symbol order that represents with different colours is determined according to Chinese-character order of strokes rule list or physical structure of Chinese characters type list; The 4th classifies the corresponding relation according to memonic symbol and English alphabet as, the basis of Chinese character parts is marked by pictographic code with memonic symbol change into corresponding capitalization English letter mark, and the color of capitalization mark is identical with the color that transforms front memonic symbol mark; The 5th classifies the character string that is comprised of English alphabet corresponding to basis of Chinese character parts as, this order that is the color according to alphabetical basis of Chinese character parts represents is come by the 4th row conversion, character string also available lowercase represents, also can mark without color, the capitalization English letter of the 5th row usefulness represents with the color identical with the letter employing of previous column, makes things convenient for the beginner to check; The 6th classifies routine word as, i.e. the position of basis of Chinese character parts in Chinese character, and this is the ingredient of former basis of Chinese character parts standard.
The Chinese-character pictographic codes input method of character-based basic components of the present invention, the basis of Chinese character parts adopt the golygram mark, this is equivalent to computing machine letter key mapping is labeled on the basis of Chinese character parts, be equivalent in other words basis of Chinese character parts simultaneously with a plurality of letters case marks, i.e. corresponding several letters of basis of Chinese character parts, this has increased basis of Chinese character parts code length undoubtedly, namely increased the Chinese character code length, thereby reach the minimizing repeated code, reduce the learning difficulty purpose, and the increase of this code length is not to realize by complement code (making complement code with stroke), and therefore the Chinese-character pictographic codes input method of character-based basic components of the present invention is that the computer inputting method of traditional shape code parts can not replace.
Also do not have on the market at present the Chinese character of simple basis of Chinese character component coding to calculate the machine input method, do not keep the Chinese character for computer input method of the whole font informations of Chinese character yet.The Chinese-character pictographic codes input method of character-based basic components of the present invention, it is the character string that is all formed by the basis of Chinese character parts, be holographic input method, have recursion association and sentence processing capacity, contain 20902 words, phrase 5~more than 60,000 bars, no matter be individual character or phrase, all to determine according to table 1 character string of basis of Chinese character parts, table 3 is determined the built-up sequence input of basis of Chinese character parts, and computing machine can find according to the character string of basis of Chinese character parts corresponding Chinese character, then organizes word or group word and input, because this character string is unfixed-length coding, code length is the shortest to be 1 yard, and the longest is 12 yards, 4.4 yards of average out to, the code length of this character string and Chinese-character stroke how many direct relations arranged, so most Chinese characters in common use are below 4 yards, and the Chinese character that is of little use, stroke is on the high side, code length is also relatively long, be conducive to disperse repeated code, so the repeated code of this character string is few, meet people's common input custom.In addition, because this input method belongs to pictographic code, in input, can run into the Chinese character that can not write unavoidably, so this input method is provided with Chinese phonetic alphabet searching system, all Chinese characters of can not writing that runs in input, as long as key in the R key, will be by Chinese phonetic alphabet Chinese character retrieval.
The Chinese-character pictographic codes input method of character-based basic components of the present invention, after skilled the grasp, just can reach the degree of seeing word knowledge code, therefore if without the basis of Chinese character parts but directly encode with whole word and also can reach effect same, although therefore not yet promulgate at present large character set basic components standard, also can be in this way to the whole word coding of Chinese character.This coding method is according to national standard in addition, therefore can unite adult's input method of Chinese character and middle and primary schools' input method of Chinese character, and difference is that the used character library of middle and primary schools' input method, basis of Chinese character parts, dictionary are relatively less.
Specify: must be black and white owing to applying for a patent the documentation requirements picture, can not use colored lines, so in table 1 in the second hurdle (memonic symbol), third column (alphabetical basis of Chinese character parts), the table 3 in the 5th hurdle (memonic symbol combination), black, red, green, purple four kinds of colors respectively with 1,2,3, the replacement of 4 four kind of sequence number sign, when parts surpass four colors, then continue sign with 5,6,7,8.With color be of equal value with digital number sign, but more directly perceived with four kinds of colour codes, when identifying with digital number, several stroke structures that each numeral links to each other, when decussate texture is arranged, need to be an artificial disconnection process of continuous stroke structure in order to distinguish, two parts of disconnection are with two identical Digital IDs.
Embodiment
The Chinese-character pictographic codes input method of the described character-based basic components of the present embodiment, keyboard comprises 26 letter keys, the basis of Chinese character parts are selected " GB13000.1 character set Hanzi component ", this basis of Chinese character parts collection contains 560 basis of Chinese character parts, 20902 Chinese characters of one-tenth capable of being combined, described Chinese character is realized the golygram coding by the basis of Chinese character parts, with the English alphabet Direct Mark on the planar structure of basis of Chinese character parts; Because the basis of Chinese character unit stroke numbers lacks than the whole word of Chinese character comparatively speaking, the memonic symbol that is used for mark is also less, first in the planar structure of 560 basis of Chinese character parts, analyze memonic symbol, in order in the planar structure of basis of Chinese character parts, to distinguish different memonic symbols, show the memonic symbol order, memonic symbol can adopt different colors to represent; Black, red, green, purple four kinds of colors that different memonic symbols adopts respectively, represent flag sequence, this memonic symbol order that represents with different colours, determine according to Chinese-character order of strokes rule list or physical structure of Chinese characters type list, when indivedual memonic symbols surpass four, still these four kinds of colors of repeated using mark in order, then memonic symbol is replaced as corresponding English alphabet by the pictographic code compiling, generate alphabetical basis of Chinese character parts, more alphabetical basis of Chinese character parts are compiled into the character string that is comprised of English alphabet according to the order of color assignment; Then the basis of Chinese character parts that character string is corresponding are compiled into character set according to the order of Chinese-character order of strokes rule list or physical structure of Chinese characters type list in the Chinese character planar structure; Character set is input in the computing machine by alphabetic keypad, realizes the computer input of this Chinese character to be entered.
The Chinese-character pictographic codes input method of the described character-based basic components of the present embodiment, Chinese character is comprised of the basis of Chinese character parts, the character set of Chinese character is comprised of the character string of basis of Chinese character parts, with reference to the Chinese-character order of strokes rule list, basis of Chinese character parts group word order sees Table 3 with the combination of Chinese character set:
Table 3
The Chinese-character pictographic codes input method of the described character-based basic components of the present embodiment, according to table 1 and table 3,20902 words of 560 basis of Chinese character parts one-tenth capable of being combined, for example, think input " key " word, in table 1, can find the basis of Chinese character parts " Jin " of the 26th row, the basis of Chinese character parts " then " of the 197th row, the basis of Chinese character parts " Yin " of the 283rd row, the character string of these three basis of Chinese character parts is respectively vf, ei, wl, according to table 3, the character set of " key " is exactly collection and the vfeiwl of the character string of these three basis of Chinese character parts; Equally, want input " dish " word, in table 1, can find the basis of Chinese character parts " boat " of the 244th row, the basis of Chinese character parts " ware " of the 114th row, the character string of these two basis of Chinese character parts is respectively juz, uk, according to table 3, the character set of " dish " is exactly character trail and the juzuk of these two basis of Chinese character; And to input phrase " keyboard ", because patent of the present invention belongs to holographic encoding, so the character set of " keyboard " is exactly collection and the vfeiwljuzuk of the character set of " key " and " dish " these two words, skillfully grasped the Chinese-character pictographic codes input method of character-based basic components of the present invention as people after, just can identify which stroke structure in the basis of Chinese character parts is memonic symbol, also can identify which stroke structure in the whole word of Chinese character is memonic symbol, do not need fully to identify memonic symbol by color yet, do not need deliberately to go to remember the composition of these 560 basis of Chinese character parts yet, and these basis of Chinese character parts positions in the Chinese character planar structure, as long as see the character set that this word just can identify this word, this in a sense the Chinese-character pictographic codes input method of this character-based basic components belong to and see that word knows code namely directly to the golygram coding of the whole word of Chinese character.
The above only is better embodiment of the present invention, all any modifications of making within the spirit and principles in the present invention, is equal to and replaces and improvement etc., all should be included within protection scope of the present invention.