WO1997007449A1

WO1997007449A1 - Method of encoding and inputing complicated or simplified form of chinese character and keyboard thereof

Info

Publication number: WO1997007449A1
Application number: PCT/CN1996/000069
Authority: WO
Inventors: Dezi Shao; Wuzhou Wei; Heqi Ren; Yonghong Hao; Fucheng Li
Original assignee: Dezi Shao; Wuzhou Wei; Heqi Ren; Yonghong Hao; Fucheng Li
Priority date: 1995-08-16
Filing date: 1996-08-16
Publication date: 1997-02-27
Also published as: AU6731496A; CN1124373A

Abstract

A method for encoding chinese character and inputting them into a computer, each chinese character being comprised of one or more character components and each character component being constituted by one or more strokes according to traditional writting order, the method is characterized by steps of: 1) classifying plurality of character components forming chinese characters into 10 groups which are marked as 1st, 2nd, 3rd, 4th, 5th, 6th, 7th, 8th, 9th and 10th groups respectively according to the similarity between each character component and one of the chinese numbers mentioned above; 2) designating the character components in the 10 groups to number keys 1 to 9 and 0 respectively such that each character component in the 10 groups corresponds to a number; and 3) dividing a chinese character to be encoded into at least one character component and inputing the number corresponding to said at least one character component into a computer using keyboard.

Description

Chinese character input method with simplified and simplified coding and keyboard

Technical field

The invention relates to a Chinese character encoding input method and a keyboard, and particularly to a method for encoding and inputting Chinese characters (including traditional, simplified, ancient, and different Chinese characters, as well as Chinese characters in Korean and Japanese) by using characters, and according to the method. Method designed keyboard. Background technique

At present, there are hundreds of Chinese character coding methods, but not many of them are easy to use.

The Pinyin Hanru method is the earliest Hanru method and it is easy to learn. Apart from professional typists, it is still used by a large number of users. However, because there are so many homophones in Chinese characters, the recoding rate of Pinyin Hanyu method is very high. Chinese characters include simplified, traditional, archaic and variant, with a total number of tens of thousands, of which the most commonly used are only a few thousand. For the most infrequently used Chinese characters, ordinary users don't know their pronunciation at all, and they can't use Pinyin to input these Chinese characters. In addition, there are many Chinese dialects, and the same character has different pronunciations in different regions. Therefore, the Pinyin method is not easy to popularize in non-Mandarin regions.

Compared with the Pinyin Hanyin method, the shape code input method has a lower weighting rate and is used by professional typists. There are many shape code input methods of the existing technical meters, each with its own characteristics. Form code entry method involves aspects such as root escape, code extraction rules, and word breaking methods. When inputting, use the root or stroke as the code. Most methods use the four-code standard. Generally speaking, the more code points, the lower the recode, but the typing speed becomes slower.

From the perspective of root word extraction, the research on the composition and coding of Chinese characters by the existing methods is not deep enough. Roots that are used frequently are not escaped, but those that are used less frequently are selected, and the root classification is poor. Because the escape of the roots is not suitable, the number of roots (about 200) is increased, and the weight is increased. The greater disadvantage is that the Chinese characters are split into ambiguities. In addition to this, the word splitting rules are more complicated. It only takes a few days or even tens of days for the user to memorize only the radicals, not to mention the memory-intensive and complicated word-breaking rules. In short, for the existing code entry method, the user must use a long time to learn before using it.

In addition, in the existing form code method, there is still a bad idea, that is, a The words of the interspersed structure are broken into small pieces to match the letters or numbers. For example, where "heavy", "fruit", and "li" are disassembled from each other, it is difficult to unify them. The method of disassembling such a structured word requires the user to memorize it. It will inevitably increase the burden on users.

Another disadvantage of the prior art shape code wheel entry method is that the encoding of traditional Chinese characters and simplified Chinese characters cannot be unified. Not to mention ancient Chinese characters and variant Chinese characters. This makes the application range of most shape code input methods extremely limited.

The most commonly used form code input method in China is a Chinese patent

"Optimizing Wubi font encoding method and keyboard" disclosed in 85100837. According to the patent, the strokes of Chinese characters are summarized into five basic strokes, and 199 basic radicals composed of the five basic strokes are summarized. These radicals are assigned to the letter keys of a standard keyboard and given a certain code value . In this way, for Chinese characters, numbers and letters can be coded separately. The disadvantages of this encoding method are: too many radicals and large amount of memory; and the narrow range of encoding of radicals can not be used for general and simplified encodings; using numeric key encoding, the longest code requires 8 digits, not digits Keys and letter keys commonly use a set of root code. In addition, when inputting Chinese characters, Chinese characters with less than four roots should be added with "Last Stroke Cross-Recognition Code." Add speed.

Another form code entry method is disclosed in Chinese Patent No. 87103761, and its name is "Chinese Character Pen Order Shape Code Encoding Method". The encoding method is to represent the stroke shape of a single stroke of a Chinese character with 10 shape codes, and assign these ten shape codes to the number keys on a standard English keyboard, and adopt numeric coding. The method is characterized by few roots and easy to learn. However, its encoding method is not unique, and it cannot be a common encoding for traditional and simplified Chinese. Summary of the Invention

An object of the present invention is to provide a method for encoding Chinese characters into a computer. According to this method, simplified, traditional, ancient and variant Chinese characters, as well as Chinese characters in Korean and Japanese, can be entered into a computer.

In order to achieve the above purpose, a method for encoding Chinese characters and cutting them into a computer is provided. Each Chinese character is composed of one or more characters, and each character is composed of one or more strokes in a traditional writing order. The method is characterized by comprising the following steps:

1) For multiple characters that make up a Chinese character, follow the shape of each character and the number of the Chinese character "1 ,,," 2 ,, "" Three "," Four "," Five, "," Six "," Seven "," eight "," nine, "and Which of the ten is similar in shape, divides multiple fonts into 10 groups, and marks them as groups 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10, and

Divide the characters whose one stroke passes through the other two strokes while writing into Group 5, and the characters whose shapes are similar to the Arabic numerals "1" and "7" into Groups 1 and 7, respectively.

Divide the strokes into a group of characters that are similar to several early shapes;

2) Assign the fonts of the 10 groups to the number keys 1 to 9 and 0 on the keyboard, respectively, so that each font in the 10 groups corresponds to a number;

3) The Chinese character to be encoded is decomposed into at least one character, and a number corresponding to the at least one character is input to a computer through a keyboard. In the above method of encoding Chinese characters and incorporating them into a computer, "characters" refer to the parts that make up a Chinese character, and each character consists of one or more strokes in the traditional writing order.

An important feature of the present invention is that the selection of characters is based on whether the shape of the structure composed of several strokes in the traditional writing order is similar to the shape of the + Chinese character numbers "一" to "+" and the Arabic numbers "1" and "7". As the basis. A structure made up of several strokes with this shape is called a font. Therefore, the shape of the characters has a strong regularity, which is easy to remember, and the characters can be quickly disassembled, which is conducive to improving the speed of Chinese character entry.

Another important feature of the present invention is an encoding method for Chinese characters composed of interspersed structures. For characters or characters made up of crossed strokes, only the amount of penetration is used. The interspersed amount is the number of intersections of the strokes. For every two intersections, a code 5 is taken (the word belongs to the aforementioned group 5), and the rest of the intersections, and so on. If there is only one intersection, a code 0 is taken (the word belongs to the aforementioned group 10). For example, the code for "Towel" is 0, the code for "Medium" is 5, the code for "More" is 155, the code for "Feng" is 50, and the code for traditional Chinese "car" is 550. In this way, the amount of font selection is reduced, and the difficult points in Chinese character splitting are solved.

After intensive research on the knots of tens of thousands of traditional and simplified Chinese characters, the inventor summarized the characteristics and rules of the glyph and structure of the Chinese character, and prescribed the code location, thereby overcoming the ambiguities and characters in the Chinese character split This piece covers the limitations of Han Shao. From 6,763 simplified Chinese characters, more than 40,000 Chinese characters, and more than 5,000 phrases, 89 characters were escaped. In the prior art, the number of characters is the least, but it covers tens of thousands of Chinese characters. The restrictions of traditional and simplified languages achieve the same method and the same text. This method can also be used for coded input of archaic, variant Chinese characters, and Korean and Japanese characters. The 89 characters are divided into 10 groups and correspond to the numbers 1 to 9 and 0, respectively. It can make accurate digital encoding of each Chinese character, which is convenient for various applications such as new word search and computer Chinese character entry. In addition, the distribution of fonts on the keyboard allows the fonts with the highest grouping frequency to be allocated to the middle keys as much as possible, so as to increase the typing speed by using the sensitivity of the index finger.

In addition, when each character is assigned to each key on the keyboard, the first character, the middle character, and the last character are both of the same, which advantageously reduces the recoding rate. The weighting rate of the Chinese character encoding method of the present invention is only 6%, and the average code length of sheep Chinese characters is 3.48. When the phrase is included, the average code length of sheep Chinese characters is 2.48. Overview of the drawings

Other features and advantages of the present invention will be more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings.

FIG. 1 shows the distribution of 89 characters on 10 number keys of a keyboard according to an embodiment of the present invention;

FIG. 2 is a distribution problem in which the 89 characters in FIG. 1 are assigned to letter keys according to another embodiment of the present invention; FIG.

Figure 3 shows the Chinese character font and structure used by the method of the present invention. Best Mode of the Invention

FIG. 1 shows the distribution of 89 characters on 10 number keys of a keyboard according to an embodiment of the present invention. The general rules for the escape of characters are the shape of the characters taken and the Chinese characters "1", "2", "3", "four", "five", "six", "seven", "eight", The shapes of "nine" and "ten" are similar. In this way, a total of 10 groups of characters are taken. For some groups of characters, the above rules are slightly changed, as described below.

1) On the number key 1, a stroke character "一 1" and a radical character "stone, king, mountain" are assigned. The shape of the first stroke of radical characters is similar to the Chinese character "一" or Arabic numeral "1".

2) The stroke characters "IIII'J;J4w" and the radical character "Yuehe Yuzhou" are assigned to the number key 2. The first stroke of the radical character is 擞. 3) Stroke characters "Sanji" and radical characters are assigned to number key 3.

Ά ,. Radical characters are radicals with a little "," on the left. The opening of "匚" can be in four directions: up, down, left, and right.

4) On the number key 4, the stroked characters "Kou corpse-J1 ^" and radical characters "Rimu Usui ^" are assigned. The shape of the font is a square, with "" below "", with four corners.

5) On the number key 5 are assigned stroked characters "廿中牛 ^ 戈" and radical characters "Zhongcai Nu". The rule of word in this group can be summarized as one stroke and two strokes, without taking the frame and fold.

6) On the number key 6, the stroke characters "two," and the radical character "Fangmen ^ guang L" are assigned. Dots on radicals are in the middle.

7) On the number key 7 are assigned the stroke text "" 工 "and the radical character" 马纟 TP ". Many corner shapes.

8) On the number key 8, the stroked characters "eight persons, and radical characters" "^ 亇" are assigned. The characters are a figure of eight, or a herringbone.

9) On the number key 9, the stroke characters "Little ^ ^, ^" and radical characters "Fire" are assigned. "Small" shaped strokes are shaped like multiple prominences.

10) On the number key 0, the stroked characters "Ten Towels" and the radical characters "Civil" are assigned. Its characteristic is that two strokes intersect into a cross shape.

Sleepy 2 is a distribution of the 89 characters in FIG. 1 assigned to the letter keys according to another embodiment of the present invention, wherein the characters assigned on the numeric keys in FIG. 1 are assigned to the keyboard and the numbers. The keys correspond to several letter keys in a sloping column. Among them, the characters on the number key 0 are assigned to the letter keys P and M.

Assign "一、石" on the Q key,

Assign "King" on the A key,

Assign "Mountain, 1" on the Z key. Assign "A J" on the W key,

Assign "February Wo" on the S key,

Assign "Fish Boat II Ί J" on the X key. Assign "3 U" on the E key,

Assign "three i-gates \" on the D key,

Assign "ΐ 匚" on the C key. Assign "日] ^ corpse" on the R key,

Assign "port" on F key

Assign "mesh □" to the V key. Assign "Zhong Wu" on the T key,

Assign "Medium Only" on the G key,

Assign "Female" on the B key. Assign "广.", 方 "on the Y key,

Assign "Door ίΤ" on the H key,

Assign "L" on the N key. Assign "马 Ί" 1ί "on the U key,

Assign "Factory L> 纟 p" on the J key,

Assign "Towel" on the M key. Assign "eight pulls" on the I key,

Assign "People-亇" to the K key. Assign "4i," on the O key,

Assign "Small Fire †" on the L key. Assign "Ten Shi Yu ϊ" on the P key. Through the assignment of sleepy 1 and Figure 2, a font corresponds to both a number and a letter. In this way, a Chinese character The encoding can be either a numeric code or an alphabetic code. Therefore, when entering Chinese characters, you can enter either a numeric code or an alphabetic code. The encoding method of the present invention is described in detail below. In the following description, the encoding of Chinese characters is represented by a numeric code, and the corresponding alphabetic code is placed in parentheses. I. Coding method of fonts.

1. Text type, consisting of single stroke and small type, code based on single stroke and small type. Such as:

"Shi" is decomposed into "One, J, Mouth", as shown in Figure 1 and Figure 2, which correspond to the numbers 1, 2, 4 and the letters Q, W, and F, respectively. The code of "Shi" is 124 (QWF).

"He" is decomposed into "Wood", which corresponds to the numbers 2, 0 and the letters W and M, respectively. The code of "禾" is 20 (WM).

2. For non-text characters, use the pen to draw the code. If it is less than four codes, add 2 or W key. If it exceeds four codes, take the first, second, third, and last codes. Such as:

"" Decomposed into ",,", encoded as 662 HHW),

"才" is decomposed into "one j one", encoding 1212 (QXQW),

Decomposed into ", 7 1 /,", take the first, second, third, and last code, and encode as 6716 (HUZH). 2. Text encoding method.

The text is encoded according to the structure and glyph of the text. The character structure and glyph used in the present invention are shown in FIG. As shown in Figure 3, Chinese characters can be divided into sheep-structured Chinese characters, two-structured Chinese characters, three-structured Chinese characters, four-structured Chinese characters, and four-structured Chinese characters or more according to their structure.

Chinese characters with single structure, such as: upper, ear, big, heaven;

Chinese characters with two structures, such as: Lei, Ding, Ji, Peng, system, live;

Three-structure Chinese characters, such as: Italian, Chinese, Mo, Chinese, and Chinese;

Four structure Chinese characters, such as: rich, artemisia, curtain, stall, female;

Chinese characters with more than four structures, such as: green, 棻. As shown in Figure 3, Chinese characters can be divided into vertical Chinese characters, horizontal Chinese characters, and mixed characters according to the shape of the characters. Chinese characters and frame Chinese characters.

Vertical Chinese characters, such as: Lei, Ding, Ji, Yi, Yi, Mo, Fu, Artemisia, Curtain, Paste, Artemisia. For a certain Chinese character, from top to bottom, each structure is called a first structure, a second structure, a third knot, etc.

Horizontal Chinese characters, such as: Peng, system, live, paste, swell, 凇, stall, female. For a certain Chinese character, from left to right, each structure is called the first structure, the second structure, the third structure,....

Mixed Chinese characters are generally at least three-structured Chinese characters, such as: Zan, Ba, Zhi, Wan, Liu, Pu, Bao; four-knot mixed Chinese characters, such as: 悭, 悭, Yong, ¾, lean; Mixed-shaped Chinese characters, such as: qi, 翦, chew, and suffix. This type is mostly traditional Chinese characters. The general method for encoding text is to follow the order of stroke writing. If the stroke can form a typeface, it is taken as typeface. If the strokes cannot form a typeface, only the strokes are taken. Compose the text with as many strokes as possible.

Hereinafter, a method of encoding characters will be described in detail.

1, single structure Chinese characters, code according to a single stroke and small print. Such as:

The "ear" is broken down into "one I twenty" and coded as 1120 (QZSP).

"上" is decomposed into "1 to 1" and coded as 111 (ZQQ).

2. Horizontal or vertical Chinese characters.

The horizontal or rigid Chinese characters have two or more structures. According to the difference of the glyph (horizontal or vertical) and the number of structures, a maximum of four codes are used.

1) Two knot Chinese characters, each structure takes two yards. If a structure is less than two yards, then another knot will take one more yard.

Horizontal Chinese characters, the first structure and the second structure each take two codes.

For example, the first structure of "side" is "nose" and the second structure is "cha". The first structure "nose" is decomposed into "J head ... 1", the first and last characters are "! 1", then the first and last two codes are 21 (WZ). The second structure "check" is decomposed into "Muichiichi" , The first and last word is "Mu Yi", then the first and last two codes are 01 (MQ). Therefore, the code of "齄" is 2101 (WZMQ). Another example:

The code for "shoes" is 5000 (GPPP) and the code for "amount" is 6418 (YFQK). The encoding of "Frequency" is 1218 (ZWQK), and the encoding of "Crane" is 0121 (PQWQ).

These Chinese characters are the largest, accounting for about 60% of the total number of Chinese characters. The following examples show that when a certain structure is less than two yards, another knot takes one more yard, that is, another structure takes three yards:

"Staff ,: 0014 (MPQF);

"Knot": 7014 (JPQF);

"Puzzle": 585 (GIG);

"Juice": 30 (DP);

"悭": 9270 (LXUP :).

For vertical Chinese characters, take the first two yards for the first knot, and take the first two yards for the second.

Note that the first knot here takes the first two yards instead of the first and last yards because the code is in line with people's writing habits and can reduce the re-coding rate.

For example, the first structure of "lei" is "rain", and the second knot is "field". The first citrus "rain" is broken down into "one ten ...", and the first two yards are 10 (QP). The second structure "Tian" is decomposed into "□ 十", and the first and last two yards are 40 (VP). So the code for "Ray" is 1040

(QPVP).

For another example, the code for "Gee" is OU (PQF), the code for "Yes" is 4118 (RQZK), the code for "Tone" is 684 (NIR), and the code for "Home" is 6129 (YQWU.)

2) Three-structure Chinese characters. The general coding principle is: one code is used for two knots, and the other two structures are used for one code. The first structure is used for two codes first. If the first structure is less than two codes, the second structure is used. Take two yards, and so on.

If one code is used for one structure, the last code is used for the last structure, and the first code is used for other structures.

If the last knot takes two yards, take the first two yards.

If the other knots take two yards, for horizontal Chinese characters, take the first two characters, and for vertical Chinese characters, take the first two characters. For example, the horizontal Chinese character "paste" has a first structure of "meter", a second structure of "old", and a third structure of "month".

First of all, the first structure is "meter". The two characters at the beginning and end are multiple shapes. The two codes are 99 (LL). Then, the second knot is "old", the first word is "ten", and the first code is 0 (P). Finally, the third structure is "month", the last word is "two", and the ending code is 2 (S). Thus, the code for "batter" is 9902 (LLPS). Similarly,

The "swell" code is 2082 (SPIW).

The code for "凇" is 3086 (EMIH).

For another example, the vertical Chinese character "meaning" has a first structure of "Li", a second structure of "Sun", and a third structure of "Heart".

First of all, the first knot "li", the first and second characters are two, and "", take the first two yards as 68 or NI. Then, the second structure "ri" itself is a character, take The first code is 4 or 1. The last word of the third knot "heart" is "," and the last code is 6 or

H. In this case, the code for "Is" is 6846 or NIRH. Another example,

"Tiger" has a code of 1103 (ZQMD), and its first knot takes two yards.

The code for "蒌" is 9905 (OLPB), and its second structure takes two codes.

The code for "Mo" is 9408 ORPK), and its third structure takes two yards.

3) Four-structure Chinese characters, one code for each structure.

For example, the first, second, third, and fourth knots of "Fu" are "Yikoutian", each takes one yard, and the code of "Fu" is 6U4 (YQFV). Similarly,

The "high" encoding is 6434 (NFDF). The code of "jing" is 4649 (RNFL) ₀ "the four structures of the booth""4" (i "each takes a code of 5781 (GUKQ). Similarly, the code of" female "is 138KZDKQ :).

4) words with more than four structures, the first three structures take the first code, and the last structure take the last code.

For example, "", the first structure is "Guang", the second structure is "4", the third structure is "", and the last structure is "Month", then the first three structures each take the first code as 688 (YKK), the last structure has a tail code of 2 (S). Thus, the encoding of "" is 6882 (YKKS).

For another example, the first three knots of "Jia Who" are "ί Gui ^", the last structure is "", the first three structures each have a first code of 803 (KPC), and the last structure has a last code of 1 ( Q). In this way, the code of "Jia Who" is 8031 (KPCQ). 3. Mixed Chinese characters are divided into upper and lower vertical Chinese characters, such as "Zan", "Zhi" and "Fort", and upper and lower horizontal Chinese characters, such as "Ba," and "Wan,".

1) The upper and lower vertical Chinese characters. The upper part of the horizontal Chinese character takes the first two codes of the structure, and the lower part of the vertical Chinese character takes the first and last code. For those with only one yard in the lower part, take one more yard in the upper part, that is, three yards in the upper part.

For example, "Like", the upper part is "first", and the lower part is "shell". The first "first" is composed of two structures "first" and "first", each of which is 22 (WW). The lower part of the shell is 38 (DK). The code for "Like" is 2238 (WWDK).

The code for "tight" is 2779 (XUJL), the code for "reserved" is 7740GUVP), and the code for "pirate" is 3842 (EKFX).

For another example, "Chi" has "Yaguchi" in the upper part and "Sun" in the lower part. Because the bottom "day" has only one yard, the upper "Yaguchi" takes one more yard. Therefore, the upper "Yaguchi" takes three yards to 804 (KPF). The lower "day" takes one yard to 4 (R). The code for "Chi" is 8044

In the same way, "Fort", the upper ⁴ stays take three yards 840 KFM), and the lower "soil" take one yard 0 (P). Similarly, the code of "t" is 1174 (ZQUV :).

2) Vertically up and down horizontal Chinese characters, the upper two according to the vertical Chinese characters take the first two yards, and the lower two according to the horizontal Chinese characters each take the last code. For those with only one yard in the upper part, one more yard in the lower part, that is, three yards in the lower part.

For example, "Ba", the upper part is "Rain" and the lower part is "Leather Moon". The upper "rain" takes the first two yards as 10 (QP). The lower part of "Leather Moon" has a tail code of 02 (PS). In this way, the "Ba" code is 1002 (QPPS).

The code for "Nie" is llOO (QZMM), and the code for "Jie" is 4207 (FXPU). For another example, "wan", the upper part is "" and the lower part is "Xi Ji". Θ is only one yard for the upper part, so one more yard for the lower part. Therefore, the upper "" takes one yard of 6 (Y). The lower part of the "Even 〇α" Yusan Yard 263 (WHD). Thus, the "Wan" code is 6263 (YWHD).

Similarly, "Pu", the upper part "^" takes one yard 9 (0), the lower "Pu", takes three yards 365 (DH T). In this way, the code of "Pod" is 9365 (ODHT)

The code for "Silent" is 6190 (YZLM).

4. Frame-shaped Chinese characters, including both full-frame and half-frame situations. When coding, first take the frame code and then the internal structure code. E.g:

"Country" is 416 VAH),

The code of "Pu" is 4655 VHTT). The reason why the code of "Fu" is 655 (HTT), please refer to the description of "III. Special Situations" below.

Identical code is 314 (DQF).

The code for "Doc" is 3808 (CKPK).

The "encoding" is 7657 (JHBU). Three. Special circumstances

In each of the above coding methods, the coding of each structure is generally performed according to the traditional stroke writing order of Chinese characters. However, for coding convenience, there are several special cases below.

1. Structure composed of interspersed strokes.

This interstitial structure is characterized by intersecting strokes in the structure. For example, in the words "Zhong", "Jin", "Yang", "A", "Yu", "Qu", there are one vertical stroke "I" or two vertical strokes "II" crossing other horizontal strokes " One". Because each stroke has different splitting methods when it is composed, it is easy to cause confusion and make the code retrieval not unique.

For Chinese characters with this interspersed structure, only the intersected amount (that is, the number of intersections) is considered, and the box, half box, and chamfer that are interspersed are not taken. When fetching a code, a code 5 is taken for every two intersections that are full, and so on for the remaining intersections. If there is only one intersection, then take a code of 0. E.g,

"Towel" has an intersection, and the code is 0;

"Zhong" has 2 intersections, and the code is 5;

"More" is decomposed into two parts, the upper part is coded as 1, the lower part has 4 intersections, and the two coded as 55, the "More" code is 155;

"Feng" has 3 intersections, first take a yard as 5, and the remaining one, take the yard 0. The code of "Feng" is 50;

"Rate" (traditional Chinese characters for cars), there are 5 intersections, first take a code 5 and the remaining 3 intersections, then take a code 5 and the remaining intersection, and finally take a code 0. The encoding is 550.

In addition, when other strokes are connected at the lower part after the insertion, in addition to the above method, the other strokes must also be coded. For example-"Big" is encoded as 08 (PK),

"乇" is encoded as 203 (WMD).

"Un" is encoded as 59 (TL).

The code for "于" is 102 (QPX).

The code for "child" is 702 (UPX).

If the box is not penetrated, the code is also taken. E.g:

The code for "A" is 45 (VT),

The code for "you" is 54 (TV).

The code for "Qu" is 554 (GGV).

If both the horizontal stroke "a" and the vertical stroke "1" intersect, then both horizontal and vertical strokes are coded. E.g:

The code for "Jun" is 554 (GBF).

The code for "聿" is 5550 (TGTP).

2. Stroke order adjustment.

The stroke order of a small number of Chinese characters falls on the top or left. In order to conform to the general stroke order (from top to bottom, left to right), it is convenient for non-Chinese users to enter Chinese characters. When encoding, the individual stroke order is omitted. Make adjustments. E.g,

The code for "dog" is 608 (HPK),

"Fu" is coded as 655 (HTT).

"Child" is coded as 702OJPX),

The "encoding" is 7657 (JHBU),

The code for "North" is 2123 (SZWD).

The appropriate encoding is 6204 (NWPF).

"Pass" is coded 6765 (NUHT), The code for "Build" is 7050 (UMTP). Four. Phrase coding method

Word groups are two-word phrases, three-word phrases, four-word phrases, and multi-word phrases. 1. Two-character phrases, each Chinese character takes the first and last code.

For example, "Science", where "Subject" is 20 (SP), and "Learning" is 92CLX, then "Science" is 2092 (SPLX).

For another example, "Labor", where "Labor" takes the first two characters of the first and last code as 97 (OU), and "Move" takes the first two characters of the first and last code of 27 (SU), then the code of "Labor" is 9727 (OUSU

2. Three-character phrase, ^ One character takes the first two characters, and the second and third characters take the first character. For example, "Academy of Sciences", where "Technology" takes the first two characters of 20 (SP), "Study" takes the first character of 9 (L), and "Institute" takes the first character of 7 (U). It is 2097 (SPLU).

3. Four-character phrases, each word takes the first code.

For example, "Family Planning" is coded 3626 (CHWN :).

4. For multi-word phrases or phrases, the first three characters are given the first code and the last one is given the first code. For example, "People's Republic of China", the first three characters "Chinese" each have a first code of 588

(GKK), the last word "国" is prefixed with 4 (V), then the code of "People's Republic of China" is 5884 (GKKV). The following introduces another phrase encoding method, which is only applicable to letter codes and not to numeric codes.

Two-word phrase

Code = first prefix code + command key + second prefix and end code.

Among them, the command key is escaped from a group of letters according to the previous code, and has nothing to do with the text constraint.

The escape rules for command keys in two-word phrase encoding are as follows: If the previous code is Q, A or Z, the command key is A;

If the previous code is W, S or X, the command key is X;

If the previous code is E, D or C, the command key is C;

If the previous code is R, F or V, the command key is V;

If the previous code is T, G or B, the command key is B;

If the previous code is Y, H or N, the command key is Y;

If the previous code is U or J, the command key is J;

If the previous code is I or K, the command key is I;

If the previous code is O or L, the command key is 0;

If the previous code is P or M, the command key is M.

Take "science" as an example,

The first word "科" starts with S (禾),

According to the above rules, the command key is X,

The second word "learning" starts with L and ends with X, then

"Science" is coded as SXLX.

For another example, "Labor", the first word of "Labor" is 0, the command key is 0, and the second and last word of "Dong" is SU, and the code of "Labor" is OOSU.

2.Three-word phrases

Code = first prefix code + second prefix code + command key + third prefix code.

Among them, the escape rule of the command key is:

If the previous code is Q, A, Z, W, S or X, the command key is A;

If the previous code is E, D, C, R, F, or V, the command key is V;

If the previous code is T, G, B, Y, H or N, the command key is Y;

If the previous code is U, J, I, or K, the command key is I;

If the previous code is 0, L, P, or M, the command key is 0.

For example, "organization department", the first code of the first word "group" is K), and the first code of the second word "weaving" is K), according to the above rules, the command key is I, and the third word "" The first code of "" is Ν (), then the code of "Organization Department" is JJIN.

For another example, "Academy of Science", the first code of "科" is S (禾), and the second word "学" The first code is L, the command key is 0, and the first character of the third word "yuan" is υ (ι, then the code of "Academy of Science" is SLOU.

3. Four-character phrases

Code = first prefix code + second prefix code + third prefix code + command key.

Among them, the escape rule of the command key is the same as the escape rule of the command key in the three-word phrase encoding. The encoding of the four-character phrase has nothing to do with the fourth character.

For example, the first three digits of "family planning" are CHW respectively. According to the above rules, the command key derived from W is A. The "family planning" code is CHWA.

For another example, the code for "educational work" is PNQA, and the code for "Beijing, China" is GVSA.

4. Four or more words or phrases

The first three characters take the first code, and the last one takes the first code.

For example, for the "International Trade Promotion Committee", the first three words "International Trade" each take the first code as VUJ, and the last word "hui" takes the first code as K, then the entire phrase is coded as VUJK "Is encoded as GKKV. V. Encoding of Traditional Chinese Characters

The aforementioned encoding methods are applicable to the encoding of traditional Chinese characters, that is, they still use the same 89 characters and the same code fetching rules. However, because some typefaces have the traditional form, they should be coded as a whole. These texts and their corresponding traditional forms are:

The corresponding traditional form of "fish" (2) is ^ "(2),

The corresponding traditional form of "i" (3) is "言" (3),

The corresponding traditional form of "门" (6) is "斗" (4),

The corresponding traditional form of "马" (7) is "," (7),

The corresponding traditional form of "holding" (8) is "金" (8),

The corresponding traditional form of "t" (8) is "食" (8),

The corresponding traditional form of "纟" (7) is "" (7), W

The corresponding traditional form of "L" (6) is "3L" (6),

The corresponding traditional form of "" (9) is "W (9).

Among them, "Yan" (3), "Jin" (8), and "Shi" (8) are coded as 3, 8, and 8 when they are radicals. When not acting as a radical, the code is obtained by stroke.

Examples of traditional Chinese characters encoding methods are as follows:

"" Is a vertical three-structure Chinese character, the first knot is 27 for the first code, the first code is 4 for the second structure, and the last code is 8 for the third structure, and the Chinese character is encoded as 2748;

"劳" is a mixed vertical Chinese character with a vertical code of 9977;

* It is a horizontal two-structure Chinese character with a code of 8015. In order to further improve the speed of Chinese character input, you can set the short code form of high-frequency characters, for example, some characters can be entered by pressing one key, some characters can be entered by pressing two keys, and so on. The design methods are diverse and need not be repeated here. Industrial applicability

The invention can be applied to any information processing system involving Chinese characters. According to the Chinese character encoding method of the present invention, not only simplified Chinese characters can be encoded and entered, but also traditional, ancient, and alien Chinese characters can be encoded and entered, and even Korean and Japanese characters can be encoded and entered.

Although the best embodiment of the present invention has been described above, it should be understood that various modifications and changes can be made by those skilled in the art without departing from the scope and essence of the present invention. The exemplary drawing of the invention is defined by the appended claims.

Claims

Rights request

1. A method for encoding Chinese characters into a computer, wherein each Chinese character is composed of one or more characters, and each character is composed of one or more strokes in a traditional writing order. The method is characterized by The following steps:

1) For multiple characters that make up a Chinese character, follow the shape of each character and the number of the Chinese character "1 ,,," 2 "," 3, "," Four "," Five "," Six "," Seven " "," Eight "," nine "and

Which one of the "ten" has a similar shape, divides the plurality of characters into ₁₀ groups, and marks them as groups 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10, and

2) Assign the fonts of the 10 groups to the numeric keys 1 to 9 and 0 on the keyboard, respectively, so that each font in the 10 groups corresponds to a number;

3) The Chinese character to be encoded is decomposed into at least one character, and a number corresponding to the at least one character is input to a computer through a keyboard.

The method for encoding Chinese characters and inputting them into a computer according to claim 1, characterized in that said step 1) further comprises the following steps:

Divide the strokes "一 I" and radical characters "Shi, Wang, Shan" into the first group; divide the strokes "二 II -J} {and radical characters" 月禾鱼舟 " Assigned to the second group;

Divide the stroke characters "三 ^ 匚" and radical characters "Γ into the first

3 groups

Divide the strokes "□ corporal mouth" and radical characters "日 ϋ 素口 ^" into the fourth group mentioned above;

Divide the stroke type "廿 noon" and the radical word "worm 4 women" into the fifth group;

Divide the strokes "、,，", "" and radical characters "Fang Menguang L" into group 6;

Distribute the stroke type "7 Plants and radical characters" 马纟 p "to the Group 7

Divide the stroked characters "eight persons," and radical characters "[φ t" into the eighth group;

Divide the stroke characters "small, 1 ',, ^ !, and radical characters" fire "into the 9th group;

The strokes "10 towels and 10", and radical characters "Civil" are divided into the 10th group.

The method for encoding Chinese characters and inputting them into a computer according to claim 2, characterized in that said step 3) further comprises the steps:

For Chinese characters composed of interspersed structures, according to the intersections formed by the cross strokes in the interspersed structure, a number 5 is entered for every two intersections, the intersections below, and so on, until there is only one intersection And enter a number of 0.

4. The method for encoding Chinese characters and inputting them into a computer according to claim 1, further comprising the following steps:

Allocating the letters of the 10 groups to the alphabetic keys on the keyboard that are in a sloping row with the numeric keys 1 to 9 and 0, so that each word in the 10 groups corresponds to a letter;

The Chinese character to be coded is decomposed into at least one character, and a letter corresponding to the at least one character is input to a computer through a keyboard.

5. The method for encoding Chinese characters and inputting them to a computer according to claim 4, characterized in that said 10 groups of words are respectively assigned to a keyboard in a slanted line with the numeric keys 1 to 9 and 0 The steps on the letter keys further include the following steps: Assign "一、石" to the Q key,

Assign "King" to the Α key,

Assign "mountain, 1" to the Z key,

Assign "Tj" to the W key,

Assign "二月禾" to the S key,

Assign "Fish Boat II 'J" to the X key.

Assign "3 U" to the E key,

Assign the "three 'gate" to the D key,

Assign "匚" to the C key, Assign "Sun Corpse," to the R key,

Assign "t t XL C 'to the F key,

Assign the "eye mortar □" to the V key,

Assign "bug cow" to the T key,

Assign "中才廿" to the G key,

Assign "female /" to the B key,

Assign "广" 、、方 "to Υ,

Assign "door," to the Η key,

Assign "L 丄" to the N key,

Assign "Ma 7 V to the U key,

Assign "factory I ³ " to the J key,

Assign "Towel" to the M key,

Assign "eight pulls" to the I key,

Assign "person t" to the K key,

Assign "^ U," to the 0 key,

Assign "小小火 †" to the L key,

Assign "十土于 ί" to the P key.

6. The method for encoding Chinese characters and inputting them into a computer according to claim 5, characterized in that said step 3) further comprises the steps:

For Chinese characters composed of interspersed structures, according to the intersections formed by the cross strokes in the interspersed knots, every two intersections are filled with one of the letters T, G, and B, and the rest of the intersections, and so on Until there is only one intersection and enter one of the letters P and M.

The method for encoding Chinese characters and inputting them into a computer according to claim 1, characterized in that said step 3) comprises:

If the Chinese character to be encoded is a font, it is decomposed into single strokes and small fonts.

If the Chinese character to be encoded is a single-structure Chinese character, it is decomposed into single strokes and small characters.

If the Chinese characters to be encoded are two-structure Chinese characters, two codes are taken from each of the two structures.

If the Chinese character to be encoded is a three-structure Chinese character, two codes are taken from one of the three structures, and one code is taken from each of the other two structures.

If the Chinese character to be encoded is a four-structure Chinese character, one code is taken from each of the four structures.

If the Chinese character to be encoded is a Chinese character with more than four structures, one code is taken from each of the first three structures, and one code is taken from the last knot.

13. The method for encoding Chinese characters and inputting them into a computer according to any one of claims 2, 5, 7 to 12, further comprising the following steps:

When the Chinese character to be encoded is a traditional Chinese character:

Group the word "exempt" into the second group,

Divide the word "言" into the third group,

Group the word "?" Into said group 4,

Divide the word ". ¾" into the 7th group,

Divide the word "gold" into said group 8,

Divide the word "食" into the eighth group,

Divide the word "^" into the 7th group,

The wording "++" is assigned to the 9th group.