WO1997007449A1 - Method of encoding and inputing complicated or simplified form of chinese character and keyboard thereof - Google Patents

Method of encoding and inputing complicated or simplified form of chinese character and keyboard thereof Download PDF

Info

Publication number
WO1997007449A1
WO1997007449A1 PCT/CN1996/000069 CN9600069W WO9707449A1 WO 1997007449 A1 WO1997007449 A1 WO 1997007449A1 CN 9600069 W CN9600069 W CN 9600069W WO 9707449 A1 WO9707449 A1 WO 9707449A1
Authority
WO
WIPO (PCT)
Prior art keywords
characters
key
assign
code
chinese
Prior art date
Application number
PCT/CN1996/000069
Other languages
French (fr)
Chinese (zh)
Inventor
Dezi Shao
Wuzhou Wei
Heqi Ren
Yonghong Hao
Fucheng Li
Original Assignee
Dezi Shao
Wuzhou Wei
Heqi Ren
Yonghong Hao
Fucheng Li
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dezi Shao, Wuzhou Wei, Heqi Ren, Yonghong Hao, Fucheng Li filed Critical Dezi Shao
Priority to AU67314/96A priority Critical patent/AU6731496A/en
Publication of WO1997007449A1 publication Critical patent/WO1997007449A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/018Input/output arrangements for oriental characters

Definitions

  • the invention relates to a Chinese character encoding input method and a keyboard, and particularly to a method for encoding and inputting Chinese characters (including traditional, simplified, ancient, and different Chinese characters, as well as Chinese characters in Korean and Japanese) by using characters, and according to the method.
  • Method designed keyboard Method designed keyboard.
  • the Pinyin Hanru method is the earliest Hanru method and it is easy to learn. Apart from professional typists, it is still used by a large number of users. However, because there are so many homophones in Chinese characters, the recoding rate of Pinyin Hanyu method is very high.
  • Chinese characters include simplified, traditional, archaic and variant, with a total number of tens of thousands, of which the most commonly used are only a few thousand. For the most infrequently used Chinese characters, ordinary users don't know their pronunciation at all, and they can't use Pinyin to input these Chinese characters. In addition, there are many Chinese dialects, and the same character has different pronunciations in different regions. Therefore, the Pinyin method is not easy to popularize in non-Mandarin regions.
  • the shape code input method has a lower weighting rate and is used by professional typists.
  • Form code entry method involves aspects such as root escape, code extraction rules, and word breaking methods.
  • root escape When inputting, use the root or stroke as the code.
  • word breaking methods When inputting, use the root or stroke as the code.
  • Most methods use the four-code standard. Generally speaking, the more code points, the lower the recode, but the typing speed becomes slower.
  • Another disadvantage of the prior art shape code wheel entry method is that the encoding of traditional Chinese characters and simplified Chinese characters cannot be unified. Not to mention ancient Chinese characters and variant Chinese characters. This makes the application range of most shape code input methods extremely limited.
  • An object of the present invention is to provide a method for encoding Chinese characters into a computer. According to this method, simplified, traditional, ancient and variant Chinese characters, as well as Chinese characters in Korean and Japanese, can be entered into a computer.
  • a method for encoding Chinese characters and cutting them into a computer is provided.
  • Each Chinese character is composed of one or more characters, and each character is composed of one or more strokes in a traditional writing order.
  • the method is characterized by comprising the following steps:
  • characters refer to the parts that make up a Chinese character, and each character consists of one or more strokes in the traditional writing order.
  • An important feature of the present invention is that the selection of characters is based on whether the shape of the structure composed of several strokes in the traditional writing order is similar to the shape of the + Chinese character numbers " ⁇ " to "+” and the Arabic numbers "1" and "7". As the basis. A structure made up of several strokes with this shape is called a font. Therefore, the shape of the characters has a strong regularity, which is easy to remember, and the characters can be quickly disassembled, which is conducive to improving the speed of Chinese character entry.
  • Another important feature of the present invention is an encoding method for Chinese characters composed of interspersed structures. For characters or characters made up of crossed strokes, only the amount of penetration is used. The interspersed amount is the number of intersections of the strokes. For every two intersections, a code 5 is taken (the word belongs to the aforementioned group 5), and the rest of the intersections, and so on. If there is only one intersection, a code 0 is taken (the word belongs to the aforementioned group 10). For example, the code for "Towel” is 0, the code for "Medium” is 5, the code for "More” is 155, the code for "Feng” is 50, and the code for traditional Chinese “car” is 550. In this way, the amount of font selection is reduced, and the difficult points in Chinese character splitting are solved.
  • the first character, the middle character, and the last character are both of the same, which advantageously reduces the recoding rate.
  • the weighting rate of the Chinese character encoding method of the present invention is only 6%, and the average code length of sheep Chinese characters is 3.48. When the phrase is included, the average code length of sheep Chinese characters is 2.48. Overview of the drawings
  • FIG. 1 shows the distribution of 89 characters on 10 number keys of a keyboard according to an embodiment of the present invention
  • FIG. 2 is a distribution problem in which the 89 characters in FIG. 1 are assigned to letter keys according to another embodiment of the present invention
  • Figure 3 shows the Chinese character font and structure used by the method of the present invention. Best Mode of the Invention
  • FIG. 1 shows the distribution of 89 characters on 10 number keys of a keyboard according to an embodiment of the present invention.
  • the general rules for the escape of characters are the shape of the characters taken and the Chinese characters "1", “2", “3", “four”, “five”, “six”, “seven”, “eight”, The shapes of "nine” and "ten” are similar. In this way, a total of 10 groups of characters are taken. For some groups of characters, the above rules are slightly changed, as described below.
  • ⁇ ,. Radical characters are radicals with a little ",” on the left.
  • the opening of " ⁇ " can be in four directions: up, down, left, and right.
  • Sleepy 2 is a distribution of the 89 characters in FIG. 1 assigned to the letter keys according to another embodiment of the present invention, wherein the characters assigned on the numeric keys in FIG. 1 are assigned to the keyboard and the numbers.
  • the keys correspond to several letter keys in a sloping column. Among them, the characters on the number key 0 are assigned to the letter keys P and M.
  • a font corresponds to both a number and a letter.
  • the encoding can be either a numeric code or an alphabetic code. Therefore, when entering Chinese characters, you can enter either a numeric code or an alphabetic code.
  • the encoding method of the present invention is described in detail below. In the following description, the encoding of Chinese characters is represented by a numeric code, and the corresponding alphabetic code is placed in parentheses. I. Coding method of fonts.
  • Text type consisting of single stroke and small type, code based on single stroke and small type. Such as:
  • the text is encoded according to the structure and glyph of the text.
  • the character structure and glyph used in the present invention are shown in FIG.
  • Chinese characters can be divided into sheep-structured Chinese characters, two-structured Chinese characters, three-structured Chinese characters, four-structured Chinese characters, and four-structured Chinese characters or more according to their structure.
  • Three-structure Chinese characters such as: Italian, Chinese, Mo, Chinese, and Chinese;
  • Chinese characters with more than four structures such as: green, ⁇ .
  • Chinese characters can be divided into vertical Chinese characters, horizontal Chinese characters, and mixed characters according to the shape of the characters.
  • each structure is called a first structure, a second structure, a third knot, etc.
  • Horizontal Chinese characters such as: Peng, system, live, paste, swell, ⁇ , stall, female.
  • each structure is called the first structure, the second structure, the third structure,....
  • Mixed Chinese characters are generally at least three-structured Chinese characters, such as: Zan, Ba, Zhi, Wan, Liu, Pu, Bao; four-knot mixed Chinese characters, such as: ⁇ , ⁇ , Yong, 3 ⁇ 4, lean; Mixed-shaped Chinese characters, such as: qi, ⁇ , chew, and suffix.
  • This type is mostly traditional Chinese characters.
  • the general method for encoding text is to follow the order of stroke writing. If the stroke can form a typeface, it is taken as typeface. If the strokes cannot form a typeface, only the strokes are taken. Compose the text with as many strokes as possible.
  • the "ear” is broken down into “one I twenty” and coded as 1120 (QZSP).
  • the horizontal or rigid Chinese characters have two or more structures. According to the difference of the glyph (horizontal or vertical) and the number of structures, a maximum of four codes are used.
  • the first structure of "side” is “nose” and the second structure is “cha”.
  • the first structure "nose” is decomposed into “J head ... 1", the first and last characters are “! 1", then the first and last two codes are 21 (WZ).
  • the second structure "check” is decomposed into “Muichiichi” , The first and last word is “Mu Yi”, then the first and last two codes are 01 (MQ). Therefore, the code of " ⁇ " is 2101 (WZMQ).
  • the code for "shoes” is 5000 (GPPP) and the code for "amount” is 6418 (YFQK).
  • the encoding of "Frequency” is 1218 (ZWQK), and the encoding of "Crane” is 0121 (PQWQ).
  • the first knot here takes the first two yards instead of the first and last yards because the code is in line with people's writing habits and can reduce the re-coding rate.
  • the first structure of "lei” is “rain”, and the second knot is “field”.
  • the first citrus “rain” is broken down into “one ten ", and the first two yards are 10 (QP).
  • the second structure "Tian” is decomposed into “ ⁇ ⁇ ”, and the first and last two yards are 40 (VP). So the code for "Ray” is 1040
  • the code for "Gee” is OU (PQF)
  • the code for “Yes” is 4118 (RQZK)
  • the code for "Tone” is 684 (NIR)
  • the code for "Home” is 6129 (YQWU.)
  • Three-structure Chinese characters The general coding principle is: one code is used for two knots, and the other two structures are used for one code.
  • the first structure is used for two codes first. If the first structure is less than two codes, the second structure is used. Take two yards, and so on.
  • the last code is used for the last structure, and the first code is used for other structures.
  • the horizontal Chinese character “paste” has a first structure of "meter”, a second structure of "old”, and a third structure of "month”.
  • the first structure is "meter”.
  • the two characters at the beginning and end are multiple shapes.
  • the two codes are 99 (LL).
  • the second knot is "old”
  • the first word is "ten”
  • the first code is 0 (P).
  • the third structure is "month”
  • the last word is “two”
  • the ending code is 2 (S).
  • the code for "batter” is 9902 (LLPS).
  • the "swell” code is 2082 (SPIW).
  • the code for " ⁇ " is 3086 (EMIH).
  • the vertical Chinese character "meaning” has a first structure of "Li”, a second structure of "Sun”, and a third structure of "Heart”.
  • the first knot "li" the first and second characters are two, and "", take the first two yards as 68 or NI.
  • the second structure "ri" itself is a character, take The first code is 4 or 1.
  • the last word of the third knot “heart” is "," and the last code is 6 or
  • the code for " ⁇ " is 9905 (OLPB), and its second structure takes two codes.
  • the first, second, third, and fourth knots of "Fu” are "Yikoutian", each takes one yard, and the code of "Fu” is 6U4 (YQFV).
  • the code of "Fu” is 6U4 (YQFV).
  • the "high” encoding is 6434 (NFDF).
  • the code of "jing” is 4649 (RNFL) 0 "the four structures of the booth”"4" (i “each takes a code of 5781 (GUKQ). Similarly, the code of" female “is 138KZDKQ :).
  • the first structure is “Guang”
  • the second structure is “4"
  • the third structure is “”
  • the last structure is "Month”
  • the first three structures each take the first code as 688 (YKK)
  • the last structure has a tail code of 2 (S).
  • the encoding of "" is 6882 (YKKS).
  • the first three knots of "Jia Who" are " ⁇ Gui ⁇ ", the last structure is "", the first three structures each have a first code of 803 (KPC), and the last structure has a last code of 1 ( Q).
  • the code of "Jia Who” is 8031 (KPCQ).
  • the upper and lower vertical Chinese characters The upper and lower vertical Chinese characters.
  • the upper part of the horizontal Chinese character takes the first two codes of the structure, and the lower part of the vertical Chinese character takes the first and last code. For those with only one yard in the lower part, take one more yard in the upper part, that is, three yards in the upper part.
  • the first “first” is composed of two structures “first” and “first”, each of which is 22 (WW).
  • the lower part of the shell is 38 (DK).
  • the code for "Like” is 2238 (WWDK).
  • the code for "tight” is 2779 (XUJL), the code for "reserved” is 7740GUVP), and the code for "pirate” is 3842 (EKFX).
  • the code for "Nie” is llOO (QZMM), and the code for "Jie” is 4207 (FXPU).
  • the upper part is “” and the lower part is “Xi Ji".
  • is only one yard for the upper part, so one more yard for the lower part. Therefore, the upper “” takes one yard of 6 (Y).
  • the "Wan” code is 6263 (YWHD).
  • Identical code is 314 (DQF).
  • the code for "Doc” is 3808 (CKPK).
  • the "encoding" is 7657 (JHBU). Three. Special circumstances
  • This interstitial structure is characterized by intersecting strokes in the structure. For example, in the words “Zhong”, “Jin”, “Yang”, “A”, “Yu”, “Qu”, there are one vertical stroke “I” or two vertical strokes "II” crossing other horizontal strokes " One". Because each stroke has different splitting methods when it is composed, it is easy to cause confusion and make the code retrieval not unique.
  • is encoded as 203 (WMD).
  • the code for " ⁇ " is 102 (QPX).
  • the code for "child” is 702 (UPX).
  • the code for "you” is 54 (TV).
  • the code for "Qu” is 554 (GGV).
  • the code for "Jun” is 554 (GBF).
  • the code for " ⁇ " is 5550 (TGTP).
  • the stroke order of a small number of Chinese characters falls on the top or left.
  • the individual stroke order is omitted. Make adjustments. E.g,
  • the "encoding” is 7657 (JHBU),
  • the code for "North” is 2123 (SZWD).
  • the appropriate encoding is 6204 (NWPF).
  • Word groups are two-word phrases, three-word phrases, four-word phrases, and multi-word phrases. 1. Two-character phrases, each Chinese character takes the first and last code.
  • each word takes the first code.
  • “Family Planning” is coded 3626 (CHWN :).
  • the first three characters are given the first code and the last one is given the first code. For example, "People's Republic of China", the first three characters "Chinese” each have a first code of 588
  • the command key is escaped from a group of letters according to the previous code, and has nothing to do with the text constraint.
  • the command key is X
  • the command key is C
  • the command key is V
  • the command key is B
  • the command key is Y
  • the command key is 0;
  • the command key is M.
  • the command key is X
  • the escape rule of the command key is:
  • the command key is A
  • the command key is V
  • the command key is Y
  • the escape rule of the command key is the same as the escape rule of the command key in the three-word phrase encoding.
  • the encoding of the four-character phrase has nothing to do with the fourth character.
  • the first three digits of "family planning" are CHW respectively.
  • the command key derived from W is A.
  • the "family planning" code is CHWA.
  • the code for "educational work” is PNQA
  • the code for "Beijing, China” is GVSA.
  • the first three characters take the first code, and the last one takes the first code.
  • is a mixed vertical Chinese character with a vertical code of 9977;
  • the invention can be applied to any information processing system involving Chinese characters.
  • Chinese character encoding method of the present invention not only simplified Chinese characters can be encoded and entered, but also traditional, ancient, and alien Chinese characters can be encoded and entered, and even Korean and Japanese characters can be encoded and entered.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)

Abstract

A method for encoding chinese character and inputting them into a computer, each chinese character being comprised of one or more character components and each character component being constituted by one or more strokes according to traditional writting order, the method is characterized by steps of: 1) classifying plurality of character components forming chinese characters into 10 groups which are marked as 1st, 2nd, 3rd, 4th, 5th, 6th, 7th, 8th, 9th and 10th groups respectively according to the similarity between each character component and one of the chinese numbers mentioned above; 2) designating the character components in the 10 groups to number keys 1 to 9 and 0 respectively such that each character component in the 10 groups corresponds to a number; and 3) dividing a chinese character to be encoded into at least one character component and inputing the number corresponding to said at least one character component into a computer using keyboard.

Description

汉字繁、简体编码输人方法及键盘  Chinese character input method with simplified and simplified coding and keyboard
技术领域 Technical field
本发明涉及一种汉字编码输入方法及键盘,尤其涉及一种利用 字件对汉字(包括繁体、简体、古体、异休汉字以及朝鲜文和日文中 的汉字)进行编码输入的方法,以及根据该方法而设计的键盘。 背景技术  The invention relates to a Chinese character encoding input method and a keyboard, and particularly to a method for encoding and inputting Chinese characters (including traditional, simplified, ancient, and different Chinese characters, as well as Chinese characters in Korean and Japanese) by using characters, and according to the method. Method designed keyboard. Background technique
目前,汉字编码狳入方法达几百种,但其中便于使用的方法却 不多。  At present, there are hundreds of Chinese character coding methods, but not many of them are easy to use.
拼音翰入法是最早的翰入方法,它简单易学。 现在除专业打字 员之外,仍为广大用户所使用。但是,因为汉字中同音字极多,所以 拼音翰入法的重码率很高。汉字包括简体、 繁体、古体、异体字,总 数不下几万个,其中最常用的汉字只有几千个。 对于大多数不常用 的汉字, 普通用户根本不知其读音,更无法利用拼音翰入法翰入这 些汉字。 另外,汉字方言众多,同一个字在不同的地区有不同的读 音。 所以,拼音翰入法在非普通话地区不便于普及。  The Pinyin Hanru method is the earliest Hanru method and it is easy to learn. Apart from professional typists, it is still used by a large number of users. However, because there are so many homophones in Chinese characters, the recoding rate of Pinyin Hanyu method is very high. Chinese characters include simplified, traditional, archaic and variant, with a total number of tens of thousands, of which the most commonly used are only a few thousand. For the most infrequently used Chinese characters, ordinary users don't know their pronunciation at all, and they can't use Pinyin to input these Chinese characters. In addition, there are many Chinese dialects, and the same character has different pronunciations in different regions. Therefore, the Pinyin method is not easy to popularize in non-Mandarin regions.
形码输入法,相对于拼音翰入法而言,重码率较低,为专业打字 员采用。 现有技米的形码输入法很多,各具特色。 形码翰入法涉及 字根逸取、取码规则和拆字方法等方面。在輸入时,用字根或笔划作 代码,多数方法以四码为标准。一般而言,码位越多,重码越低,但是 打字速度变慢。  Compared with the Pinyin Hanyin method, the shape code input method has a lower weighting rate and is used by professional typists. There are many shape code input methods of the existing technical meters, each with its own characteristics. Form code entry method involves aspects such as root escape, code extraction rules, and word breaking methods. When inputting, use the root or stroke as the code. Most methods use the four-code standard. Generally speaking, the more code points, the lower the recode, but the typing speed becomes slower.
从字根逸取来看,现有方法对汉字的构成与编码的研究还不够 深刻。对使用频度高的字根不逸取,却选取使用频度低的字根,并且 字根分类较差。 由于字根的逸取不合适, 增加了字根数量(200 个左右),增多重码,而更大的缺点是使汉字拆分产生多义性。 除此 之外,拆字規则比较复杂。 用户只是记忆字根就需要几天甚至几十 天的时间,更不必说记忆多且繁杂的拆字规则了。总之,对于现有形 码锛入法,用户必须经过长时间学习才能使用。  From the perspective of root word extraction, the research on the composition and coding of Chinese characters by the existing methods is not deep enough. Roots that are used frequently are not escaped, but those that are used less frequently are selected, and the root classification is poor. Because the escape of the roots is not suitable, the number of roots (about 200) is increased, and the weight is increased. The greater disadvantage is that the Chinese characters are split into ambiguities. In addition to this, the word splitting rules are more complicated. It only takes a few days or even tens of days for the user to memorize only the radicals, not to mention the memory-intensive and complicated word-breaking rules. In short, for the existing code entry method, the user must use a long time to learn before using it.
另外,现有形码翰入法中,还存在着一种不好的现彖,即把一个 穿插结构的字拆成小块去配合字母或数字。例如, "重"、"果 "和 "里"等,从哪里拆开,拆成几部分,很难统一起来。 象这样的穿插结 构的字的拆法,要求用户死记硬背。 必然又增加了用户的负担。 In addition, in the existing form code method, there is still a bad idea, that is, a The words of the interspersed structure are broken into small pieces to match the letters or numbers. For example, where "heavy", "fruit", and "li" are disassembled from each other, it is difficult to unify them. The method of disassembling such a structured word requires the user to memorize it. It will inevitably increase the burden on users.
现有技术形码輪入法还有一个共同的缺点, 是繁体汉字和简 体汉字的编码不能统一。更不用说古体汉字和异体汉字了。这使得 大多数形码输入法的应用范围受到极大限制。  Another disadvantage of the prior art shape code wheel entry method is that the encoding of traditional Chinese characters and simplified Chinese characters cannot be unified. Not to mention ancient Chinese characters and variant Chinese characters. This makes the application range of most shape code input methods extremely limited.
在 中 国 最 普 遍 使 用 的 形 码 输 入法, 是中国专利 The most commonly used form code input method in China is a Chinese patent
85100837中公开的"优化五笔字型编码方法及其键盘"。 根据该专 利,将汉字笔划归納为五种基本笔划,总结出由五种基本笔划组成 的 199个基本字根,将这些字根分配到标准键盘的字母键上,并赋 予一定的码值。这样,对于汉字,可以分別釆用数字和字母编码。该 编码方法的缺点是:字根太多,记忆量大;而且其字根的编码适用范 围窄,不能繁、简体通用编码;采用数字键编码,则最长码需翰入 8 个数字, 不能数字键、 字母键通用一套字根编码;另外,在输入汉 字时,不足四个字根的汉字,则要加"末笔字型交叉识别码",这样就 增加了拆字难度,减慢了汉字狳入速度。 "Optimizing Wubi font encoding method and keyboard" disclosed in 85100837. According to the patent, the strokes of Chinese characters are summarized into five basic strokes, and 199 basic radicals composed of the five basic strokes are summarized. These radicals are assigned to the letter keys of a standard keyboard and given a certain code value . In this way, for Chinese characters, numbers and letters can be coded separately. The disadvantages of this encoding method are: too many radicals and large amount of memory; and the narrow range of encoding of radicals can not be used for general and simplified encodings; using numeric key encoding, the longest code requires 8 digits, not digits Keys and letter keys commonly use a set of root code. In addition, when inputting Chinese characters, Chinese characters with less than four roots should be added with "Last Stroke Cross-Recognition Code." Add speed.
另一种形码翰入法公开在中国专利 87103761中,名称为 "汉字 笔序形码编码翰入法"。 其编码方法是将汉字的单一笔划的笔形用 10个形码表示,并将这十个形码分配到标准英文键盘上的数字键 上,采用数字编码。该方法的特点是字根少,易学。但是其编码方法 不唯一,且不能繁体、简体通用一套编码。 发明内容  Another form code entry method is disclosed in Chinese Patent No. 87103761, and its name is "Chinese Character Pen Order Shape Code Encoding Method". The encoding method is to represent the stroke shape of a single stroke of a Chinese character with 10 shape codes, and assign these ten shape codes to the number keys on a standard English keyboard, and adopt numeric coding. The method is characterized by few roots and easy to learn. However, its encoding method is not unique, and it cannot be a common encoding for traditional and simplified Chinese. Summary of the Invention
本发明的目的是提供一种对汉字进行编码并翰入到计算机中 的方法。根据该方法,能够将简体、繁体、古体和异体汉字以及朝鲜 文和日文中的汉字狳入到计算机中。  An object of the present invention is to provide a method for encoding Chinese characters into a computer. According to this method, simplified, traditional, ancient and variant Chinese characters, as well as Chinese characters in Korean and Japanese, can be entered into a computer.
为实现上述目的,提供一种对汉字进行编码并斩入到计算机中 的方法,其中每个汉字由一个或多个字件组成,每个字件由一个或 多个笔划按传统书写顺序组成,该方法的特征在于包括下列步骤:  In order to achieve the above purpose, a method for encoding Chinese characters and cutting them into a computer is provided. Each Chinese character is composed of one or more characters, and each character is composed of one or more strokes in a traditional writing order. The method is characterized by comprising the following steps:
1) 对于组成汉字的多个字件,按照其中每个字件的形状与汉 字数字" 一,, 、"二,,、"三"、 "四"、"五,,、"六"、 "七"、 "八"、"九,,和 "十"中哪一个的形状相近的关糸,将多个字件分成 10組,分别标为 第 1、2、3、4、5、6、7、8、9和 10组, 并且 1) For multiple characters that make up a Chinese character, follow the shape of each character and the number of the Chinese character "1 ,,," 2 ,, "" Three "," Four "," Five, "," Six "," Seven "," eight "," nine, "and Which of the ten is similar in shape, divides multiple fonts into 10 groups, and marks them as groups 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10, and
将其一个笔划在书写时穿过其他两个笔划的字件分到第 5组, 将其形状与阿拉伯数字" 1"和" 7"相近的字件分别分到第 1和 第 7组,  Divide the characters whose one stroke passes through the other two strokes while writing into Group 5, and the characters whose shapes are similar to the Arabic numerals "1" and "7" into Groups 1 and 7, respectively.
将其笔划构成类似多个出头形状的字件分到第 9組;  Divide the strokes into a group of characters that are similar to several early shapes;
2)将该 10个组的字件分别分配到键盘上的数字键 1至 9和 0 上,使得该 10个组中的每个字件与一个数字相应;  2) Assign the fonts of the 10 groups to the number keys 1 to 9 and 0 on the keyboard, respectively, so that each font in the 10 groups corresponds to a number;
3)将待编码的汉字分解成至少一个字件,通过键盘,将与所述 至少一个字件相应的数字输入到计算机中。 在上述对汉字进行编码并翰入到计算机中的方法中,"字件"是 指构成汉字的零件,每个字件由一个或多个笔划按传统书写顺序构 成。  3) The Chinese character to be encoded is decomposed into at least one character, and a number corresponding to the at least one character is input to a computer through a keyboard. In the above method of encoding Chinese characters and incorporating them into a computer, "characters" refer to the parts that make up a Chinese character, and each character consists of one or more strokes in the traditional writing order.
本发明的一个重要特征是, 字件的选取是以若干笔划按传统 书写顺序构成的结构的形状是否与 +个汉字数字 "一"至" +"及阿 拉伯数字" 1 "和" 7"形状相近为依据的。具有这种形状的若干笔划 组成的结构称为字件。 因此, 字件的形状具有很强的规律性,便于 记忆,拆字迅速,有利于提高汉字录入速度。  An important feature of the present invention is that the selection of characters is based on whether the shape of the structure composed of several strokes in the traditional writing order is similar to the shape of the + Chinese character numbers "一" to "+" and the Arabic numbers "1" and "7". As the basis. A structure made up of several strokes with this shape is called a font. Therefore, the shape of the characters has a strong regularity, which is easy to remember, and the characters can be quickly disassembled, which is conducive to improving the speed of Chinese character entry.
本发明的另一个重要特征是,对穿插结构組成的汉字的编码方 法。 对于交穿笔划构成的字件或字,采用只取穿插量的方法。 穿插 量是指笔划交叉点的个数。每满两个交叉点,取一个码 5(字件属前 述第 5组),对余下的交叉点,依此类推。若只剩一个交叉点,则取一 个码 0(字件属前述第 10组)。 例如, "巾"的码为 0,"中"的码为 5, "更"的码为 155,"丰"的码为 50,车的繁体"車"的码为 550。 这样, 降低了字件的选取量,解决了汉字拆分中的疑难点。  Another important feature of the present invention is an encoding method for Chinese characters composed of interspersed structures. For characters or characters made up of crossed strokes, only the amount of penetration is used. The interspersed amount is the number of intersections of the strokes. For every two intersections, a code 5 is taken (the word belongs to the aforementioned group 5), and the rest of the intersections, and so on. If there is only one intersection, a code 0 is taken (the word belongs to the aforementioned group 10). For example, the code for "Towel" is 0, the code for "Medium" is 5, the code for "More" is 155, the code for "Feng" is 50, and the code for traditional Chinese "car" is 550. In this way, the amount of font selection is reduced, and the difficult points in Chinese character splitting are solved.
发明人对数万个繁体、简体汉字的结枸进行了深入研究之后, 总结出汉字的字形与结构的特征及规则,规定了取码位置,从而克 服了汉字拆分中的多义性和字件復盖汉 少的局限性。 从 6763个 简体汉字和 4万多个繁休汉字以及 5000多个词组中,逸取了 89个 字件, 在现有技术中, 字件数最少,但却復盖了几万个汉字,不受 繁体、简体的限制,达到方法统一,字件同一的效果。这种方法也可 用于古体、异体汉字以及朝鲜文和日文汉字的编码输入。 所述 89 个字件分为 10组,并分别对应数字 1至 9和 0,能够对每个汉字做 出准确的数字编码,便于生字查找和计算机汉字录入等各种应用。 另外,字件在键盘上的分布使得组字頻率最高的字件尽可能分配到 中间的键上,以利用食指的灵敏度提高打字速度。 After intensive research on the knots of tens of thousands of traditional and simplified Chinese characters, the inventor summarized the characteristics and rules of the glyph and structure of the Chinese character, and prescribed the code location, thereby overcoming the ambiguities and characters in the Chinese character split This piece covers the limitations of Han Shao. From 6,763 simplified Chinese characters, more than 40,000 Chinese characters, and more than 5,000 phrases, 89 characters were escaped. In the prior art, the number of characters is the least, but it covers tens of thousands of Chinese characters. The restrictions of traditional and simplified languages achieve the same method and the same text. This method can also be used for coded input of archaic, variant Chinese characters, and Korean and Japanese characters. The 89 characters are divided into 10 groups and correspond to the numbers 1 to 9 and 0, respectively. It can make accurate digital encoding of each Chinese character, which is convenient for various applications such as new word search and computer Chinese character entry. In addition, the distribution of fonts on the keyboard allows the fonts with the highest grouping frequency to be allocated to the middle keys as much as possible, so as to increase the typing speed by using the sensitivity of the index finger.
另外,在将各字件分配到键盘上的各键时,首字件、中间字件和 尾字件三方兼頋,有利地降低了重码率。 本发明的汉字编码方法的 重码率仅为 6%,羊个汉字的平均码长为 3. 48, 包括词組时,羊个 汉字的码长平均为 2. 48。 附图概述  In addition, when each character is assigned to each key on the keyboard, the first character, the middle character, and the last character are both of the same, which advantageously reduces the recoding rate. The weighting rate of the Chinese character encoding method of the present invention is only 6%, and the average code length of sheep Chinese characters is 3.48. When the phrase is included, the average code length of sheep Chinese characters is 2.48. Overview of the drawings
通辻以下结合附图对本发明实施方式的描述,本发明的其他特 征和优点将会更加明显。  Other features and advantages of the present invention will be more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings.
图 1是根据本发明一种实施方式所逸 89个字件在键盘的 10 个数字键上的分布困;  FIG. 1 shows the distribution of 89 characters on 10 number keys of a keyboard according to an embodiment of the present invention;
图 2是根据本发明另一种实施方式将图 1中的 89个字件分配 到字母键上的分布困;  FIG. 2 is a distribution problem in which the 89 characters in FIG. 1 are assigned to letter keys according to another embodiment of the present invention; FIG.
图 3示出了本发明的方法所利用的汉字字型与结构。 本发明的最佳实施方式  Figure 3 shows the Chinese character font and structure used by the method of the present invention. Best Mode of the Invention
图 1是根据本发明一种实施方式所逸 89个字件在键盘的 10 个数字键上的分布困。字件逸取的一般規则是所取字件的形状与汉 字数字"一,,、 "二"、 "三"、"四 "、 "五"、"六 "、"七"、 "八"、"九 "和 "十"的形状相近。 这样,共逸取 10組字件。 对于某些组字件,上述 規则又略做变化,如下所述。  FIG. 1 shows the distribution of 89 characters on 10 number keys of a keyboard according to an embodiment of the present invention. The general rules for the escape of characters are the shape of the characters taken and the Chinese characters "1", "2", "3", "four", "five", "six", "seven", "eight", The shapes of "nine" and "ten" are similar. In this way, a total of 10 groups of characters are taken. For some groups of characters, the above rules are slightly changed, as described below.
1)在数字键 1上分配了笔划字件"一 1 "和部首字件"石,王, 山"。部首字件第一笔划的形状与汉字数字"一"或阿拉伯数字 "1"相 近。  1) On the number key 1, a stroke character "一 1" and a radical character "stone, king, mountain" are assigned. The shape of the first stroke of radical characters is similar to the Chinese character "一" or Arabic numeral "1".
2)在数字键 2上分配了笔划字件"二 II 'J ; J 4 w和部首字 件"月禾鱼舟"。 部首字件第一笔都是擻。 3)在数字键 3上分配了笔划字件"三 匚 ,和部首字件 2) The stroke characters "IIII'J;J4w" and the radical character "Yuehe Yuzhou" are assigned to the number key 2. The first stroke of the radical character is 擞. 3) Stroke characters "Sanji" and radical characters are assigned to number key 3.
Ά ,。部首字件是左側有点"、 "的部首。 "匚"的开口可以在上、 下、左、右四个方向上。  Ά ,. Radical characters are radicals with a little "," on the left. The opening of "匚" can be in four directions: up, down, left, and right.
4)在数字键 4上分配了笔划字件"口尸 - J1 ^ "和部首字件 " 日目臼口^ "。 字件的形状为四方形, "合"下面的 " ",有四个 角。  4) On the number key 4, the stroked characters "Kou corpse-J1 ^" and radical characters "Rimu Usui ^" are assigned. The shape of the font is a square, with "" below "", with four corners.
5)在数字键 5上分配了笔划字件"廿中牛 ^ 戈 "和部首字件" 虫才女"。 该組中字件规律可以总结为一笔穿二划,不取被穿的框 和折。  5) On the number key 5 are assigned stroked characters "廿 中 牛 ^ 戈" and radical characters "Zhongcai Nu". The rule of word in this group can be summarized as one stroke and two strokes, without taking the frame and fold.
6) 在数字键 6上分配了笔划字件"二 、 、"和部首字件"方 门^广 L"。 部首字件上的点"、 "在正中间。  6) On the number key 6, the stroke characters "two," and the radical character "Fangmen ^ guang L" are assigned. Dots on radicals are in the middle.
7)在数字键 7上分配了笔划字件 " Ί厂" ,,和部首字件 "马纟 TP "。 折角形状多。  7) On the number key 7 are assigned the stroke text "" 工 "and the radical character" 马 纟 TP ". Many corner shapes.
8)在数字键 8上分配了笔划字件"八人、 和部首字件" ^ 牵亇"。 字件为八字形,或人字形。  8) On the number key 8, the stroked characters "eight persons, and radical characters" "^ 亇" are assigned. The characters are a figure of eight, or a herringbone.
9)在数字键 9上分配了笔划字件"小小 ^ ^、 ^ "和部首字 件 火† "。 "小"形笔划枸成类似多个出头的形状。  9) On the number key 9, the stroke characters "Little ^ ^, ^" and radical characters "Fire" are assigned. "Small" shaped strokes are shaped like multiple prominences.
10)在数字键 0上分配了笔划字件"十巾 十弋"和部首字件 "土木 "。 其特点是两笔相交成十字形。  10) On the number key 0, the stroked characters "Ten Towels" and the radical characters "Civil" are assigned. Its characteristic is that two strokes intersect into a cross shape.
困 2是根据本发明另一种实施方式将图 1中的 89个字件分配 到字母键上的分布困,其中,将图 1中数字键上所分配的字件分配 到在键盘上与数字键相应成一斜列的几个字母键上。 其中,数字键 0上的字件分配到字母键 P和 M上。  Sleepy 2 is a distribution of the 89 characters in FIG. 1 assigned to the letter keys according to another embodiment of the present invention, wherein the characters assigned on the numeric keys in FIG. 1 are assigned to the keyboard and the numbers. The keys correspond to several letter keys in a sloping column. Among them, the characters on the number key 0 are assigned to the letter keys P and M.
在 Q键上分配"一、石",  Assign "一 、 石" on the Q key,
在 A键上分配"王",  Assign "King" on the A key,
在 Z键上分配"山、 1 "。 在 W键上分配" A J ",  Assign "Mountain, 1" on the Z key. Assign "A J" on the W key,
在 S键上分配"二月禾",  Assign "February Wo" on the S key,
在 X键上分配"鱼舟 II Ί J "。 在 E键上分配" 3 U", Assign "Fish Boat II Ί J" on the X key. Assign "3 U" on the E key,
在 D键上分配"三 i门 \ ,,  Assign "three i-gates \" on the D key,
在 C键上分配" ΐ匚 "。 在 R键上分配"日]^尸",  Assign "ΐ 匚" on the C key. Assign "日] ^ corpse" on the R key,
在 F键上分配"口
Figure imgf000008_0001
Assign "port" on F key
Figure imgf000008_0001
在 V键上分配"目臼□"。 在 T键上分配"虫午",  Assign "mesh □" to the V key. Assign "Zhong Wu" on the T key,
在 G键上分配"中才廿",  Assign "Medium Only" on the G key,
在 B键上分配"女 "。 在 Y键上分配"广 . "、方 ",  Assign "Female" on the B key. Assign "广.", 方 "on the Y key,
在 H键上分配"门 ίΤ、 ",  Assign "Door ίΤ" on the H key,
在 Ν键上分配" L "。 在 U键上分配"马 Ί " 1ί",  Assign "L" on the N key. Assign "马 Ί" 1ί "on the U key,
在 J键上分配"厂 L> 纟 p,,,  Assign "Factory L> 纟 p" on the J key,
在 M键上分配"巾木 "。 在 I键上分配"八 牵",  Assign "Towel" on the M key. Assign "eight pulls" on the I key,
在 K键上分配"人 -亇"。 在 O键上分配" 4i、 ",  Assign "People-亇" to the K key. Assign "4i," on the O key,
在 L键上分配"小 火† ,。 在 P键上分配"十土于 ϊ"。 通过困 1和图 2的分配,使得一个字件既对应一个数字,也对 应一个字母。这样,一个汉字的编码既可以是数字码,也可以是字母 码。所以,输入汉字时,既可以輸入数字码,也可以输入字母码。 下面对本发明的编码方法进行详细说明。 在以下的描述中,汉 字的编码用数字码表示,相应的字母码放在括号中。 一.字件的编码方法。 Assign "Small Fire †" on the L key. Assign "Ten Shi Yu ϊ" on the P key. Through the assignment of sleepy 1 and Figure 2, a font corresponds to both a number and a letter. In this way, a Chinese character The encoding can be either a numeric code or an alphabetic code. Therefore, when entering Chinese characters, you can enter either a numeric code or an alphabetic code. The encoding method of the present invention is described in detail below. In the following description, the encoding of Chinese characters is represented by a numeric code, and the corresponding alphabetic code is placed in parentheses. I. Coding method of fonts.
1.文字字件,由单笔划和小字件組成,按单笔划与小字件取码。 如:  1. Text type, consisting of single stroke and small type, code based on single stroke and small type. Such as:
"石"分解为 "一, J ,口",如困 1和图 2所示,分别对应数字 1、 2、 4和字母 Q、W、F,则"石"的编码为 124(QWF)。  "Shi" is decomposed into "One, J, Mouth", as shown in Figure 1 and Figure 2, which correspond to the numbers 1, 2, 4 and the letters Q, W, and F, respectively. The code of "Shi" is 124 (QWF).
" 禾"分解为 ,木 ",分别对应数字 2、0和字母 W、M,则 "禾"的编码为 20(WM)。  "He" is decomposed into "Wood", which corresponds to the numbers 2, 0 and the letters W and M, respectively. The code of "禾" is 20 (WM).
2.非文字字件,按羊笔划取码,不足四码的加 2或 W键输入, 超过四码的,取第一、二、三和最后一码。 如:  2. For non-text characters, use the pen to draw the code. If it is less than four codes, add 2 or W key. If it exceeds four codes, take the first, second, third, and last codes. Such as:
" "分解为"、 、 ",编码为 662 HHW) ,  "" Decomposed into ",,", encoded as 662 HHW),
"才 "分解为 "一 j一",编码 1212(QXQW),  "才" is decomposed into "one j one", encoding 1212 (QXQW),
分解为 "、 7 1 / 、 ",取第一、二、三和最后一码,编码为 6716(HUZH)。 二.文字编码方法。  Decomposed into ", 7 1 /,", take the first, second, third, and last code, and encode as 6716 (HUZH). 2. Text encoding method.
对文字的编码,按文字的结构和字形进行。 本发明所用的文字 结构和字形示于图 3。 如图 3所示,可将汉字按结构分为羊结构汉字、二结构汉字、三 结构汉字、四结构汉字以及四结构以上汉字。  The text is encoded according to the structure and glyph of the text. The character structure and glyph used in the present invention are shown in FIG. As shown in Figure 3, Chinese characters can be divided into sheep-structured Chinese characters, two-structured Chinese characters, three-structured Chinese characters, four-structured Chinese characters, and four-structured Chinese characters or more according to their structure.
单结构汉字,如:上,耳,大,天;  Chinese characters with single structure, such as: upper, ear, big, heaven;
二结构汉字,如:雷,定,吉,彭,制,活;  Chinese characters with two structures, such as: Lei, Ding, Ji, Peng, system, live;
三结构汉字,如:意,茗,莫,糊,凇;  Three-structure Chinese characters, such as: Italian, Chinese, Mo, Chinese, and Chinese;
四结构汉字,如:富,蒿,幕,摊,雌;  Four structure Chinese characters, such as: rich, artemisia, curtain, stall, female;
四结构以上汉字,如:青,棻。 如图 3所示,又可将汉字按字形分为竖形汉字、横形汉字、混合 形汉字和框形汉字。 Chinese characters with more than four structures, such as: green, 棻. As shown in Figure 3, Chinese characters can be divided into vertical Chinese characters, horizontal Chinese characters, and mixed characters according to the shape of the characters. Chinese characters and frame Chinese characters.
竖形汉字,如:雷,定,吉,意,茗,莫,富,蒿,幕,膏,蒿。对于某个 汉字,从上至下,各结构分别称为第一结构、第二结构、第三结 枸…。  Vertical Chinese characters, such as: Lei, Ding, Ji, Yi, Yi, Mo, Fu, Artemisia, Curtain, Paste, Artemisia. For a certain Chinese character, from top to bottom, each structure is called a first structure, a second structure, a third knot, etc.
横形汉字,如:彭,制,活,糊,膨,凇,摊,雌。 对于某个汉字, 从 左至右,各结构分别称为第一结构、第二结构、第三结构、. . .。  Horizontal Chinese characters, such as: Peng, system, live, paste, swell, 凇, stall, female. For a certain Chinese character, from left to right, each structure is called the first structure, the second structure, the third structure,....
混合形汉字,一般至少为三结构汉字,如:赞,霸,智,宛,留,蒲, 堡;四结 的混合形汉字,如:悭,琬,雍, ¾,靠;四结构以上的混合 形汉字,如:器,翦,嚼,綴,这种类型以繁体汉字为多。 对文字进行编码, 总的方法是,按笔划书写顺序依次进行。 如 果笔划能够組成字件,则取为字件。如果笔划不能組成字件,则只取 笔划。 用尽可能多的笔划组成字件。  Mixed Chinese characters are generally at least three-structured Chinese characters, such as: Zan, Ba, Zhi, Wan, Liu, Pu, Bao; four-knot mixed Chinese characters, such as: 悭, 悭, Yong, ¾, lean; Mixed-shaped Chinese characters, such as: qi, 翦, chew, and suffix. This type is mostly traditional Chinese characters. The general method for encoding text is to follow the order of stroke writing. If the stroke can form a typeface, it is taken as typeface. If the strokes cannot form a typeface, only the strokes are taken. Compose the text with as many strokes as possible.
下面,对文字的编码方法进行详细说明。  Hereinafter, a method of encoding characters will be described in detail.
1,单结构汉字,按照单笔划和小字件取码。 如: 1, single structure Chinese characters, code according to a single stroke and small print. Such as:
"耳"分解为 "一 I二十",编码为 1120(QZSP)。  The "ear" is broken down into "one I twenty" and coded as 1120 (QZSP).
"上"分解为" 1一一 ",编码为 111(ZQQ)。  "上" is decomposed into "1 to 1" and coded as 111 (ZQQ).
2.横形或竖形汉字。 2. Horizontal or vertical Chinese characters.
橫形或坚形的汉字, 都有两个或两个以上的结构,按字形(横 形或竖形)的不同和结构的多少最多取四个码。  The horizontal or rigid Chinese characters have two or more structures. According to the difference of the glyph (horizontal or vertical) and the number of structures, a maximum of four codes are used.
1)二结枸汉字,每一结构各取二码。若某一结构不足二码,则另 一结枸多取一码。 1) Two knot Chinese characters, each structure takes two yards. If a structure is less than two yards, then another knot will take one more yard.
橫形汉字,第一结构和第二结构各取首尾两码。  Horizontal Chinese characters, the first structure and the second structure each take two codes.
例如, "舷"的第一结构为"鼻",第二结构为"查"。 第一结构" 鼻"分解为 " J 目. . . 1 ",首尾字件为" ! 1 ",则取首尾两码为 21 (WZ)„ 第二结构"查"分解为 "木日一",首尾字件为"木一",则取首 尾两码为 01(MQ)。 因此, "齄"的编码为 2101(WZMQ)。 又如: For example, the first structure of "side" is "nose" and the second structure is "cha". The first structure "nose" is decomposed into "J head ... 1", the first and last characters are "! 1", then the first and last two codes are 21 (WZ). The second structure "check" is decomposed into "Muichiichi" , The first and last word is "Mu Yi", then the first and last two codes are 01 (MQ). Therefore, the code of "齄" is 2101 (WZMQ). Another example:
"鞋"的编码为 5000( GPPP) ,"额 "的编码为 6418(YFQK), "频"的编码为 1218(ZWQK),"鹤"的编码为 0121(PQWQ)。 The code for "shoes" is 5000 (GPPP) and the code for "amount" is 6418 (YFQK). The encoding of "Frequency" is 1218 (ZWQK), and the encoding of "Crane" is 0121 (PQWQ).
这类汉字最多, 约占汉字总数的 60%。 下面的几个例子说明, 在某一结构不足二码时,另一结枸多取一码,即另一结构取三码: These Chinese characters are the largest, accounting for about 60% of the total number of Chinese characters. The following examples show that when a certain structure is less than two yards, another knot takes one more yard, that is, another structure takes three yards:
"拮,,:0014(MPQF) ; "Staff ,: 0014 (MPQF);
"结 ":7014(JPQF) ;  "Knot": 7014 (JPQF);
"拼 ":585 (GIG) ;  "Puzzle": 585 (GIG);
"汁 ":30(DP) ;  "Juice": 30 (DP);
"悭 ":9270(LXUP:)。  "悭": 9270 (LXUP :).
竖形汉字, 第一结枸取首次两码, 第二编码取首尾两码。  For vertical Chinese characters, take the first two yards for the first knot, and take the first two yards for the second.
注意,这里之所以第一结枸取首次两码而不取首尾两码,是因 为,这样取码符合人们的书写习惯,并且能够降低重码率。  Note that the first knot here takes the first two yards instead of the first and last yards because the code is in line with people's writing habits and can reduce the re-coding rate.
例如, "雷"的第一结构为"雨",第二结枸为"田"。 第一结枸" 雨"分解为 "一十 . . . ",取首次两码为 10(QP)。 第二结构"田"分解 为"□十",取首尾两码为 40 (VP)。 因此, "雷"的编码为 1040 For example, the first structure of "lei" is "rain", and the second knot is "field". The first citrus "rain" is broken down into "one ten ...", and the first two yards are 10 (QP). The second structure "Tian" is decomposed into "□ 十", and the first and last two yards are 40 (VP). So the code for "Ray" is 1040
(QPVP)。 (QPVP).
又如, " 吉"的编码为 OU (PQF ),"是 "的编码为 4118 ( RQZK ) , " 音"的编码为 684 (NIR ), "家 "的编码为 6129 (YQWU。  For another example, the code for "Gee" is OU (PQF), the code for "Yes" is 4118 (RQZK), the code for "Tone" is 684 (NIR), and the code for "Home" is 6129 (YQWU.)
2) 三结构汉字,总的编码原则是:某一结枸取两码,其他两个 结构各取一码, 其中第一结构优先取两码,若第一结构不足两码, 则第二结构取两码,依此类推。 2) Three-structure Chinese characters. The general coding principle is: one code is used for two knots, and the other two structures are used for one code. The first structure is used for two codes first. If the first structure is less than two codes, the second structure is used. Take two yards, and so on.
若某一结构取一码,则该结构为最后一个结构时取尾码,为其 他结构时取首码。  If one code is used for one structure, the last code is used for the last structure, and the first code is used for other structures.
若最后一个结枸取两码,则取首尾两码。  If the last knot takes two yards, take the first two yards.
若其他的结枸取两码,则对于横形汉字,取首尾两码,对于竖形 汉字,取首次两码。 例如,横形汉字"糊",其第一结构为"米",第二结构为"古", 第三结构为"月"。  If the other knots take two yards, for horizontal Chinese characters, take the first two characters, and for vertical Chinese characters, take the first two characters. For example, the horizontal Chinese character "paste" has a first structure of "meter", a second structure of "old", and a third structure of "month".
首先,第一结构"米",首尾两个字件为多个出头的形状,取首尾 两码为 99(LL)。 然后, 第二结枸"古",第一个字件为"十",取首码 为 0(P)。 最后,第三结构"月",最后一个字件为"二" ,取尾码为 2 (S)。 这样, "糊"的编码为 9902(LLPS)。 类似地, First of all, the first structure is "meter". The two characters at the beginning and end are multiple shapes. The two codes are 99 (LL). Then, the second knot is "old", the first word is "ten", and the first code is 0 (P). Finally, the third structure is "month", the last word is "two", and the ending code is 2 (S). Thus, the code for "batter" is 9902 (LLPS). Similarly,
"膨"的编码为 2082(SPIW)。  The "swell" code is 2082 (SPIW).
"凇"的编码为 3086(EMIH)。  The code for "凇" is 3086 (EMIH).
又如, 竖形汉字"意",其第一结构为"立",第二结构为"日", 第三结构为"心"。  For another example, the vertical Chinese character "meaning" has a first structure of "Li", a second structure of "Sun", and a third structure of "Heart".
首先, 第一结枸"立" , 第一和第二个字件为 二,,和" ',取 首次两码为 68或 NI。 然后,第二结构"日"本身为一个字件, 取首 码为 4或1。 第三结枸"心"最后一个字件为"、 ",取尾码为 6或 First of all, the first knot "li", the first and second characters are two, and "", take the first two yards as 68 or NI. Then, the second structure "ri" itself is a character, take The first code is 4 or 1. The last word of the third knot "heart" is "," and the last code is 6 or
H。这样, "意"的编码为 6846或 NIRH。 又如, H. In this case, the code for "Is" is 6846 or NIRH. Another example,
"虎"的编码为 1103(ZQMD),其第一结枸取两码。  "Tiger" has a code of 1103 (ZQMD), and its first knot takes two yards.
"蒌"的编码为 9905(OLPB),其第二结构取两码。  The code for "蒌" is 9905 (OLPB), and its second structure takes two codes.
"莫"的编码为 9408 ORPK),其第三结构取两码。  The code for "Mo" is 9408 ORPK), and its third structure takes two yards.
3)四结构汉字,每个结构各取一码。 3) Four-structure Chinese characters, one code for each structure.
例如, "富"的第一、第二、第三、第四结枸分别为" 一口田",各 取一码,则"富"的编码为 6U4(YQFV)。 类似地,  For example, the first, second, third, and fourth knots of "Fu" are "Yikoutian", each takes one yard, and the code of "Fu" is 6U4 (YQFV). Similarly,
"高"的编码为 6434( NFDF) 。 "景"的编码为 4649(RNFL) 0 "摊"的四个结构" 4又 ( i "各取一码为 5781(GUKQ)。 同 理,"雌"的编码为 138KZDKQ:)。 The "high" encoding is 6434 (NFDF). The code of "jing" is 4649 (RNFL) 0 "the four structures of the booth""4" (i "each takes a code of 5781 (GUKQ). Similarly, the code of" female "is 138KZDKQ :).
4)四个以上结构的字,前三个结构取首码,最后一个结构取尾 码。 4) words with more than four structures, the first three structures take the first code, and the last structure take the last code.
例如, " ,,第一个结构为"广",第二个结构为 " 4 ",第三个结 构为" ",最后一个结构为 "月",则前三个结构各取首码为 688 (YKK) , 最后一个结构取尾码为 2(S)。这样, " "的编码为 6882 (YKKS)。  For example, "", the first structure is "Guang", the second structure is "4", the third structure is "", and the last structure is "Month", then the first three structures each take the first code as 688 (YKK), the last structure has a tail code of 2 (S). Thus, the encoding of "" is 6882 (YKKS).
又如, "佳谁 "的前三个结枸为" ί圭^ " ,最后一个结构为" ", 前三个结构各取首码为 803(KPC),最后一个结构取尾码为 1(Q)。 这样, "佳谁 "的编码为 8031(KPCQ)。 3·混合形汉字,又分为上横下竖形汉字,如"赞"、 "智"和"堡", 和上竖下横形汉字,如"霸,,和"宛,,。 For another example, the first three knots of "Jia Who" are "ί Gui ^", the last structure is "", the first three structures each have a first code of 803 (KPC), and the last structure has a last code of 1 ( Q). In this way, the code of "Jia Who" is 8031 (KPCQ). 3. Mixed Chinese characters are divided into upper and lower vertical Chinese characters, such as "Zan", "Zhi" and "Fort", and upper and lower horizontal Chinese characters, such as "Ba," and "Wan,".
1)上横下竖形汉字,上部按橫形汉字各取两个结构的首码,下 部按竖形汉字取首尾码。 对于下部只有一码的, 上部要多取一码, 即上部取三码。 1) The upper and lower vertical Chinese characters. The upper part of the horizontal Chinese character takes the first two codes of the structure, and the lower part of the vertical Chinese character takes the first and last code. For those with only one yard in the lower part, take one more yard in the upper part, that is, three yards in the upper part.
例如, "赞",上部为"先先",下部为"贝"。 上部"先先"由两个 结构"先"和"先"组成,各取首码为 22(WW)。 下部"贝"取首尾码为 38(DK)。 这样, "赞"的编码为 2238(WWDK)。  For example, "Like", the upper part is "first", and the lower part is "shell". The first "first" is composed of two structures "first" and "first", each of which is 22 (WW). The lower part of the shell is 38 (DK). The code for "Like" is 2238 (WWDK).
" 紧"的编码为 2779( XUJL) ,"留 "的编码为 7740GUVP) , "盗"的编码为 3842(EKFX)。  The code for "tight" is 2779 (XUJL), the code for "reserved" is 7740GUVP), and the code for "pirate" is 3842 (EKFX).
又如, "智",上部为 "矢口 ",下部为"日"。 因为下部"日,,只有 一码,所以上部"矢口"要多取一码。因此,上部"矢口"取三码为 804 (KPF)。 下部"日 "取一码为 4 (R)。 这样, "智"的编码为 8044
Figure imgf000013_0001
For another example, "Chi" has "Yaguchi" in the upper part and "Sun" in the lower part. Because the bottom "day" has only one yard, the upper "Yaguchi" takes one more yard. Therefore, the upper "Yaguchi" takes three yards to 804 (KPF). The lower "day" takes one yard to 4 (R). The code for "Chi" is 8044
Figure imgf000013_0001
同理, "堡",上部4 呆"取三码 840 KFM),下部 "土"取一码 0(P) o 类似地," t"的编码为 1174(ZQUV:)。 In the same way, "Fort", the upper 4 stays take three yards 840 KFM), and the lower "soil" take one yard 0 (P). Similarly, the code of "t" is 1174 (ZQUV :).
2)上竖下橫形汉字,上部按竖形汉字取首次两码,下部按横形 汉字各取尾码。对于上部只有一码的,下部要多取一码,即下部取三 码。 2) Vertically up and down horizontal Chinese characters, the upper two according to the vertical Chinese characters take the first two yards, and the lower two according to the horizontal Chinese characters each take the last code. For those with only one yard in the upper part, one more yard in the lower part, that is, three yards in the lower part.
例如, "霸 ",上部为"雨",下部为 "革月"。 上部"雨"取首次两码 为 10(QP)。 下部"革月"各取尾码为 02(PS)。 这样, "霸"的编码为 1002(QPPS)。  For example, "Ba", the upper part is "Rain" and the lower part is "Leather Moon". The upper "rain" takes the first two yards as 10 (QP). The lower part of "Leather Moon" has a tail code of 02 (PS). In this way, the "Ba" code is 1002 (QPPS).
"聂"的编码为 llOO( QZMM) ,"羁 "的编码为 4207(FXPU)。 又如, "宛",上部为" ",下部为"夕已"。 Θ为上部" 只有 一码, 所以下部" 夕 要多取一码。 因此,上部" "取一码 6 (Y)。 下部"夕 Οα"馭三码 263(WHD)。 这样, "宛"的编码为 6263 (YWHD)。  The code for "Nie" is llOO (QZMM), and the code for "Jie" is 4207 (FXPU). For another example, "wan", the upper part is "" and the lower part is "Xi Ji". Θ is only one yard for the upper part, so one more yard for the lower part. Therefore, the upper "" takes one yard of 6 (Y). The lower part of the "Even 〇α" Yusan Yard 263 (WHD). Thus, the "Wan" code is 6263 (YWHD).
同理, "蒲",上部 " ^ "取一码 9(0),下部"浦,,取三码 365( D H T ) 。这样, "蒲"的编码为 9365(ODHT) Similarly, "Pu", the upper part "^" takes one yard 9 (0), the lower "Pu", takes three yards 365 (DH T). In this way, the code of "Pod" is 9365 (ODHT)
"寂"的编码为 6190(YZLM)。  The code for "Silent" is 6190 (YZLM).
4.框形汉字,包括全框和半框两种情况,编码时先取框的码,再 取内部结构的码。 例如: 4. Frame-shaped Chinese characters, including both full-frame and half-frame situations. When coding, first take the frame code and then the internal structure code. E.g:
"国"的编码为 416 VAH) ,  "Country" is 416 VAH),
"圃"的编码为 4655 VHTT) ,这里"甫"的编码之所以为 655 (HTT),请详见以下 "三.特殊情况"的说明,  The code of "Pu" is 4655 VHTT). The reason why the code of "Fu" is 655 (HTT), please refer to the description of "III. Special Situations" below.
"同"的编码为 314(DQF),  Identical code is 314 (DQF).
"医"的编码为 3808(CKPK),  The code for "Doc" is 3808 (CKPK).
"成"的编码为 7657(JHBU)。 三.特殊情况  The "encoding" is 7657 (JHBU). Three. Special circumstances
在上述各编码方法中,对各结构的取码一般是按照汉字的传统 笔划书写顺序进行的。 但是, 为了编码方便,有下面几个特殊情 况。  In each of the above coding methods, the coding of each structure is generally performed according to the traditional stroke writing order of Chinese characters. However, for coding convenience, there are several special cases below.
1.穿插笔划組成的结构。 1. Structure composed of interspersed strokes.
这 种 穿 插结构的特点是结构中含有交穿的笔划。 例如, "中"、"巾 "、"央"、 "甲"、 "由"、"曲 "等字中,都有一竖笔划" I "或两 竖笔划" II "穿过其他的横笔划 "一"。由于各笔划组成字件时有不同 的拆分方法,容易引起混乱,造成取码不唯一。  This interstitial structure is characterized by intersecting strokes in the structure. For example, in the words "Zhong", "Jin", "Yang", "A", "Yu", "Qu", there are one vertical stroke "I" or two vertical strokes "II" crossing other horizontal strokes " One". Because each stroke has different splitting methods when it is composed, it is easy to cause confusion and make the code retrieval not unique.
对于这种穿插结构的汉字, 只考虑穿插量( 即,交叉点的个 数),被穿的方框、半框、折角都不取。在取码时,每满两个交叉点,取 一个码 5,对余下的交叉点,依此类推。 若只剩下一个交叉点,则取 一码 0。 例如,  For Chinese characters with this interspersed structure, only the intersected amount (that is, the number of intersections) is considered, and the box, half box, and chamfer that are interspersed are not taken. When fetching a code, a code 5 is taken for every two intersections that are full, and so on for the remaining intersections. If there is only one intersection, then take a code of 0. E.g,
"巾"有一个交叉点,取码为 0;  "Towel" has an intersection, and the code is 0;
"中"有 2个交叉点,取码为 5 ;  "Zhong" has 2 intersections, and the code is 5;
"更"分解为上下两部分,上部取码为 1,下部有 4个交叉点, 取两码为 55,则"更"的编码为 155 ;  "More" is decomposed into two parts, the upper part is coded as 1, the lower part has 4 intersections, and the two coded as 55, the "More" code is 155;
"丰"有 3个交叉点,先取一码为 5,剩下一个交叉点,取码为 0, 则"丰"的编码为 50; "Feng" has 3 intersections, first take a yard as 5, and the remaining one, take the yard 0. The code of "Feng" is 50;
"率"(车的繁体字),有 5个交叉点,先取一个码 5 ,剩下 3个交 叉点, 再取一个码 5,剩下一个交叉点,最后取一个码 0,则"卓"的 编码为 550。  "Rate" (traditional Chinese characters for cars), there are 5 intersections, first take a code 5 and the remaining 3 intersections, then take a code 5 and the remaining intersection, and finally take a code 0. The encoding is 550.
另外,穿插后,下部有其他笔划连接时,除了按上述方法取码之 外,对其他笔划,也要取码。 例如- "大"的编码为 08(PK),  In addition, when other strokes are connected at the lower part after the insertion, in addition to the above method, the other strokes must also be coded. For example-"Big" is encoded as 08 (PK),
"乇"的编码为 203(WMD),  "乇" is encoded as 203 (WMD).
"未"的编码为 59(TL),  "Un" is encoded as 59 (TL).
"于"的编码为 102(QPX),  The code for "于" is 102 (QPX).
"子"的编码为 702(UPX)。  The code for "child" is 702 (UPX).
如果方框未被穿透,也取码。 例如:  If the box is not penetrated, the code is also taken. E.g:
"甲"的编码为 45(VT) ,  The code for "A" is 45 (VT),
"由"的编码为 54(TV),  The code for "you" is 54 (TV).
"曲"的编码为 554(GGV)。  The code for "Qu" is 554 (GGV).
如果横笔划"一"和竖笔划" 1 "都有交穿,则,横竖笔划都取码。 例如:  If both the horizontal stroke "a" and the vertical stroke "1" intersect, then both horizontal and vertical strokes are coded. E.g:
"君"的编码为 554(GBF),  The code for "Jun" is 554 (GBF).
"聿"的编码为 5550(TGTP)。  The code for "聿" is 5550 (TGTP).
2.笔划顺序调整。 2. Stroke order adjustment.
有少量汉字的笔划顺序最后一笔落在上边或左側,为了符合一 般笔划顺序(从上至下、从左至右),便于非中文用户翰入汉字,在编 码时,将这些个别笔划顺序略作调整。 例如,  The stroke order of a small number of Chinese characters falls on the top or left. In order to conform to the general stroke order (from top to bottom, left to right), it is convenient for non-Chinese users to enter Chinese characters. When encoding, the individual stroke order is omitted. Make adjustments. E.g,
"犬"的编码为 608(HPK) ,  The code for "dog" is 608 (HPK),
"甫"的编码为 655(HTT),  "Fu" is coded as 655 (HTT).
"子"的编码为 702OJPX) ,  "Child" is coded as 702OJPX),
"成"的编码为 7657(JHBU),  The "encoding" is 7657 (JHBU),
"北"的编码为 2123(SZWD),  The code for "North" is 2123 (SZWD).
"适"的编码为 6204(NWPF),  The appropriate encoding is 6204 (NWPF).
"通"的编码为 6765(NUHT), "建"的编码为 7050(UMTP)。 四.词组编码方法 "Pass" is coded 6765 (NUHT), The code for "Build" is 7050 (UMTP). Four. Phrase coding method
词組分为二字词组、三字词組、四字词组和多字词组。 1.二字词组,每个汉字各取首尾码。  Word groups are two-word phrases, three-word phrases, four-word phrases, and multi-word phrases. 1. Two-character phrases, each Chinese character takes the first and last code.
例如, "科学",其中 "科"取首尾两码为 20(SP),"学 "取首尾两 码为 92CLX),则"科学"的编码为 2092(SPLX)。  For example, "Science", where "Subject" is 20 (SP), and "Learning" is 92CLX, then "Science" is 2092 (SPLX).
又如, "劳动",其中 "劳"取首尾两码为 97(OU),"动"取首尾 两码为 27(SU),则"劳动"的编码为 9727(OUSU  For another example, "Labor", where "Labor" takes the first two characters of the first and last code as 97 (OU), and "Move" takes the first two characters of the first and last code of 27 (SU), then the code of "Labor" is 9727 (OUSU
2·三字词组, ^一个字取首尾两码,第二、第三个字各取首码。 例如, "科学院",其中"科"取首尾两码为 20(SP),"学 "取首码 为 9(L),"院 "取首码为 7(U),则"科学院 "的编码为 2097(SPLU)。 2. Three-character phrase, ^ One character takes the first two characters, and the second and third characters take the first character. For example, "Academy of Sciences", where "Technology" takes the first two characters of 20 (SP), "Study" takes the first character of 9 (L), and "Institute" takes the first character of 7 (U). It is 2097 (SPLU).
3.四字词組,每个字取首码。 3. Four-character phrases, each word takes the first code.
例如,"计划生育",编码为 3626(CHWN:)。  For example, "Family Planning" is coded 3626 (CHWN :).
4.多字词组或短语,前三个字各取首码,最后一个字取首码。 例如, "中华人民共和国",前三个字"中华人 "各取首码为 5884. For multi-word phrases or phrases, the first three characters are given the first code and the last one is given the first code. For example, "People's Republic of China", the first three characters "Chinese" each have a first code of 588
(GKK),最后一个字 "国"取首码为 4(V),则"中华人民共和国"的 编码为 5884(GKKV)。 下面介绍另一种词组编码方法,它仅适用于字母码,不适用于 数字码。 (GKK), the last word "国" is prefixed with 4 (V), then the code of "People's Republic of China" is 5884 (GKKV). The following introduces another phrase encoding method, which is only applicable to letter codes and not to numeric codes.
1.二字词组 Two-word phrase
编码 =第一个字首码 +命令键 +第二个字首、尾码。  Code = first prefix code + command key + second prefix and end code.
其中,命令键是根据前一码而从一组字母中逸取的,与文字结 拘无关。  Among them, the command key is escaped from a group of letters according to the previous code, and has nothing to do with the text constraint.
二字词组编码中命令键的逸取规则如下: 若前一码是 Q、A或 Z,则命令键为 A ; The escape rules for command keys in two-word phrase encoding are as follows: If the previous code is Q, A or Z, the command key is A;
若前一码是 W、S或 X,则命令键为 X;  If the previous code is W, S or X, the command key is X;
若前一码是 E、D或 C,则命令键为 C;  If the previous code is E, D or C, the command key is C;
若前一码是 R、F或 V,则命令键为 V;  If the previous code is R, F or V, the command key is V;
若前一码是 T、G或 B,则命令键为 B;  If the previous code is T, G or B, the command key is B;
若前一码是 Y、H或 N,则命令键为 Y;  If the previous code is Y, H or N, the command key is Y;
若前一码是 U或 J,则命令键为 J;  If the previous code is U or J, the command key is J;
若前一码是 I或 K,则命令键为 I;  If the previous code is I or K, the command key is I;
若前一码是 O或 L,则命令键为 0 ;  If the previous code is O or L, the command key is 0;
若前一码是 P或 M,则命令键为 M。  If the previous code is P or M, the command key is M.
以"科学"为例,  Take "science" as an example,
第一个字"科"的首码为 S (禾),  The first word "科" starts with S (禾),
根据上述规则, 命令键为 X,  According to the above rules, the command key is X,
第二个字"学"的首码为 L,尾码为 X,则  The second word "learning" starts with L and ends with X, then
"科学 "的编码为 SXLX。  "Science" is coded as SXLX.
又如, "劳动",第一个字"劳"的首码为 0,命令键为 0,第二个 字"动"的首尾码为 SU,则"劳动"的编码为 OOSU。  For another example, "Labor", the first word of "Labor" is 0, the command key is 0, and the second and last word of "Dong" is SU, and the code of "Labor" is OOSU.
2.三字词組 2.Three-word phrases
编码 =第一个字首码 +第二个字首码 +命令键 +第三个字首 码。  Code = first prefix code + second prefix code + command key + third prefix code.
其中,命令键的逸取规则为:  Among them, the escape rule of the command key is:
若前一码为 Q、A、Z、W、S或 X,则命令键为 A;  If the previous code is Q, A, Z, W, S or X, the command key is A;
若前一码为 E、D、C、R、F或 V,则命令键为 V;  If the previous code is E, D, C, R, F, or V, the command key is V;
若前一码为 T、G、B、Y、H或 N,则命令键为 Y ;  If the previous code is T, G, B, Y, H or N, the command key is Y;
若前一码为 U、J、I或 K ,则命令键为 I;  If the previous code is U, J, I, or K, the command key is I;
若前一码为 0、L、P或 M,则命令键为 0。  If the previous code is 0, L, P, or M, the command key is 0.
例如, "組织部",第一个字"组"的首码为 Κ ),第二个字"织" 的首码为 Κ ),根据上述规则,命令键为 I,第三个字"部"的首码 为 Ν ( ),则"組织部 "的编码为 JJIN。  For example, "organization department", the first code of the first word "group" is K), and the first code of the second word "weaving" is K), according to the above rules, the command key is I, and the third word "" The first code of "" is Ν (), then the code of "Organization Department" is JJIN.
又如, "科学院",第一个字"科"的首码为 S (禾),第二个字"学" 的首码为 L,命令键为 0,第三个字"院"的首码为 υ( ι ,则"科学 院"的编码为 SLOU。 For another example, "Academy of Science", the first code of "科" is S (禾), and the second word "学" The first code is L, the command key is 0, and the first character of the third word "yuan" is υ (ι, then the code of "Academy of Science" is SLOU.
3.四字词组 3. Four-character phrases
编码 =第一个字首码 +第二个字首码 +第三个字首码 +命令 键。  Code = first prefix code + second prefix code + third prefix code + command key.
其中,命令键的逸取规则与三字词組编码中命令键的逸取规则 相同。 四字词组的编码与第四个字无关。  Among them, the escape rule of the command key is the same as the escape rule of the command key in the three-word phrase encoding. The encoding of the four-character phrase has nothing to do with the fourth character.
例如, "计划生育"前三个字的首码分别为 CHW,根据上述规 则,由 W得出命令键为 A。 则"计划生育"的编码为 CHWA。  For example, the first three digits of "family planning" are CHW respectively. According to the above rules, the command key derived from W is A. The "family planning" code is CHWA.
又如, "教育工作"的编码为 PNQA,"中囯北京"的编码为 GVSA。  For another example, the code for "educational work" is PNQA, and the code for "Beijing, China" is GVSA.
4.四字以上词組或短语 4. Four or more words or phrases
前三个字各取首码,最后一个字取首码。  The first three characters take the first code, and the last one takes the first code.
例如, "国 贸易促进委员会",前三个字"国 贸 "各取首码为 VUJ,最后一个字 "会"取首码为 K,则整个词組的编码为 VUJK 又如, "中华人民共和国"的编码为 GKKV。 五.繁体汉字的编码  For example, for the "International Trade Promotion Committee", the first three words "International Trade" each take the first code as VUJ, and the last word "hui" takes the first code as K, then the entire phrase is coded as VUJK "Is encoded as GKKV. V. Encoding of Traditional Chinese Characters
前述各编码方法适用于繁体汉字的编码,即仍使用相同的 89 个字件,和相同的取码规则。 但是,由于某些字件本身具有繁体形 式,所以对于繁体形式的字件,应作为整体取码。这些字件及其相应 的繁体形式为:  The aforementioned encoding methods are applicable to the encoding of traditional Chinese characters, that is, they still use the same 89 characters and the same code fetching rules. However, because some typefaces have the traditional form, they should be coded as a whole. These texts and their corresponding traditional forms are:
"鱼" (2)相应的繁体形式为 ^"(2),  The corresponding traditional form of "fish" (2) is ^ "(2),
" i "(3)相应的繁体形式为"言" (3),  The corresponding traditional form of "i" (3) is "言" (3),
"门" (6)相应的繁体形式为"鬥" (4),  The corresponding traditional form of "门" (6) is "斗" (4),
"马" (7)相应的繁体形式为", "(7),  The corresponding traditional form of "马" (7) is "," (7),
"牵" (8)相应的繁体形式为"金" (8) ,  The corresponding traditional form of "holding" (8) is "金" (8),
" t "(8)相应的繁体形式为"食" (8) ,  The corresponding traditional form of "t" (8) is "食" (8),
"纟 "(7)相应的繁体形式为" "(7), W The corresponding traditional form of "纟" (7) is "" (7), W
" L"(6)相应的繁体形式为" 3L"(6), The corresponding traditional form of "L" (6) is "3L" (6),
" "(9)相应的繁体形式为 "W(9)。  The corresponding traditional form of "" (9) is "W (9).
其中, "言"(3)、"金"(8)和"食"(8)在作为部首时取码分别为 3、 8和 8。 在不作为部首时,则按笔划分别取码。  Among them, "Yan" (3), "Jin" (8), and "Shi" (8) are coded as 3, 8, and 8 when they are radicals. When not acting as a radical, the code is obtained by stroke.
繁体汉字的编码方法举例如下:  Examples of traditional Chinese characters encoding methods are as follows:
" "为竖形三结构汉字,第一结枸取首次码为 27,第二结构取 首码为 4, 第三结构取最后一码为 8, 则该汉字的编码为 2748 ;  "" Is a vertical three-structure Chinese character, the first knot is 27 for the first code, the first code is 4 for the second structure, and the last code is 8 for the third structure, and the Chinese character is encoded as 2748;
"勞"为混合形的上横下竖形汉字,取码为 9977 ;  "劳" is a mixed vertical Chinese character with a vertical code of 9977;
* 为横形的二结构汉字,取码为 8015。 为进一步提高汉字录入速度,可以设置高频字的简码形式,例 如某些字击一键即可输入,某些字击两键即可输入,等等。设计方法 是多种多样的,在这里不必赘述。 工业应用性  * It is a horizontal two-structure Chinese character with a code of 8015. In order to further improve the speed of Chinese character input, you can set the short code form of high-frequency characters, for example, some characters can be entered by pressing one key, some characters can be entered by pressing two keys, and so on. The design methods are diverse and need not be repeated here. Industrial applicability
本发明可以用于任何涉及汉字的信息处理糸统。根据本发明的 汉字编码方法,不仅可以对简体汉字进行编码录入,而且可以对繁 体、古体、异体汉字进行编码录入,甚至也可以对朝鲜文和日文中的 汉字进行编码录入。  The invention can be applied to any information processing system involving Chinese characters. According to the Chinese character encoding method of the present invention, not only simplified Chinese characters can be encoded and entered, but also traditional, ancient, and alien Chinese characters can be encoded and entered, and even Korean and Japanese characters can be encoded and entered.
虽然以上对本发明的最佳实施方式进行了描述, 但是应理解 到,对于本领域内熟练的技术人员,可以做出各种修改和变动,而不 背离本发明的范围和实质。 本发明的范图由所附权利要求限定。  Although the best embodiment of the present invention has been described above, it should be understood that various modifications and changes can be made by those skilled in the art without departing from the scope and essence of the present invention. The exemplary drawing of the invention is defined by the appended claims.

Claims

权利要求 Rights request
1.对汉字进行编码并翰入到计算机中的方法, 其中每个汉字 由一个或多个字件組成, 每个字件由一个或多个笔划按传统书写 顺序组成,该方法的特征在于包括下列步骤: 1. A method for encoding Chinese characters into a computer, wherein each Chinese character is composed of one or more characters, and each character is composed of one or more strokes in a traditional writing order. The method is characterized by The following steps:
1)对于组成汉字的多个字件,按照其中每个字件的形状与汉 字数字" 一,, 、 "二"、"三,,、"四"、 "五"、"六 "、"七"、 "八"、"九 "和 1) For multiple characters that make up a Chinese character, follow the shape of each character and the number of the Chinese character "1 ,,," 2 "," 3, "," Four "," Five "," Six "," Seven " "," Eight "," nine "and
"十"中哪一个的形状相近的关糸,将所述多个字件'分成 10组,分别 标为第 1、2、3、4、5、6、7、8、9和 10组, 并且 Which one of the "ten" has a similar shape, divides the plurality of characters into 10 groups, and marks them as groups 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10, and
将其一个笔划在书写时穿过其他两个笔划的字件分到第 5組, 将其形状与阿拉伯数字" 1"和" 7"相近的字件分别分到第 1和 第 7组,  Divide the characters whose one stroke passes through the other two strokes while writing into Group 5, and the characters whose shapes are similar to the Arabic numerals "1" and "7" into Groups 1 and 7, respectively.
将其笔划构成类似多个出头形状的字件分到第 9组;  Divide the strokes into a group of characters that are similar to several early shapes;
2)将所述 10个组的字件分别分配到键盘上的数字键 1至 9和 0上,使得所述 10个组中的每个字件与一个数字相应;  2) Assign the fonts of the 10 groups to the numeric keys 1 to 9 and 0 on the keyboard, respectively, so that each font in the 10 groups corresponds to a number;
3)将待编码的汉字分解成至少一个字件,通过键盘,将与所述 至少一个字件相应的数字输入到计算机中。  3) The Chinese character to be encoded is decomposed into at least one character, and a number corresponding to the at least one character is input to a computer through a keyboard.
2.根据权利要求 1的对汉字进行编码并輸入到计算机中的方 法,其特征在于所述步骤 1)进一步包括下列步骤:  The method for encoding Chinese characters and inputting them into a computer according to claim 1, characterized in that said step 1) further comprises the following steps:
将笔划字件"一 I "和部首字件 "石,王,山"分到所述第 1組; 将笔划字件"二 II -J ) { 和部首字件"月禾鱼舟"分到所 述第 2组;  Divide the strokes "一 I" and radical characters "Shi, Wang, Shan" into the first group; divide the strokes "二 II -J} {and radical characters" 月 禾 鱼 舟 " Assigned to the second group;
将笔划字件"三 ^ 匚 "和部首字件" Γ,分到所述第 Divide the stroke characters "三 ^ 匚" and radical characters "Γ into the first
3组; 3 groups
将笔划字件"□尸口 "和部首字件"日 ϋ臼口 ^"分到所 述第 4组;  Divide the strokes "□ corporal mouth" and radical characters "日 ϋ 素 口 ^" into the fourth group mentioned above;
将笔划字件"廿中 午 "和部首字件 "虫 4女"分到所述第 5组;  Divide the stroke type "廿 noon" and the radical word "worm 4 women" into the fifth group;
将笔划字件"丄 、,"、"和部首字件"方门 广 L "分到^述第 6组;  Divide the strokes "、,,", "" and radical characters "Fang Menguang L" into group 6;
将笔划字件" "7厂 和部首字件"马纟 p "分到所述 第 7组; Distribute the stroke type "7 Plants and radical characters" 马 纟 p "to the Group 7
将笔划字件"八人、 "和部首字件" [ φ t "分到所述第 8 組;  Divide the stroked characters "eight persons," and radical characters "[φ t" into the eighth group;
将笔划字件"小、 1'、 、^ !,和部首字件 " 火 "分到所述 第 9组;  Divide the stroke characters "small, 1 ',, ^ !, and radical characters" fire "into the 9th group;
将笔划字件"十巾 十 ,,和部首字件 "土木 ί "分到所述第 10 組。  The strokes "10 towels and 10", and radical characters "Civil" are divided into the 10th group.
3.根据权利要求 2的对汉字进行编码并输入到计算机中的方 法,其特征在于所述步骤 3)还包括步骤:  The method for encoding Chinese characters and inputting them into a computer according to claim 2, characterized in that said step 3) further comprises the steps:
对于穿插结构组成的汉字,根据所述穿插结构中交叉笔划形成 的交叉点,每满两个交叉点,翰入一个数字 5,对 下的交叉点,依 此类推,直到只剩下一个交叉点,然后输入一个数字 0。  For Chinese characters composed of interspersed structures, according to the intersections formed by the cross strokes in the interspersed structure, a number 5 is entered for every two intersections, the intersections below, and so on, until there is only one intersection And enter a number of 0.
4.根据权利要求 1的对汉字进行编码并输入到计算机中的方 法,其特征在于还包括下列步骤:  4. The method for encoding Chinese characters and inputting them into a computer according to claim 1, further comprising the following steps:
将所述 10个组的字件分别分配到键盘上与数字键 1至 9和 0 成一斜列的字母键上,使得所述 10个组中的每个字件与一个字母 相应;  Allocating the letters of the 10 groups to the alphabetic keys on the keyboard that are in a sloping row with the numeric keys 1 to 9 and 0, so that each word in the 10 groups corresponds to a letter;
将待编码的汉字分解成至少一个字件, 通过键盘,将与所述至 少一个字件相应的字母输入到计算机中。  The Chinese character to be coded is decomposed into at least one character, and a letter corresponding to the at least one character is input to a computer through a keyboard.
5.根据权利要求 4的对汉字进行编码并输入到计算机中的方 法,其特征在于所述将所述 10个组的字件分别分配到键盘上与数 字键 1至 9和 0成一斜列的字母键上的步驟进一步包括下列步骤: 将"一、石"分配到 Q键上,  5. The method for encoding Chinese characters and inputting them to a computer according to claim 4, characterized in that said 10 groups of words are respectively assigned to a keyboard in a slanted line with the numeric keys 1 to 9 and 0 The steps on the letter keys further include the following steps: Assign "一 、 石" to the Q key,
将"王"分配到 Α键上,  Assign "King" to the Α key,
将"山、 1 "分配到 Z键上,  Assign "mountain, 1" to the Z key,
将" Tj "分配到 W键上,  Assign "Tj" to the W key,
将"二月禾"分配到 S键上,  Assign "二月 禾" to the S key,
将"鱼舟 II 'J "分配到 X键上,  Assign "Fish Boat II 'J" to the X key.
将" 3 U "分配到 E键上 ,  Assign "3 U" to the E key,
将"三 '门 "分配到 D键上,  Assign the "three 'gate" to the D key,
将" 匚 "分配到 C键上, 将 "日 尸,,分配到 R键上, Assign "匚" to the C key, Assign "Sun Corpse," to the R key,
将"口 t^ XL C '分配到 F键上,  Assign "t t XL C 'to the F key,
将"目臼□"分配到 V键上,  Assign the "eye mortar □" to the V key,
将"虫牛"分配到 T键上,  Assign "bug cow" to the T key,
将"中才廿"分配到 G键上,  Assign "中 才 廿" to the G key,
将"女 /,分配到 B键上,  Assign "female /" to the B key,
将"广 "、、方 "分配到 Υ键上,  Assign "广" 、、 方 "to Υ,
将"门 、 "分配到 Η键上,  Assign "door," to the Η key,
将" L丄"分配到 Ν键上,  Assign "L 丄" to the N key,
将"马 7 V分配到 U键上,  Assign "Ma 7 V to the U key,
将"厂 έ I3 "分配到 J键上, Assign "factory I 3 " to the J key,
将"巾木 分配到 M键上,  Assign "Towel" to the M key,
将"八 牵"分配到 I键上,  Assign "eight pulls" to the I key,
将"人 t "分配到 K键上,  Assign "person t" to the K key,
将 " ^ U、 "分配到 0键上,  Assign "^ U," to the 0 key,
将"小小火† "分配到 L键上,  Assign "小小 火 †" to the L key,
将"十土于 ί "分配到 P键上。  Assign "十 土 于 ί" to the P key.
6.根据权利要求 5的对汉字进行编码并输入到计算机中的方 法,其特征在于所述步骤 3)还包括步骤:  6. The method for encoding Chinese characters and inputting them into a computer according to claim 5, characterized in that said step 3) further comprises the steps:
对于穿插结构组成的汉字,根据所述穿插结枸中交叉笔划形成 的交叉点,每满两个交叉点,翰入字母 T、G、B中的一个,对剩下的 交叉点,依此类推,直到只剩下一个交叉点,然后输入字母 P和 M 中的一个。  For Chinese characters composed of interspersed structures, according to the intersections formed by the cross strokes in the interspersed knots, every two intersections are filled with one of the letters T, G, and B, and the rest of the intersections, and so on Until there is only one intersection and enter one of the letters P and M.
7.根据权利要求 1的对汉字进行编码并输入到计算机中的方 法,其特征在于所述步骤 3)包括:  The method for encoding Chinese characters and inputting them into a computer according to claim 1, characterized in that said step 3) comprises:
如果所述待编码的汉字为字件, 则将其分解为单笔划和小字 件。  If the Chinese character to be encoded is a font, it is decomposed into single strokes and small fonts.
8.根据权利要求 1的对汉字进行编码并输入到计算机中的方 法,其特征在于所述步骤 3)包括:  The method for encoding Chinese characters and inputting them into a computer according to claim 1, characterized in that said step 3) comprises:
如果所述待编码的汉字为单结构汉字, 则将其分解为单笔划 和小字件。 If the Chinese character to be encoded is a single-structure Chinese character, it is decomposed into single strokes and small characters.
9.根据权利要求 1的对汉字进行编码并输入到计算机中的方 法,其特征在于所述步骤 3)包括: The method for encoding Chinese characters and inputting them into a computer according to claim 1, characterized in that said step 3) comprises:
如果所述待编码的汉字为二结构汉字, 则从所述二个结构的 每个中取两码。  If the Chinese characters to be encoded are two-structure Chinese characters, two codes are taken from each of the two structures.
10.根据权利要求 1的对汉字进行编码并输入到计算机中的方 法,其特征在于所述步骤 3)包括:  The method for encoding Chinese characters and inputting them into a computer according to claim 1, characterized in that said step 3) comprises:
如果所述待编码的汉字为三结构汉字, 则从所述三个结构中 的一个结构中取两码,从另外二个结构的每个中取一码。  If the Chinese character to be encoded is a three-structure Chinese character, two codes are taken from one of the three structures, and one code is taken from each of the other two structures.
11.根据权利要求 1的对汉字进行编码并输入到计算机中的方 法,其特征在于所述步骤 3)包括:  The method for encoding Chinese characters and inputting them into a computer according to claim 1, characterized in that said step 3) comprises:
如果所述待编码的汉字为四结构汉字, 则从所述四个结构中 的每一个结构中各取一码。  If the Chinese character to be encoded is a four-structure Chinese character, one code is taken from each of the four structures.
12.根据权利要求 1的对汉字进行编码并输入到讦算机中的方 法,其特征在于所述步骤 3)包括:  The method for encoding Chinese characters and inputting them into a computer according to claim 1, characterized in that said step 3) comprises:
如果所述待编码的汉字为四个以上结构的汉字, 则从前三个 结构中的每个结构中取一码, 并且从最后一个结.构中取一码。  If the Chinese character to be encoded is a Chinese character with more than four structures, one code is taken from each of the first three structures, and one code is taken from the last knot.
13.根据权利要求 2、 5、7至 12中任一项的对汉字进行编码并 输入到计算机中的方法,其特征在于还包括下列步稞:  13. The method for encoding Chinese characters and inputting them into a computer according to any one of claims 2, 5, 7 to 12, further comprising the following steps:
当所述待编码的汉字为繁体汉字时:  When the Chinese character to be encoded is a traditional Chinese character:
将字件 "免"分到所述第 2组,  Group the word "exempt" into the second group,
将字件 "言"分到所述第 3组,  Divide the word "言" into the third group,
将字件 "? "分到所述第 4组,  Group the word "?" Into said group 4,
将字件". ¾ "分到所述第 7组,  Divide the word ". ¾" into the 7th group,
将字件 "金"分到所述第 8组,  Divide the word "gold" into said group 8,
将字件 "食"分到所述第 8组,  Divide the word "食" into the eighth group,
将字件" ^ "分到所述第 7组,  Divide the word "^" into the 7th group,
将字件 "++"分到所述第 9组。  The wording "++" is assigned to the 9th group.
PCT/CN1996/000069 1995-08-16 1996-08-16 Method of encoding and inputing complicated or simplified form of chinese character and keyboard thereof WO1997007449A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU67314/96A AU6731496A (en) 1995-08-16 1996-08-16 Method of encoding and inputing complicated or simplified form of chinese character and keyboard thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN 95108761 CN1124373A (en) 1995-08-16 1995-08-16 Input method and keyboard for Chinese character of original complex form and simplified form coding
CN95108761.4 1995-08-16

Publications (1)

Publication Number Publication Date
WO1997007449A1 true WO1997007449A1 (en) 1997-02-27

Family

ID=5076879

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN1996/000069 WO1997007449A1 (en) 1995-08-16 1996-08-16 Method of encoding and inputing complicated or simplified form of chinese character and keyboard thereof

Country Status (3)

Country Link
CN (1) CN1124373A (en)
AU (1) AU6731496A (en)
WO (1) WO1997007449A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107300988A (en) * 2017-09-04 2017-10-27 张新伟 The pictograph letter method of Chinese character coding and its computer Chinese input with keyboard
CN112083813A (en) * 2019-10-25 2020-12-15 钱文威 Chinese character input method

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107656628A (en) * 2017-11-08 2018-02-02 河南水天环境工程有限公司 A kind of input method of Chinese character based on touch-screen virtual numeric keypad

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1102488A (en) * 1993-11-05 1995-05-10 张绍贤 Computer entering method for Chinese numerals and its keyboard
CN1102894A (en) * 1994-08-05 1995-05-24 李善成 Chinese character-type digital coding method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1102488A (en) * 1993-11-05 1995-05-10 张绍贤 Computer entering method for Chinese numerals and its keyboard
CN1102894A (en) * 1994-08-05 1995-05-24 李善成 Chinese character-type digital coding method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107300988A (en) * 2017-09-04 2017-10-27 张新伟 The pictograph letter method of Chinese character coding and its computer Chinese input with keyboard
CN112083813A (en) * 2019-10-25 2020-12-15 钱文威 Chinese character input method

Also Published As

Publication number Publication date
AU6731496A (en) 1997-03-12
CN1124373A (en) 1996-06-12

Similar Documents

Publication Publication Date Title
JP2006127510A (en) Multilingual input method editor for ten-key keyboard
WO2003104963A1 (en) Input method for optimizing digitize operation code for the world characters information and information processing system thereof
CN111880667A (en) Phoneme same-tone near-bit common Chinese character code input method
CN103616960A (en) Six vowel binary syllabification input method
WO1997007449A1 (en) Method of encoding and inputing complicated or simplified form of chinese character and keyboard thereof
WO1995007505A1 (en) Method and keyboard device for high speed inputting chinese character in computer or like
CN106168858A (en) 26 radical radical and stroke Chinese-character input methods
CN105912139B (en) Method for correspondingly recognizing modular stroke coding Chinese characters
CN103207684A (en) Phonemic letter double-input method
CN105302330A (en) Combined phonetic and stroke type main and auxiliary code Chinese character and word and phrase coding input method and keyboard adopting method
CN1259698A (en) Chinese character double stroke wang code input method
CN102511021A (en) Number-order-code-element keyboard and information input method thereof
CN110879668A (en) Chinese character input method by expanding strokes in large character library
CN105278697B (en) Combined double-spelling class major-minor code Chinese character, word coded input method and its keyboard
CN1196057C (en) One-code two-form quick Chinese digital coding input method
CN103425250A (en) Digital keyboard convenient in Chinese and English input
TWI777235B (en) A Chinese character input method
CN1293452C (en) Chinese character keyboard niput method for identifying shape code while meeting character and also using sound code
CN102043469A (en) Two-stroke type three-dimensional digital input method and keyboard
CN105204657B (en) Combined type phonetic class major-minor code Chinese character, word coded input method and its keyboard
CN101866338B (en) Method for creating Chinese character
CN105320291B (en) Combined type pronunciation and meaning class major-minor code Chinese character, word coded input method and its keyboard
CN100561410C (en) A kind of multichannel input Chinese keyboard
CN103616961A (en) Phoneme T-shaped Chinese character code input method
CN115047980A (en) Non-split Chinese character input integrated system capable of accurately inputting Chinese characters

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BB BG BR BY CA CH CZ DE DK EE ES FI GB GE HU IL IS JP KE KG KP KR KZ LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG US UZ VN AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): KE LS MW SD SZ UG AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA