GB2158276A - Phonetic encoding system of Chinese characters - Google Patents

Phonetic encoding system of Chinese characters Download PDF

Info

Publication number
GB2158276A
GB2158276A GB08409538A GB8409538A GB2158276A GB 2158276 A GB2158276 A GB 2158276A GB 08409538 A GB08409538 A GB 08409538A GB 8409538 A GB8409538 A GB 8409538A GB 2158276 A GB2158276 A GB 2158276A
Authority
GB
United Kingdom
Prior art keywords
chinese
characters
vowels
character
codes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB08409538A
Other versions
GB8409538D0 (en
GB2158276B (en
Inventor
Jin-Kai Li
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lu Ping
Original Assignee
Lu Ping
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lu Ping filed Critical Lu Ping
Priority to GB08409538A priority Critical patent/GB2158276B/en
Publication of GB8409538D0 publication Critical patent/GB8409538D0/en
Publication of GB2158276A publication Critical patent/GB2158276A/en
Application granted granted Critical
Publication of GB2158276B publication Critical patent/GB2158276B/en
Priority to SG801/88A priority patent/SG80188G/en
Priority to HK16/89A priority patent/HK1689A/en
Expired legal-status Critical Current

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B41PRINTING; LINING MACHINES; TYPEWRITERS; STAMPS
    • B41JTYPEWRITERS; SELECTIVE PRINTING MECHANISMS, i.e. MECHANISMS PRINTING OTHERWISE THAN FROM A FORME; CORRECTION OF TYPOGRAPHICAL ERRORS
    • B41J3/00Typewriters or selective printing or marking mechanisms characterised by the purpose for which they are constructed
    • B41J3/01Typewriters or selective printing or marking mechanisms characterised by the purpose for which they are constructed for special character, e.g. for Chinese characters or barcodes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/53Processing of non-Latin text

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)

Abstract

26 Latin letters are used for representing 21 consonants and 37 vowels in Chinese, to spell a character with two Latin letters. A consonant is combined with a vowel, both represented by a Latin letter, to get a character used as a name allotted to the Latin letter. After that, Fan-qui (a traditional Chinese phonetic method) is done with characters used for naming Latin letters. By this way, the obstacle of the phonetic interference of dialects can be removed to great extent. Stroke shape codes are used to identify homonyms and alternative encoding with letters and digits is adopted to drop the ending symbol. Brevity codes are used for frequently used characters.

Description

SPECIFICATION Phonetic encoding system of Chinese characters The subject matter of this invention relates to a system for encoding Chinese characters phonetically in which the allotted codes are formed as short as possible and are suitable as much as possible not only to the standard pronunication but also to the pronunciation of the many Chinese dialects, which differ greatly from one another.
Great difference of phonetic rule exists between Chinese and Occidental languages. A longer spelling results from adopting the ordinary method of phoneticizing Chinese characters with Latin letters, e.g. " k (zhuang)", " gj (chuang)", " i; (shuang)", all with 6 letters. What is more, Chinese contains many greatly differing dialects and the standard pronunciation is not yet popularized. Therefore, it is difficult for those from southern-dialect areas (including overseas Chinese who speak southern dialects) to manage phonetic spelling by standard pronunciation. Up to now, no phonetic encoding system is available for shortening phonetic spelling as well as suitable to various dialects as much as possible.
The invention system for encoding Chinese characters, has been previously filed, number: GB 2 100 899 A. The present invention takes the aforesaid system a step forward. This system unifies stroke shape encoding and phonetic encoding.
This invention is characterized by: using 26 Latin letters which stand for 21 consonants and 37 vowels in Chinese; naming each Latin letter by way of reading out the consonant and the vowel, both represented by this same Latin letter, to obtain the pronunciation of the Chinese character which is used for naming the Latin letter.
Stroke shape identification codes are used to differentiate homonyms which could arise in phonetic spelling. Brevity codes may be allotted for frequently used characters.
The invention is described in detail as follows: (1) The correlatior between Chinese characters and Chinese consonants and vowels:~ Referring to the table in Fig. 1, Latin letters are listed in the top line, Chinese consonants in the second, Chinese vowels in the third, and in the bottom line there are Chinese characters which are the names of their related Latin letters. The 21 Latin letters represent not only the 21 consonants but also the corresponding vowels in the third line. For example, if the phonetic syllable of a character is "bb", the first "b" is pronounced according to its original articulation, the second "b" is pronounced according to the articulation of its corresponding vowel "ai". This is what we mean with one letter representing both a consonant and a vowel.In Fig. 1, all consonants have the same role as those in "The Scheme for the Chinese Phonetic Alphabet" but v, w, y, which represents zh, ch, sh, in the Scheme. Vowels represented by 21 Latin letters are shown in Fig. 1 in which characters in the fourth line are the names given to the related Latin letters and also the phonetic syllables obtained by reading out the consonants and the vowels accordingly, both represented by the related Latin letters named.
The five Latin letters, a, o, e, i, u in Fig. 2 are simple vowels.
So far, 21 out of 37 vowels in Chinese are represented by the consonants themselves, another 5, by the five Latin vowels. The remaining 11 will be represented by the Latin letters which have been used for representing vowels, thus all 21 consonants and 37 vowels in Chinese are represented by 26 Latin letters.
For illustrating how the remaining 11 vowels are represented by the already-used letters, it is necessary to point out the theory on which the correlation between consonants and vowels in Chinese is based. The Chinese syllable is of a consonant-vowel structure, i.e. a consonant and a vowel construct a syllable, with the consonant before the vowel. Thus, a Latin letter is in a position to appear twice consecutively to stand for a consonant and a vowel at the same time with the former representing a consonant and the latter, a vowel.
Whenever a Chinese characrer is represented by the first must be consonant and the second, a vowel. This is the "double spelling with consonant and vowel" for Chinese character syllables.
Regularly, there is a mutually complementary relationship between a Chinese consonant and a Chinese vowel, e.g. vowel "uan", "uen", "ong(ueng)" can never be combined with consonant "j". "q", "x", whereas, "wan", "(In", "iong((Ieng)" can only be combined with "j", ''q'', "x". This law makes it possible to use "z", "e", ''s'' for representing both "uan", "uen", "ong(ueng)" and "(ivan", "un", "iong(iieng)".
Another 8 Latin letters which appear thrice for representation are shown in Fig. 3.
In addition, "er" is pronounced '' )Q " in Chinese.
Again, use Latin letter "i" for representing the vowel in " t (zi)" " + (ci)", " ( (si)" - and " k (zhi)", " o# chi)", "q#(shi)", 1. D (ri)".
In the ultimate, by using 26 Latin letters representing all consonants and vowels in Chinese, the foundation is laid for encoding Chinese characters by double spelling with consonant and vowel.
(2) Rules for Spelling:~ Each Chinese character consists of two let ters, the former is for consonant, the latter for vowel, as shown in Fig. 4. Take ":#", "#" for example.
The character "jX" should be spelt phoneti cally "jiang" according to "The Scheme for the Chinese Phonetic Alphabet". By the above-explained rule of using one letter repre senting both consonant and vowel at the same time, the phonetic spelling of the char acter can be reduced to "jq". The same reason is for character '';i" being simplified from "wang" to "uq" in which only vowels appear with vowel "u" taking the place of the lacked consonant. Such a syllable is called zero-consonant syllable.
In order to remove a great extent the obsta cle to the phonetic spelling of Chinese charac ters, which arises from the phonetic interfer ence of dialects, this invention adopts a method of pronouncing a character with another two characters which represent the related Latin letters, namely, the new form of Fan-qui (a traditional Chinese phonetic method). By Fan-qui we mean taking the consonant of the first character and the vowel of the second, reading out the consonant and the vowel to get the pronunciation of the third character. Phonetic interference of dialects can be got rid of to great extent this way.
There is a close correlation between standard Chinese (spoken languag) and various dia lects. If one reads out the first and the second characters used for Fan-qui in standard pro nunciation, he or she will pronounce the would-be third character in standard manner; if one reads out the first and the second in his or her dialectal sound, he or she will pro nounce the third one dialectally. No matter how standard Chinese is different from various dialects, one will get the same character by way of Fan-qui, thus solving the embarrassing question in phonetic spelling and the encoding of Chinese characters, which is due to the different pronunciations of standard Chinese and various dialects.
Fig. 5 shows the correlation between standard Chinese and Cantonese, and how the interference of dialectal sound is removed, in which we can see clearly that the character obtained by way of Fan-qui of another two characters in standard pronunciation is the right character obtained by way of Fan-qui of another two characters in Cantonese accent, but this character can be pronounced in standard way or in Cantonese way.
Encoding Chinese characters phonetically according to "The Scheme for the Chinese Phonetic Alphabet" is confined by dialectal pronunciations and it is difficult to get a unified phonetic encoding system for Chinese characters. Besides, it is not an easy thing for those who cannot speak standard Chinese to master such a phonetic method. When using the method of Fan-qui with consonant and vowel, those from dialectal areas may easily carry out 90% of all Chinese phonetic spelling in their respective dialectal pronunciation to get the right characters obtained otherwise by way of standard phonetic spelling. For the minor add characters, one more reference source can be used as a remedy in characterinformation processors.
(3) Character Shape Identification Codes:~ One can tell a Chinese homonym from others by their different shapes, and the simplest way is to encode Chinese characters by their stroke shapes. The invention of this kind of system has already been claimed. According to statistical data, a stroke shape code of four digits is enough for the identification of characters with same code. The concrete way is, dividing the stroke shapes of a multicomponent character into two groups, in each of which two strokes are taken to form a 4digit stroke shape code; for one-component character, taking four strokes consecutively in the order of from top to bottom and from left to right. Fig. 6 exemplifies character shape identification codes of homonyms of syllable "bb(bai)", those out of curves are effective codes, whereas those within are ineffective ones, (i.e. useless codes) which can be omitted in storage or encoding. The character can be differentiated that way.
(4) Brevity Codes:~ By the information theory, briefest codes should be given to the frequently used characters. One Latin letter and a 1-digit stroke shape code can be used for encoding 208 (26 X 8) Chinese characters. 8 brevity codes are shown in Fig. 6.
According to statistics, these 208 charac ters make up more than 50% of total use frequency of Chinese characters.
One Latin letter code and two stroke shape codes can be used for encoding 1664 charac ters, while two letter codes and one stroke shape code can be adopted to encode 5408 characters. That is to say, 7,000 characters can be allotted a 3-digit code.
Statistical data show that 1690 characters make up 97% of total use frequency, and 2393, 99%.
If encoding with Latin letyters and digits alternatively (letter before numeral), the end ing symbol indispensable to common encod ing (including telegraphic coding) can be omit ted. By this encoding system, only 2.5 codes are used for each character on the average, whereas 5 codes are used for each telegraphic coding on the average. The former is 50% shorter than the latter. By using this encoding system, character-information processors (in cluding computers, teleprinters, typewriters and dial telephones, etc.) can work easily, accurately, and more rapidly.
This system for encoding Chinese charac ters phonetically can be used for compiling Chinese dictionaries and character indexes, as well as for establishing Chinese character information processing system in computers, teleprinters, typewriters of large, medium or small size.
Brief description of the drawings:~ Figure 1: Consonant-vowel table. Shows the correlation between Latin letters and Chinese consonants, vowels, and the related character names.
Figure 2: Simple vowels table. Shows the correlation between Latin letters and simple Chinese vowels and the related character names.
Figure 3: Letters which appear thrice.
Shows the correlation between Latin letters which appear thrice and the represented vow els.
Figure 4: Examples of Chinese phonetic spelling. Gives Chinese characters, consonants and vowels which are used for phonetic spelling and are compared with original Chinese phonetic alphabets.
Figure 5: Fan-qui of Chinese characters.
Shows the characters used for Fan-qui and the third ones obtained by way of Fan-qui in both standard Chinese and Cantonese.
Figure 6: Stroke shape identification codes table. Gives the stroke shape identification codes of the homonyms.
Figure 7: Table of brevity codes of Chinese characters. Gives brevity codes of Chinese characters, composed by one Latin letter and one digit.

Claims (10)

1. A phonetic encoding system of Chinese characters, in which a limited number of notational letters are used for representing consonants and vowels of Chinese characters, and the stroke shape identification codes are used for differentiating homonymous characters.
2. A method as claimed in Claim 1 in which 26 Latin letters are used for representing 21 consonants and 37 vowels in Chinese and any Chinese syllable can be double spelt with consonant and vowel.
3. A method as claimed in the above two Claims in which 21 consonants of Latin letters stand for both 21 consonants and 21 vowels of Chinese characters, 5 Latin vowels for 5 simple Chinese vowels, the remaining 11 Chinese consonants are represented by 11 Latin letters which have already been used for representing Chinese consonants and vowels, thus using 26 Latin letters representing 21 consonants and 37 vowels of Chinese characters.
4. A method as claimed in Claims 1 to 3 in which any Chinese syllable is composed of two letters, the former is a consonant, the letter, a vowel, thus forming the "double spelling with consonant and vowel".
5. A method as claimed in any of the preceding Claims in which "double spelling with consonant and vowel" is equal to using two characters for Fan-qui, which are the names of their related Latin letters; according to the correlation between standard Chinese pronunciation and various dialectal sounds, when Fan-qui is done in standard pronunciation, the character thus acquired is pronounced standardly; when Fan-qui is done in dialectal pronunciation, the character thus obtained has a dialectal pronunciation; despite the difference in pronunciations, the character obtained by two different ways is just the same one.
6. A method as claimed in Claim 1 to 5 in which homonymous characters are differenti ated by stroke shape codes of Chinese charac ters, i.e. stroke shape identification codes.
7. A method as claimed in any of the preceding Claims in which brevity codes are allotted to frequently used characters; 3-digit codes are given to majority of Chinese charac ters; thus it is possible to encode all Chinese characters.
8. A method as claimed in Claims 1 to 7 in which alternative encoding with letter codes and digit codes is used and an ending symbol every two characters can be omitted.
9. Dictionaries, codes and character in dexes compiled by a method as claimed in Claims 1 to 8.
10. Character-information processors of large, medium or small size, apparatus such as computers, teleprinters, typewriters and dial telephones using the phonetic encoding system of Chinese characters as claimed in Claims 1 to 8, and adapted for Chinese char acter information processing.
GB08409538A 1984-04-12 1984-04-12 Phonetic encoding system of chinese characters Expired GB2158276B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
GB08409538A GB2158276B (en) 1984-04-12 1984-04-12 Phonetic encoding system of chinese characters
SG801/88A SG80188G (en) 1984-04-12 1988-11-29 Phonetic encoding system of chinese characters
HK16/89A HK1689A (en) 1984-04-12 1989-01-05 Phonetic encoding system of chinese characters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB08409538A GB2158276B (en) 1984-04-12 1984-04-12 Phonetic encoding system of chinese characters

Publications (3)

Publication Number Publication Date
GB8409538D0 GB8409538D0 (en) 1984-05-23
GB2158276A true GB2158276A (en) 1985-11-06
GB2158276B GB2158276B (en) 1988-04-27

Family

ID=10559580

Family Applications (1)

Application Number Title Priority Date Filing Date
GB08409538A Expired GB2158276B (en) 1984-04-12 1984-04-12 Phonetic encoding system of chinese characters

Country Status (3)

Country Link
GB (1) GB2158276B (en)
HK (1) HK1689A (en)
SG (1) SG80188G (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2185838B (en) * 1985-08-29 1990-02-28 Yoshinori Shinoto Selection system for ideographic characters
WO2007043979A1 (en) * 2005-10-14 2007-04-19 Subramaniam Athirubarani Method and system for teaching spelling and pronunication of a language

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB1398882A (en) * 1968-07-02 1975-06-25 Leban C Method and means for transcribing ideographic or other non- alphabetic characters
GB1600841A (en) * 1978-04-12 1981-10-21 Kuo Kuang Hu Method and apparatus for reproducing desired ideographs

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB1398882A (en) * 1968-07-02 1975-06-25 Leban C Method and means for transcribing ideographic or other non- alphabetic characters
GB1600841A (en) * 1978-04-12 1981-10-21 Kuo Kuang Hu Method and apparatus for reproducing desired ideographs

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2185838B (en) * 1985-08-29 1990-02-28 Yoshinori Shinoto Selection system for ideographic characters
WO2007043979A1 (en) * 2005-10-14 2007-04-19 Subramaniam Athirubarani Method and system for teaching spelling and pronunication of a language

Also Published As

Publication number Publication date
HK1689A (en) 1989-01-13
SG80188G (en) 1989-03-23
GB8409538D0 (en) 1984-05-23
GB2158276B (en) 1988-04-27

Similar Documents

Publication Publication Date Title
CN101118539B (en) Modern Chinese information holographic Latinizing Chinese voice code processing method
Porpodas Literacy acquisition in Greek: Research review of the role of phonological and cognitive factors
US5331557A (en) Audio-video coding system for Chinese characters
Edzard Polygenesis, convergence, and entropy: an alternative model of linguistic evolution applied to semitic linguistics
Goldsmith Probabilistic models of grammar: Phonology as information minimization
CN109255120A (en) A kind of Laotian segmenting method
Zhao et al. An online database of phonological representations for Mandarin Chinese
Aranta et al. Utilization Of Hexadecimal Numbers In Optimization Of Balinese Transliteration String Replacement Method
GB2158276A (en) Phonetic encoding system of Chinese characters
Duarte The Mechanics of Fingerspelling: Analyzing Ethiopian Sign Language
Odinye Phonology of mandarin chinese: a comparison of Pinyin and IPA
CN1257444C (en) Complete pronunciation Chinese input method for computer
Birkenes North Frisian dialects: A quantitative investigation using a parallel corpus of translations
Goodman The process of reading in non-alphabetic languages: An introduction
List Typology of sound change (Open problems in computational diversity linguistics 9)
CN101281434A (en) Sound number voice input method
CN113506559B (en) Method for generating pronunciation dictionary according to Vietnam written text
Zhang et al. Tibetan Lhasa Phonetic to International Phonetic Alphabet Conversion System Based on Small Character Set
CN1234062C (en) Chinese-character input method for computer
CN115099225A (en) Simple spelling Chinese character method
KR20000053095A (en) Method for converting non-phonetic characters into surrogate words for inputting into a computer
CN1202647A (en) Phonetic Chinese characters
Jinfeng et al. Chinese in the computer: efficiency in input and the role of nested element analysis
Neef Translation in the context of theoretical writing system research
Leton Computer simulation of reading: A progress report

Legal Events

Date Code Title Description
732 Registration of transactions, instruments or events in the register (sect. 32/1977)
PCNP Patent ceased through non-payment of renewal fee

Effective date: 19930412

728C Application made for restoration (sect. 28/1977)
728A Order made restoring the patent (sect. 28/1977)
PCNP Patent ceased through non-payment of renewal fee

Effective date: 19950412