CN1043015A - Pronunciation and form compatible chinese coding scheme of dual-purpose information-exchange code - Google Patents

Pronunciation and form compatible chinese coding scheme of dual-purpose information-exchange code Download PDF

Info

Publication number
CN1043015A
CN1043015A CN 89108345 CN89108345A CN1043015A CN 1043015 A CN1043015 A CN 1043015A CN 89108345 CN89108345 CN 89108345 CN 89108345 A CN89108345 A CN 89108345A CN 1043015 A CN1043015 A CN 1043015A
Authority
CN
China
Prior art keywords
code
sound
sign indicating
indicating number
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN 89108345
Other languages
Chinese (zh)
Inventor
林宇威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN 89108345 priority Critical patent/CN1043015A/en
Publication of CN1043015A publication Critical patent/CN1043015A/en
Withdrawn legal-status Critical Current

Links

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

Pronunciation and form compatible Chinese coding scheme of dual-purpose information-exchange code is one and is used for working out form class like GB but can represent the encoding scheme of the Hanzi code for interchange of two information of Chinese-character sound-shape simultaneously, it represents the font of Chinese character with all-key (14 binary codes), represent the pronunciation of Chinese character with part sign indicating number (back 11 binary codes) wherein, because phonetic-stroke code is compatible, thereby provide possibility for the transmission and the conversion equipment of the sound-form information of simplifying Chinese characters greatly.This sign indicating number has stronger regularity, than being easier to memory, also can be used as a kind of sound shape input code and uses, and the user not necessarily needs to be grasped Chinese phonetic alphabet knowledge.

Description

Pronunciation and form compatible Chinese coding scheme of dual-purpose information-exchange code
Pronunciation and form compatible Chinese coding scheme of dual-purpose information-exchange code is one and is used for working out form class like GB but can represent the encoding scheme of the Hanzi code for interchange of two information of Chinese-character sound-shape simultaneously.
As is generally known, Chinese character has shape, two features of sound, but the versatility code-post and telecommunications sign indicating number and the GB that are used for carrying out Chinese character information exchange usefulness at present all are a kind of information exchange code of Chinese character pattern, can not be used for directly exchanging the pronunciation information of Chinese character simultaneously, in addition, now also there is not a kind of general Chinese-character pronunciation permutation code, the general at present pronunciation of adopting the Chinese phonetic alphabet to mark Chinese character, the pronunciation that marks and import Chinese character with the Chinese phonetic alphabet is feasible, but be not suitable for making the word sound permutation code of Chinese character of it, because it is not a kind of isometric numerical code, it is very long to convert the binary code code word to, it is too low to be used for transmitting the Chinese-character pronunciation information efficiency, and be difficult to and existing code system and hardware compatibility, in order to satisfy the needs of growing Chinese speech treatment technology, set up a kind of efficient and can be necessary with the Chinese characters in current use word sound permutation code of existing code system and hardware compatibility.
But the Chinese-character pronunciation permutation code meaning of setting up a kind of efficient general merely is big not enough, because though such word sound sign indicating number can satisfy the needs of some simple Chinese speech treatment facilities, still can't satisfy those needs and frequently carry out the Chinese natural language input and output of Chinese-character shape-pronunciation information translation and the needs of treatment facility, in this kind equipment, if the font of Chinese character and word sound use a kind of irrelevant each other graphemic code and word sound sign indicating number separately, carrying out required hardware device of shape → sound or sound → shape information translation and software program still will be very complicated.
The optimal way of dealing with problems obviously is, manage to set up a kind of dual-purpose information-exchange code of pronunciation and form compatible for Chinese character, be a kind ofly can represent Chinese character pattern, can represent the permutation code of Chinese character pronunciation again simultaneously, this is an imagination with suitable attractive force, if can realize, will and change for the Chinese-character sound-shape transmission of Information undoubtedly and bring very big benefit and convenient.
The purpose of this patent is to attempt inquiring into setting up the approaches and methods of this pronunciation and form compatible Chinese dual-purpose information-exchange code, and proposes a preliminary specific coding scheme for people's reference.
As is generally known, Chinese character pronunciation has the unisonance characteristic, 6763 commonly used one of sums, in the Chinese characters of level 2, different pronunciations only have 1301 kinds (GB2312-80 standards), a kind of pronunciation, minimum is only corresponding with a Chinese character, maximum have 60 phonetically similar words (as " Y ì " sound), utilize this unisonance characteristic of Chinese character pronunciation, we can set up the pronunciation and form compatible dual-purpose sign indicating number of a kind of form for " word sound code+phonetically similar word sequence number ", for example, just can construct the simplest a kind of pronunciation and form compatible dual-purpose sign indicating number with 6 decimal number X6X5X4X3X2X1, wherein X6X5X4X3 is a word sound code, and span 0001~1301 is represented the sequence number of Chinese character pronunciation; X2X1 is the phonetically similar word sequence number, span 01~60, and the font of a word comes code with all-key, and its pronunciation is represented with word sound code.But, this pronunciation and form compatible Chinese dual-purpose sign indicating number that constitutes with straightforward procedure, because code word oversize (need account for 3 bytes), the code space utilization factor is very low, is not a kind of desirable pronunciation and form compatible dual-purpose sign indicating number of Gong promoting the use of obviously.
In order to set up pronunciation and form compatible dual-purpose sign indicating number with Practical significance, we wish that pronunciation and form compatible dual-purpose sign indicating number has the such form of GB, therefore, our striving direction should be to manage the pronunciation and form compatible dual-purpose sign indicating number of above-mentioned 6 tens digit forms is collapsed into 4 decimal numbers, and the binary code that makes its correspondence has the form (promptly the form of 14 binary codes) altogether of the such double byte of similar GB 7 bit codes, makes it also can be compatible with the ASC11 sign indicating number.
At first sight come, this target seemingly can't realize, the way it goes, the author explored multiple scheme and did not all succeed, the scheme of introducing below is unique scheme that can satisfy above-mentioned requirements substantially that present patent application person finds, in order to cast a brick to attract jade, now put forward reference for people.
For the convenience of narrating, we are split as graphemic code and two kinds of forms of word sound sign indicating number (they are a kind of sign indicating number in fact) to the pronunciation and form compatible Chinese dual-purpose information-exchange code, these two kinds of codes that sign indicating number all is 4 tens digit forms, compatible each other, the compatible meaning is meant that the code value of back three bit codes of two kinds of sign indicating numbers is identical, some difference of the code value of first bit code, but available straightforward procedure changes, and introduces version, code implication and the relation each other of these two kinds of sign indicating numbers below first.
The decimal system form of graphemic code and word sound sign indicating number is a4a3a2a1 and A4A3A2A1, and span and mutual relationship are a1=0~9, a4a3a2=000~999; A1=0 or 6, A4A3A2=a4a3a2.
The binary mode and the mutual relationship between them of graphemic code and word sound sign indicating number are:
Figure 891083456_IMG2
A1 in decimal system form graphemic code and the word sound sign indicating number and the value of A1 equal preceding 4 bit codes of corresponding binary code second byte respectively and carry out the transformed value of conversion by weight " 6-4-2-1 " binary-coded decimal code (BCD code) (transformation relation of " 6-4-2-1 " binary-coded decimal code (BCD code) binary coded decimal is 0000~0,0001~1,0010~2,0011~3,0100~4,0101~5,1000~6,1001~7,1010~8,1011~9) and since binary mode word sound sign indicating number the code value perseverance of first three bit code be 0, so A1 only can get 0 and 6 two-values; The value of a4a3a2 and A4A3A2 in decimal system form graphemic code and the word sound sign indicating number equals after corresponding binary code first byte and second byte three bit codes totally ten pairing decimal numbers of binary code altogether respectively.
By above code implication and span as can be known:
1. graphemic code sign indicating number position adds up to 10000, and word sound sign indicating number sign indicating number position adds up to 2000;
2. word sound sign indicating number A4A3A2A1 is divided into A1=0 and A1=6 two big classes, and every class accounts for 1000 sign indicating number positions, the class of A1=0, and every yard class of representing the range of sound, a Datong District: A1=6 is represented a little same range of sound for every yard, and each range of sound, Datong District can hold 6 phonetically similar words; Each is little can hold 4 phonetically similar words with the range of sound;
3. same all identical with the a4a3a2 in their graphemic code of the Chinese character in the range of sound, the value of a1 to the range of sound, Datong District, fades to 5 by 0; To the little same range of sound, fade to 9 by 6, by phonetically similar word in this district ordering and decide;
4. same all have identical word sound sign indicating number with the Chinese character in the range of sound, and the A4A3A2 in the word sound sign indicating number equals the a4a3a2 in this district's graphemic code, the A1 in the word sound sign indicating number, and to the range of sound, Datong District, perseverance is 0; To the little same range of sound, perseverance is 6;
5. the rule that derives decimal system form word sound sign indicating number by decimal system form graphemic code is: A4=a4, and A3=a3, A2=a2, the transformation rule of A1 is: when a1=0~5, A1=0; When a1=6~9, A1=6 is exemplified below:
Figure 891083456_IMG3
This pronunciation that shows first word of available this district of same pronunciation with all phonetically similar words in the range of sound (representative word) is represented, and their word sound sign indicating number is equal to this word sound sign indicating number (also equaling the graphemic code of this word) of representing word;
6. the rule by binary mode graphemic code derivation binary mode word sound sign indicating number is more simple, and only the code value of the preceding 3 bit code b3 in graphemic code second byte, b2, b1 all need be become 0 and get final product, for example:
Figure 891083456_IMG4
Hence one can see that, though be used for representing the graphemic code of a Chinese character pattern the same in the phonetic-stroke code with GB, length of needs is 14 binary code, but be used for representing the word sound sign indicating number of its pronunciation, only needing a length is 11 binary code (because first three bit code code value perseverance is 0, so they can be removed).
The reason that above-mentioned conversion rule is achieved is, adopted weight to change the value of a1 and A1 for the binary-coded decimal code (BCD code) of " 6-4-2-1 ", as adopt other weight binary-coded decimal code (BCD code) (as normally used " 8-4-2-1 " sign indicating number), then can't obtain so simple conversion rule.
With several thousand Chinese characters in common use according to Chinese phonetic alphabet syllabic alphabet order to carry out layout with range of sound mode, can get pronunciation and form compatible Chinese dual-purpose information-exchange code code table and the code book of shape shown in table one and table two, during layout, the unisonance number of words is less than when equaling 6 or 4, be arranged in same greatly or little with in the range of sound respectively, the unisonance number of words is arranged in during more than 6 adjacent several with in the range of sound.Code table is a full edition, has listed the font, word sound of each Chinese character and the concrete form of corresponding two, decimal system graphemic code and word sound sign indicating number above in detail, and wherein first row " sound preface " are Chinese character pronunciation sequence number (0~1301), retrieves with helping; Code book is an abridged table that uses for the user, and totally 100 pages (00~99 district) only has the graphemic code and the word sound sign indicating number of font, word sound and the decimal system form of Chinese character above.This programme is owing to adopt 6 words/4 word modes to divide the big or small same range of sound, can make the word sound number that only accounts for a same range of sound (a word sound sign indicating number is only arranged) reach 928 more than, account for more than 71% of total word sound sum 1301, this is that this programme adopts 6 words/4 word modes to divide the reason of size with the range of sound.
For making the utilization factor of code specificationsization and raising code space, the more word sound of phonetically similar word number need occupy a plurality of with the range of sound and corresponding with a plurality of word sound sign indicating numbers, this phenomenon is inevitable, but it is little to the practical application influence, because, in the shape → sound conversion of Chinese character, the corresponding a plurality of word sound sign indicating numbers of word sound are can not bring what difficulty (because can make these several word sound sign indicating numbers all represent same pronunciation) to application; In the sound → shape conversion of Chinese character, the corresponding a plurality of word sound sign indicating numbers of word sound, though can bring some difficulties to application, but since a word sound is occupied in this programme a plurality of be adjacent with the range of sound, corresponding a plurality of word sound sign indicating numbers also are that order changes, and the difficulty that increases when therefore searching phonetically similar word is also just not too large.
The pronunciation and form compatible Chinese dual-purpose information-exchange code can be used for doing permutation code and internal code, also can be used to do input code and uses.
Table two pronunciation and form compatible Chinese dual-purpose information-exchange code code book
(the 19th page)
Figure 891083456_IMG6
Make internal code and permutation code and use the transmission and the conversion equipment of the sound-form information of can simplifying Chinese characters.For example, adopt word sound sign indicating number to transmit Chinese-character pronunciation information and only need the 11bit/ word; Carry out Chinese character shape → message breath conversion with phonetic-stroke code, only need the Chinese character pronunciation waveform or the waveform parameter memory of a 2K address space (11 address wire), as with GB and post and telecommunications sign indicating number, then need 16K or 64K, the former only is the latter's 1/8 and 1/32; Carry out the conversion of Chinese character sound → shape information with phonetic-stroke code, because phonetic-stroke code is compatible each other, code has very strong regularity, so also might simplify required hardware and software equipment greatly.
Use as input code, can be the people who is ignorant of the Chinese phonetic alphabet a kind of sound shape code input method is provided, this sign indicating number is easier to memory than GB and post and telecommunications sign indicating number, than the automatic retrieval and the search that are easier to realize artificial and machine because stronger regularity is arranged.
Below we lift a little possible application examples:
Can make a kind of holding type Chinese character sounder that only need use small-sized decimal keyboard, use for foreign traveller, be their some common sentence handbooks of layout, at the other word sound sign indicating number that stamps corresponding Chinese character of each sentence, the user only needs the word sound sign indicating number of a certain sentence is keyed in machine in proper order, machine just can send this sound automatically, and the mute if can learn the word sound sign indicating number of several thousand Chinese characters by heart, also can utilize it to come indiscriminately ad. as one wishes to talk with other people.
Binary mode word sound sign indicating number is printed on the bottom of each Chinese character of Chinese character Sounding reading matter or nearby with hidden or publicity pattern, can make a kind of Chinese character automatic reading machine that need not use very complicated Chinese character pattern recognition technology, this reading machine only need be discerned binary code, therefore make than being easier to, cost is lower, can be used as the first step that realizes Chinese automatic reading machine, also available this manufactured blind reading and reading machine for the blind.
Utilize phonetic-stroke code to replace the post and telecommunications sign indicating number can send Chinese-character sound-shape two by cable.
Utilize word sound sign indicating number can set up the wired or wireless telephone system of the narrow Chinese of a kind of very bandwidth, this telephone system can be used for the business that does not need to discern speaker's identity, as public affair communication and broadcasting, battlefield communication etc.
Adopt phonetic-stroke code can set up the Chinese character index typewriting method of the whole word input of a kind of standard, import the word sound sign indicating number (as " east " or " 1870 ") that certain group phonetically similar word is represented word or represented word by the user earlier, machine can will should present with the whole phonetically similar words in the range of sound " eastern thrush Dong winter rub-a-dub radon " immediately, the user selects a needed word (as " winter " word) again for use, it is keyed in get final product.
Adopt phonetic-stroke code further can also set up a kind of Chinese character sound retrieval input system of semi-automation, the typist is as long as send the word sound (as " d ō ng ") of a word facing to machine, the sound automatic identification equipment of machine intimate can find the word sound sign indicating number of d ō ng sound automatically, thereby the same range of sound of d ō ng sound is all presented, and the typist just can select one that it is keyed in.
Make internal code with phonetic-stroke code and make the Chinese natural language input system, as previously mentioned, might simplify required hardware and software equipment greatly.
……
The pronunciation and form compatible dual-purpose permuted code that this patent is introduced, because word tone code number only has 2000, all hold 6763 I and II Chinese characters and some difficulty of other graphical symbol, the author tries to have compiled a version, only can hold lower about 6500 Chinese characters and some punctuates commonly used, figure and symbol and capital and small letter English and Latin letter, therefore during the layout code book, how selecting Chinese character and enrolling more Chinese character also needs the coding work person further goes to grope.

Claims (2)

1, a pronunciation and form compatible Chinese coding scheme of dual-purpose information-exchange code (1) is characterized in that:
A, this pronunciation and form compatible Chinese coding scheme of dual-purpose information-exchange code (1) can be worked out out a kind of form class like GB but can represent the pronunciation and form compatible Chinese dual-purpose information-exchange code (2) of two information of Chinese-character sound-shape simultaneously;
B, pronunciation and form compatible Chinese dual-purpose information-exchange code (2) can be split as compatible each other graphemic code a 4a 3a 2a 1 and word sound sign indicating number A 4A 3A 2A 1, they all are the codes of 4 tens digit forms, span and mutual relationship are: a1=0~9, a 4a 3a 2=000~999; A1=0 or 6, A4A3A2=a4a3a2;
The binary mode graphemic code of c, pronunciation and form compatible Chinese dual-purpose information-exchange code (2) and word sound sign indicating number and the mutual relationship between them are:
Figure 891083456_IMG1
A1 among d, decimal system form graphemic code a 4a 3a 2a 1 and the decimal system form word sound sign indicating number A 4A 3A 2A 1 and the value of A1, equal respectively preceding 4 bit codes of corresponding binary code second byte by weight " 6-4-2-1 " binary-coded decimal code (BCD code) carry out conversion transformed value (transformation relation of " 6-4-2-1 " binary-coded decimal code (BCD code) binary coded decimal is:
0000~0,0001~1,0010~2,0011~3,0100~4,0101~5,1000~6,1001~7,1010~8,1011~9) and since binary mode word sound sign indicating number the code value perseverance of first three bit code be 0, so A1 only can value 0 and 6; The value of a 4a 3a 2 and A4A3A2 equals after corresponding binary code first byte and second byte three bit codes totally ten pairing decimal numbers of binary code altogether respectively in graphemic code and the word sound sign indicating number;
E, the total code bit number of graphemic code are 10000, and the total code bit number of word sound sign indicating number is 2000;
F, word sound sign indicating number A4A3A2A1 are divided into A1=0 and A1=6 two big classes, and each class respectively accounts for 1000 sign indicating number positions, and the class of A1=0 is represented the range of sound, a Datong District for every yard; The class of A1=6 is represented a little same range of sound for every yard, and each range of sound, Datong District can hold 6 phonetically similar words; Each is little can hold 4 phonetically similar words with the range of sound;
G, same all identical with a 4a 3a2 in their graphemic code of the Chinese character in the range of sound, the value of a1 to the range of sound, Datong District, fades to 5 by 0; To the little same range of sound, fade to 9 by 6, by phonetically similar word in this district ordering and become;
H, samely all have identical word sound sign indicating number with the Chinese character in the range of sound, the A4A3A2 in the word sound sign indicating number equals a 4a 3a 2 in this district's graphemic code, the A1 in the word sound sign indicating number, and to the range of sound, Datong District, perseverance is 0; To the little same range of sound, perseverance is 6;
I, to derive the rule of decimal system form word sound sign indicating number by decimal system form graphemic code as follows: A4=a4, and A3=a3, A2=a2, the conversion rule of A1 is: when a1=0~5, A1=0; When a1=6~9, A1=6, this pronunciation that shows first word of available this district of same pronunciation with all phonetically similar words in the range of sound (representative word) is represented, and their word sound sign indicating number is equal to this word sound sign indicating number (also equaling the graphemic code of this word) of representing word;
J, the rule that derives binary mode word sound sign indicating number by the binary mode graphemic code are, the code value of the most preceding 3 bit code b 3b 2b 1 in graphemic code second byte is all become the 0 word sound sign indicating number that can obtain binary mode, hence one can see that, the binary mode of word sound sign indicating number is that a length is 11 binary code (because the code value perseverance of front 3 bit codes is 0, so they can be removed) in fact;
2, pronunciation and form compatible Chinese coding scheme of dual-purpose information-exchange code according to claim 1 (1), it is characterized in that utilizing pronunciation and form compatible Chinese dual-purpose information-exchange code code table (table one) and code book (table two) come to several thousand Chinese characters encode and according to the lexicographic order of Chinese phonetic alphabet syllable to carry out layout with range of sound mode, during layout, the unisonance number of words is arranged in same big or little with in the range of sound respectively when being less than 6 or 4, the unisonance number of words is arranged in during more than 6 adjacent several with in the range of sound.
CN 89108345 1989-11-02 1989-11-02 Pronunciation and form compatible chinese coding scheme of dual-purpose information-exchange code Withdrawn CN1043015A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 89108345 CN1043015A (en) 1989-11-02 1989-11-02 Pronunciation and form compatible chinese coding scheme of dual-purpose information-exchange code

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 89108345 CN1043015A (en) 1989-11-02 1989-11-02 Pronunciation and form compatible chinese coding scheme of dual-purpose information-exchange code

Publications (1)

Publication Number Publication Date
CN1043015A true CN1043015A (en) 1990-06-13

Family

ID=4857542

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 89108345 Withdrawn CN1043015A (en) 1989-11-02 1989-11-02 Pronunciation and form compatible chinese coding scheme of dual-purpose information-exchange code

Country Status (1)

Country Link
CN (1) CN1043015A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100410852C (en) * 2002-12-27 2008-08-13 佳能株式会社 Character processing method, device and storage medium
CN100433044C (en) * 2004-04-22 2008-11-12 微软公司 Coded pattern for an optical device and a prepared surface

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100410852C (en) * 2002-12-27 2008-08-13 佳能株式会社 Character processing method, device and storage medium
CN100433044C (en) * 2004-04-22 2008-11-12 微软公司 Coded pattern for an optical device and a prepared surface

Similar Documents

Publication Publication Date Title
CN1530815B (en) Device and method for simplified keyboard input
CN1129838C (en) Free-type Chinese-character enter method using keypad and its device
CN1043015A (en) Pronunciation and form compatible chinese coding scheme of dual-purpose information-exchange code
CN1215397C (en) Chinese-character phonetic letter input method with keypad
CN1854997A (en) Numbers and alphabets inputting method
CN1182694C (en) Voice command system for automatic dialing
CN1202647A (en) Phonetic Chinese characters
CN100464286C (en) 24 code cell-phone Chinese-character inputting method for initial consonant, sonictag and syllable
CN1173254C (en) Simple vertical-horizontal code input method and its keyboard
CN1121601A (en) Chinese characters input method for telephone key
CN1121007C (en) Chinese-character five tones-digital code input method and keyboard
CN1207648C (en) '5-3 code' and its keyboard
CN1702606A (en) Eight stroke input method
CN114185440A (en) Chinese character datamation input and output method
CN1107238A (en) Alphabetic writing Chinese character coding method
CN86107214A (en) A kind of Chinese word input method and keyboard thereof
CN1325050A (en) Root-shaped code Chinese character input method for computer
CN100511111C (en) Dicode combined input method
CN1204803A (en) Input Chinese character by voice or Chinese character information method and product manufacture by it
CN1252555A (en) Three-Three phonetic code and Three-Three digital code
CN1061859A (en) Chinese sound preface Pinyin coding (title sound preface sign indicating number, Tang's sign indicating number again) system
CN1614539A (en) Initial consonant and vowel inputting method
CN85102847A (en) The input of computer Chinese-character dynamic coding
CN1204798A (en) Chinese character intelligence method for computer digital input and product manufactured according to the method
CN102637077A (en) Phonological, calligraphic and tone hybrid coding method for inputting Chinese characters to computer

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
WW01 Invention patent application withdrawn after publication