CN101923399A - Encoding method of computer Chinese character encoded texts capable of being used as input codes and internal codes - Google Patents

Encoding method of computer Chinese character encoded texts capable of being used as input codes and internal codes Download PDF

Info

Publication number
CN101923399A
CN101923399A CN 201010193813 CN201010193813A CN101923399A CN 101923399 A CN101923399 A CN 101923399A CN 201010193813 CN201010193813 CN 201010193813 CN 201010193813 A CN201010193813 A CN 201010193813A CN 101923399 A CN101923399 A CN 101923399A
Authority
CN
China
Prior art keywords
chinese character
computer
chinese
literal
codes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201010193813
Other languages
Chinese (zh)
Inventor
范显镔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN 201010193813 priority Critical patent/CN101923399A/en
Publication of CN101923399A publication Critical patent/CN101923399A/en
Pending legal-status Critical Current

Links

Abstract

The invention relates to an encoding method of computer Chinese character encoded texts capable of being used as input codes and internal codes, and the encoding method comprises the following steps: words represented by Chinese characters are encoded to English letter symbol strings that consist of pinyin ideographic codes, rear light codes and external name codes, the pinyin ideographic codes corresponding to each Chinese character comprise 1-4 English letters, wherein the first letter represents sound, the second letter represents rhyme, the third letter represents tone and meaning, and the fourth letter is a digital; the rear light codes corresponding to a Chinese character with an unstressed sound part are represented by one or two letters; the external name codes are spelt by internationally used Latin letters; and the symbol strings of the computer Chinese character encoded texts can be directly input to a computer through English keys on a computer keyboard, and when being output from the computer, the symbol strings can be restored to Chinese characters by decoding to be output. The Chinese character encoded texts can be used as not only the input codes but also the internal codes, are easily input to the computer, has no repeated codes and can avoid inter-conversion of different internal codes and messy codes generated when an E-mail is transmitted, the number of the Chinese characters in a source character set can be expanded without limitation, and the source character set has the characteristics of precision, readability, high efficiency in representing Chinese, and the like.

Description

Can be used as the coding method of the computer Chinese character literal of input code and ISN
[technical field]
The present invention relates to the coding method of Comnputer Chinese character, particularly relate to a kind of coding method that can be used as the computer Chinese character literal of input code and ISN.
[background technology]
At present, along with popularizing of computer, the operation of computer and application have become the part in people's life, and at computation with in using, must import Chinese character by input method of Chinese character.For this reason, people develop multiple input method of Chinese character, and input method of Chinese character then is based on the method for Chinese character coding.Encode Chinese characters for computer comprises input code and ISN, and wherein ISN is the GB that places computer memory, i.e. GB 2312 (simplified Chinese character collection China national standard), and input code then is input coding for Chinese character, the input code difference of various input methods.
Study various input method of Chinese character as seen, a ubiquitous problem is that they all can't solve coincident code problem.With " the Five-stroke Method " input method of Chinese character is example, if we will import Chinese character " ancestral ", its region-position code is 5570, GB is 5766H (alphabetical H represents it is 16 system numbers), five-stroke character input method stipulates that the input code of this word is PYEG, and this means must be by the order keystroke of PYEG 4 times, and specific loading routine is 5766H+8080H=D7E6 (16 system numbers according to the internal code that input code PYEG finds " ancestral ", D is 13, and E is 14).On computer screen, show " ancestral " word then.Why not directly input code PYEG is sent into internal memory and represents Chinese character " ancestral "? PYEG accounts for 4 bytes, occupies than ISN and Duos one times, and this is one of reason.But chief reason is the hundreds of kind encode Chinese characters for computer of using in China at present, and repeated code is all arranged inevitably.Input code as the Five-stroke Method regulation " win is won, and Luo is thin " all is YNKY.When need are imported one of them, loading routine just all shows these several words, require word selection, promptly determine a word, then different internal code (being respectively D3AE, D9F8, D9F9, D9FA) is sent into internal memory then according to selecting with click or numeral of keystroke input.Otherwise, when decoding, read YNKY, in the output " win is won, and Luo is thin " which just can't be determined.So five-stroke character input method is transformed into the GBK sign indicating number with input code, be because of can't accomplishing not have repeated code, must not oneself and taked a kind of method of sacrificing efficient.Other input methods also are generally so.
Full-blown in the world language all is made of speech, and Chinese is no exception.For the ease of people or the computing machine analysis and understanding to spoken and written languages, literal should be unit representation with the speech.That is to say in the speech each syllable write the two or more syllables of a word together together, and separate by space or punctuation mark between speech and the speech, this is word link writing.Make the word link writing not a duck soup of Chinese ISN, this all is not isometric because of efficient encode Chinese characters for computer.Not isometric coding write the two or more syllables of a word together regular meeting together cause obscuring of boundary between the syllable.For example:
Peace → an, west → xi, elder generation → xian
The coding of three Chinese characters there is no repeated code, but with source file " xian " when being decoded into Chinese character, it can be translated into " Xi'an ", also can be translated into " elder generation ", and obscuring of syllable-boundary promptly taken place.And for example:
Peace → an, sense → gan, sight → guan, wide → guang
The coding of four Chinese characters does not have repeated code yet, but with source file " guangan " when being decoded into Chinese character, it can be translated into " Guangan ", also can be translated into " impression ", and obscuring of syllable-boundary promptly taken place, thereby does not have unique decodability.
Existing " Scheme for the Chinese Phonetic Alphabet " do not address this problem.It proposes to add single quotation marks in case of necessity and does syllable-dividing mark.The various encodes Chinese characters for computer that last century, the seventies was released later on all do not address this problem yet.They or adopt equal-length code (at this moment not having syllable-boundary to obscure certainly), perhaps add a space, thereby lost code efficiency significantly in each encode Chinese characters for computer back.
A large amount of English-Chinese intertranslation statistics are shown a Chinese character is translated into English and spent 3.7 letters approximately.Therefore we can think, if alphabetical its practicality of formula literal of expression Chinese, legibility and English are quite, mean code length is less than 3.7, and then its efficient surpasses English.The literal efficient of expression Chinese surpasses English in computing machine, very helps us and catches up with world technology culture advanced level.Yet, accomplish that mean code length is quite difficult less than 3.7.According to statistics, the mean code length of the Scheme for the Chinese Phonetic Alphabet existing 3.1.In order to transfer the way that does not adopt the number of marking on letter again for each syllable mark, to suppose is to transfer with suffix 1,2,3,4 marks, and then mean code length is increased to 4.1.This fashion is not distinguished phonetically similar word.The Chinese character of the unisonance people having the same aspiration and interest of some syllable in the Xinhua dictionary just has tens.Need to increase by one again to two digits in order to distinguish phonetically similar word.At this moment mean code length just reaches more than 5.1.Above these data be enough to the explanation make mean code length less than 3.7 not a duck soups.
In addition, in the E-mail communication business, the conversion process of counting ISN (abbreviation ISN) by multibyte 2 systems is the visible defective of the existing method of Chinese character coding also.
If mail is a Chinese character, outbox person manually imports input code, and loading routine is transformed into ISN with it.The characteristics of ISN are to account for two or three bytes (1 byte equals 82 carries), and the most significant digit of each byte is 1.Can not be when Network Transmission because these high positions are data of 1 by some gateway that only allows character to pass through, so will carry out the base64 conversion before transmission, length increases by 1/3 times after the conversion, forms transmission code, just can be sent to network.The take over party gets the mail from network after the data, will carry out anti-base64 conversion earlier, and transmission code is become ISN, by Chinese character converse routine in the operating system ISN is transformed into Chinese character output again.This process is schematically as follows:
Chinese character → input code → ISN → transmission code → network → transmission code → ISN → Chinese character.
And English mail transfer process is:
English → network → English
This shows that the Hanzi internal code transmission link is a lot, therefore lost efficient.More, cause also will carrying out the conversion of ISN, the efficient of Chinese electronic mail communication can't be compared with English to ISN because the employed ISN of communicating pair may be also different.
Have, Chinese character does not have legibility again." Scheme for the Chinese Phonetic Alphabet " of not marking tone do not have legibility yet, because in Chinese speech pronunciation, tone is crucial voice messaging.The overall word of Chinese character does not have fixed number, and the source character set of any specific implementation all is a subclass.Existing many codings when the source character set number of words is limited in 4000 or 5000, can accomplish not have repeated code, are charged to 8,000, when up to ten thousand, just can't accomplish not have repeated code but draw together when the source character set number of words, and all be limited space encoder therefore.
Problems such as mentioned above, the existing method of Chinese character coding of China exists input code and ISN is inconsistent, all has repeated code, and code efficiency is low, it causes the doing what he thinks is right of encode Chinese characters for computer of China to reach confusing state.
[summary of the invention]
The present invention is intended to address the above problem, and provide a kind of input code of both can having done, can make ISN again, import computing machine easily and do not have repeated code, mess code appears in the time of can avoiding the mutual conversion of different ISNs and transmission of e-mail, the Chinese character number of source character set can infinitely expand, and has precision, readability, efficiently represents the coding method of the computer Chinese character literal that can be used as input code and ISN of characteristics such as Chinese.
For achieving the above object, the invention provides a kind of coding method that can be used as the computer Chinese character literal of input code and ISN, this coding method is that Chinese word coding that Chinese character is represented becomes by the phonetic English alphabet symbol string that sign indicating number, the light sign indicating number in back and outer name sign indicating number form of expressing the meaning, the pairing phonetic of each Chinese character sign indicating number of expressing the meaning comprises 1~4 English alphabet, wherein, first letter representation sound, second letter is represented rhythm, the 3rd letter representation mediation justice, the 4th letter is numeral; Light one or two letter representation of sign indicating number in the pairing back of part Chinese character with light pronunciation; Outer name sign indicating number is by international Latin alphabet spelling; The symbol string of computer Chinese character literal can then be reduced to Chinese character output by decoding during computer output directly by the English key input computer of computor-keyboard.
The phonetic sign indicating number of expressing the meaning comprises source word symbol and purpose character string, described source word symbol is Chinese character, the purpose character string comprises 1~4 English alphabet, the phonetic of the Chinese characters in current use sign indicating number of expressing the meaning forms the phonetic code data storehouse of expressing the meaning, database is by the ordering of the lexicographic order of purpose character string, and phonetic is expressed the meaning Chinese character in the code data storehouse by its GBK representation.
Phonetic express the meaning the sign indicating number first letter in, " know, Chi, poem " of expression sound represented with v, w, y respectively, expression is in harmonious proportion in the 3rd the adopted letter, with a, b, c, d, six letter representations of e, f the 1st, with g, h, i, j, six letter representations of k, l the 2nd, six letter representations such as m, n, o, p, q, r the 3rd are with s, t, u, v, w, x, eight letter representations of y, z the 4th.The 3rd letter of expression mediation justice also may be numeral, 0,1,5 expressions at this moment the 1st, and 2,6 expressions the 2nd, 3,7 represent the 3rd, 4,8,9 expressions the 4th.
Have the express the meaning purpose character string of sign indicating number of the light sign indicating number in the pairing back of the part Chinese character of light pronunciation and phonetic and form gently code data storehouse, back.
Outer name sign indicating number comprises the international Latin alphabet of source word symbol and outer name, and described source word symbol forms outer name database for the Chinese translated name of outer name, the Chinese translated name of general outer name and the corresponding with it international Latin alphabet.
The interpretation method of computer Chinese character literal comprises:
A, be source file with the language coding literal that will decipher, Chinese character or Chinese character sequence that decoding obtains are the purpose file;
B, the English alphabet symbol string of from source file, reading the encode Chinese characters for computer literal of importing by computor-keyboard, search for outer name database earlier, see if there is the purpose character string that meets fully, if have, just export the Chinese character that is complementary with the encode Chinese characters for computer literal, if whether no, then search this English alphabet symbol string in light code data storehouse, back is the light speech in back;
C is if the light speech in back, then export the Chinese character that is complementary with the encode Chinese characters for computer literal, speech for the language coding literal that all can not find out in the light code data storehouse of name database outside and back, then read a syllable successively at every turn, in phonetic is expressed the meaning the code data storehouse, find corresponding Chinese character and output, finished the decoding of a computer Chinese character literal;
D, repeating step a~c are till source file is all deciphered.
Contribution of the present invention is, it efficiently solves problems such as input code and ISN that the existing method of Chinese character coding exists are inconsistent, and all there is repeated code in various input methods, and code efficiency is low.Compared with prior art, the present invention has following characteristics:
One, input code is consistent with ISN, and ISN can be read by phonetic
The present invention has realized having yearned for since the own computing machine of Chinese technos and the target that do not reach always.Though at present input code is a lot, does not all accomplish not have repeated code, can only be transformed into can not read, ISN that inconvenience is handled.Language coding literal of the present invention is an input code, is again ISN, can read by phonetic, has realized people's hope for many years.
Two, practicality
Because encode Chinese characters for computer literal of the present invention is made up of ascii character, it can pass through the gateway that every English can pass through, therefore the unnecessary transmission code that is converted into.With same simple effective of the transmission of English Email.
Three, accuracy
Encode Chinese characters for computer literal of the present invention is all more accurate than other any codings.It can not only distinguish the unisonance allograph in the Chinese character, can also distinguish the heterograph word, as " long (chang2) → wgj ", " long (zhang 3) → vgn ", " row (hang2) → hgk ", " row (xi ng2) → xyh " etc. illustrates that its statement Chinese is more more accurate than Chinese character.Because the ultima that it can distinguish polysyllabic word is stressed or gently reads, and illustrates that also it is more accurate than Chinese character.In addition, the simplified Chinese character and the complex form of Chinese characters can both be represented, as feel → jcj , Awareness → jcj9, also are the high performances of its accuracy.
Four, expandability
Language coding literal of the present invention has unlimited space encoder, the expansion Chinese character source character set that it can be unlimited.Word surplus Chinese character source character quantity has reached 13550 at present.
Five, word link writing
But write the two or more syllables of a word together in the middle of the word of language coding literal of the present invention and the word can not make a mistake when translating into Chinese character.Because language coding literal of the present invention is an equal-length code not, some is 1 or 2 for a code length, and most code lengths are 3 and 4.But coded text has very particular structure, and making code length is that 3 or 4 coding write the two or more syllables of a word together have very clearly demarcated boundary together the time, can not produce boundary and obscure.Code length is that 1 or 2 coding (sign indicating number of gently reading) is exactly suffix usually, after connect space or punctuation mark.The light pronunciation joint of only a few is not at suffix, and at this moment the back underlines.In other words, the language coding literal has solved the problem that the syllable-boundary in the equal-length code is not obscured with very effective way.
Six, mean code length is less than 3
Language coding literal of the present invention adopts multiple way to reduce mean code length under the situation of the practicality that guarantees literal, accuracy, readability.For example use a letter representation ' sound ', with a letter representation ' rhythm ', with a letter representation ' accent ' and ' justice ', sometimes distinguish phonetically similar word with the way of additional character, with 1 to 2 light pronunciation joint of letter representation, set up the brief spelling and the coding/decoding method of ' unstressed word ' notion and invention unstressed word, or the like.The resultant effect of multinomial innovative measure is reduced to 2.8 with mean code length exactly.That is to say that it is high by 24% that the efficiency ratio English letter of coded text is wanted.
Seven, solve the bright braille word of Chinese intercommunication problem
Chinese bright braille word intercommunication problem is a universally acknowledged difficult problem.If in the extraordinary education of blind person with the language coding literal as Chinese braille, then can effectively solve this difficult problem.
[embodiment]
The following example is to further explanation of the present invention and explanation, and the present invention is not constituted any limitation.
The coding method that can be used as the computer Chinese character literal of input code and ISN of the present invention is that the Chinese word coding that Chinese character is represented is become by the phonetic English alphabet symbol string that sign indicating number, the light sign indicating number in back and outer name sign indicating number form of expressing the meaning, this English alphabet symbol string can directly be imported computer by the English key of computor-keyboard, and can be used as the ISN of computer Chinese character.
The described phonetic sign indicating number of expressing the meaning comprises source word symbol and purpose character string, wherein, the source word symbol is Chinese character, the purpose character string comprises 1~4 English alphabet, the phonetic of the Chinese characters in current use sign indicating number of expressing the meaning forms the phonetic code data storehouse of expressing the meaning, database is by the ordering of the lexicographic order of purpose character string, and phonetic is expressed the meaning Chinese character in the code data storehouse by its GBK representation.In the represented purpose character string of 1~4 English alphabet, first letter representation sound, second letter is represented rhythm, and the 3rd letter representation is in harmonious proportion justice, and the 4th letter be digital.More particularly, phonetic express the meaning the sign indicating number first letter in, " know, Chi, poem " of expression sound represented with v, w, y respectively, expression is in harmonious proportion in the 3rd the adopted letter, with a, b, c, d, six letter representations of e, f the 1st, with g, h, i, j, six letter representations of k, l the 2nd, six letter representations such as m, n, o, p, q, r the 3rd, with s, t, u, v, w, x, eight letter representations of y, z the 4th.
For further specifying the express the meaning formation of sign indicating number of phonetic, in the present embodiment, the phonetic sign indicating number of expressing the meaning can be represented by a transform, the corresponding transform of each Chinese character, as:
Ancestral → zun state → goi → d literary composition → ung word → zis, wherein, the Chinese character on the arrow left side is the source word symbol, arrow the right is the purpose character string.Some Chinese character may also have the 4th character, can only be numeral, does not influence pronunciation.
The transform of more than 13550 Chinese character is collected in phonetic and expresses the meaning in the code data storehouse, and by the lexicographic order ordering of purpose character string, table 1 shows the express the meaning transform of sign indicating number of part phonetic.
Table 1
Encode Chinese characters for computer
Aa
Breathe out aa0
A word used for translation aa5
Ah aaa
Aae
Aaf salts down
Aai
Sha aak
Aao
Aas
Dirty agf
Dirty agf9
High agh
Ang agi
Big belly agt
Ang agx
Hey ai0
Einsteinium ai1
Ai ai4
Ai ai49
Astatine ai8
Ai ai9
Xi aia
Sound of sighing aib
Aic
…… ……
As table 1, there are two fields in the phonetic code data storehouse of expressing the meaning, and one is source word symbol Chinese character (transform left part), and one is purpose character string (transform right part).Described " Chinese character " is actually its GBK sign indicating number.In using the computer system of language coding literal of the present invention, Chinese character GBK sign indicating number only appears at phonetic and expresses the meaning in code data storehouse and the following two kinds of database tables, and in other place, Chinese character has all been substituted by its purpose character string.In the computing machine of using language coding literal of the present invention, what describe Chinese is the language coding literal.
The light sign indicating number in described back is the coding with part Chinese character of light pronunciation, and it is by one or two letter representation.The light speech in considerable back is arranged in the Chinese, and its ultima is gently read (not transferring), as:
Knowledge → vicy grain → lhiy strength → livlh
Light speech after having collected 1541 in the coding method of encode Chinese characters for computer literal of the present invention, by they and phonetic express the meaning the purpose character string of sign indicating number forms one after light code data storehouse, gently code conversion formula such as table 2 after the part.
Table 2
Coding read again in back light sign indicating number Chinese character
Aafza Yan Za
The agfzg dirt
Lover aisrn
The anlwk quail
Aojmo endures mill
Aoxzl is regretful poor
The b8age starling
The babda babdaa that smacks one's lips
The babgi babgib that smacks one's lips
Bacjr fawns on
The bacvg palm
The bacla crust draws
The bafla scar
Baolm monopolizes
The baoy wushu
The baoyb handle
The overbearing bazdlv of bazdl
Bdchy breaks off with the fingers and thumb and draws
Bditx daytime
Bdizy is fair and clear
Bdpbu controls
The bdpht pendulum is drawn
The bdpvi pendulum is controlled
Bdpye furnishes bdpyev
…… ……
Described outer name sign indicating number is the coding that is used for foreign name of the country, place name, name, and it adopts international Latin alphabet spelling.Following rank transformation formula:
Canada → Canada Belgrade → Belgrade Mary Ya → Maria.
Wherein, the Chinese character on the arrow left side (source word symbol) is an outer general Chinese translated name, and arrow the right is the international Latin alphabet of outer name.The Chinese translated name of general outer name and the corresponding with it international Latin alphabet are formed outer name database, part outer name transform such as table 3.
Table 3
Outer name Chinese character name
The Agana Agana
Alaska Alaska
Albania Albania
The Alexander Alexandria
Alger Algiers
Algeria Algeria
The Alofi Alofi
The Amman Amman
Amsterdam Amsterdam
Andorra Andorra
Angola Angola
Ankara Ankara
The Apia Apia
Arabian Arab
Babylon Babylon
…… ……
The symbol string of computer Chinese character literal can directly be imported computer by the English key of computor-keyboard, and forms three above-mentioned databases, and they are input code, it is again ISN, both are in full accord, and they are stored in the internal memory of computer, then are reduced to Chinese character output by decoding during computer output.
The language coding literal coded by coding method of the present invention is stored in the computer with English alphabet symbol string form, is used to represent Chinese written language, can be handled by computer easily, also can read by phonetic.Described language coding literal can be reduced to Chinese character output by computing machine decoding.
The interpretation method of computer Chinese character literal comprises:
A, be source file with the language coding literal that will decipher, Chinese character or Chinese character sequence that decoding obtains are the purpose file;
B, when from source file, reading the English alphabet symbol string by the encode Chinese characters for computer literal of computor-keyboard input, search for outer name database earlier, see if there is the purpose character string that meets fully, if have, just export the Chinese character that is complementary with the encode Chinese characters for computer literal, if whether no, then search this English alphabet symbol string in light code data storehouse, back is the light speech in back;
C is if the light speech in back, then export the Chinese character that is complementary with the encode Chinese characters for computer literal, speech for the language coding literal that all can not find out in the light code data storehouse of name database outside and back, then read a syllable successively at every turn, in phonetic is expressed the meaning the code data storehouse, find corresponding Chinese character and output, finished the decoding of a computer Chinese character literal;
D, repeating step a~c are till source file is all deciphered.
In principle, import language coding literal of the present invention as of the help of input English without any need for software.But at the initial stage of using, familiar not enough to the language coding literal when us, when not remembering the purpose character string of each Chinese word, the character reminding Input Software then is necessary.This software shows a reminding window on the screen left side, the content of reminding window be the keystroke according to the operator demonstrate the Chinese character that will import at any time transform for your guidance.For example:
Import " ancestral " word, behind the letter z of key entry sound position, just demonstrating a string in the reminding window is the transform of the word of initial consonant with z.After keying in rhythm position letter u, just demonstrate the transform of the phonetically similar word of all " ancestral " words in the reminding window.If the operator remember " ancestral " word transform for " ancestral → zun ", can what be regardless of the content of reminding window just, and directly key in three letters, even if the input of this word finishes.If forget what the letter of accent, justice position is, just can remove the transform of searching " ancestral " word in the reminding window, then according to the 3rd letter of its thump.Letter of every input just shows this letter on the screen.The purpose character string of the intact Chinese character of every input thereunder just shows this Chinese character.Coded text and Chinese character contrast show below:
zungoi ustsvx!vicy?jus?yis?livlh.
A motherland long live! Knowledge is power.
Send into the just top delegation language coding literal of computer memory.Following Chinese character is the product after the coded text decoding, plays checking input correctness.

Claims (6)

1. coding method that can be used as the computer Chinese character literal of input code and ISN, it is characterized in that, this coding method is that Chinese word coding that Chinese character is represented becomes by the phonetic English alphabet symbol string that sign indicating number, the light sign indicating number in back and outer name sign indicating number form of expressing the meaning, the pairing phonetic of each Chinese character sign indicating number of expressing the meaning comprises 1~4 English alphabet, wherein, first letter representation sound, second letter is represented rhythm, the 3rd letter representation mediation justice, the 4th letter is numeral; Light one or two letter representation of sign indicating number in the pairing back of part Chinese character with light pronunciation; Outer name sign indicating number is by international Latin alphabet spelling; The symbol string of computer Chinese character literal can then be reduced to Chinese character output by decoding during computer output directly by the English key input computer of computor-keyboard.
2. the coding method that can be used as the computer Chinese character literal of input code and ISN as claimed in claim 1, it is characterized in that, the described phonetic sign indicating number of expressing the meaning comprises source word symbol and purpose character string, described source word symbol is Chinese character, the purpose character string comprises 1~4 English alphabet, the phonetic of the Chinese characters in current use sign indicating number of expressing the meaning forms the phonetic code data storehouse of expressing the meaning, and database is by the lexicographic order ordering of purpose character string, and phonetic is expressed the meaning Chinese character in the code data storehouse by its GBK representation.
3. the coding method that can be used as the computer Chinese character literal of input code and ISN as claimed in claim 2, it is characterized in that, phonetic express the meaning the sign indicating number first letter in, " know, Chi, poem " of expression sound represented with v, w, y respectively, expression is in harmonious proportion in the 3rd the adopted letter, with a, b, c, d, six letter representations of e, f the 1st, with g, h, i, j, six letter representations of k, l the 2nd, six letter representations such as m, n, o, p, q, r the 3rd are with s, t, u, v, w, x, eight letter representations of y, z the 4th.
4. the coding method that can be used as the computer Chinese character literal of input code and ISN as claimed in claim 2 is characterized in that, has the express the meaning purpose character string of sign indicating number of the light sign indicating number in the pairing back of the part Chinese character of light pronunciation and phonetic and forms gently code data storehouse, back.
5. the coding method that can be used as the computer Chinese character literal of input code and ISN as claimed in claim 1, it is characterized in that, described outer name sign indicating number comprises the international Latin alphabet of source word symbol and outer name, described source word symbol forms outer name database for the Chinese translated name of outer name, the Chinese translated name of general outer name and the corresponding with it international Latin alphabet.
6. the coding method that can be used as the computer Chinese character literal of input code and ISN as claimed in claim 1 is characterized in that the interpretation method of computer Chinese character literal comprises:
A, be source file with the language coding literal that will decipher, Chinese character or Chinese character sequence that decoding obtains are the purpose file;
B, the English alphabet symbol string of from source file, reading the encode Chinese characters for computer literal of importing by computor-keyboard, search for outer name database earlier, see if there is the purpose character string that meets fully, if have, just export the Chinese character that is complementary with the encode Chinese characters for computer literal, if whether no, then search this English alphabet symbol string in light code data storehouse, back is the light speech in back;
C is if the light speech in back, then export the Chinese character that is complementary with the encode Chinese characters for computer literal, speech for the language coding literal that all can not find out in the light code data storehouse of name database outside and back, then read a syllable successively at every turn, in phonetic is expressed the meaning the code data storehouse, find corresponding Chinese character and output, finished the decoding of a computer Chinese character literal;
D, repeating step a~c are till source file is all deciphered.
CN 201010193813 2010-06-07 2010-06-07 Encoding method of computer Chinese character encoded texts capable of being used as input codes and internal codes Pending CN101923399A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010193813 CN101923399A (en) 2010-06-07 2010-06-07 Encoding method of computer Chinese character encoded texts capable of being used as input codes and internal codes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010193813 CN101923399A (en) 2010-06-07 2010-06-07 Encoding method of computer Chinese character encoded texts capable of being used as input codes and internal codes

Publications (1)

Publication Number Publication Date
CN101923399A true CN101923399A (en) 2010-12-22

Family

ID=43338371

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010193813 Pending CN101923399A (en) 2010-06-07 2010-06-07 Encoding method of computer Chinese character encoded texts capable of being used as input codes and internal codes

Country Status (1)

Country Link
CN (1) CN101923399A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103455808A (en) * 2013-08-22 2013-12-18 黑龙江大学 Sending device of machine character reading code and coding method
CN105955501A (en) * 2016-04-15 2016-09-21 北京理工大学 Chinese character information flow visualization method based on man-machine interaction
WO2018228101A1 (en) * 2017-06-14 2018-12-20 佛山辞荟源信息科技有限公司 Chinese meaning based chinese encoding method and system, and medium device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1043576A (en) * 1988-12-22 1990-07-04 深圳软件技术有限公司应用软件开发部 Input and output method for chinese spelling code
GB2256953A (en) * 1991-06-20 1992-12-23 Henry Gao Computer system chinese input method and the related keyboard structure
CN1224866A (en) * 1998-01-24 1999-08-04 吴鸿春 Pictophonetic code Chinese character input method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1043576A (en) * 1988-12-22 1990-07-04 深圳软件技术有限公司应用软件开发部 Input and output method for chinese spelling code
GB2256953A (en) * 1991-06-20 1992-12-23 Henry Gao Computer system chinese input method and the related keyboard structure
CN1224866A (en) * 1998-01-24 1999-08-04 吴鸿春 Pictophonetic code Chinese character input method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103455808A (en) * 2013-08-22 2013-12-18 黑龙江大学 Sending device of machine character reading code and coding method
CN103455808B (en) * 2013-08-22 2017-03-29 黑龙江大学 The dispensing device of machine character read code and coded method
CN105955501A (en) * 2016-04-15 2016-09-21 北京理工大学 Chinese character information flow visualization method based on man-machine interaction
CN105955501B (en) * 2016-04-15 2018-10-12 北京理工大学 Chinese character information stream method for visualizing based on human-computer interaction
WO2018228101A1 (en) * 2017-06-14 2018-12-20 佛山辞荟源信息科技有限公司 Chinese meaning based chinese encoding method and system, and medium device
CN109086257A (en) * 2017-06-14 2018-12-25 佛山辞荟源信息科技有限公司 Language coding processing method and system based on Chinese meaning

Similar Documents

Publication Publication Date Title
KR101435265B1 (en) Method for disambiguating multiple readings in language conversion
US5903861A (en) Method for specifically converting non-phonetic characters representing vocabulary in languages into surrogate words for inputting into a computer
CN102298582A (en) Data searching and matching method and system
JP5502814B2 (en) Method and system for assigning diacritical marks to Arabic text
CN104008123A (en) Native-script and cross-script Chinese name matching
CN101520693A (en) Method and system for rapidly inputting bulk information
CN101923399A (en) Encoding method of computer Chinese character encoded texts capable of being used as input codes and internal codes
CN101882006B (en) Zero-memory simple sub-character splitting input method
CN101727195A (en) Various information input method of Chinese phonetics codes
CN110516125B (en) Method, device and equipment for identifying abnormal character string and readable storage medium
CN101706688A (en) Method for inputting Chinese numbers
RU2008128245A (en) COMPUTER IMPLEMENTED METHOD FOR CODING NUMERICAL DATA AND METHOD FOR CODING DATA STRUCTURES FOR TRANSMISSION IN A TELECOMMUNICATION SYSTEM BASED ON THE ABOVE METHOD FOR CODING NUMERICAL DATA
CN102053955B (en) Method and system for inputting symbols
CN100533359C (en) Oracle spelling and component disintegration and input method
KR100629862B1 (en) The korean transcription apparatus and method for transcribing convert a english language into a korea language
CN114757154A (en) Job generation method, device and equipment based on deep learning and storage medium
CN100458668C (en) Input method for Chinese character of first pronunciation
CN100535836C (en) Method and system for restoring candidat word order for Chinese input method
CN102004557A (en) Stroke-order voice-code Chinese character input technical scheme
CN111428509A (en) Latin letter-based Uygur language processing method and system
CN101901062B (en) Computer Chinese character information processing method based on phoneme encoding
Abudena Proposal to encode Quranic marks used in Quran published in Libya
TWI581130B (en) Pinyin display device and method
KR101080880B1 (en) Automatic loanword-to-korean transliteration method and apparatus
TWI747275B (en) Braille conversion method for electronic device and computer program product thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20101222