CN114185440A

CN114185440A - Chinese character datamation input and output method

Info

Publication number: CN114185440A
Application number: CN202111511715.XA
Authority: CN
Inventors: 史颖
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-12-06
Filing date: 2021-12-06
Publication date: 2022-03-15

Abstract

The invention discloses a Chinese character datamation input and output method. The target characters or sentences can be input on the electronic equipment by receiving and storing the input keyboard characters, dividing the characters according to the coding rules, determining codes representing all Chinese characters or terms, inquiring the coding rule database to determine the Chinese characters or terms corresponding to the codes and outputting the Chinese characters corresponding to the codes. The encoding rule is that four-digit western language capital letter combinations of rows, columns, longitudes and sequences are used for representing each Chinese character, one-to-one correspondence identifiable mapping relation is established between the Chinese characters and ASCII codes, so that complete ASCII coding of the Chinese characters is realized, operations such as sequencing and searching can be carried out on the Chinese characters, Chinese messy codes and system breakdown can not be caused when a system processes information, and the accuracy and the safety and reliability degree of the computer in processing the characters are greatly improved; the invention makes each letter combination accurately represent only one Chinese character and its pronunciation, the input code is the Chinese character machine internal code, and realizes the keyboard touch typing input and the card-coating machine-readable input of the Chinese characters.

Description

Chinese character datamation input and output method

Technical Field

The application relates to the technical field of Chinese information processing, in particular to a Chinese character input and output method.

Background

The expression mode of language characters in computers and networks thereof is very important, data processed by the computers are binary data in reality, namely, the computers can only recognize two states of 0 and 1 in reality, so that in the process of computer development, a very important problem to be solved is character processing, namely how to convert character symbols into binary data and how to endow information or data with unique binary codes.

The first electronic computer in the world was invented by Americans, which is based on Western culture, and English itself has only 26 letters, and all symbols used by Americans daily cannot exceed 100. Based on the Americans, a set of rules is formulated: american Standard Code for Information exchange, American Standard Code, i.e., ASCII Code.

The most widely used international data transmission code is currently the american standard code for ASCII. The standard ASCII code encodes each character in 7 bits, with 8 bits for each byte, and is therefore a single byte code. In ASCII code, each character occupies the lower 7 bits of a byte, the most significant bit of the byte is empty, and is typically set to 0. Since ASCII code is encoded with 7 bits, a total of 2 can be expressed⁷(128) Individual characters, including upper and lower case letters of latin characters, arabic numerals, punctuation marks, and some special characters. Although ASCII code is a national standard in the united states, since it is widely used, the international organization for standardization ISO directly adopts ASCII as an international standard, which is called ISO 646.

At present, a computer uses bytes as a minimum operable unit of information, ASCII codes are operable single-byte modes with a high position of 0, and in a computer communication technology, 7-bit ASCII is also called "security Characters" (Safe Characters), which can ensure that each byte is correctly transmitted, and the basic working principle is as follows: of the eight bits in each byte, seven carry the data and one bit is used for parity checking, if the parity checks find an error, it indicates that the received byte has a transmission error, and therefore, the byte can be re-received according to the program design until the parity checks indicate that the received byte is correct.

At present, western languages are expressed by ASCII codes in a computer, because the number of words cannot be expressed by a single byte, the Chinese characters can only be expressed by using a plurality of bytes and setting the high position of each byte as '1' (for distinguishing the ASCII codes) and binding the bytes together, so that two, three and four bytes of the Chinese characters cannot be split and misplaced, otherwise, a string of messy codes can be generated, the phenomenon that two or three automobiles on a high-speed highway are bound together to move is greatly restricted, and the information safety performance is not mentioned from the beginning. And the information core system of the global important field excludes non-ASCII code for safe and reliable operation of information, such as global financial information core system, global air traffic control system, large database core system, global network domain name resolution system, etc. This is also the reason why we often appear dead halt, messy codes, question mark black box in Chinese character text, etc. when running in computer Chinese system. Furthermore, if the global network control center sets the byte high level in the information stream to "0", all the ASCII platforms will operate normally, not the ASCII platforms will crash instantaneously.

Therefore, the Chinese character computer internal code expression mode with multiple bytes bundled together has a plurality of problems: the internal code of the non-ASCII multi-byte Chinese character machine only plays the identification role of the Chinese character, and has no operation functions such as digital sequencing retrieval and the like. The multi-byte high position "1" bundling mode can bring about huge potential safety hazards: in some computer operating systems, the most significant bits of the character code are not considered, and some operating systems employ the character code system of expanded ASCII characters, i.e., 8-bit ASCII characters, in these operating systems, if the most significant bit of a character is 1, it may be an expanded ASCII character or a chinese character with the most significant bit set to 1, in which case, if the computer cannot make a correct distinction, a "messy code" may be present, which is a headache. The existing Chinese character mode can bring operations such as continuous transcoding and transformation for Chinese computer processing, because the common Chinese character input method needs to input codes by means of Chinese characters, such as phonetic codes, five-stroke character codes and the like, the input codes are received and need to be converted into in-machine codes by an input code conversion module of a Chinese character operating system, and then can be stored and processed in various ways, and the process inevitably generates dissimilarity of the input codes and the stored codes, thereby increasing the system overhead and inevitably causing errors.

Disclosure of Invention

Based on this, it provides a Chinese character datamation input and output method to solve the problems of the present Chinese character computer internal code expression mode.

According to an aspect of an embodiment of the present invention, there is provided a chinese character datamation input method, including:

receiving and storing input keyboard characters;

dividing English characters according to a coding rule, and determining codes representing all Chinese characters or words;

inquiring a coding rule database obtained according to coding rules, determining Chinese characters or terms corresponding to the codes, and outputting Chinese characters corresponding to the codes;

the encoding rule is as follows:

each Chinese character is represented by using a western capital letter combination which is independently computable with weight bits and four bits of rows, columns, longitudes and orders, wherein:

the behavior Chinese character pinyin initial consonant is represented by 23 western capital letters, the first character is represented as a row without the initial consonant, the letter I, V, U is used for special purposes, Chinese punctuation marks or non-Chinese characters are represented in the behavior I, V, U, and the correspondence between the western capital letters and the initial consonants is as follows:

the column is the pinyin final of the Chinese character, which is represented by 26 western capital letters, and the correspondence between the western capital letters and the final is as follows:

the vertical line is the pinyin tone of the Chinese character, 26 capital letters of western language are used for representation, and the corresponding relationship between the capital letters of western language and the tone is as follows:

the negative level tone is expressed by using ABC DEF sequence, the positive level tone is expressed by using GHI JKL sequence, the upper tone is expressed by using MNO PQR sequence, the lower tone is expressed by using TUV WXYZ sequence, and the light tone is expressed by using S, wherein ABC, GHI, MNO and TUV are front tone letter groups, DEF, JKL, PQR and WXYZ are rear tone letters, and the consonants are ch, sh, zh or vowel Hu pinyin and only use the rear tone letters; for homophonic and homophonic characters, ordering and grouping according to a predetermined Chinese character use frequency sequence, stroke number and stroke sequence, wherein 26 bits are divided into one group, Chinese characters in the first 26 bits are ordered into a first group, first-bit letters corresponding to tone letters are used as the vertical, Chinese characters in 27-52 bits are ordered into a second group, second-bit letters corresponding to tone letters are used as the vertical, and by analogy, Chinese characters in each group use corresponding tone letters as the vertical of each character;

tone order code, using 26 capital western letters; after homophonic homonyms are sequenced and grouped according to the use frequency sequence, the stroke number and the stroke sequence of the Chinese characters which are determined in advance, homophonic homonyms belonging to the same group, namely homophonic homonyms with the same length are marked by the sequence of A-Z capital letters according to the sequencing result to serve as the sequence of each character, and the homophonic homonyms are represented and distinguished by combining with the length.

Furthermore, when homophonic homonymous characters are sorted and grouped according to the predetermined Chinese character use frequency sequence, stroke number and stroke sequence, a character corresponding to a traditional Chinese character is positioned at the back position of a corresponding simplified Chinese character, so that the computer can conveniently carry out operation conversion between the traditional Chinese character and the simplified Chinese character.

Further, the encoding rule further includes:

the characters of Chinese words are connected by using a "-" character, suffix characters are connected by using a "" character, children voice is represented by a "-E", and the words are divided by using a space;

in Chinese text, Chinese and Western capital letters are distinguished by a "-" character preceding a string of capital letters in the 4 th place, or less.

Preferably, the coding rule database contains the pronunciation of an unlabeled word of the Mandarin Chinese pronunciation.

Furthermore, the four-digit western capital letters with the weight bits capable of being independently calculated mean that each capital letter is an independent arithmetic unit, and the arithmetic operation can be carried out by using the ASCII code corresponding to each capital letter.

Preferably, the encoding rule further comprises:

the method comprises the steps of representing the predetermined common character codes by one-digit, two-digit and three-digit brevity codes, wherein each brevity code corresponds to a respective four-digit full code for operation processing, the one-digit brevity code is represented by a row of a Chinese character, the two-digit brevity code is represented by a row and a column of the Chinese character, and the three-digit brevity code is represented by a row, a column and a vertical three-digit of the Chinese character.

Further, the encoding rule further includes:

the two-word, three-word and four-word in the form of 'XX-XX', 'XX-XX-XX-XX' are standard words, each word in the standard words is represented by a two-bit brevity code, the standard words are of an integral representation structure and are not limited by each secondary brevity code word, and the standard words are treated as an integral word when being analyzed by a computer; non-standard words are represented using any simple combination of codes.

Furthermore, the three-digit brevity code of each Chinese character is a pronunciation code, and the pronunciation of the Chinese character can be spelled according to the row, column and longitudinal combination of the corresponding letter combination of each Chinese character; and XXXR corresponding to the three-digit brevity code is represented as a pronunciation code corresponding to the Chinese character retroflex pronunciation.

According to another aspect of the embodiments of the present invention, there is provided an output method for chinese speech datamation, the method including:

receiving voice information;

identifying syllables, tones and sentences of the voice information;

according to the syllables, tones and sentence readings of the voice information, determining corresponding Chinese character and sentence codes by calculating and inquiring a coding rule database, and then outputting the codes and/or Chinese texts with partial tones and sentence reading marks corresponding to the codes;

the coding rule database adopts the coding rules.

According to another aspect of the embodiments of the present invention, there is provided an output method of chinese character encoding voice broadcast, the method including:

receiving Chinese character codes with partial tones and sentence reading marks;

inquiring the coding rule database to determine the sounding characteristics of corresponding Chinese characters and sentences, and outputting corresponding accurate unambiguous voice information;

the coding rule database adopts the coding rules.

The invention has at least the following beneficial effects:

the input method provided by the invention can be used for receiving and storing input keyboard characters, dividing English characters according to coding rules, determining codes representing all Chinese characters or terms, inquiring a coding rule database to determine the Chinese characters or terms corresponding to the codes, and then outputting the Chinese characters or terms corresponding to the codes, so that target characters or sentences can be input on the electronic equipment; the encoding rule is that four digits of rows, columns, longitudes and sequences are combined with western capital letters with weight bits to represent each Chinese character; the invention realizes the splitting and full digital quantization of each Chinese character, each letter representing the Chinese character can independently participate in the operation of the computer, solves the problem that the internal code of the traditional Chinese character machine can not be operated, and ensures that the Chinese character does not only play the role of a typewriter in the computer any more: inputting, storing, composing and printing, but can participate in the analysis, sorting, retrieval, calculation, reasoning and control of the computer to the information; the invention establishes one-to-one corresponding recognizable mapping relation between Chinese characters and ASCII codes, realizes complete ASCII coding of the Chinese characters, can directly recognize and read, does not have serious consequences of Chinese messy codes and even system breakdown caused by double bytes or three and four bytes which can not be split and misplaced, the high position of each byte is '1' and the like when a system processes information like the internal code of the traditional Chinese character machine, and can not be forcibly excluded from an important information processing system, thereby greatly improving the precision and the safety and reliability when the computer processes the information; in addition, the input method of the invention distinguishes polyphone character, polyphone character and simple and complex body, each letter combination accurately represents only one Chinese character and the pronunciation thereof, the input code is the Chinese character machine code, the Chinese character is input without being converted into the machine code by the 'input code conversion module' of the Chinese character operating system, and can be directly stored and processed in various ways, thereby avoiding the dissimilarity between the input code and the stored code, reducing the system overhead and increasing the system stability, and simultaneously realizing the keyboard Chinese character touch typing input and the card coating machine reading input of the Chinese character.

Drawings

FIG. 1 is a diagram illustrating a Chinese character datamation input method according to an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating an output method for chinese speech digitization according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an output method for chinese character encoding voice broadcast according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The first embodiment is as follows:

in this embodiment, as shown in fig. 1, a method for inputting chinese characters by datamation is provided, the method comprising:

step S101, receiving and storing input keyboard characters, where the keyboard includes a physical keyboard or a virtual keyboard, that is, keys encoded by the input method may adopt a commonly used physical keyboard or a virtual keyboard, that is, a keyboard provided in a software program, such as a keyboard in a mobile phone or a tablet, and the stored input code characters can be used for a computer to perform operations and communications.

And S102, dividing English characters according to the coding rule, and determining codes representing all Chinese characters or words.

Step S103, inquiring a coding rule database obtained according to coding rules, determining Chinese characters or terms corresponding to the codes, and outputting Chinese characters corresponding to the codes.

The encoding rule is as follows:

the negative level tone is expressed by using ABC DEF sequence, the positive level tone is expressed by using GHI JKL sequence, the upper tone is expressed by using MNO PQR sequence, the lower tone is expressed by using TUV WXYZ sequence, and the light tone is expressed by using S, wherein ABC, GHI, MNO and TUV are front tone letter groups, DEF, JKL, PQR and WXYZ are rear tone letters, and the initials are ch, sh, zh or part of vowels are the pinyin of lu and only use the rear tone letters; for homophonic and homophonic characters, ordering and grouping according to a predetermined Chinese character use frequency sequence, stroke number and stroke sequence, wherein 26 bits are divided into one group, Chinese characters in the first 26 bits are ordered into a first group, first-bit letters corresponding to tone letters are used as the vertical, Chinese characters in 27-52 bits are ordered into a second group, second-bit letters corresponding to tone letters are used as the vertical, and by analogy, Chinese characters in each group use corresponding tone letters as the vertical of each character;

Specifically, for example, the code of Ann is ABAA, whose behavior A corresponds to the silent mother with the first character as a line.

Specifically, the codes of the behavior I, V, U represent punctuation marks or non-chinese characters, and I, V, U is reserved as the code space of punctuation marks and non-chinese characters as the letter combinations of the rows, wherein the non-chinese characters include characters of the chinese character circle countries such as japan and korea, so that the global pictographs are all incorporated into a character one-dimensional linear space, and mathematical operations such as computer sequencing and retrieval can be performed on the global pictographs.

Specifically, when the vowel is ai, the corresponding column is represented by Z, and ai is corresponding to the last digit of the western letter, so that the vowels E, I, O and U are not dislocated, that is, the western letter E represents the vowel E, the western letter I represents the vowel I, the western letter O represents the vowel O, and the western letter U represents the vowel U.

Specifically, for example, the Chinese characters named [ Z ī ] include ZIAA, ZIAB, ZIAC, zizaad, ZIAE, ZIAF, zizaag, zizizaah, zizaai, black ziajj, ZIAK, ancient covered wagon ZIAL, ZIAM, ZIAN, ZIAO, ZIAP, ZIAQ, ZIAR, ZIAS, zizaat, ZIAU, ZIAV, mullet, ZIAX, zizay, 24 zizaa, ba, bb, zizizibc, bd, zizabe ZIBE, ZIBF ZIBF, zizag ziga 33, wherein the first 26 ZIAA, ZIAB, zizaad, zizaa, zizaab, zizaa, zizaaf, zizaa, ziaba 8672, ziaba, zia, ziaba iaa, zia, ziaba iav, zia, ziaba iaa, zia, ziaba iaz 72, zia, ziaba zia, ziaba iaz 94, zia 368672, zia , zia zizaa , zia 3614 zia, zia 368472, zia, zia, zia, zia zizaa zia, zia zi; positions 27-33, ZIBA, ZIBB, ZIBC glutinous-ball, ZIBD, ZIBE, ZIBF, ZIBG, are a second group, so these words are all B vertically, and similarly, the groups are distinguished by A-G sequential designation as the order of the words.

Specifically, for example, when it is a mood assist word, it is soft, and when it is soft, the code is MASA, corresponding to the soft tone, denoted by S.

Specifically, because the pinyin corresponding to the initial C or ch in the case of the behavior C, the initial S or sh in the case of the behavior S, the initial Z or zh in the case of the behavior Z, and the U or lu in the column U, if it is not distinguished which specific pinyin corresponds to, it will cause ambiguity in decoding the code, and to avoid this, the pinyin with the initial ch, sh, zh and the final lu uses DEF, JKL, PQR and WXYZ as tone letters, and other pinyins will not cause ambiguity, and all tone letters can be used. For example: coding of property is CZGE, which uses G to represent two tones; the code of firewood is CZJA, which uses J to represent two tones of a tongue-curling sound; nu is encoded as NUGA, which uses G for two tones; the code for the woman is NUPA, which uses the triphone tone of P representation u.

Further, the homophones are sorted and grouped according to a predetermined Chinese character use frequency sequence, stroke number and stroke sequence, and specifically, the homophones are sorted and grouped according to a general standard Chinese character table issued by the state department in 2013. In the 'universal standard Chinese character table', the collected Chinese characters are divided into three levels, the first level character table is a common character set, 3500 received characters, the second level is a secondary common character set, 3000 received characters, the third level is an uncommon character set and 1605 received characters, and simultaneously in each character set, the Chinese characters are arranged according to the number of strokes, and the same number of strokes are arranged according to the horizontal, vertical, left-falling and dot sequences, so that in each character set, the Chinese characters are arranged from simple to complex sequences in the use frequency range. And because the income and classification to "general standard Chinese character table" tertiary word stock are the result of big data statistics, therefore the content of "general standard Chinese character table" is relatively stable, it is very scientific and reasonable to be regarded as the longitudinal grouping and sequence code sequencing basis of homophonic homonym according to "general standard Chinese character table", can guarantee in the coding rule database that this coding rule corresponds to, the pronunciation is arranged according to the spelling order, homophonic homonym is arranged according to commonly used to not commonly used, the typeface is from simple to complicated, part Chinese character codes are as follows in the word stock of the coding rule database:

the Chinese character of the @ pair is recorded in a universal standard Chinese character table, but cannot be displayed on a computer at present, the pronunciation of the Chinese character can be clearly shown to be arranged according to the Pinyin sequence during coding by the selected part of Chinese character codes, homophonic homonymous characters are arranged according to common use to different common use and characters from simple to complex, and the Chinese character coding method has scientificity.

Meanwhile, in the word stock of the coding rule database, for example, the code corresponding to [ h { hacao } o ] is HDMA, the code corresponding to [ h { hacao ] is HDTC, which represents that the homomorphic polyphonic characters represent independently, and different pronunciations of the homomorphic polyphonic characters correspond to different codes, so that the accuracy and precision of Chinese expression are improved.

Specifically, when homophonic homonymic characters are sorted and grouped according to the predetermined Chinese character use frequency sequence, stroke number and stroke sequence, a character corresponding to a traditional Chinese character is positioned at the back position of a corresponding simplified Chinese character, so that the computer can conveniently carry out operation conversion between the traditional Chinese character and the simplified Chinese character. For example, the code of "horse" is MAMA, the code of "horse" is MAMB, and the computer can directly calculate the code of "horse" by carrying out +1 on the sequence of "horse". In addition, when one Chinese character has a plurality of traditional Chinese characters, the traditional Chinese characters respectively correspond to the simplified Chinese characters, and operation conversion between the traditional Chinese characters and the simplified Chinese characters is ensured not to be mistaken. For example, since there are three complex characters, i.e., "table", "", "table", when encoding "table", it is first encoded as an independent character, and its encoding as an independent character is TZGA, and then the simplified characters and the complex characters corresponding to the three complex characters are sequentially encoded, which is reflected in the encoding rule database as follows: table TZGB, table TZGC, table TZGD, TZGE, table TZGF, table TZGG, so that no matter how many complex characters a Chinese character has, the operation conversion between complex characters and simplified characters can be directly performed according to the rules. For another example, when the distance is kilometers, there is no traditional character, and when the distance is the inside distance, there is a traditional character, so two distances are encoded as independent characters, and at the same time, one position behind the inside distance is the corresponding traditional character, so the encoding rule database is embodied as: the inner LIMD represents kilometers in kilometers, and the inner LIME represents inner miles.

Further, the encoding rule further includes:

the characters of Chinese words are connected by using a "-" character, suffix characters are connected by using a "" character, children voice is represented by a "-E", and the words are divided by using a space; in Chinese text, Chinese and Western capital letters are distinguished by adding a "-" character in front of a string of capital letters in Western within 4 digits. "-", "'" correspond to ASCII universal characters. For example, "he is a good list of us," this sentence corresponds to T 'S W-M' D H BC-YC. For another example, the code corresponding to the "root of a tree" is SU-GG 'E, and the code corresponding to the "key-off" is KT-MG' E, so that the children can be independently expressed and distinguished from the code when the "child" is used as a word, for example, the code corresponding to the "daughter" is NU-E. In addition, if the western capital letters within 4 digits are required to be input in a Chinese article, in order to distinguish from the Chinese character codes, a "-" symbol is required to be added before the western capital letters in the input process, and since the words themselves are divided by spaces, the "-" symbol used here is not confused with the "-" symbol used when words are connected. For example, "I am on work with IBM corporation," this sentence corresponds to a code of W' ZZ-IBM GR-SI SC-BB, so that the system will not recognize IBM as a Chinese character code when processing the input string.

Based on this, in step S102, suffix characters and words can be accurately distinguished, meanwhile, english characters are accurately divided, and codes representing each chinese character or word and sentence reading information included in chinese sentences are determined, which is convenient for a computer to perform semantic processing.

Further, the coding rule database contains the pronunciation of the unmarked character of the modern Mandarin pronunciation. For example, "a" has only a pronunciation [ y ī ], but there are a number of situations that produce a tonal modification: when the word 'one' is used independently and used as the last word of a word or a sentence, the book is read to turn to the first sound, such as unity, one nine and the like; when the word "a" is used before the fourth sound, the "a" transposes the second sound as one, once, the same, etc.; when the word "one" is used in the first sound, the second sound and the third sound, the word "one" is modified to read the fourth sound, such as one time, one kind, one sheet, etc. in addition, in spoken language, people are often used to read "one" as y ā o. Therefore, the pronunciation of these Chinese characters after tone modification and the pronunciation of spoken language are added in the coding rule database, the word "one" corresponds to the pronunciation of one sound, two sounds, four sounds tones and [ y ā o ], the corresponding codes are YIAA, YIGA, YITA and YDAA respectively, and the similar cases also include seven, eight and none.

Furthermore, the four-digit western language capital letters with the weight and the independent computable position mean that each capital letter is an independent computable unit, mathematical operation can be carried out by utilizing the ASCII code corresponding to each capital letter, namely, the mapping relation of Chinese characters in one-to-one correspondence is established between the ASCII codes, the complete ASCII coding of the Chinese characters is realized, the problem of messy codes can not occur when a computer processes Chinese character information, and the precision and the safety and the reliability of the computer in processing the Chinese characters are greatly improved.

For example, the code of "I" is WOMA, the ASCII code of character W is 87, the ASCII code of character O is 79, the ASCII code of character M is 77, the ASCII code of character a is 65, the code of "I" is converted into N by-9, the code of character O is I by-6, the code of character M is +0 or M, and the code of character a is C by +2, whereby the code of "I" can be converted into "you" NIMC after being operated.

Similarly, the four-digit western capital letters with the weight bits capable of being independently calculated mean that each capital letter is an independent arithmetic unit, the rhyme of the poem can be judged by the coding method, the rhyme is that the vowel and the tone of the last character of each sentence of the poem are required to be the same, namely, whether the rhyme of the poem is rhyme or not can be judged by judging whether the corresponding coded columns of the last character of each sentence of the poem are consistent and whether the lines of the last character of each sentence of the poem belong to the same type or not. For example, in poetry "bright moon before bed, doubtful frost on the ground; in "look ahead to tomorrow, look ahead to hometown", the light corresponding code is GLAA, the frost corresponding code is SLDC, the country corresponding code is XLAA, the three-character column is L, and the vertical column belongs to the tone letter corresponding to one sound, so that it can be determined that the three-sentence poems satisfies the rhyme.

Further, the encoding rule further includes:

in the word library of the coding rule database, common words are represented by one-bit, two-bit and three-bit brevity codes, each brevity code corresponds to a respective four-bit full code for operation processing, the one-bit brevity code is represented by a row one-bit of a Chinese character, the two-bit brevity code is represented by a row two-bit of the Chinese character, and the three-bit brevity code is represented by a row three-bit, a column three-bit of the Chinese character. For example, he may be denoted by T (TAAA), by TC (TCGB), and by TCM (TCM) for lying down (TCMC).

Specifically, in the word library of the coding rule database, the corresponding relationship between the one-digit brevity code and the Chinese character is as follows:

specifically, in the word library of the coding rule database, the correspondence between the binary simple codes and the Chinese characters is as follows:

specially, because ten, hour and hundred million are more common, it is very practical to use two-bit brevity code to represent, therefore, two-bit brevity code of EA, EB and EC which is not used is used to represent ten, hour and hundred million respectively, and similarly, these several-bit brevity codes respectively correspond to original four bits of ten, hour and hundred million, and can calculate full code.

Further, two-word, three-word and four-word words in the form of "XX-XX", "XX-XX" are standard words, each word in the standard words is represented by a binary brevity code, and the standard words are overall representation structures and are not limited by each secondary simple word; non-standard words may be represented using any simple combination of codes. For example, the word "Xinhua society" can be expressed as XP-HJ-SE, which represents a standard word that is wholly independent and not limited by corresponding binary code words, and the standard word is treated as a whole word when the computer analyzes the standard word, so that the efficiency of processing Chinese character information by the computer is improved; the word "we" can be expressed as W-M, "stretcher" can be expressed as DBAB-JJTE, "talent" can be expressed as R-CZG, corresponding and non-standard words can be expressed by any simple code combination, and for non-standard words, computers are all analyzed by words independently. Therefore, the invention receives and records the common words into the standard word bank of the coding rule database in the form of the standard words, and part of the standard words in the standard word bank of the coding rule database are coded as follows:

based on this, when the encoding rule database obtained according to the encoding rule is queried in step S103 and the chinese character or word corresponding to the encoding is determined, for the standard word, the chinese character corresponding to the standard word encoding is determined and output by retrieving the standard word library in the encoding rule database, and for the individual chinese character and non-standard word, the chinese character corresponding to each encoding is determined and output by retrieving the word library in the encoding rule database.

Further, the encoding rule further includes:

the three-digit brevity code of each Chinese character is a pronunciation code, namely when the computer carries out Chinese homonymous translation on the letter combination input by the keyboard to broadcast the voice, the pronunciation of the Chinese character can be spelled according to the row, column and longitudinal combination of the letter combination corresponding to each Chinese character; and XXXR corresponding to the three-digit brevity code is represented as a pronunciation code corresponding to the Chinese character retroflex pronunciation. All three-digit brevity codes are extracted from a character library of the coding rule database to be used as pronunciation codes and all voice pronunciation codes XXXR to form a new pronunciation code library, and mapping relation without ambiguity can be established for Chinese characters and corresponding pronunciations.

The input method provided by the invention successfully realizes 'text homophony', and meets the requirements of the unification of voice information and the standard communication development in the modern society. Because the Chinese characters do not have pronunciation attributes, the phenomenon that the Chinese characters read different pronunciations nationwide is caused, and the method provided by the invention intuitively and completely enables the pronunciations (including four pronunciations) of the corresponding Chinese characters to be intuitively expressed and locked. The 'text with sound' function greatly improves the learning, using and expanding capability of Chinese, and also greatly expands the capability of processing text in a computer.

Further, the word stock in the coding rule database can theoretically accommodate 26⁴The characters are enough to contain all Chinese characters developed to the present, including rare characters and variant characters.

In the input method provided by this embodiment, the target character or sentence can be input on the electronic device by receiving and storing the input keyboard character, dividing the english character according to the coding rule, determining the code representing each chinese character or word, querying the coding rule database to determine the chinese character or word corresponding to the code, and then outputting the chinese character corresponding to the code. The coding rule is that each Chinese character is represented by a western capital letter combination with weight digits at four positions of rows, columns, longitudes and sequences, characters of Chinese words are connected by a "-" character at the same time, suffix characters are connected by a "" character, children voice is represented by a "" E ", and the characters and words are divided by a blank space. The input method realizes the splitting and full digital quantization of each Chinese character, each letter representing the Chinese character can independently participate in the operation of the computer, and solves the problem that the internal code of the traditional Chinese character machine can not be operated, so that the Chinese character does not only play the role of a typewriter in the computer any more: input, storage, composition, printing, but rather may participate in the computer's parsing, sorting, retrieval, computation, reasoning, and control of information.

The embodiment of the invention establishes the one-to-one mapping relation which can be directly identified between the Chinese characters and the ASCII codes, realizes the complete ASCII coding of the Chinese characters, does not cause Chinese messy codes and even serious consequences of system breakdown when the system processes information like the internal codes of the traditional Chinese character machine because double bytes or three and four bytes can not be split and misplaced, the high position of each byte is 1 and the like, and can not be forcibly excluded from an important information processing system, thereby greatly improving the precision and the safety and reliability degree when the computer processes the information, and simultaneously leading the input, the storage, the transmission and the control of the Chinese character information to be conveniently carried out among the computer, various digital devices and networks.

In addition, the coding rules of the invention are distinguished to represent polyphone characters, polyphone characters and simple and complex bodies, so that each letter combination accurately represents only one Chinese character and the pronunciation thereof, and the characters and the words can be broken reasonably when inputting, so that the computer can accurately understand the meaning of the sentence, the input sentence has no ambiguity, and the Chinese character input can realize the touch typing in the true sense. Meanwhile, the input code is the Chinese character machine internal code, and the Chinese character can be stored and processed without being converted into the machine internal code by an input code conversion module of a Chinese character operating system when being input, so that the system overhead is reduced, the system stability is improved, and the Chinese character touch typing input can be realized. The computer can spell out the pronunciation of the Chinese character according to the row, column and longitudinal combination of the letter combination corresponding to each Chinese character, and the computer can accurately translate and broadcast the meaning to be expressed by an input person when performing simultaneous interpretation on the letter combination sequence input by the keyboard.

Meanwhile, according to the input method provided by the embodiment of the invention, when learning Chinese pinyin, students can master the corresponding relation between the Chinese pinyin and the line, column, longitudinal and sequence only by one class, and can quickly enter a touch typing state, so that the input method is simple and easy to learn. The invention can also make the Chinese character computer input such as examination, filling in the form, etc. at present, realize and use 26 letters to scribble the card machine to read and process, has greatly raised the speed of information processing, has saved a large amount of labour force.

Example two:

in this embodiment, as shown in fig. 2, an output method for converting chinese speech into data is provided, where the method includes:

s201, voice information is received.

S202, identifying syllables, tones and sentence readings of the voice information.

S203, according to the syllable, tone and sentence reading of the voice information, determining the corresponding Chinese character and sentence code by calculating and inquiring the code rule database, and then outputting the code and/or the Chinese text with partial tone and sentence reading mark corresponding to the code.

The encoding rule database may employ the encoding rules described in the first embodiment.

The calculation refers to accurately distinguishing the pronunciation difference of suffix characters and words and accurately judging the sentence break according to syllables, tones and sentence reading information of the voice information.

For example, for the speech information that "such talents are what we need", the speech information has two meanings:

(ii) such a person/is what we need;

② the talents/people are needed.

When the speech information is described according to two meanings, the computer can identify syllable, tone and sentence reading information, the characters of Chinese words are connected by using a "-" character, the suffix characters are connected by using a "" character, and the words are separated by using a blank space, so that the corresponding code of the speech information is output:

(ii) such a human/is the ZE-YC 'D R CZG-S W-M XU-YD' D we need;

② the talents/talents are ZE-YC ' D R-CZG ' S W-M XU-YD ' D which we need.

Therefore, the method can realize the judgment of word breaking and word breaking of Chinese speech when outputting the coded information, so that the output coded information has no ambiguity.

Similarly, for the phrase "the great bridge of the Yangtze river in Wuhan city", because the phrase "long" is different from the phrase "polyphone" itself, the phrase is ambiguous and can refer to the great bridge of the Yangtze river or the great bridge of the city. When the phonetic information is described according to two meanings, the computer can identify syllable, tone and sentence reading information, the characters of Chinese words are connected by using "-" character, the suffix characters are connected by using "" character, and the words are separated by using blank space, at the same time, because the Chinese polyphones in the coding rule are respectively and independently represented, the coding and Chinese characters corresponding to the phonetic information can be distinguished when being output:

wuhan city/Changjiang bridge

WU-HB-SI CC-JLA DA-QM, which refers to a bridge;

② Wuhan city Chang/Jiang bridge

WU-HB SI-ZC JLA-DA-QM, refers to a human.

Similarly, for the phrase "classmates in the first building of the big north library," one "has multiple tones, and the phrase is ambiguous, and when one" reads one sound, the phrase refers to the classmates in the first floor of the library, and when one "reads four sounds, the phrase refers to the classmates in one building of the library. Similarly, when the speech information is described in two meanings, the computer can identify syllable, tone and sentence reading information, the characters of Chinese words are connected by a "-" character, the suffix characters are connected by a "-" character, the words are separated by a space, and because the tone of Chinese characters in the coding rule is independently represented, the corresponding coding of the speech information can be output to distinguish the two meanings:

first, classmates in the first floor of the big library in north

BF-DA TU-SU-GV YI-LT' D TR-XW, which refers to classmates at the first level in a library;

② classmates of first floor of big library in north

BF-DA TU-SU-GV YITA-LT' D TR-XW, refers to classmates in one of the buildings of the library.

Similarly, for the phrase "the debt RMB 200 ten thousand yuan", the "the reading of the word" also "expresses two diametrically opposite meanings of" hu n "or" h i ", which is very important for the distinction of the meaning, similarly, when the voice information is respectively described according to the two meanings, the computer can identify the syllable, tone and sentence reading information, connect the characters of the Chinese word with" - "character, connect the suffix character with", separate the space between the words, because the tone of the Chinese character in the coding rule is respectively and independently expressed, the distinction of the two meanings can be made when outputting the coding corresponding to the voice information:

is 200 ten thousand yuan less than renminbi

HVG QKTRG-MP-BI 200WB-YV, indicating that there has been a debt of 200 ten thousand yuan (hu an n);

② also (h & i) owes RMB 200 ten thousand yuan

HZ QKTRG-MP-BI 200WB-YV, indicating still (h i) less than 200 ten thousand yuan.

Similarly, as to the word "how many" can be regarded as a whole word, and can also be regarded as a suffix word, wherein the two understandings can lead to different understandings of the sentence, when the voice information is respectively stated in two meanings, the computer can recognize syllable, tone and sentence reading information, characters of the Chinese word are connected by a "-" character, suffix characters are connected by a "" character, the words are separated by a blank space, and when the code corresponding to the voice information is output, the two meanings can be distinguished:

winter clothes can be worn by a certain amount

DR-TK YI-FU NH CVDB DY-SD CVDB DY-SD, where more and less are words combined together as a whole, and where the expression is to be as many as possible;

② how much clothes can be worn in summer

XJ-TK YI-FU NH CVDB DY 'SD CVDB DY' SD, where the small ones are suffix words, the expression here is to pass as few as possible.

The technical effects of the present embodiment are the same as those of the first embodiment, and are not described herein again.

Example three:

in this embodiment, as shown in fig. 3, an output method for chinese character encoding voice broadcast is provided, where the method includes:

step S301, receiving Chinese character codes with partial tones and sentence reading marks.

Step S302, inquiring the coding rule database to determine the sound production characteristics of the corresponding Chinese characters and sentences, and outputting the corresponding accurate unambiguous voice information.

For example, for the phrase "such talents are what we need", the phrase has two meanings due to the difference of word breaking and sentence breaking when inputting, and the corresponding syllables, tones and sentence readings when translating the codes and broadcasting the voice are different, when inputting the Chinese character codes according to the two different meanings, the computer can analyze the codes according to the coding rules, determine the sounding characteristics of the corresponding Chinese characters and sentences, and output different voice information:

①ZE-YC'D R CZG-S W-M XU-YD'D

such persons/is what we need;

②ZE-YC'D R-CZG'S W-M XU-YD'D

such talents/are desirable.

Similarly, for the phrase "the great bridge of the Yangtze river in Wuhan city", because the phrase "the great bridge" is different from the phrase "the great bridge" itself is polyphone and the phrase is ambiguous, the phrase may refer to the great bridge of the Yangtze river or the great bridge of the city. Correspondingly, syllables, tones and sentences are different when codes are translated and voice broadcast is carried out, when Chinese character codes are input according to two different meanings, a computer can analyze the codes according to the coding rules and determine the sounding characteristics of corresponding Chinese characters and sentences, so that different voice information is output:

①WU-HB-SI CC-JLA DA-QM

wuhan City/Changjiang bridge, which refers to bridge;

②WU-HB SI-ZC JLA-DA-QM

the Wuhan city Chang/Jiang bridge refers to a person.

Similarly, for the phrase "classmates in the first building of the big north library," one "has multiple tones, and the phrase is ambiguous, and when one" reads one sound, the phrase refers to the classmates in the first floor of the library, and when one "reads four sounds, the phrase refers to the classmates in one building of the library. Similarly, when inputting Chinese character codes according to two different meanings, the computer can analyze the codes according to the coding rules to determine the sounding characteristics of corresponding Chinese characters and sentences, thereby outputting different voice information:

①BF-DA TU-SU-GV YI-LT'D TR-XW

the classmates in the first floor of the big north library output a sound to indicate the classmates in the first floor of the library;

②BF-DA TU-SU-GV YITA-LT'D TR-XW

the classmates in the first building of the big north library refer to the classmates in one building of the library for a sound of four voices.

Similarly, for the sentence "owing RMB 200 ten thousand yuan", the "reading of the" still "word is" hu n "or" h a "expresses two diametrically opposite meanings, which is very important for the distinction of the meaning, and similarly, when the Chinese character code is input according to two different meanings, the computer can analyze the code according to the coding rule, determine the sound production characteristics of the corresponding Chinese character and the sentence, and output different voice information:

①HVG QKT RG-MP-BI 200WB-YV

also (hu n) owes RMB 200 ten thousand yuan, and also outputs the reading of hu n, which indicates that owing 200 ten thousand yuan has been left;

②HZ QKT RG-MP-BI 200WB-YV

and (h & i) the reading of the h & i is output, which means that the reading is still less than 200 ten thousand yuan.

Similarly, as for the word "how many" can be regarded as a whole word, and can also be regarded as a suffix word, the two understandings can lead to different understandings of a sentence, when a Chinese character code is input according to two different meanings, a computer can analyze the code according to the coding rule, determine the sound production characteristics of the corresponding Chinese character and the sentence, and output different voice information:

①DR-TK YI-FU NH CVDB DY-SD CVDB DY-SD

how much clothing can be worn in winter, wherein more and less are words combined together to form a whole, the number is referred, more and less are read as a whole during pronunciation, and the expression here is to wear as much as possible;

②XJ-TK YI-FU NH CVDB DY'SD CVDB DY'SD，

how much clothing can be worn in summer, wherein less is more suffix characters, while less is emphasized when sounding, and the expression is to wear as little as possible.

Further, when the computer translates and broadcasts the letter combination sequence input by the keyboard, the words, the suffixed characters and the voices can be recognized according to the symbols of the < - > and the "' in the received input information, the letter combination sequence is arranged into sentences according to grammar and semanteme, the words can be broken and the words can be broken reasonably when the translation result is broadcasted by voice, ambiguous sentences can not be generated, for example, whether the bridge of Changjiang river in Wuhan city says" people "or" bridge "can be understood, and whether the bridge of 200 RMB in less than" people in 200 RMB in less than "can be understood, so that the input Chinese data is valuable.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A Chinese character datamation input method is characterized by comprising the following steps:

receiving and storing input keyboard characters;

the encoding rule is as follows:

the negative level tone is expressed by using ABC DEF sequence, the positive level tone is expressed by using GHIJKL sequence, the upper tone is expressed by using MNO PQR sequence, the lower tone is expressed by using TUV WXYZ sequence, and the light tone is expressed by using S, wherein ABC, GHI, MNO and TUV are front tone letter groups, DEF, JKL, PQR and WXYZ are rear tone letters, and the consonants are ch, sh, zh or vowel Hu pinyin only use the rear tone letters; for homophonic and homophonic characters, ordering and grouping according to a predetermined Chinese character use frequency sequence, stroke number and stroke sequence, wherein 26 bits are divided into one group, Chinese characters in the first 26 bits are ordered into a first group, first-bit letters corresponding to tone letters are used as the vertical, Chinese characters in 27-52 bits are ordered into a second group, second-bit letters corresponding to tone letters are used as the vertical, and by analogy, Chinese characters in each group use corresponding tone letters as the vertical of each character;

2. The method of claim 1, wherein when homophonic homonyms are ordered and grouped according to the predetermined frequency order of use of the Chinese characters, the number of strokes and the order of strokes, a character corresponding to a traditional Chinese character is positioned at the next position of a corresponding simplified Chinese character, thereby facilitating the operation conversion between the traditional Chinese character and the simplified Chinese character by a computer.

3. The method of claim 1, wherein the encoding rule further comprises:

4. The method of claim 1, wherein the database of coding rules contains an unlabeled word pronunciation of a mandarin chinese pronunciation.

5. The method of claim 1, wherein the four-digit weight-bit independently computable western capital letters means that each capital letter is an independently computable unit, and mathematical operations can be performed using ASCII code corresponding to each capital letter.

6. The method of claim 1, wherein the encoding rule further comprises:

7. The method of claim 6, wherein the encoding rule further comprises:

8. The method of claim 6, wherein the three-digit brevity code of each Chinese character is a pronunciation code, and the pronunciation of the Chinese character can be spelled out according to the row, column and longitudinal combination of the corresponding letter combination of each Chinese character; and XXXR corresponding to the three-digit brevity code is represented as a pronunciation code corresponding to the Chinese character retroflex pronunciation.

9. An output method for Chinese speech datamation, the method comprising:

receiving voice information;

identifying syllables, tones and sentences of the voice information;

the coding rule database employs the coding rules described in claim 1.

10. An output method for Chinese character coding voice broadcast is characterized by comprising the following steps:

the coding rule database employs the coding rules described in claim 1.