CN111611773A

CN111611773A - Digital coding method for Chinese and foreign languages and its use

Info

Publication number: CN111611773A
Application number: CN201910855221.XA
Authority: CN
Inventors: 侯景华; 侯朋太
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-02-24
Filing date: 2019-08-30
Publication date: 2020-09-01

Abstract

The invention relates to a digital coding method of Chinese characters and foreign languages and application thereof, belonging to the field of character coding, wherein 11 symbols of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and "·" are used as code elements, Chinese characters are split into 6 single strokes, three etymons are set to use 0, 1 and 9 as code elements respectively, and the required single strokes except etymons are combined into one group according to every two strokes in sequence as much as possible, and the 11 symbols are used as code elements to code the Chinese characters; korean and Korean are split into single vowels, consonants and vowels, each no more than 11 are used as a group, the symbols generated by the 11 are used as code elements respectively, and all Chinese characters, Chinese and Korean can be coded by no more than three codes; the rules are easy to learn, easy to remember and intuitive and simple to operate. After the individual characters are sorted by the digital codes not exceeding three digits, the 'individual character search table' of Chinese characters can be made, the 'four-corner number character search' of 'radical character search table' and the 'pen and brush sequential character search table' are replaced, and the blank of three-code character search of Chinese characters is filled, so that the method is convenient and quick.

Description

Digital coding method for Chinese and foreign languages and its use

The invention belongs to the technical field of character coding.

Background artthe technique of encoding characters (or words) of chinese words and other nationalities is widely applied to the input of character codes of electronic computers, the retrieval of dictionaries of characters, dictionaries, etc. For example, in Xinhua dictionary, the radical indexing Table requires that the strokes of the radicals of the components are counted first and then the strokes of the rest parts are counted, which is complicated and has 400 characters difficult to index; with the development of digital technology, a scheme of coding characters by numbers is developed. Such as the five-stroke digital code and four-corner code looking-up method of Chinese characters. The five-stroke digital code has the performance of novelty, practicability and the like, but the code can code single characters as much as six codes (namely each single character is represented by 6 numbers), obviously the code is overlong, and a plurality of (more than 6) etymons are set as code elements; the widely used four-corner number word-searching method of the Chinese dictionary, in fact, each single word is added with an additional number at the tail part besides four-code digits, so that the code of each word is up to five-code digits substantially, and the word-searching is not quick due to the fact that multiple code words exist; and the four corner numbers stipulate that 10 stroke structures are respectively represented by 10 numbers such as 0-9, which is equal to that 10 etymons are set, and too many etymons are not easy to remember. It is an object of the present invention to develop a method for encoding Chinese characters and other national characters (or words) using all the individual digits in different arrangements, so that the code elements of each character (or word) are shorter, preferably no more than three codes, and the repeated code characters are fewer, so that the character-searching is faster than the character-searching of the former four-corner number.

Summary of the inventionthe present invention is so implemented.

The basic features of the invention are: in both Chinese and foreign language encoding, the symbols used are within 11 symbols of "1, 2, 3, 4, 5, 6, 7, 8, 9, 0, ·" ("read point, dotted decimal point). That is, the coded symbols of the present invention use 11 symbols of "1, 2, 3, 4, 5, 6, 7, 8, 9, 0, ·". The invention is also characterized in that: the present invention does not exceed three codes for coding all the single words of Chinese and does not exceed three codes for coding syllables (single words) of Korean and Korean.

1. Some concepts of the invention to clarify the innovation of the invention, the concepts related to the coding of words and phrases of Chinese are described below.

1.1, the stroke is a single continuous writing line in a single character, and is the most basic element for splitting the single character, and the invention is characterized in that: the single strokes forming the Chinese character are divided into 6 types, namely: horizontal, vertical, left-falling, right-falling, turning, point (","); the "horizontal" includes "mention" (the last stroke like ";" the right-falling "does not contain a point (", ")," the turn "contains all the structures with turns and turns in the Chinese character (also includes the" vertical hook "of" strong "and" vertical lift "of" long ", the last stroke like" sharp ", all the turns are calculated), and the 6 th stroke is a point", "(the first stroke like" heart "," Beijing "and" heart "," the last stroke ").

1.2, the etymon is a small number of parts of fixed combination of a plurality of strokes in a single character, and the etymon is specified as follows.

The first type of etymons are "small" and "wood", and the upper part like "tip" is the lower part of the etymon ' small ' wood "like" jie's upper "shelf" and the left part of the "root". When a character containing wood is encountered, the whole wood is preferred to be rather than the whole small when the etymon is taken.

The second type of etymons are boxes, each box comprises a mouth in each character and a rectangular box in each character, (for example, the upper part of a foot, the lower part of a No and the left part of an eating are in a mouth shape, the structure of a box refers to the peripheral box structures in the characters such as a target, a day, , a country, a cause, a field and the like, but the mouth or the box which becomes the etymons cannot be intersected with other strokes, (for example, the mouth of the character such as a middle 'worm' cannot be intersected with other strokes to form a etymon box, and similarly, the characters such as wine, a base, electricity, an first, a second, a third, a fourth, a sixth and the like cannot be intersected with other strokes to form a etymon box and a mouth).

The third radical is that "people" includes "eight" and "gold" includes "Chinese radicals". For example, the uppermost part of the 'full' and 'score' is the etymon 'man' or 'eight', and the last part of the identification is the etymon gold; the left part of the iron is the radical character.

The invention is characterized in that: when the single character is split, the structure of the etymon is encountered, strokes forming the etymon are compiled into a code which is equal to a group of single strokes split by other strokes; the radical cannot be separated into single strokes, and if adjacent strokes form a radical structure, the radical should be preferentially separated into full radicals, such as single strokes, horizontal strokes and radicals when the single word is separated;

1.3, the numbers of strokes and etymons represent numbers, single strokes are divided into single strokes according to the stroke order and preferably form full etymons, and the numbers 1, 2, 3, 4, 5 and 'DEG' respectively represent the horizontal, vertical, left falling, right falling, turning and dot points of the single strokes in sequence. The 6 digits represent the number of the stroke.

The roots "small" and "wood" are also represented by the number 1; the structure of the etymon 'mouth' and 'frame' is represented by a numeral 0; the radicals "human", "eight", "gold" and "character" are all represented by 9 as symbols.

The 1.4 code element is the basic element of coding, the invention uses 11 symbols of '1, 2, 3, 4, 5, 6, 7, 8, 9, o, ·' as code elements.

When the Chinese character is coded, according to other regulations of following word coding, the single character is divided into single strokes according to the original stroke order except the surrounding single character, the divided single strokes are combined into one group as much as possible, the representative numbers of the strokes in the group are added, and the added result is expressed as a code element. The result of the combination and addition of the folding and folding is represented by "1", and if the character is encoded, the result of the folding and addition of the two split strokes is represented by 1.

The invention is characterized in that: after the single character is split into single strokes, if the single stroke ' dot ' is combined with other single strokes except the dot, the representative number and the ' right-falling stroke ' of the dot are both the number 4 instead of the dot '; (for example, the first dot and the second cross of the code of 'Jing' are combined, the dot represents the number 4, the cross represents the number 1, the two-stroke combination addition result is 5 to represent the code element of ''), if the dot is not combined with other single strokes into one group but is a single stroke, then the dot is 'independently a group, the code element of the dot of the single stroke is still represented by' 8 ', for example, the first left-falling part of the' pill 'and the second folding combination code element are 3+5 to be represented by 8, but the third' and 'are single strokes which are not combined with other strokes, the code of' 8 'is represented by' so 'pill'; provision is also made for: the combination of points and dots into a set, also denoted by (-) can; (for example, the first two strokes of the river are dots and the addition result is represented by. therefore, the first code of "river" is ". The same principle, the first stroke and the second stroke of" cut, fast and river "are both". The first stroke and the second stroke are combined, and then the symbol is represented by ". The code element is represented by". The first stroke and the second stroke are combined.

2, the invention is a scheme for coding Chinese single characters

All Chinese character codes are coded according to the original standard stroke order except that the surrounding type characters are coded according to the stroke order newly specified by the invention; the strokes can not be used repeatedly, the strokes are coded according to the regulations, the rest strokes are not used, and the used strokes are used up in sequence and are directly broken.

The invention divides single characters into two types of compound characters and whole characters. The method is characterized in that: the composite characters include three types, i.e., left-right type, top-bottom type, and surrounding type. Wherein the left and right types include left, middle and right types such as "old, outer, chi, eel and lake"), and the upper and lower types include upper, middle and lower types such as "the most", "bi" and "Yi"; words of the enclosing type, such as: the Chinese, pool, house, peer, di, tetro, sickness, surplus, oxygen, room, region, cause, captive, magic, watch, plant, cross, send, build, musk, murder, letter, and the structure is characterized in that a part of strokes of the word forms a surrounding or semi-surrounding to another part of strokes, forms a fully surrounding type such as 'country', 'cause', a surrounding type such as 'close', 'quart' and 'den', a surrounding type 'same', '', 'surrounding type' sickness ',' see ',' right surrounding type 'two', 'planting', a surrounding type 'region' in middle of upper left, a surrounding type 'region' lower surrounding type 'over', 'send', 'delay' lower left, and right surrounding type 'murder letter'.

2.1 coding of whole words the present invention is characterized in that individual words other than the left-right type, the top-bottom type, and the surrounding type are coded as whole words. During coding, firstly, the code elements of a first group of strokes are taken as a first code according to a standard stroke order, the first group of strokes are generally the combination of a first stroke and a second stroke, or the etymon combined by the first stroke, such as the first code in a 'sufficient' character is a etymon 'mouth', if the etymon is formed by the subsequent strokes of the first stroke, the first stroke can only be taken as a group of codes, such as the first group of strokes 'one' of 'Yu', the 'mouth' is arranged behind the 'one', and the first code is 'one'; the head of the whole word is thus formed. With the remaining strokes: this is followed by a second set of strokes or radicals, which are the second code of the word. The remaining strokes, forward from the penultimate stroke and the penultimate stroke, combine to form a tail code. If the penultimate stroke is combined into a etymon, the etymon is taken as a tail code, if the penultimate stroke is combined with other strokes into the etymon, the penultimate stroke can not be combined, such as ' denier ', or the penultimate stroke is the last stroke of the remained unique stroke, such as ' pill ', or ' the last stroke, or the penultimate stroke is a separated stroke, such as ' armor ' or ' dog ', the last stroke of the three cases is called, and only the separated stroke is taken as the tail code. Then, the second code of the head code and the tail code are connected as much as possible to form the code of the whole word. Whole words are encoded with up to three codes.

In order to make the coding operation more intuitive and clear, the following special reminders are provided: characters which are not separated among the single character components are all whole characters; the characters with strokes of the front and rear parts being close to each other but not separated are still coded according to the whole characters, such as: , JIAOJIAN, YI, JIAOGUO, WANGZHU, HONGYU, YU, HUANG, and FANG; the double-stroke structure is regarded as that strokes in a single character are not separated as follows: the characters of common individual characters and the whole characters are treated with the Chinese characters such as "vertical", "horizontal", "vertical", "horizontal.

2.2, there is at least one separate space between the components of the multi-character encoding, and no space is necessarily an inseparable whole character. The left-right type (including left-middle-right type) takes the leftmost part as the front part and the rest strokes as the rear part. Note that: since there is a gap from top to bottom between the front and the back of the left and right type, the code divides the front and the back, and the front and the back are divided by the gap from top to bottom, the left part formed after the division is the front of the code, and the rest is the back. After the cold cutting, "" is the front part and "order" is the back part; the front part of the split of "" is "", and the rear part is "vacuum control unit". The method for dividing the front part and the rear part of the combined character is called 'one-cutting method'.

Similarly, the upper and lower type words with upper, middle and lower types are cut from left to right along the transverse through interval of the uppermost edge of the word by using a 'one-knife cutting method', all strokes on the upper edge are front parts, and the rest strokes are taken as rear parts. If the Chinese character 'xiao' is cut, the front part is a 'bamboo character head', the rear part is an 'death'; forbidding ' front ' forest '; the front of the "catch" is the "catch". The front part of Bin is the 'king' and 'white' side by side, and the back part is 'stone'.

The enclosed type word is coded, including the lower enclosed type word, the enclosed part is taken as the front part, the rest enclosed strokes are taken as the rear part, and the front part and the rear part are separated. For example, the front part of disease is , the front part of district is T, the front part of cause and country is mouth, the front parts of fierce, hit, go and letter are all marked.

That is to say: during coding, the left and right types and the upper and lower types are coded from front to back according to the standard stroke order, but the stroke order during coding of the surrounding type characters is as follows: the order of strokes of the word that must be considered as all the enclosing forms, with the enclosure as the front, is to write the enclosure first, and then write the enclosure. Not only the encirclement rings on the leftmost side and the uppermost side are coded in such a way that even when the whole single character is divided into a front part and a rear part, if the rear part is divided into the encirclement forms, the encirclement strokes must be taken first during the rear part coding. Such as: the coding of "banquet", the header is the bamboo word header, the back is "delay", so surround "" first while the back is coded, the code is 496 is not 459; similarly, the "tent" code is 393 or "389" which is not absolute stroke order coded.

After the upper and lower types, the left and right types and the surrounding types are divided into a front part and a rear part according to the structure, the coding is started from the front part, the first code is taken by the same method of whole character coding, and other residual strokes are not used; then, two codes, namely a first code (namely a middle code of the whole character) and a tail code at the rear part are measured at the rear part by the same whole character method, and finally, the codes at the front part and the rear part are connected to form a first code, a second code and a tail code of the single character, wherein no more than three codes are used, and the codes are the codes of the combined character. The strokes are insufficient and the second code is the tail code, e.g. "wu" codes the first code is 3 and the second code is "0" and also the tail code. The following steps are repeated: for the "cold" code, the first code is the header "" code is 4+1 to 5; then, the rear ' order ' is coded, the first code is the etymon ' man ', the tail code is the first stroke from last, the Chinese character stroke ' from last and the second stroke from last, the code element is 4+5 to 9, so the ' cold ' code ' 599 '. And then coding the 'da': surrounding "" is the front part, and taking the code element "9" of the first group of strokes (i.e. the combination of the first stroke point and the second stroke break) as the first code of the word; the first set of strokes of "rear" large "are then combined into a second code of 4, the remaining I's are grouped together and their code elements are 4, so" da "encodes" 944 ".

2.3, the code when the last stroke of a single character is separated is specified in Chinese single character coding for intuitive and quick coding: that is, when the last stroke in the end code is separated from the preceding stroke, the end code is directly represented by the representative number of the single stroke regardless of whether the end code is the second code or the third code. Such as; the first piece of "pin" is wood, the first code is 1, the second code, likewise 1, but the last piece of "pin" is "down-line" and preceding separation, so the last code is 3, so the encoding of "pin" is 113. The first stroke of the ' stream ' is the combination of the first two strokes ' from left to right and the code element is represented by ' phi ', the first two strokes of the rear ' ' are the ' from left to one ' symbols (middle code) is ' 5 ', the last stroke is the dash ' ', the code of the ' stream ' formed by connecting the single stroke code, the code represented by ' 5 ', the first code, the middle code and the tail code is ' phi 55 ').

2.4 two-way processing method for coding all Chinese single characters, in order to avoid the embarrassment of two-way style of strokes and prefix coding for beginners, the invention adopts two-way processing method. That is, when designing a coding scheme, two different codes that may appear for a single word are included. The principle is that a word in a dictionary can be retrieved from two different radicals.

3, coding of Chinese words and phrases

The invention is also characterized in that; the simple code of the words is coded for the words and phrases in the Chinese on the basis of coding the single words.

3.1, the method for editing the brevity codes of all the two words is as follows. The first three codes of the first character and the first and second codes of the last character are taken, and at most 5 codes are used for forming the brevity code of the two characters and words. If the first character is less than three codes, the second character takes three codes as much as possible. The strokes are still insufficient and how many are compiled. For example, the two words "one-to-one" code is "11".

3.2, the method for coding the brevity code of the three-character word is as follows. If the first character has only one code, the second character takes two codes and is sequentially delayed. The three words have 4 codes, the number of the words is far less than that of the two words, and therefore, the code has only 4 codes.

3.3, the brevity codes of words (containing four words) or idioms and phrases with more than four characters are selected, and the first code of the first character, the second character and the third character and the first code of the last character are connected to form the brevity codes.

3.4, when the coding of the self-made words is used for editing the computer Chinese input method software, the user can use the brevity codes of the self-made words at any time. When editing self-created words, a user only edits two digit symbols according to established rules specified by the user, and then outer codes of different words with more than 1300 self-created words can be formed. The guide symbol for knocking the self-made word is input into the self-made word record for use. For self-created words or some common punctuation marks, when a computer is used for typing, guide symbols can be added and then brevity codes are input, namely the guide symbols; for example, the "/" or "tab" key is set as a leader.

3.5, adding a space on the space as a new punctuation mark, and separating the space before and after the words (including phrases) in the single sentence characters from other characters so as to avoid the embarrassment of confusion when reading the article.

4, use of Chinese single character and word digital code

4.1 the use of a Chinese word and word code is characterized by that it can be used for Chinese dictionary and electronic dictionary search, and all the words in the dictionary are digitally coded according to the method of above-mentioned 2, then according to the sequence arrangement order of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and · and (of course, the meaning of said invention can be identical before or after the order is that the · number key is placed in 0), 11 digital code elements are represented, and then the "word index table" is made up by using said order, and the correspondent serial number of the word on the "index table" is the code of said word, and can not exceed 3 digits. After the number is less than three, the phonetic letters of the corresponding single character and single letter and the page of the single character on the specific dictionary are displayed, so that the character meaning and other attributes of the character can be searched on the specific page of the dictionary. And the page number of the word in the dictionary and the left column or the right column of a certain page number column are displayed, so that the quick search is facilitated. Similarly, a search list of a chinese dictionary and an electronic dictionary can be created by the same method.

For simplicity, when the single character indexing table is made according to the invention, only three columns of codes, font forms and page numbers are designed.

4.2, Chinese numeric coding the numeric coding of the characters of the invention can be used for Chinese input of a keyboard of a computer. The method is characterized in that the coding scheme is firstly coded into a software program. The character coding system is installed in computer and features that when the character coding system is input, the number keys on keyboard are set as code elements from left to right, 1, 2, 3, 4, 5, 6, 7, 8, 9 and 0, the single character codes and the simple digital codes of the words are led into the system, the corresponding keys are tapped to input Chinese single characters and simple digital codes of the words, and after one character is input, the space key is pressed to input other characters. Meanwhile, the Chinese pinyin is input on the lower letter keys on the keyboard, and when the coding software is set, in order to ensure that the fingers are flexible during input, the positions of a comma key, a period key and the uppermost punctuation mark key can be exchanged, and even the original lower symbols on the keyboard can be set on the F key. For example, F5 is a comma key, and F6 is a period key.

According to the number coding method described in the above 2, the present invention is characterized in that the number keys on the left side and the small number keys on the right side of the keyboard cannot be applied simultaneously, and the numbers of 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 can be set on the number keyboard on the upper side of the keyboard, while the small number keyboard on the right side of the keyboard still maintains the original number function; or interchanging the settings as above. The function conversion function is set on the decimal sub-key of the keyboard on the right side of the computer and is converted by the function conversion key of 'NumLck' on the upper left corner of the small keyboard: pressing the key button is the original digital function, and pressing the key button is the digital coding function.

A mouse with numeric keys having a function of inputting numbers instead of the small numeric keys may also be used.

4.3, the method according to 2 above, input on the mobile phone of Chinese digital coding, the characteristic of the invention lies in, the above-mentioned Chinese single word and digital coding method of the word, can be used for Chinese input of the mobile phone. The number code is programmed, and 11 symbol keys of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and # key (or delta key) on the mobile phone are set as the 11 code element key of the invention. Inputting a code when pressing one character, inputting a code element of one character, pressing three times at most and determining a key to display a single character, and determining a unique corresponding three-code single character when a corresponding serial number is selected by pressing #; the same may be used to enter words or phrases in this manner.

5 Chinese characters and other nations and nations characters digital coding and use

5.1, the present invention can digitally code characters of other nations and countries than Chinese, such as Russian, English, French, German, Korean (or Korean), and especially Korean language characters (the same applies to Korean). It is characterized by that as long as the syllable, single word or word of said language can be decomposed into more basic symbols as basic elements, and as long as these basic elements can be summarized into below 44 symbols, all the basic elements can be coded by using 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, these 11 symbols, and these basic elements of said language can be arranged in order, then can be divided into one group or several groups according to the order, every group is less than 11, and finally how many basic elements are counted. Then, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, and (dot) are used to represent the corresponding symbols of the first group in position order as symbols, and similarly represent the following groups of basic elements in order, but for the encoding of the second group, etc., the first group of symbols is added with "·" and the third group of symbols is added with two dots similarly. The various basic symbols are digitally coded in writing order and then the words or syllables or the digital symbols of the words are concatenated in writing order to form the code.

Since the syllables of korean and korean are composed of vowel, consonant, and vowel, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, and can represent consonant, vowel, and vowel, because the rule of syllables is: the consonant is in the first place, the vowel is in the second place, the final is in the three-place, although set up the same number and represent a certain vowel letter, represent another consonant letter, even the final, no matter whether people or computer can judge whether it is a consonant, vowel, final according to whether the code element is the first place, the second place, or the third place, can distinguish the code element does not add the number code of ". Sum" and add ". Sum.", each represents what.

5.2, the coding method according to 5.1, used for foreign language coding input of computer, characterized in that when inputting the numeric symbol on the computer keyboard, it can be set to add a different guide before inputting the numeric symbol to distinguish different code elements. When inputting the code element with points on the mobile phone, the code element without points is pressed by the corresponding number key once, the number key is pressed by one point for two times, and the number key is pressed by two points for three times;

5.3 encoding the single characters, syllables, and words in the foreign language as described in 5.1, the present invention is characterized in that all syllables of korean are sorted in the order of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, and (·) (or may be sorted in the order of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 0, ·, 1, 2, 3, 4, 5, 6, 7, 8, and 9, and the inventive principle is the same). Can be selected and then compiled into a character searching table. Similarly, the digital search table is edited for words in other foreign languages, such as russian, english, french, and the like.

6, the beneficial effects of the invention

The former digital coding of Chinese characters adopts 10 digital codes, the duplication code rate is high and inconvenient, and the invention adopts 11 symbols including "·". The method is convenient and reduces the code coincidence rate of the code. The definition of the whole word of the existing code is a single-body word, but the concept of the single-body word is not well mastered, and the division of the combined word is not well mastered, for example, five-stroke digital code divides a head part and a remainder part according to parts, but not a few single-word head parts and the remainder parts are not separated and are not easy to define; the invention splits the left and right type and the up and down type according to the separated interval 'one-knife cut', which is easier and more intuitive; all Chinese characters in Xinhua dictionary can be coded by no more than 3 codes; the coincident code words are the most twenty or more words and the coincident code is only below 10 codes. The method fills the blank of compiling the Chinese character retrieval table by using 3-code numbers, and can retrieve the position (page number and column) of the character on a dictionary of a certain version and the pinyin and other attributes of the character according to the established code of a single character. The method can be used as a relatively preferable search method for searching the character attributes in a font, instead of the four-corner number method and the 'radical character searching table' of a dictionary and a dictionary. It can also be used in electronic dictionary and dictionary. When designing the digital indexing system, the same code words are arranged according to the front and back sequence of variant, complex and simplified characters, even though simplified characters are arranged in front of the original ones with more strokes. The arrangement is practical and fast, and when the character is checked, the page number is displayed, and the left column or the right column of the page of the character is displayed. Can be in place by one step and has quick character retrieval. The invention can also use more 7000 common Chinese characters to input and output by numbers on a computer or a mobile phone. When inputting, the input can be quickly carried out without turning pages according to the selection of the sequence numbers of the repeated code characters, and similarly, the input and the output can be carried out on Korean, Korean and the like in the input of the mobile phone, so that the method is convenient and quick, and has the characteristics of novelty, creativity and practicability of .

Description of the drawings figure 1 is a schematic view of the present invention.

Now, the following description will be made with reference to embodiment 1 and embodiment 2 with reference to fig. 1.

With the method described in example 1, ten thousand characters in "Xinhua dictionary" can be compiled into a digital code of no more than three codes, and then the codes of the ten thousand characters are sorted (sorted in the order of 0, 1, 2, 3, 4, 5, 6, 7, 8.9,. the attached figure 1 is a screenshot of "search table" after the characters are coded and sorted. The indexing Table has 6 columns, and in FIG. 1: column 1 shows the code, column 2 shows the corresponding single character, column 3 shows the page number of the single character in 2011 version "Xinhua dictionary", and so ")" shows that the character is in the left column of the page number; with "(" in the right column; column 4 shows the pinyin column and column 5 the canonical stroke order of the word.

Examples

1, the scheme of the coding of the Chinese single character divides the single character into a compound character and an integral character. The invention is characterized in that: the combined character includes: the left-right type (including left-middle-right type) is as follows: old, outer, chi, eel and lake; the upper and lower types (including the upper, middle and lower types) are like the most 'Bibi' meaning 'sincere' or the most 'sincere'; the enclosing type character is structurally characterized in that a part of strokes of the character form an enclosure for another part of strokes, and form full enclosures such as ' nation ' and ' enclosure ', upper enclosures such as ' close ' and ' same ' ', upper left enclosures such as ' sick ' and ' watching ', upper middle left enclosures such as ' area ', upper right enclosures such as ' oxygen ' and ' division ' and ' planting ', lower left enclosures such as ' sending ', crossing ' and ' extending ', lower left right forms ' such as ' letter ' and ' clicking '.

1.1 coding of Whole words except the above upper and lower types, left and right types and surrounding types, all the remaining individual words are coded as whole words. When the whole character is coded, firstly taking code elements of a first group of strokes as an initial code according to the standard stroke order, wherein the first group of strokes is the combination of a first stroke and a second stroke, or a radical combined by the first stroke (for example, the first code in the character 'foot' is a radical port), or the subsequent strokes of the first stroke form the radical, only the first stroke can be taken as a single group of codes (for example, the first stroke of the character 'Yu' is a character 'I', and the first stroke of the character 'He' is a character 'II'), so that the initial code of the whole character is formed; if there is residual stroke, then taking the second group of strokes according to the same method, namely the middle code of the character; if the word end is a radical, the code element of the radical is taken; if the penultimate and other strokes are combined into a child root, e.g. "denier", or the penultimate is the only stroke left, such as" pill ", or the penultimate is the last stroke of the discrete stroke (i.e. the penultimate does not intersect with other strokes), such as" armor "or" dog ", these three cases occur, taking only the single stroke as the end code. Then, the formed head code middle code and tail code are connected as possible to form the whole word code. The whole word coding of the scheme can form three codes at most. In order to make the coding operation more intuitive, clear and specific: the characteristics of the combined character are as follows: the front and back must be able to move apart, and the characters and parts that the strokes are against without separating are still encoded without separating, such as: , Yao, jacket,

Feet, horns, feet, red, fish, birds, yellow, leather; however, the double stroke structure is regarded as that the strokes cannot be separated (e.g. , horse, zhu, yan, qi, ghost, black), these characters are well known and commonly used single-body characters or radicals of the single-body components, and of course, they are of an integral structure, and can be coded according to the whole character or be integrally used as the front portion of the composite character.

1.2 when encoding the multi-character, the multi-character is divided into a front part and a rear part. The left-right type (including left-middle-right type) takes the leftmost part as the front part and the rest strokes as the rear part. The front part and the rear part of the left-right character are communicated with each other at an interval from top to bottom, the front part and the rear part are divided by taking the interval as a boundary, the front part and the rear part are communicated and divided from top to bottom, the left part formed after the division is the front part of the code, and the rest is the rear part. If the front part of "Li" is "standing grain", the rear part is "vacuum control unit", "the split of ", "" is front part ", and" vacuum control unit "is rear part.

For the upper and lower type (including upper, middle and lower type) characters, the character is cut from left to right along the transverse through interval of the uppermost edge of the character, all strokes on the upper edge are front parts, and the rest strokes are used as rear parts. The upper row has two parts and should be cut with one knife. If the Chinese character 'xiao' is cut, the front part is a 'bamboo character head', the rear part is an 'death'; the forbidden front is forest; the front of the "catch" is the "catch". The front part of the Bin is cut with one knife, the king and the white are the front part and the answer, and the stone is the back part.

The enclosed type word is coded with the enclosed component strokes as the front and the remaining enclosed strokes as the back. For example, the front part of disease is , the front part of zone is T, the front part of cause is mouth, the front part of send is "" and the front part of build is , and the front part of letter is "" and "" respectively. The stroke order during coding is as follows: the bounding must be considered the front, and the order of strokes equal to this type of word is to write the bounding strokes first, and then the bounding strokes. Even when the whole word is divided into front and rear parts, if the rear part is a split word in such a surrounding form, the surrounding strokes must be taken first when the rear part is coded. Such as: the banquet coding comprises a banquet header and a banquet extension, so that the banquet coding surrounds before coding, and the banquet coding is 96 or not 59; similarly, the "tent" code is 393 or "389" which is not absolute stroke order coded.

After the upper and lower types, the left and right types and the surrounding types are divided into the front part and the rear part according to the structure, the combined character is coded, a code is firstly selected at the front part by using an integral character coding method, and other residual strokes are not used; and finally, the codes of the front part and the rear part are connected to form the code of the single character, the code does not exceed three codes, the strokes are insufficient, and the number of the strokes is calculated. Without the third code, the second code is the tail code. Such as: the code of the 'quote' is only two codes of '62', the code of the 'roll' is only one stroke after the code of the 'roll', and the code is 65. Words with three codes are: for the "cold" code, the first code is the header "" code is 4+1 to 5; the back "order" is then encoded: the first code is the radical "man", the last code is the first stroke from last "chinese character stroke" and the second stroke from last "chinese character stroke", the code elements are 4+5 to 9, so the "cold" code is 599. And then coding the 'da': surrounding "" is the front part, and the code element 9 of the first group of strokes (i.e. the combination of the first stroke point and the second stroke break) is taken as the first code of the word; the rear portion of Dada is encoded, the first set of strokes in the rear portion are combined into a second code 4, the remaining I is a set of strokes whose code elements are 4, so the encoding of Dada is 944.

1.3 code separated by tail code all single words include whole word and multiple word, when coding: when the last stroke of the tail code and the stroke of the front edge are separated, whether the tail code is the second code or the third code, the tail code is represented by a number representing this single stroke, e.g. the front part of "Bin" is wood, the first code is 1, the same code is 1, but the last stroke of "Bin" is "down" so that the tail code is 3, so that the coding of "Bin" is 113. Further, as for the "right" "coding, the front portion is" "standing grain" "the first portion is" "standing grain" "the rear portion is" "standing grain" "only, the first portion is" "standing grain" "the code element is" "3" "the rear portion" "the second portion is the second portion and also the end portion, the last vertical portion is a separate stroke, the end portion only can be a single stroke vertical portion, the code element is" "5" "the" "right" "coding is 35, the same" "stand grain" "coding, the first portion only can be a" "day" "code element" "0" "the" "stand grain" "coding is 01" "stream" "coding" & 55 "".

2, the second embodiment is a method for indexing Chinese dictionary by digital coding of Chinese single character

The Chinese character indexing system is characterized in that the codes of all the compiled characters are ordered according to the sequence of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and 9, so as to compile a Chinese character digital indexing table, and the indexing table displays the digital code, the font, the pinyin and the page number in a dictionary of a specific version according to the same horizontal line, and also displays the information of the page number, such as the left column or the right column, the stroke order and the like.

The Chinese character coding is characterized in that the simplified codes of the compiled words can be sequenced according to the sequence of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and the like, and a retrieval table, an electronic dictionary and an electronic dictionary of a Chinese dictionary with the same version can be compiled.

The third embodiment is a Chinese input method for mobile phone using digital coding.

Firstly, the method of the first embodiment is adopted to compile Chinese words and expressions into digital codes and lead the digital codes into an input system for standby. The mobile phone input method is characterized in that 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and the key on a mobile phone keyboard are sequentially set as 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and- & gt for coding, the originally set function on the key is added on the key # and if the key is pressed, the screen of the mobile phone displays ". & gt"; so that the handset can input 11 digital symbols including "·". After entering the digital input system, the method of embodiment 1 can be used to input the codes of the single characters, the candidate area of the screen will display the same code word of the code after pressing the confirm key, and then the only corresponding single character will be selected from the repeated code words according to the number sequence. And similarly, inputting the brevity codes of the words, pressing a determination key, pressing a selection number of a display screen and displaying the only brevity codes of the words.

The fourth embodiment is a method for encoding korean syllables and a method for creating a search table.

4.1A method for coding Korean and a method for making a search list, which is characterized in that the syllable of Korean is firstly divided into three types of initial consonants, consonants and radio (vowel), and the first 10 consonant symbols are sequentially represented by 1, 2, 3, 4, 5, 6, 7, 8, 9 and 0 and 10 numbers:

then 1, 2, 3, 4, 5, 6, 7, 8, 9, another 9 composite consonants are represented in sequence:

10 vowels (basic vowels) are sequentially represented by 10 symbols of 1, 2, 3, 4, 5, 6, 7, 8, 9, 0), and 11 symbols of 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, · (point):

a total of 11 composite vowels; in addition, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 and 10 symbols are used for sequentially representing the symbols of the first 10 sounds (finals), namely:

then 1, 2, 3, 4, 5, 6 are used to sequentially represent another 6 finals after the reception (finals), that is:

since the number of bits of the 6 complex tones is 10-16, the symbols should be represented by "1", 2, 3, 4, 5, 6, 7, 8, 9, 10 "in sequence, and the 11 numbers represent the symbols in turn

These last 11 consonants constitute the final, but the symbol should be written two dots on top (.).

4.2, a method for applying Korean numeric coding to computer keyboard input, according to the method described in the above 4.1, firstly dividing the syllable symbol of Korean into three types of original sound, compound sound and reception (final sound), according to the above coding method for Korean syllable, setting up an input system, designing the numeric keys on the upper part of the keyboard as numeric code element function, or designing the small numeric keys on the right side of the keyboard as the function with numeric code element. When the keyboard is used for inputting numeric code elements, the corresponding numeric code without adding points is directly input, the number with one point above is added, the shift key can be used as a leading symbol, and the backspace key can be used as a leading guide symbol for the number with two points above.

Fifth embodiment 5, a method for applying korean digital coding to keypad input of mobile phone

5.1 according to the method described in 4.1 above, a korean input system for mobile phones can be programmed to "0, 1, 2, 3, 4, 5, 67, 8, 9" in the mobile phones, which is sequentially designed as a numeric symbol in korean and ". about" in the mobile phones as a numeric symbol in korean ". When inputting Korean, directly pressing corresponding key, for number code element added with one point, inputting two times of quickly pressing corresponding number key, for number code element added with two points, quickly pressing corresponding number key for three times.

5.2A method for creating a syllable and word retrieval table of Korean, wherein the numeric codes of Korean described in embodiment 4.1 are arranged in the order of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and a, and the numbers with one dot added on the top are arranged in the second round and two dots added in the third round. And sequentially compiling a Korean digital search table. The method is used for searching syllables, words and languages in Korean.

The sixth embodiment is a method for making a Russian digit code and a Russian dictionary digit search table.

6.1, a Russian digital code is characterized in that 33 Russian letters are divided into 3 groups in sequence, each group comprises 11 letters, then letters in each group are represented in sequence by the sequence of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and the like, but a point (.) is added on the digital code of the second group of letters, and two points (..) are added on the digital code of the third group of letters to be respectively used as code elements in the represented Russian letter code, and when a Russian word is input, the digital code elements of the letters are connected together in the writing sequence to form the digital code of the word. The invention is also characterized in that single consonant, soft note and hard note of Russian can be set to replace some commonly used multi-letter words in Russian to be used as brevity codes of the words, or two consonants, soft note and hard note can be formed into different arrangement to be used as brevity codes of the words, and the soft note and the hard note can be set as 'etymons' during coding, and the dollar representing the number of the soft note and the hard note can be used for representing the multi-letter word end in Russian.

6.2, a Russian digital code computer keyboard input method, according to the above-mentioned 1, its characteristic is that when using Russian digital code to carry on the keyboard input of the computer, its input system presumes, except keeping the original keyboard letter input function, can also presume the code element function of Russian letter as the large number function on the left side on the keyboard, the small number key on the right side keeps the original function. When the keyboard is used for inputting, a first group of letters directly inputs 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and the like, a corresponding letter of the first group is input on the screen, a front symbol upshifting key is added in the front when a second group of letters are input, and a front symbol backspacing key is added in the front when a third group of letters are input.

6.3, a Russian number code mobile phone keyboard input method, according to the method of the above 1, its characteristic is that when using the Russian number code to carry on the keyboard input of the mobile phone, its input system is set, when inputting the Russian letters, the first group of letters is directly input, the second group of letters should be pressed on the corresponding keyboard at the fast speed twice, the third group of letters input will correspond to the keyboard at the fast speed three times.

6.4, a Russian dictionary using number code ordering search table preparation method, according to the above-mentioned 1 method, its characteristic is on the basis of the letter number code of Russian, carry on the number ordering to all Russian words compiled, "·" can arrange in the end, but the identical number first arranges the number code element of the letter of the first group (the number code element without adding the point), arrange the number code element of the second group letter "above the point" and arrange the code element of the third group letter (the number code element above the point with two points). But the same number first arranges the number symbols of the letters of the first group (number symbols without dots), then arranges the number symbols of the letters of the second group "with dots on the top" and finally arranges the symbols of the letters of the third group (number symbols with dots on the top). And making a Russian word retrieval table according to the sequence.

The seventh embodiment is a digit code of English and a digit look-up table making method of English dictionary.

7.1, a numeric code of English, characterized by that divide 26 English letters into 3 first groups and second groups in proper order each have 11 letters, the third group is the remaining 4 letters, then represent each letter in the first group and second group sequentially with the order of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,. sequentially, but should add and write a point () on the second group letter, represent the third group letter sequentially with 1, 2, 3, 4, add and write two points on the numeric code of the third group letter, as the code element of English letter represented separately, when inputting the English word, connect the numeric code element of each letter together in writing order, it is the numeric code of the word.

7.2, a method for inputting English number code on computer face, according to the above-mentioned method 1, characterized in that when the keyboard input of the computer is carried out by using English number code, the input system is set, the number function on the keyboard can be set to the code element function of English letter, when the keyboard is knocked to input, the first group of letters are directly input 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and-, when the corresponding letters of the first group are input on the screen, but the second group of letters are input, the front key is added as the front symbol, and when the third group of letters are input, the front symbol backspace key is added at the front.

7.3, an input method on the mobile phone keyboard with the numeric code of English according to the above-mentioned 1 method, its characteristic is that when using the numeric code to carry on the keyboard input of the mobile phone, its input system is set up, while inputting the English letter, the first group of letters are input directly, the second group of letters input again and need to press the number symbol on the corresponding keyboard and press twice fast, the third group of letters input, press the corresponding keyboard and press three times fast.

7.4, a dictionary of English dictionary using the number code sorting search table preparation method, according to the above 7.1 English letter said coding method, its characteristic is, on the basis of the number code, will compile all English words to order numerically, but the identical number first arranges the number code element of the letter of the first group (the number code element without adding the dot), then arrange the number code element of the letter of the second group "above the dot" and arrange the code element of the letter of the third group (above the dot plus two number code elements).

Claims

1. A Chinese and foreign language digital coding method, which divides Chinese single character into single stroke as basic element, divides syllable or word of foreign language into single letter as basic element, and is characterized in that when the Chinese and foreign language are coded, 11 symbols not more than 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 'and' are adopted, corresponding basic element code elements are respectively made according to rules, the letters of the foreign language word respectively form a code, and single consonant, vowel and radio of the syllable of Korean language and Korean language are respectively formed into a code, and then are arranged in sequence to form the code of the syllable of Korean language or the code of other foreign language words, wherein the single syllable codes of Chinese single character, Korean language and Korean language are not more than three codes.

2. A method for coding Chinese single characters, according to the method of claim 1, the single character is divided into single strokes, the number 1 represents horizontal stroke, 2 represents vertical stroke, 3 points, 4 represents right-falling stroke, 5 represents folding stroke, every two strokes are combined into one group according to the sequence, the numbers represented by the two strokes in the group are added to form code elements, the combination of the prescribed folding stroke and the prescribed folding stroke is represented by 1, the etymon and the single stroke which can not be combined are separately coded, characterized in that the method comprises the following steps: when a character is disassembled into a single stroke, the 'points' are not included, (i) and (i) are also independently used as a single stroke, the's' is used as a code element to represent the's', the specified points (the ') and the points (the') are combined, the's' is still used as a code element, and the's' points 'and' the code elements for separating strokes are also used as's'; when the point is combined with other strokes, the point (I) and the representative numbers of other strokes are added together according to '4', and the added result represents the code elements of the group of strokes; and three etymons are set for Chinese character coding: the first radical is represented by the number "1" for "wood" and "small" preferably to full "wood"; the second etymon is the structure of the square in the Chinese character represented by the numeral 0, and the third etymon is the structure of the character's square in the Chinese character represented by the numeral 9, wherein the character's character is represented by the numeral 'man' comprising 'eight' and the numeral 'jin' comprising 'the character's radical; the full radical is preferred when strokes are combined.

3. A Chinese single character coding method, according to the method of claim 1 and 2, divide all Chinese characters into two kinds of integral characters and compound characters, its characteristic is that the compound characters include the single character of the left and right type, upper and lower type and surrounding type; characters except the combined characters are called as whole characters, and the combined characters are divided into front parts and rear parts.

4. A method for coding Chinese single characters, the method according to claims 1, 2 and 3, characterized in that the coding methods of whole characters and combined characters are respectively as follows:

coding the whole word, firstly taking the first stroke and the second stroke of the word as a group, adding the representative numbers of the two strokes in the group, and expressing the added result numbers as code elements of the group of strokes, if the first stroke forms a radical, using the radical as a first code, and if the first stroke is followed by the radical, using the first stroke to form a first code; then, a second code is taken by the same method, if the second code is followed by the remaining strokes, a tail code is taken from the remaining strokes, if the penultimate stroke and other strokes form a etymon, the code element of the etymon is taken as the tail code, and for the etymon which is not the etymon, the first stroke is taken from the reciprocal and the second stroke is taken from the reciprocal to the front reciprocal to combine to form the tail code; if the tail code is the isolated stroke, only taking the stroke as the tail code;

the method comprises the steps of coding a composite character, wherein the leftmost part of left-right character separation in the composite character is a front part and the rest is a rear part, the uppermost part of the upper-lower character separation is a front part and the rest is a rear part, the front part is divided into a plurality of parts which are communicated with each other along the interval between the front part and the rear part, and the surrounding part of the surrounding character separation is the front part and the surrounded strokes which are used as the rear part; and (3) coding the combined character, namely, taking the first code at the front part of the combined character, taking the first code, then taking the residual strokes at the front part of the combined character to not participate in coding, and then taking the second code and the tail code of the whole character from the rear part according to the whole character code taking method.

5. The use of a Chinese character digital coding method, the Chinese character digital coding method according to the claims 1, 2, 3, 4 can be used for the keyboard input of the computer (computer and mobile phone), characterized by that, set one of the two of the left big number key or right small number key of the computer keyboard as the number code element key, use 11 symbols of ·, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, carry on the Chinese phonetic alphabet input on the letter key of the keyboard at the same time; the method can also be used for Chinese input of a mobile phone, and 11 symbols of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and the # key on the mobile phone are set as 11 code element keys of the invention. When a code is input by pressing quickly, the code element of a character is input, and then the serial number is selected according to the corresponding code word to determine the unique corresponding single character, or Chinese words or phrases can be input according to the method.

6. A Chinese character and word searching method is characterized by that the individual characters in dictionary are all digitally coded, then sorted according to the order of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and · and made into individual character searching table according to the sorting, the serial number of the word searching table is the code of individual character and is not more than 3 digits, the same row in the word searching table shows the digital code, font and phonetic alphabet of the individual character, and the word is displayed on the page number of dictionary and on the left or right of some page number column, so that it is convenient for quick searching.

7. The method as claimed in claim 1, wherein the syllables or words of a language, if the number of the split basic symbols is more than 11, the basic symbols can be divided into one group by not more than 11, and finally not more than 11, and how many basic symbols are also divided into one group; however, 11 symbols of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9', described in claim 1 are used to replace the basic symbols of each group in sequence to make symbols, but for the second group of basic symbols, a dot may be added to the digital symbols to represent the symbols, and for the third group of symbols, two dots may be added to the digital symbols to represent the symbols.

8. A coding method for korean and korean, the digital coding method according to claim 1 and claim 7, wherein the syllable of korean is first divided into three categories of initial consonant, and radio (vowel), and 10 consonant symbols are sequentially represented by 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, and 10 total numbers: then 1, 2, 3, 4, 5, 6, 7, 8, 9, another 9 composite consonants are represented in sequence: (ii) a Then 1, 2, 3, 4, 5, 6, 7, 8, 90) sequentially represents 10 vowels (basic vowels), and then 11 symbols of 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, · (point) represent 11 compound vowels; in addition, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 and 10 symbols are used for sequentially representing the symbols of the first 10 sounds (finals), namely:

1, 2, 3, 4, 5, 6 is used to sequentially represent another 6 finals after the reception (finals), a ". once" should be added to the number, and "1, 2, 3, 4, 5, 6, 7, 8, 9, 10" are used to sequentially represent 11 numbers

9. A method for applying Korean numeral coding to keyboard input of a computer (computer or mobile phone), according to the method of claims 1, 7 and 8, firstly dividing the syllable symbol of Korean into three types of original sound, compound sound and radio reception (final sound), according to the coding method for Korean syllable, setting up an input system, designing the numeral keys on the upper part of the keyboard as numeral code element function or designing the small numeral keys on the right side of the keyboard as the function with numeral code element. When the keyboard is used for inputting numeric code elements, the corresponding numeric code without adding points is directly input, the number with one point above is added, the shift key can be used as a leading symbol, and the backspace key can be used as a leading guide symbol for the number with two points above. Applying Korean numeric code to mobile phone keyboard input, sequentially using 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9 to represent numeric code elements without adding points, and inputting the code elements with one point by pressing the corresponding numeric key for two times, or pressing the code elements with two points for three times; further, it is also possible to sort the codes of the syllables of korean and korean in the order of "0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ·", and create a search table of syllables and words.

10. A Russian number coding method and use, according to the method of claim 1 and claim 8, the method of claim 1 and claim 2, wherein 33 Russian letters are divided into 3 groups of 11 letters in sequence, and then letters in each group are represented in sequence by using the sequence of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,. the letters in each group are represented in sequence, but one point is added on the number code of the second group of letters, two points are added on the number code of the third group of letters, the symbols of the represented Russian letter code are respectively used, when the Russian word is input, the number symbols of the letters are connected together in writing sequence, namely the Russian word number code is used for the computer to input number keys, but 0, 1, 2, 3, 4, 5, 6. 7, 8, 9, is set on the number keys and keys, when the code element is input by the mobile phone keyboard, "·" can be set on the "·" key, and the number codes of Russian words can be made into a number retrieval table of Russian words and various dictionaries for Russian words according to 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and sequencing.