WO2018228101A1

WO2018228101A1 - Chinese meaning based chinese encoding method and system, and medium device

Info

Publication number: WO2018228101A1
Application number: PCT/CN2018/086500
Authority: WO
Inventors: 夏诠真
Original assignee: 佛山辞荟源信息科技有限公司
Priority date: 2017-06-14
Filing date: 2018-05-11
Publication date: 2018-12-20
Also published as: CN109086257A; TW201915775A

Abstract

The invention provides a Chinese meaning based Chinese encoding method and system, and a medium device. The method comprises the following steps: encoding morphemes of Chinese, so as to obtain a morpheme code of each of the morphemes; encoding words and phrases of Chinese, so as to obtain word and phase codes of the words and the phrases (note: the word and phase code is also known as "concept code"); constructing a Chinese encoding database, the database comprising a morpheme table corresponding to the morphemes and a word and phase table corresponding to the words and the phrases; and the morpheme table including the morpheme code of each of the morphemes, and the word and phase table including word and phase codes of the words and the phrases. By means of the present method, the processing of language information is convenient, meticulous and flexible, thereby being able to perform searching, parsing and counting of big data of the language, having the powerful function of traversing the big relational database of the language, thereby effectively enhancing the value thereof.

Description

Chinese coding method, system and medium device based on Chinese meaning

Technical field

The invention relates to a computer data processing technology, in particular to a Chinese digital coding method and system and a media device encoded in Chinese.

Background technique

Generally, Chinese digital processing, such as computer processing, should be encoded first, that is, coded as an intermediary input and digital processing, to achieve information memory and transmission, so that in the personal computer, the World Wide Web, smart phones The information age represented by the Chinese character system.

The earliest Chinese coded data was the Chinese Business Telegraph Code, which was published in 1880. It has been transformed into 20 by the Wang Yunwu four-corner number in the early years of the Republic of China, the Taiwan Big Five in the 1980s, and the Chinese national standard code. At the end of the century, the international Unicode code, Chinese data is closely followed by the Latin language system into the glorious digital world.

However, the codes of the telegraph code, the four-corner number, the big five-code, the national standard code, and the unified code are all single-word code systems, and each code represents a Chinese character. In the prior art, the weakness of the unicode used by people is that it can only represent the glyph of a Chinese character, neglecting the meaning of the word and the meaning of the word, and cannot directly understand and process the meaning of the text, resulting in the Chinese being better than the Western language. The advantages are not fully applied, and the combination of the basic shape (pen shape), sound (pinyin) and meaning (meaning) of the Chinese language cannot be effectively digitized.

Existing Chinese characters have a phenotype, while homomorphic Chinese characters can have multiple meanings. For a long time, from ancient times to today, people have always used the existing Chinese pen shape as the unit of word formation. All digital information systems, including computer processing, digital search, and communication, translation and other applications, are all in the shape of existing Chinese characters. Rules come as the basic unit of digital information processing.

An article in Chinese is a collection of words and phrases rather than a collection of words. It is a "word" or a "phrase" that represents a complete concept. The word can't take on this task, so use the glyph word above. The traditional method of digital processing and information memory and electronic media limits the spread of Chinese in the digital age culture. It is impossible to provide powerful help for information search and information analysis. It lacks room for expansion and needs further improvement.

Summary of the invention

The present invention provides a Chinese encoding method, system and medium device based on Chinese meaning for overcoming the intrinsic defects existing in the digital coding scheme of Chinese characters, and digitally encodes the basic constituent elements (morphemes) of Chinese meaning, which is different from the present There are glyph elements for Chinese digital encoding and processing. The Chinese meaning element, that is, the morpheme rule, can solve the problem of computer processing Chinese characters, the common meaning of the same word, the different sounds of the same word and other problems in the process of Chinese digitization.

The Chinese digital encoding method, system and medium device based on Chinese meaning (morpheme) of the invention, including morpheme codes based on Chinese meaning, ie neutral codes, word codes based on Chinese words and phrases (meaningful set of morphemes) (also known as "concept code"), and the Chinese database of these codes corresponding to the huge Chinese meaning.

A morpheme-based Chinese encoding method provided for implementing the present invention includes the following steps:

Encoding the morphemes of Chinese to obtain the morpheme code of each of the morphemes;

Encoding words and phrases in Chinese to obtain the word code of the words and phrases;

Constructing a Chinese code database, the database includes a morpheme table corresponding to the morphemes and a vocabulary list corresponding to the words and phrases; and the morpheme table includes a morpheme code of each morpheme, the vocabulary list The word code containing the words and phrases.

Preferably, the morpheme table includes a glyph summary table and a word summary table, and the glyph summary table and the word summary table are associated with each other by the morpheme code.

Preferably, the encoding processing method based on the Chinese meaning further includes the following steps:

Using electronic devices, Chinese is input using a morpheme-assisted input method based on the neutral code and the word code.

It includes the following steps:

Receiving input data information;

Providing a morpheme selection prompt according to the input data information;

Receiving morpheme selection information, determining a morpheme code corresponding to the selected morpheme;

Calling the word meaning summary table query and providing Chinese characters corresponding to the morpheme code;

Calling the glyph summary table according to the selected Chinese characters, querying and determining the Chinese to be entered;

Display and enter the confirmed Chinese.

Preferably, the encoding of a Chinese morpheme includes encoding a single Chinese character;

Including the following steps:

Constructing a unique canonical code for each of the individual Chinese words;

Determining the number of different meanings contained in the single Chinese character;

Determining a morpheme number for each meaning of the single Chinese character;

The canonical code is combined with the morpheme number to form a morpheme code of a different meaning morpheme of the single Chinese character.

Preferably, the number of the word list is plural.

Preferably, the word code is represented by a 32-bit hexadecimal number.

Preferably, the storage of the word code in the Chinese code database is implemented by using an eight-dimensional matrix space.

Preferably, the word code is stored in an eight-dimensional matrix space, including:

Each word code is used as a point in the eight-dimensional matrix space, and the points are positioned by eight hexadecimal values of X, Y, Z, P, Q, R, S, and T, and the eight-dimensional matrix space structure is used as the storage address.

Preferably, the using the eight-dimensional matrix spatial structure as the storage address of the word code includes:

The word code consists of three parts: the sequential classification value, the sorting link value, and the protection code. The structure is as follows:

Wherein, the classification value includes three values, respectively representing X-axis, Y-axis and Z-axis coordinate values; the sorting link value includes four values, respectively representing P-axis, Q-axis, R-axis and S-axis coordinate values, The protection code is the last digit and represents the T-axis coordinate value.

Preferably, the classification value represents a point in the three-dimensional space, and all the classification values are stored in a data layout diagram of the three-dimensional space, wherein the data layout diagram is a table;

The vocabulary list is divided into a plurality of types, and different categorical values correspond to different types of vocabulary tables.

Preferably, the protection code is a control value calculated from the classification value and the sorting link value by a code map, and the code picture is a table structure composed of a plurality of vectors and a matrix.

Preferably, the word list includes: a dictionary type sentence list, a dictionary type word list, a poetry ancient sentence list, and a history list.

In order to achieve the object of the present invention, a storage medium for storing the computer program instructions of the encoding processing method based on Chinese meaning is also provided.

In order to achieve the object of the present invention, a coding processing software system based on Chinese meaning is further provided, comprising the storage medium, wherein computer program instructions in the storage medium are called to complete an encoding process based on Chinese meaning.

Further, an object of the present invention is to provide an encoding processing device based on Chinese meaning, including a central processing unit, and the storage medium connected to a central processing unit;

The central processor invokes computer program instructions in the storage medium to perform an encoding process based on Chinese meaning.

The Chinese encoding method and system and medium device based on Chinese meaning have the following advantages:

The invention has a breakthrough design, fully considers the convenience and accuracy advantages of adopting morphemes as a code number system for designing Chinese characters, and uses neutral code and word code as the core to solve different homophones and different words in Chinese digital processing. Righteousness and other issues. At the same time, the morpheme table and/or its generated applications are: smart prompt input method. And powerful, flexible and accurate coding, rich and complete, enabling people to input Chinese more easily and accurately and understand the semantics of Chinese. This coding system has the potential to help improve the electronic processing efficiency of Chinese in the era of computer digitization, make Chinese more suitable for the information processing requirements of the digital age, and contribute to the promotion of Chinese culture in the digital age.

DRAWINGS

In order to more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings to be used in the specific embodiments or the description of the prior art will be briefly described below, and obviously, the attached in the following description The drawings are some embodiments of the present invention, and those skilled in the art can obtain other drawings based on these drawings without any creative work.

1 is a flowchart of a morpheme-based Chinese encoding processing method according to an embodiment of the present invention;

Figure 2 is an embodiment of the step S100 of Figure 1;

FIG. 3 is an implementation manner of step S200 in FIG. 1;

Figure 4 is an embodiment of the step S400 of Figure 1;

FIG. 5 is a morpheme-based Chinese encoding system according to an embodiment of the present invention.

detailed description

The objects, technical solutions, and advantages of the present invention are apparent from the accompanying drawings. The present invention will be described in detail in conjunction with specific embodiments. Descriptions of well-known structures and techniques are omitted in the process to avoid unnecessarily obscuring the inventive concept. For these descriptions, it is merely exemplary. It is not intended to limit the scope of the invention.

As the basic element of Chinese semantics, morpheme has the following requirements: (1) it has only one pronunciation and an accurate basic meaning. (2) The morpheme has no glyph, it is neutral to the font, does not distinguish between simplified and traditional, and facilitates the search, statistics and information of information. analysis.

In the embodiment of the present invention, the morpheme is a language unit representing the smallest Chinese meaning, and the same word, according to the meaning of the meaning, may correspond to multiple morphemes, and the morpheme is an element of the Chinese group word, and is a unique Chinese semantic unit, relying on Words and phrases cannot exist alone. For example: "pass" corresponds to two morphemes (English send, biography; communication or biography); "calendar" corresponds to two morphemes (English history, calendar; history or calendar); "day" corresponds to three morphemes (English Sun, day, japanese; sun, day, Japan). Morphemes have a unique pronunciation and a meaning.

As an implementation manner, the morpheme is encoded, and the formed code is called a neutral code; the code and the phrase are encoded, and the formed code is called a word code.

A Chinese encoding method based on Chinese meaning, which is shown in FIG. 1 , includes the following steps:

Step S100, encoding a morpheme of a Chinese language to obtain a morpheme code of each of the morphemes;

As shown in FIG. 2, in the embodiment of the present invention, multiple meaning attributes of Chinese are analyzed, morphemes are detected, and each morpheme is defined and encoded to obtain a morpheme code, which is a neutral code.

The existing Chinese characters are used for notes, the articles are composed of sentences, the sentences are composed of words and phrases, and the words and phrases are composed of existing Chinese characters. The existing Chinese characters are different from the Western languages. They have three attributes: shape (pen shape), sound (pinyin), and meaning (meaning). A homomorphic existing Chinese character can have multiple meanings and pinyin. Because the ambiguity (multiple meaning attributes) of existing Chinese characters hinders the automatic processing of information, the analysis of big data affecting Chinese coding makes it difficult to search, disseminate, translate, input, etc. In view of the weakness of the above-mentioned existing Chinese characters, the embodiment of the present invention encodes a plurality of meaning attributes of Chinese, and encodes the morphemes to obtain a neutral code.

The difference between words, morphemes and existing Chinese characters is: 1 word is the unit of sentence construction 2 morpheme is the unit of word formation 3 Existing Chinese characters are the writing unit of record words and morphemes. The first two belong to the linguistic symbol system and have meaning attributes; the latter belong to the writing symbol system, mainly the glyph attributes, and the meaning attributes are vague. The most obvious difference between a morpheme and an existing Chinese character is that the morpheme is ideographic and neutral, and can be displayed in a plurality of different glyphs, so the encoding can be called a morpheme code, that is, a neutral code;

The breakthrough invention of the embodiment of the present invention is to abandon this unbreakable traditional method, using morpheme coding as a unit of word formation, and information processing with morpheme as the core structure is impossible for other language systems (including English and French), such as 1 is shown.

In addition, morpheme is the core of Chinese, which can make the conversion between existing simplified Chinese characters and traditional Chinese characters not rely on context analysis and rely on the indication of morpheme table (the morpheme table is a collection of morphemes, both simplified and traditional characters) The definition is performed in the morpheme table), and it is not necessary to identify that it is a simplified or traditional Chinese character, and the retrieval accuracy can be basically 100%.

Table 1:

Preferably, as an implementable manner, each code of the morpheme is constructed on the basis of the existing Chinese characters, and one code corresponds to one neutral code.

As an exemplified manner, the morpheme coding method of the embodiment of the present invention combines shape, sound, and meaning, that is, each morpheme is encoded using a neutral code.

In the embodiment of the present invention, the information of the Chinese character is encoded by using two general tables, that is, the Chinese glyph summary table and the word meaning summary table, wherein the glyph summary table only passes the "shape" attribute of the Chinese character (the radical, the stroke number, the stroke order) , (acoustic) coding; the word meaning summary table only registers the "righteous" and "sound" attribute codes of Chinese characters, homonyms synonymous Chinese characters (such as dust/dust, Chen/Chen, peak/peak) use only the same code, regardless of the written It is a traditional form, a simplified form or a variant form. As long as it is synonymous, it is treated as a word, so the code of the morpheme in the list of meanings is also called "neutral code".

In the embodiment of the present invention, in order to solve the problem caused by the ambiguity of the homomorphic Chinese characters, a word summary table and a glyph summary table are adopted. Among them, the word meaning summary indicates that the morpheme “meaning” does not mean “shape”; the glyph summary indicates that the morpheme “shape” does not mean “meaning”; this data structure is based on the relational database inventor Dr. Edgar Frank Codd. The database integrity is designed according to the third law. The purpose is to change the complex shape, sound, and meaning of the many-to-many relationship of the existing Chinese characters into simple by adding the simplified and traditional characters of the Chinese characters to the summary list. "Many-to-one relationship" and "one-on-one relationship."

Further, the dual master table changes the program for digitizing the Chinese character information: the input and storage of Chinese uses a list of meanings, and the output of Chinese (display or print of text) uses a glyph summary. The separate processing of input and output is a major innovation in information processing that changes people's work habits.

Studies have shown that one-third of Chinese characters are polysemy. In the embodiment of the present invention, a neutral code is set for each different meaning of a standardized Chinese character, wherein the neutral code is a neutral code. The purpose of the morpheme is to accurately define each meaning of the Chinese normative word. Because of the existence of polysemy, a Chinese normative character (the existing Chinese character set published by the State Council of China in 2013) can correspond to multiple morphemes. The encoding method of the canonical word is four Arabic numerals. The structure of "neutral code" is "normative word code" + "morpheme serial number", as follows:

For example, the word "line" is multi-syllable, and the code for the standard word "line" is "0483", which has: 1 walk 2 rows (row) 3 industries (business) ... a variety of meanings. Therefore, the embodiment of the present invention sets the specification word "row": 10843A ("walking" morpheme) 2083B ("row" morpheme) 30838C ("industry" sense morpheme) ... and so on, and many morphemes clearly distinguish the norm The different meanings of the word "row" are shown in Table 2.

Table 2: Morphological Table Example

规范字Normative word	规范字代号Specification word code	语素代码Morpheme code	语素音Morpheme	语素含义Morpheme meaning	能组成的词a word that can be composed
行Row	04830483	0483A0483A	XíngXíng	走(walk)Walk	行走、步行、旅行、行踪Walking, walking, traveling, whereabouts
行Row	04830483	0483B0483B	HangHang	排(row)Row	单行、双行、雁飞成行Single line, double line, geese flying into line
行Row	04830483	0483C0483C	HangHang	行业(business)Business	外行、同行如敌国Foreigners, peers, such as enemy countries

The morpheme code is based on the code of the existing standard Chinese character (for example, "0483" is the "line" standard word code), plus an identifying letter (A, B, C, D, E, F, G...) The codes of different meaning morphemes, as in the above example: 0483A is the code of the "go" morpheme, 0483B is the code of the "row" morpheme, and 0483C is the code of the "industry" morpheme.

As a more preferred embodiment, based on the morpheme coding, the number N of morpheme codes is added, and N morpheme codes are obtained, wherein N is an integer, indicating that the existing kanji has a total of N morpheme codes.

The morpheme code is based on the existing Chinese characters, plus the number N, N is an integer, that is, the existing canonical word has N morpheme codes, for example, the morpheme code of the existing canonical word "row" is 04833, wherein the last one Bit 3 indicates that the canonical word has 3 morphemes.

Step S200, encoding words and phrases in Chinese, and obtaining word codes of the words and phrases;

As shown in FIG. 3, in the embodiment of the present invention, a morpheme code (neutral code) is used as a construction unit, and a word or a phrase is set to obtain a word code.

Words and phrases are the basic units of human thinking, reasoning, and exchange of information, morphemes of embodiments of the present invention. Compared with chemistry, words are like atoms. Words and phrases are like molecules or genes. The performance of analytes should stop at molecules or genes. Analytical articles should be based on words and phrases. From the perspective of the embodiment of the present invention, a morpheme is an element constituting a word, and a word is a basic unit constituting a sentence. In the embodiment of the present invention, a Chinese word (substantially a Chinese word) or a phrase is treated in one piece. From the point of view of it, a "word" is a combination of all monosyllabic morphemes or a plurality of morphemes that can be independent. Single words, multiple words and phrases (idioms, conjunctions, proverbs, proverbs, afterwords, maxims, famous sentences, idioms, names of people, place names, institution names, brand names, trade names, specialist terms...) should be coded , thus set to the word code, and the word code is set by the morpheme code (neutral code).

The biggest benefit of defining words and phrases with morphemes is that morphemes have "unicity". The function of the text is a note, the clearer the better. The biggest purpose of coding information is to achieve "uniqueness" and to eliminate ambiguities and inaccuracies expressed in ordinary languages.

A morpheme is a unit of words or phrases that should be able to accurately pronounce words and phrases. The encoding of the words and phrases from the morpheme is collectively referred to as the word code in the embodiment of the present invention. As shown in Table 3, the word code is classified into the following eight categories from the perspective of group words: 1 language morpheme (word code) 2 surname morpheme (word code) 3 person name morpheme (word code) 4 place name class Morpheme (word code) 5 science morpheme (word code) 6 ancient Chinese morpheme (word code) 7 nonsense phoneme morpheme (word code) 8 table morpheme (word code) and so on. The latter two types are not recognized as true morphemes (true word code) in the prior art, but in the embodiments of the present invention, they are also encoded for the accurate retrieval of information and the need for big data analysis, which is called "false morpheme (false words) code)".

The morpheme is used as the construction unit, and the words or phrases of the morpheme and the morpheme of the nonsense (no meaning, express pronunciation) are set to be pseudo-statement codes.

As an implementable way, many words (especially foreign words) that make up a word or phrase are only sounds, not ideograms, such as: "horse" and "da" in the word "motor"; the word "Citroen" The words "snow", "iron" and "dragon" in the middle. These Chinese characters used to translate foreign products, trademarks, names and place names are used only for the pronunciation. Ma, Da, Xue, Tie, Long... These words have nothing to do with their original meaning.

In the statement code in the embodiment of the present invention, all the phonetic Chinese characters are collected by using the "nonsense phonetic morpheme" table, and each phonetic word is encoded, which greatly improves the accuracy of information retrieval and analysis.

Table 3: Morphological coding rules

语素类别Morpheme category	语素序号Morpheme number	能组成的词(例子)Words that can be composed (examples)
语文类语素Language morpheme	A、B、C、D、E、F、G、H、I、J、K、LA, B, C, D, E, F, G, H, I, J, K, L	蛇行、苦行僧、自行其是Snake, ascetic, self
姓氏类语素Surname morpheme	MM	陈、李、张、王、何Chen, Li, Zhang, Wang, He
人名类语素Human morpheme	NN	慈禧太后、李白、朱邦復Empress Dowager Cixi, Li Bai, Zhu Bangfu
地名类语素Toponymic morpheme	PP	上海、巴黎、马陵道Shanghai, Paris, Maling Road
科技类语素Technology morpheme	RR	本特雷电报码、有机发光材料Bentley telegraph code, organic luminescent material
古汉语语素Ancient Chinese morpheme	TT	夫未战而庙算胜者，得算多也If the husband is not fighting and the temple is the winner, it’s too much.
无义表音语素Nonsense phoneme morpheme	VV	马歇尔、三文治、雷达Marshall, sandwich, radar
表形语素Tabular morpheme	ZZ	图书馆/圖書館Library/library

For example, the words "horse lane" and "maling road" have the words "horse" and "dao". The standard word number of the word "马" is 2777, and the standard word number of the word "dao" is 2745. The "horse" morpheme of the horse lane is 2777A, the "horse" morpheme of the Maling Road is 2777P; the "dao" morpheme of the horse lane is 2745B, and the "dao" morpheme of the Ma Lingdao is 2745P; the "horse" of these two words The word and the word "dao" are different, because the horse lane is a common word, and the Ma Lingdao is a geographical term. If you do not distinguish from the morpheme level, the search for information cannot be accurate, but the meaning of the "horse" of the animal and the "horse" of the geographical term are mixed, so the analysis result of the data is not accurate.

Example :

The animal "horse" morpheme (2777A) can be composed of: words, phrases, horses, horses, horses, successes, etc.;

The place name "Ma" morpheme (2777P) can be composed of: Ma Lingdao, Ma Yipo...

The phonogram "Ma" morpheme (2777V) can be composed of: motor, Rome, Madrid, etc.;

The morpheme is a word-forming unit, and the word or phrase other than the non-speech morphemes and the morphemes are encoded to obtain a false word code.

Table 4 below is a morpheme and word comparison table showing the relationship between morphemes and words.

Table 4: Morpheme, word comparison table

	语素Morpheme	词/短语Word/phrase
代号Code	语素编码(中性码)＝4个阿拉伯数字+语素序号Morpheme coding (neutral code) = 4 Arabic numerals + morpheme number	词句码(按类别和用途，分散储存在多个表中)Word code (distributed in multiple tables by category and purpose)
功能Features	是组成词或短语的最小信息处理单位Is the smallest information processing unit that makes up a word or phrase	是组成复合词或句的单位Is the unit that makes up a compound word or sentence
核心表格Core form	《语素表》+附表(部首、声符……)Morpheme Table + Schedule ( radicals, notes...)	常用词、成语、人名、地名、术语……数十个表Commonly used words, idioms, names of people, place names, terms... dozens of tables
表格结构Table structure	《规范字字形表》和《语素表》是姐妹表The Normative Glyph Table and the Morpheme Table are sister tables.	每个表的字段数量和内容都不同，表与表互相串联The number and content of the fields in each table are different, and the table and table are connected in series.
基本用途Basic use	字形查阅、字音查阅、字义查阅、语素检索……Glyph look, word pronunciation, word sense, morpheme search...	信息的取阅、搜索、分析，支持智能提示输入法Information retrieval, search, analysis, support intelligent prompt input method

Whether they are common words or phrases, they are defined by one or more morphemes ("word/phrase" = "morphe 1" + "morphe 2" + "morphe 3"...). Take the idiom table as an example. Each idiom consists of four (or more) morphemes, as shown in Table 5 below:

table 5:

成语idiom	成语拼音Idiom phonetic	第一字First word	第二字Second word	第三字Third word	第四字Fourth word	“道”字含义The meaning of the word "dao"	序号Serial number	语素码Morpheme code
安贫乐道Anomalous	ān pín lè dàoNn pín lè dào	安Ann	贫poor	乐fun	道Road	法则，道德Rule, morality	AA	2745A2745A

成语idiom	成语拼音Idiom phonetic	第一字First word	第二字Second word	第三字Third word	第四字Fourth word	“道”字含义The meaning of the word "dao"	序号Serial number	语素码Morpheme code
班荆道故Ban Jingdao	bān jīng dào gùBān jīng dào gù	班class	荆Jing	道Road	故Therefore	说，讲Say, speak	CC	2745C2745C
背道而驰Running in the opposite direction	bèi dào ér chíBèi dào ér chí	背Back	道Road	而and	驰Chi	路，途径Road	BB	2745B2745B
兵行诡道Soldiers	bīng xíng guǐ dàoBīng xíng guǐ dào	兵Soldier	行Row	诡sly	道Road	法则，道德Rule, morality	AA	2745A2745A
惨无人道inhuman	cǎn wú rén dàoCǎn wú rén dào	惨awful	无no	人people	道Road	法则，道德Rule, morality	AA	2745A2745A
豺狼当道Jackal	chái láng dāng dàoChái láng dāng dào	豺豺	狼Wolf	当when	道Road	路，途径Road	BB	2745B2745B
称孤道寡act like an absolute monarch	chēng gū dào guǎChēng gū dào guǎ	称Weigh	孤solitary	道Road	寡Widow	说，讲Say, speak	CC	2745C2745C
称兄道弟call each other brothers	chēng xiōng dào dìChēng xiōng dào dì	称Weigh	兄Brother	道Road	弟younger brother	说，讲Say, speak	CC	2745C2745C
盗亦有道Pirates also have a way	dào yì yǒu dàoDào yì yǒu dào	盗Thief	亦also	有Have	道Road	法则，道德Rule, morality	AA	2745A2745A
道傍之筑Building of the road	dào bàng zhī zhùDào bàng zhī zhù	道Road	傍傍	之It	筑build	路，途径Road	BB	2745B2745B
道边苦李Daobian Li	dào biān kǔ lǐDào biān kǔ lǐ	道Road	边side	苦bitter	李Lee	路，途径Road	BB	2745B2745B
道山学海Daoshan Xuehai	dào shān xué hǎiDào shān xué hǎi	道Road	山mountain	学learn	海sea	法则，道德Rule, morality	AA	2745A2745A

In the embodiment of the present invention, for example, in the "Idiom Table", among the seven thousand idioms, 75 idioms include the word "dao", but the word "dao" corresponds to 6 or 7 morphemes, so the embodiment of the present invention is " When the idiom table is coded, it should indicate which morpheme (one of A, B, C, ...) that constitutes the idiom.

In the embodiment of the present invention, the first sentence of the first chapter of Laozi's Tao Te Ching is: "Tao Dao, very Tao"; three "Tao" characters appear in this sentence, meaning different, so three different morphemes should be used. To express and remember its meaning (affecting interpretation and translation). Among them, the first word "dao" is a noun, meaning "dao" (Dao) of the Tao Te Ching; the second "dao" is a verb, meaning "talk"; third The word "dao" means "method". The phrase "Tao Dao, very Tao" can be translated as: "The truth that can be dictated is not an eternal truth." The philosophical theory of Tao Te Ching is profound, and the explanations of later generations may not be the opinions of Lao Tzu himself. There is no morphological concept and the truth of the author of the Tao Te Ching cannot be accurately translated.

The embodiment of the present invention distributes all Chinese vocabulary (the number of targets is one million) in tens to hundreds of forms according to word classes (common words, idioms, linguistics, idioms, linguistics, slang, proverbs, maxims, Allusions, names of people, names of places, names of school organizations, specialist terms...); lexical coding means that words and phrases are defined by morphemes, as in the above example, the "Apocalypse" is divided into four morphemes: security, poverty, music, and Tao. , indicating the "neutral code" of the "dao" morpheme = 2745A; splitting "Bang Jing Dao" into four morphemes: class, Jing, Tao, and so, indicating the "neutral code" of the "dao" morpheme =2745C...... The two Arabic numerals of 2745 represent the normative word "dao", and A and C are the serial numbers of the "dao" morphemes.

In the embodiment of the present invention, the core method is: (1) Each word and each phrase (idiom, place name, specialist term, ...) are encoded. Each code represents a concept and does not represent a Chinese character string. Words or phrases of the same concept (such as "mouse" / "mouse"; "astronaut" / "spaceman"), although the strings are different, only use the same code to represent. Words or phrases with N meanings are represented by N codes (for example, the word "fan" has two significantly different meanings of "food" and "FANS", so it is represented by two different codes) (2) each The word code must be accurately defined; for the accuracy of the definition, in many cases the embodiment of the present invention adds English/French corresponding words (such as using "FANS" to accurately define "fans") (3) string (word /phrase) expressed in neutral code (sentence morphemes); words/phrases expressed in neutral codes make them more independent and accurate, so they are not plagued by differences in simple, complex, and alloglyphs. (4) Vision or The nature of the phrase, the embodiment of the present invention uses a table with different structures to record its attributes (such as the number of fields and contents of the common vocabulary, idiom table, place name table, etc.) are completely different.

Take the "Idiom Table" as an example. The way to encode "words/phrases" is:

In step A, the Chinese vocabulary is collected and stored in a plurality of forms in the relational database according to the part of speech/word class;

As an implementation manner, the table includes, but is not limited to, a common vocabulary, an idiom list, a philanthropy list, an allusion table, a Chinese place name table, and the like.

Step B, the table is divided into morphemes;

Each table is reviewed, supplemented, and corrected by Chinese experts with the assistance of technicians. After generating a complete form, the vocabulary is split into morphemes by the coding program.

For example, the idioms of "Apocalypse" are divided into four norms: security, poverty, music, and Tao. Each normative word is defined by four Arabic numerals (for example, the word "dao" in the middle of poverty is represented by 2745).

In step C, the above-mentioned canonical word is replaced by an appropriate morpheme by adding a morpheme number (A, B, C, ...) to each of the canonical words, for example, the morpheme number of the word "dao" of the sinister music is "A" ( So the morpheme code is 2745A).

Step S300, constructing a Chinese code database, the database includes a morpheme table corresponding to the morpheme and a vocabulary list corresponding to the word and the phrase; and the morpheme table includes a morpheme code of each morpheme, The word list contains the words and phrases of the words and phrases.

The neutral code is classified and summarized, and the word code is combined to form a morpheme database based on semantic coding.

As a preferred implementation, the morpheme table is integrated into an eight-dimensional matrix space and sorted and linked.

Each of the word codes is represented by a combination of a plurality of vectors and a matrix by a number of 32 bits, that is, 8 16-digit numbers and 4 bytes of length.

Since the number of words and sentences is up to one million, far exceeding the limit of 16 bits (current coding systems, including Unicode, are based on 16 bits, and only 16536 codes can be produced with 16 bits. Therefore, the Chinese encoding method of the present invention encodes words and phrases by a 32-bit (ie, 8 16-digit, 4-byte length) hexadecimal numbers (the morpheme itself is 16) One bit encoding).

The vocabulary coding for a million structures and attributes is a very complicated problem, considering the uniqueness of the code, the update and accessibility, and so on. As an implementation manner, each code (including a neutral code and a word code) is regarded as a point in an eight-dimensional matrix space, and the points are eight, X, Y, Z, P, Q, R, S, and T. The values (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F) are de-positioned. The embodiment of the present invention utilizes an eight-dimensional matrix spatial structure to make a storage address of information.

The coding is composed of three parts: classification value, sorting link value and protection code. Its structure is:

A class value is a point in a stereo matrix (three-dimensional matrix space) stored in a table in three-dimensional space. The name of the table can be called a data allocation table. Its task is to record the number. The attributes of a hundred tables;

The sort link value is the sequence number of the record of each table, expressed in four 16-digit numbers. ;

As an implementation manner, the sorting may be sorted by pinyin letters, such as a, o, e, ...; or may be sorted by stroke size, for example: one, B, ....

As an implementation manner, the links may be linked in the same glyph, for example, between the poems of the name “Li Bai”; or may be linked between words having the same meaning, for example, “filial piety” Link all morphemes.

The protection code is a control digit calculated from the classification value and the ranking value by a code map, and as an implementable manner, a control value that can be calculated from the classification value and the ranking value. The code map is a table structure composed of a number of vectors [vector] and a matrix [array].

As an implementation, the following class values include, but are not limited to, table classifications:

(1) Dictionary class, the following is the main table of the code table (ie, dictionary class) of the embodiment of the present invention (each table is an independent file, storing the same kind of information):

The word list, which collects 13,000 Chinese words, its main fields are simplified and traditional fonts, pinyin, short spells, radicals, strokes, notes, strokes, short interpretations, and detailed explanations.

A commonly used glyph table, which collects about 6,000 commonly used simplified, traditional and variant Chinese glyphs. Its main fields are: glyph, Unicode code, radical, stroke, stroke order, sound, basic sound, basic meaning (example: / 碍,袄/袄,鉴/鉴/鋻/鉴...... These words are synonymous but each different form uses one record to register information);

Commonly used word meaning table (ie commonly used morpheme table), which is based on 4500 commonly used Chinese characters (estimated 8000), it is the blueprint for making neutral code in the future, the main field is: word code code (ie Morpheme code), default glyphs, simplified glyphs, traditional glyphs, variants, word definitions, pinyin, short spells, short interpretations, detailed explanations, examples, English equivalents (example: obstacles, obstacles, 袄/袄, Jian/鉴/鋻/鉴...... These words are different, but each group of words only uses one record to register information; but some polysemous words, such as the word "Tibetan" stored in Tibet and the word "Tibetan" in Tibet, although the glyphs are the same, Because the former's pronunciation is cáng, the latter's pronunciation is zàng, the meaning is completely different, there is no connection, so we use two records to register information);

The radical table, which collects 260 simplified and traditional radicals (for example: 钅, 釒, and gold are three different radicals);

The sound note table collects 1000 sounds, and 80% of Chinese is a sound word. In the embodiment of the present invention, the "sound" is changed to the "sound" side by word, and the usage is similar to the radical.

The keyword table, which collects 500 keywords, consists of morphemes. As an implementable method, the most commonly used 500 morphemes are selected and called "keywords". Information search by semantic keywords is a basic function. Different from other Chinese coding systems, because keywords are defined by word meaning, the meaning is accurate, so the search of information can be done very delicately. It can be done by other Chinese systems. .

(2) Dictionary Type - The following is a table of poetry dictionary classes of the embodiment of the present invention:

A commonly used vocabulary, which collects about 60,000 commonly used words. It is a form that is made according to the principle of one yard and one meaning. Each record has only one basic meaning. The main fields are: simplified characters, traditional Chinese characters, pinyin, even spells, definitions, example sentences, English words, French words, keywords, first words, tail words, words.

Phrase class - The following is a table of the phrase poetry dictionary class, etc.:

Idiom, which collects about 7,000 idioms, its fields are: simplified string, traditional string, pinyin, even spell, simplified interpretation, traditional interpretation, use example sentences, English translation, keywords, first words, tail words;

Chinese Association, this table can collect 3,000 links (two sentences of idioms / famous words), its fields are: joint language, pinyin, annotation, short comment, source, source, category, keyword, first word;

Proverbs, which collect about two thousand common proverbs, whose fields are: proverbs, categories, explanations;

After the break, it collects about 2,000 words after the break, and its fields are: after-speaking, category, explanation;

Famous saying, it collects about 2,000 Chinese famous sayings, its fields are: famous words, categories, sources, authors;

The maxim, which collects about two thousand common adages, whose fields are: maxim, category, source, interpretation;

Fable, which collects about 2,000 Chinese fables, including but not limited to: fables, title, category, author, etc.;

Sayings, also known as idioms, collect about two thousand common idioms, and its fields are: idioms, categories, sources, and explanations.

(3) Poetry Ancient Books - The following are the main forms of poetry and ancient books:

There are three hundred poems of Tang poetry, which consist of two tables of poetry and poetry. Its fields include but are not limited to: poetry, author, author introduction, poetry genre, original text, annotation, commentary, Chinese translation, English translation, French translation. ;

There are three hundred songs in Song Dynasty, which consist of two parts: word content and word author. Its fields include but are not limited to: name card name, word title, author, author introduction, word original, annotation, comment, Chinese translation, English translation, French translation;

Bai Xiang's lyrics, which was compiled by Shu Menglan of Jing'an people during the Jiaqing period of the Qing Dynasty. It selects a total of 100 words from Tang to Qing, and all hundred is a valuable reference for lyrics. The fields of this table include but are not limited to: name card name, author, title, original text, test, practice;

Selected poems of the past dynasties, collecting about 2,000 poems from the Qin Dynasty to the modern times;

Selected slogans in the past, collecting about 2,000 words from the Tang Dynasty to modern times;

Pei Wen poetry, which collects 105 poems, its fields include but are not limited to: poetry rhyme name, big category, poetry rhyme number, attached poem rhyme word;

Gu Wenguan, which is a collection of Chinese prose in the past dynasties, a total of 218 articles. It was a study of ancient Chinese texts selected by Wu Chucai and Wu Tiaohou during the Kangxi reign of the Qing Dynasty. The fields of this table are: author, author introduction, dynasty, title, article title, original text, comment, vernacular translation, short comment

The four books, which are the collective name of "The Analects of Confucius", "Mencius", "University", and "The Doctrine of the Mean". "The Analects of Confucius" records the words and deeds of Confucius, "Meng Zi" records Meng Yan's words and deeds, "The Doctrine of the Mean" and "University" are two articles written by the Southern Song Dynasty scholar Zhu Xi from the "Book of Rites". The authors of the four books are Confucius, Zi Si, Mencius, Cheng Zi, Zhu Xi, etc., with a time interval of 1,800 years. After the Song and Yuan Dynasties, the four books became a must-read for the school's official textbooks and the imperial examinations.

The Tao Te Ching, the Tao Te Ching, was made by Laozi (Li Er) in the Spring and Autumn Period of China. It consisted of 81 chapters and was translated into many languages. The fields include, but are not limited to, chapters, original texts, vernacular translations, English translations, French translations, and reviews;

A selection of Chinese folk songs, which collects about 300 Chinese folk songs.

(4) History and Geography - The following are the main forms of the history and geography:

In the Chinese dynasty, its fields include, but are not limited to: the name of the dynasty, the age of the beginning of the AD, the founder, the capital, the present place, the main characters, and the notes;

The history of Chinese history, its fields include but are not limited to: year, dynasty, brief description of major events;

Chinese ancient celebrities, who collected information on ancient Chinese celebrities (politicians, philosophers, strategists, writers, artists...) into this table, its fields include but are not limited to: name, dynasty, category, introduction;

Chinese modern celebrities, who collect information on modern Chinese celebrities (politicians, philosophers, strategists, writers, artists, scientists, etc.) into this table, its fields include but are not limited to: name, category, introduction;

Chinese provinces, including but not limited to: provincial name (or district name), abbreviation, category, area, population, provincial capital (or capital), major towns;

China's big towns, Chinese geographical terms, China's famous attractions, its fields include but are not limited to: provincial name (or district name), abbreviation, major categories, fine categories, levels, short sentences, detailed introduction, pictures;

The national name capital table, its fields include but are not limited to: region, country name (Chinese + English), capital (Chinese + English), area, population, short introduction, remarks, national flag, national anthem.

Step S400, using an electronic device, inputting Chinese using a morpheme-assisted input method according to a neutral code and a word code.

As shown in FIG. 4, step S400 includes the following steps:

Step S410, receiving input data information;

Step S420, providing a morpheme selection prompt according to the input data information;

Step S430, receiving morpheme selection information, and determining morpheme coding corresponding to the selected morpheme;

Step S440, calling the word meaning summary table query and providing Chinese characters corresponding to the morpheme coding;

Step S450, calling the glyph summary table according to the selected Chinese characters, querying and determining the Chinese to be entered;

Step S460, displaying and inputting the determined Chinese.

After the neutral code and the word code are set as the morpheme database, the Chinese language can be stored and transmitted in three different formats: (1) Unicode (2) Neutral Code (3) Word Code. Take the string "Chinese Treasure Chest" as an example, and archive it with Unicode. The inner code is: "6C49 8BED 767E 5B9D 7BB1"; archived with neutral code, the inner code is: "BA7E BB79 A6CA C45F BD63"; archive with word code The internal code is: "ABCD1234".

As an implementable manner, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F represent a 16-digit number, because the computer uses two The method of entering the law (0, 1 carry) (the "digital" of the digital camera is the common name of the binary method), the Chinese code does not use the 10 method and uses the binary method. A hexadecimal digit consists of 4 binary digits (0, 1); two 16 digits form a byte (byte, the smallest unit of computer memory). The length of the Unicode code and the neutral code are both two bytes, and the length of the word code is four bytes, which is composed of eight 16-digit numbers. The matrix coding table of the embodiment of the present invention is conceived according to the hexadecimal notation.

Since the inner code is different, the embodiment of the present invention should develop a new input method to enter information. Input methods can be handwriting, oral reading, pinyin, radicals, notes, strokes, foreign languages, and so on. Regardless of the manner (pinyin, handwriting, oral reading, etc.), in the embodiment of the present invention, on the basis of the prior art, it is processed by the morpheme input method according to an embodiment of the present invention.

Based on the morpheme code, input the existing Chinese characters to be used in the existing Chinese character input method, and then list all the morphemes of the word, and the user selects the morphemes.

There are a variety of Chinese character input methods for existing Chinese characters, such as pinyin input method and Wubi input method. In summary, no matter what method is used, the input unit is a word or a phrase instead of a word. Using the word as the input unit can reduce the repetition rate. If you encounter a polysemy, a dialog box is displayed asking the user to select the appropriate meaning.

Example 1: Enter "fans" meaning FANS:

11) The user inputs "fs" and presses the Enter key;

12) Display: (1) mode (2) launch (3) occurs (4) affiliated (5) fans...;

13) The user clicks (5) the fan and presses the Enter key; the system asks: (1) the meaning of the food (2) the transliteration of the English FANS; what is the meaning?

14) User click (2) Transliteration of English FANS, press Enter;

15) Select the word code that means English FANS, and then display a new input layout.

Wherein, steps 11. to 13. are the same as in the prior art; and steps 14. to 15. are the embodiments of the present invention in which the user selects the meaning of the food or the meaning of the FANS, and obtains a suitable word code.

Example 2: When the user types the word "CHEN", it displays: 1. Chen 2. Chen 3. Dust 4. Dust 5. Morning 6. 趁......; If the user selects the first or second item, the internal code is It is B3AF. If the user selects the third or fourth item, the internal code is B9D0. Among them, B3AF is the neutral code of "Chen surname", B9D0 is the neutral code of "DUST"; in the input stage, only the meaning of the word, ignore the glyph, write abbreviated and write, all remember with the same neutral code.

Example 3: When the user types the word "BAI", it displays: 1. white (color) 2. white (speaking) 3. white (last name) 4. worship 5. pendulum 6. defeat...; if the user wants Enter the word "white" and choose one of the first, second or third meanings, which is the "white" of the color, or the "white" of the speech, or the "white" of the last name. The inner code is A5D5, A5D6. , A5D7, to clear the meaning of the word "white".

If you encounter a situation where the re-encoding rate is too high (such as single words and unusual double words), you can enter the complete pinyin (consonant + final) of the words, and then type the category (ie, the classification value) to reduce the re-code. rate.

In order to achieve the object of the present invention, an embodiment of the present invention further provides a storage medium for storing computer program instructions according to the Chinese meaning encoding processing method according to the embodiment of the present invention.

In order to achieve the object of the present invention, an embodiment of the present invention further provides an encoding processing software system based on Chinese meaning, including the storage medium, where computer program instructions in the storage medium are called to complete encoding processing based on Chinese meaning.

As an implementation manner, as shown in FIG. 5, the software system includes a morpheme encoding module 10, a statement encoding module 20, a table module 30, and an input module 40. among them:

The morpheme encoding module 10 is configured to encode a morpheme of a Chinese language to obtain a morpheme code of each of the morphemes.

The sentence encoding module 20 is configured to encode words and phrases in Chinese to obtain word codes of the words and phrases.

The table module 30 is configured to construct a Chinese code database, where the database includes a morpheme table corresponding to the morphemes and a vocabulary list corresponding to the words and phrases; and the morpheme table includes each morpheme The morpheme code, the word list contains the word code of the word and the phrase.

The input module 40 is configured to input Chinese by using a morpheme-assisted input method according to a neutral code and a word code using an electronic device.

In order to achieve the object of the present invention, an encoding processing device based on Chinese meaning is further provided, including a central processing unit and the storage medium connected to the central processing unit;

The working process of the storage medium, the software system, and the processing device in the embodiment of the present invention is basically the same as the Chinese encoding method based on the Chinese meaning. Therefore, in the specific embodiment, the detailed description will not be repeated.

As a preferred embodiment of the present invention, in order to help overseas Chinese and foreigners who do not understand pinyin to input Chinese, the present embodiment also has a glyph input method (handwriting, cangjie, wubi, radical strokes, notes, strokes). The logic is called system identification after inputting the whole word (such as "dayday") or phrase (such as "Li Bai") by traditional methods such as handwriting, Cangjie, Wubi, radical strokes, notes, strokes, etc., so the system knows " "White Day" and "Li Bai" are inseparable strings. Look for the words "dayday" or "Li Bai".

A morpheme is used to define a word or phrase; the unit of input is not a morpheme but a word or phrase. When the user enters "Li Bai", the code of the phrase "Li Bai" is found. After finding the code, the value of the code (the value of the XYZ matrix) is used to accurately determine that "Li Bai" is the name of the person.

The language processing method, system and medium device based on Chinese meaning in the embodiment of the present invention have the following advantages:

(1) Unicity - Based on the coding method corresponding to the unique semantics, the morpheme can realize the one-to-one correspondence with the code, that is, the uniqueness of the code. Analyze and collect Chinese words and phrases, and store them in hundreds of tables in relational databases (Access, Oracle, others). The coding codes are not the same;

(2) Accessibility – Since the word code is a data address, the way to obtain data is direct access, which is very fast;

(3) navigability – each morpheme has a hyperlink function. Without leaving the computer environment in which the present invention operates, the user can browse the entire knowledge base at will (for example, reading Bai Juyi’s "The Song of Everlasting Sorrow", the user clicks on "fishing" The morpheme of Yangshuo Drum can show the interpretation of "Yuyang Drums"; the user then clicks on the morpheme "An Lushan" from the explanatory text, and enters the "Ancient Chinese Names List" to show the life of An Lushan and "The Anshi Rebellion" "Review; after reading the explanation, you can return to the verse of "The Song of Everlasting Sorrow".

(1) The language system has been used for many years, especially since the Unicode system has been in use for more than 20 years, and it has become obsolete. Without the ability to load the new needs caused by the rapid advancement of technology in the information age, the embodiment of the present invention introduces a neutral code ( Morpheme coding) and word code (word and phrase coding), with the vitality of this method and system to promote the continued development of language culture and technology.

(2) The language system is diverse. Taking Chinese characters as an example, due to historical reasons, independent development has formed two simple and traditional Chinese languages, which is not conducive to cultural and economic exchanges. At the same time, in the new era, the application of new words, the translation of foreign words, and the production of technical vocabulary are very non-uniform and hinder the interaction of language and culture. The embodiment of the present invention serves the people of the world with technical reforms. It collects languages, tries to unify the language processing of foreign words and new words, and enables the neutral code (morpheme coding) logic of multi-font coexistence, so that users can conveniently select and use. .

(3) The embodiment of the present invention collects a large number of morpheme vocabulary, performs coding processing (adding foreign language corresponding words, etc.), and obtains 1) a linguistic knowledge base with morpheme as the core; 2) a neutral code and a word code as the backbone. Language processing system; 3) Intelligent prompt input method backed by language knowledge base. The three major modules of knowledge base, coding system and input method can be applied independently or combined.

In summary, the morpheme-based language processing method and system of the embodiment of the present invention is convenient, delicate, and flexible in processing language information, and can perform search, analysis, and statistics of language big data, and has a super-function of a large relational database of a language. Has a strong boost to its value.

The specific embodiments of the present invention are described in detail in the detailed description of the embodiments of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Claims

A coding processing method based on Chinese meaning, characterized in that it comprises the following steps:

Encoding the morphemes of Chinese to obtain the morpheme code of each of the morphemes;

Encoding words and phrases in Chinese to obtain the word code of the words and phrases;

Constructing a Chinese code database, the database includes a morpheme table corresponding to the morphemes and a vocabulary list corresponding to the words and phrases; and the morpheme table includes a morpheme code of each morpheme, the vocabulary list The word code containing the words and phrases.
The encoding processing method based on Chinese meaning according to claim 1, wherein the morpheme table comprises a glyph summary table and a word meaning summary table, and the morphemes are passed between the glyph summary table and the word meaning summary table. Codes are related to each other.
The encoding processing method based on the Chinese meaning according to claim 2, further comprising the following steps:

Enter Chinese using electronic devices;

It includes the following steps:

Receiving input data information;

Providing a morpheme selection prompt according to the input data information;

Receiving morpheme selection information, determining a morpheme code corresponding to the selected morpheme;

Calling the word meaning summary table query and providing Chinese characters corresponding to the morpheme code;

Calling the glyph summary table according to the selected Chinese characters, querying and determining the Chinese to be entered;

Display and enter the confirmed Chinese.
The encoding processing method based on Chinese meaning according to claim 3, wherein the encoding of the Chinese morpheme comprises encoding a single Chinese character;

Including the following steps:

Constructing a unique canonical code for each of the individual Chinese words;

Determining the number of different meanings contained in the single Chinese character;

Determining a morpheme number for each meaning of the single Chinese character;

The canonical code is combined with the morpheme number to form a morpheme code of a different meaning morpheme of the single Chinese character.
The encoding processing method based on Chinese meaning according to claim 1, wherein the number of the sentence table is plural.
The encoding processing method based on Chinese meaning according to claim 2, wherein the word code is represented by a 32-bit hexadecimal number.
The encoding processing method based on Chinese meaning according to claim 6, wherein the storage of the word code in the Chinese encoding database is implemented by using an eight-dimensional matrix space.
The encoding processing method based on Chinese meaning according to claim 7, wherein the word code is stored in an eight-dimensional matrix space, including:

Each word code is used as a point in the eight-dimensional matrix space, and the points are positioned by eight hexadecimal values of X, Y, Z, P, Q, R, S, and T, and the eight-dimensional matrix space structure is used as the storage address.
The encoding processing method based on the Chinese meaning according to claim 8, wherein the using the eight-dimensional matrix spatial structure as the storage address of the word-sentence code comprises:

The word code consists of three parts: the sequential classification value, the sorting link value, and the protection code. The structure is as follows:

Wherein, the classification value includes three values, respectively representing X-axis, Y-axis and Z-axis coordinate values; the sorting link value includes four values, respectively representing P-axis, Q-axis, R-axis and S-axis coordinate values, The protection code is the last digit and represents the T-axis coordinate value.
The encoding processing method based on Chinese meaning according to claim 9, wherein the classification value represents a point in the three-dimensional space, and all the classification values are stored in a data layout diagram of the three-dimensional space, and the data layout diagram is a table; and

The vocabulary list is divided into a plurality of types, and different categorical values correspond to different types of vocabulary tables.
The encoding processing method based on the Chinese meaning according to claim 9, wherein the protection code is a control value calculated from the classification value and the sorting link value by a code map, and the coded picture is A table structure consisting of multiple vectors and matrices.
The encoding processing method based on Chinese meaning according to claim 5, wherein the word list includes: a dictionary type sentence list, a dictionary type word list, a poetry ancient sentence list, and a history list.
A storage medium, characterized in that it is used to store computer program instructions for encoding processing based on Chinese meaning according to any one of claims 1-12.
An encoding processing software system based on Chinese meaning, comprising the storage medium according to claim 13, wherein the computer program instructions in the storage medium are called to complete an encoding processing method based on Chinese meaning.
An encoding processing device based on Chinese meaning, comprising a central processing unit, characterized by further comprising a storage medium according to claim 13 connected to a central processing unit;

The central processor invokes computer program instructions in the storage medium to perform an encoding process based on Chinese meaning.