WO1998000773A1 - Computer input method of confined semantic unifying encoding for different natural languages and computer input system thereof - Google Patents

Computer input method of confined semantic unifying encoding for different natural languages and computer input system thereof Download PDF

Info

Publication number
WO1998000773A1
WO1998000773A1 PCT/CN1997/000069 CN9700069W WO9800773A1 WO 1998000773 A1 WO1998000773 A1 WO 1998000773A1 CN 9700069 W CN9700069 W CN 9700069W WO 9800773 A1 WO9800773 A1 WO 9800773A1
Authority
WO
WIPO (PCT)
Prior art keywords
code
natural language
common
concept
concepts
Prior art date
Application number
PCT/CN1997/000069
Other languages
French (fr)
Chinese (zh)
Inventor
Sha Liu
Original Assignee
Sha Liu
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sha Liu filed Critical Sha Liu
Priority to AU33336/97A priority Critical patent/AU3333697A/en
Publication of WO1998000773A1 publication Critical patent/WO1998000773A1/en

Links

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute

Definitions

  • the present invention relates to a computer text information processing system, and more particularly, to a computer input method and a computer input system for semantically limited unified coding of different natural language machine translations.
  • the invention also relates to a semantically restricted unified coding of different natural languages. Background technique
  • the object of the present invention is to provide a computer input method and a computer input system with uniformly-encoded semantics of different natural languages. That is, a limited natural language is used to develop a machine translation system.
  • a limited natural language is used to develop a machine translation system.
  • On the basis of the recognition of capability boundaries in order to adapt to the computer's natural language processing capabilities, first make a restrictive agreement on the relationship between the natural language symbol form and semantic content, and then uniformly encode the restricted semantics, thereby effectively Improve the quality of machine translation and solve the problem of interoperability barriers between different natural languages.
  • a computer input method of uniformly-encoded semantics of different natural languages including a word '; a unified semantic encoding at a level and a unified semantic encoding at a syntax level, wherein:
  • the unified semantic coding at the lexical level includes:
  • the unified semantic coding at the syntactic level includes:
  • syntactic frame is transformed into a stereotactic syntactic frame of adult machine consensus, and the position of the spatial location in the group sentence frame is used to express the syntactic concept;
  • the invention also provides a computer input system with semantically limited unified coding of different natural languages and a semantically limited unified coding of different natural languages for computer translation.
  • the unified encoding of semantically restricted semantics of different natural languages of the present invention is an easily identifiable and operable machine-translated public media language, which can be used by people with different mother tongues.
  • the technology of the present invention is further described below with reference to the embodiments and the accompanying drawings. Overview of the drawings
  • Figure 1 Schematic diagram of a shared concept sleepy code design example.
  • Figure 2b Schematic illustration of the correspondence between temporal semantics, graphic codes, and numbers.
  • Figure 4. Alignment chart of natural language concept and common concept of the basic word "beautiful”.
  • Figure 5. Alignment diagram of natural language concepts and common concepts of the basic word "if”.
  • Figure 6. Schematic diagram of the physical map of multilingual real nouns.
  • Figure 13 Logic block diagram of restricted semantic unified coding machine translation system. Best Mode of the Invention
  • the vocabulary-level common concept system established by the present invention is based on synonymous statistics of synonyms of different languages in natural languages. It is obtained by comparing and sifting the natural language semantic classification headword with the corresponding number of primitive sets.
  • the embodiment is based on using natural Language population distribution and shared concepts must have the characteristics of "isolated language".
  • Chinese synonyms, semantically classified headwords and corresponding sets of primitives select and form a common concept base.
  • the descriptive definition method is used for concepts with certain natural stereotypes; the definitions of concepts that can be obtained through dictionary interpretation can be interpreted and defined, such as the "lazy” dictionary meaning "do not want to work”; from the explanation
  • typical examples can be used, such as the "abstract” dictionary definition of "extracting essential features from various affairs”; for concepts with relevant k meaning, use the contrast definition to obtain image expression Such as “high”, “short”; when the iconicity of the dictionaries of the dictionaries is poor, common symbolic methods of symbolic images can be used.
  • the meaning of the dictionary of "peace” is "state without war".
  • the levy image is a dove of peace with an olive branch;
  • the deductive method is a graphic code derived from the graphic codes of other concepts, such as "diplomacy” derived from the country and communication;
  • the metaphorical method is obtained from metaphorical expressions such as The definition of "noble” is "moral water” Ping Gao ...
  • the sign language reference method is also called psychological imagery.
  • Some words that represent abstract concepts can learn from sign language expression methods.
  • Graphic codes can be colored for easy identification. With the concept graphic code, there will be similar
  • the graphic codes of persuasive word meanings are arranged in the form of a coordinate matrix to form a class diagram (see Figure 2a).
  • the number of vertical and horizontal coordinates of the position of each graphic code in the class diagram is its class code; different word meanings in the same class code
  • the graphic code is arranged in the form of a coordinate matrix according to the page to form a bitmap.
  • the number of vertical and horizontal coordinates of the position of the first graphic code on a page is its bit code; the order of the class code, page number, and bit code constitutes any 5-digit isometric number of graphic code, the number is defined on the keyboard number keys.
  • time, orientation, quantity, reference, association, grammar raw materials (animals, plants, artificial composites), physical movement, human movement, Meteorology, physics, personal, life, social relations, psychology, thinking, transportation, communication, finance, trade, economy, tourism, food, entertainment, medical, shopping, administration, culture, education, technology,
  • first classify establish class diagrams and class codes, and then establish bitmaps and bitcodes for each category on a page-by-page basis.
  • Figure 2 shows some conceptual graphic codes of time classes and their numbers, including year, month, Day, day, hour, minute, second, day, morning, morning, noon, afternoon, evening, night, time, day before yesterday, yesterday, today, tomorrow, day after tomorrow, past, present, future, season, spring, autumn, summer, Winter, period, early, middle, late, epoch, dynasty, gongwu, century, era, ancient, modern, modern, contemporary, etc.
  • the code for "daytime” is "01112", where the class code for time is "01”, page number is "1", bit code is "12".
  • the present invention can be a direct ideographic symbol and number code, a natural language vocabulary prototype, or a direct ideographic symbol Combined with natural language vocabulary prototypes.
  • Adopting direct ideographic symbols as a common conceptual form code has fast features and is more suitable for users who often use the unified coding machine translation system. However, in the learning process of the user, it takes a long time to hold it often. It is easy to learn because it has a common conceptual form code, but it has the disadvantage of polysemy. At this time, the graphic symbol can be combined with it and used as one of the discrimination symbols for users to choose in the process of polysemous discrimination.
  • the operator of the computer interface can include both the natural language vocabulary prototype and the direct ideographic graphic symbol, taking advantage of the two forms of code.
  • colors can be used to build part-of-speech codes. For example, red indicates a verb, white indicates a noun, pink indicates an adjective, yellow indicates an adverb, gray indicates a technical term, silver gray indicates a function word, and so on. Since it is difficult to get rid of the randomness of natural language understanding in the semantic description of common concepts with natural language, the semantic annotation of common concept codes in the present invention uses the natural language synonym superposition definition method to mark common concepts with unexplained definitions.
  • the unexplained definition notation also establishes alignment charts for multiple natural language concepts and common concepts, and for the convenience of computer processing, it also uniformly encodes common concepts (including basic concepts and their synonymous codes) and includes them in Alignment chart.
  • common concepts including basic concepts and their synonymous codes
  • Alignment chart Through the above steps, we will obtain a common concept system of semantic expression at the lexical level. In this system, each common concept is defined by the semantic superposition of different natural language synonyms.
  • the common concepts of "self-evident" have an ideographic code, a unique numeric code, and a prototype of a natural language vocabulary.
  • Figures 3, 4, and 5 show the alignment of the natural language concepts and common concepts of the basic words “laugh”, “beauty”, “if” (Chinese), “laugh”, “beautifur,” and “if” (English), respectively.
  • the middle column gives the media language concept graphic code, numbers and their subtraction by degree, plus, synonym, idiom, written language, colloquialism, slang, idiomatic, slang, and derogatory.
  • B, ⁇ j) to form a semantic overlay definition.
  • “Hua” (51) is transferred to the computer interface for users to make semantic discrimination, so as to improve the speed of semantic discrimination.
  • c Professional vocabulary can establish a multilingual professional vocabulary correspondence database.
  • d When establishing a vocabulary-level concept of a certain natural language and common counterpoints, if there is no corresponding word in the common basic concept, then use this natural language to define and describe the common basic concept. Corresponding words, this natural language can be used to define and describe common basic concepts, and basic concepts can also be used to replace gaps, as shown in Figures 3 to 5; e- If the method of ad is still not used to establish a relationship with the common concepts.
  • the relevant natural language concepts (vocabulary terms) are not coded as redundant concepts.
  • the vocabulary-level concepts of multiple natural languages can be given a limited unified semantic encoding, and all encoded lexical meaning terms have a clear semantic agreement.
  • the unified semantic coding at the syntactic level of the present invention is an image expression of a combination of concepts at the lexical level.
  • the implicit syntax concept of natural language must be promoted to the surface level and then unified conceptual coding.
  • the final result of syntactic analysis is usually expressed as a syntax tree.
  • the grammatical concepts involved in this syntax tree can be said to be obtained from the source language when performing machine translation. Syntactic information.
  • the syntactic concept expressed by the "grammar tree" is transformed into a spatially-oriented syntactic frame.
  • the spatial location is syntactic.
  • the syntactic frame (group sentence frame) shown in the dilemma includes: (1) to (9) a total of nine regions, each of which consists of one to nineteen cells.
  • the solid line frame in the figure is the area, and the dotted frame For cells, the number of each cell is composed of area code and ⁇ . For example, the code of cell 2 and cell 2 is 1-2.
  • (1), (2), and (3) are the core subject, predicate, and object areas.
  • Areas (1) and (3) have the same usage rules, including:
  • the single subject and object (table) words serving as the core components of the sentence are placed in the first cell of the area;
  • the direct object placement method in the double object is the same as the a and f rules, and the indirect object placement (8) area.
  • the core predicate in the core predicate area is placed in the area 2 ⁇ ;
  • the willing predicate of the willing predicate predicate is placed in the area 1 ⁇ 1, and the core predicate is placed in the 2 area of the area;
  • e- represents the past tense and future of the special sleepy code of the core predicate tense, posture, and voice.
  • Time, progress, completion, negation, are dynamically placed in the 4th, 4th, 7th, 6th, 9th, 5th, and 8th cells in the area, expressing the tense with special figure codes.
  • the rules for using the area are to place the modified component in the modified area that goes with the modified component.
  • the modified component of the core subject (1) area must be placed in (4) area; when there is a parallel subject If there are modifiers, they must go with the modifier.
  • the rules for using freely modified areas include:
  • the modifier corresponds to the first modifier.
  • Simple subject, predicate, and object structure modification can be placed in the corresponding area of a 4, 4, 5, 6, 1, 1, 2, 1, 3, 7, 7, 8, _9;
  • Modifications of the subject, predicate, and object structures that can be linked together and co-located predicates, can be linked together and co-located predicates side-by-side and placed in one, one, two, one, and three cells in the corresponding area;
  • the subject, predicate, and object with modified components are modified by the whole area.
  • the whole area enter it in the corresponding area-1 cell, and then the subject, predicate, and object are placed in the corresponding area-5, 1, 2, and 8;
  • the ingredients are placed in the corresponding zone 1, 4, 1, 1, and 7; the supplementary ingredients are placed in the corresponding zone 1, 6, 3, and -9.
  • the whole area is modified. Insert the area, that is, insert the caret * and area code in the frame to be inserted, insert the area code immediately after *, and then place the entire area in the insertion area.
  • Prepositional prepositions with subject, predicate, and object are in the same case as the subject or predicate;
  • Prepositions with multiple modifications are prepositioned before the first modifier, and placed in the same cell as the first modifier;
  • A, B, and C modification methods can be used side by side.
  • the side by side methods include:
  • transition is modified side by side.
  • the transition word is placed in the same case as the last modifier before the transition.
  • the supplementary zone (7)-(9) is basically the same as the rules in (4)-(6), the difference is that simple, side-by-side supplementary placement is in cells 4, 5, 5, and 6;
  • Predicate tense expression method ⁇ core area tense code is brought into the predicate; c single negation of verbs and modifiers and verbs and modifiers are placed together;
  • the grid in the syntax frame can be opened as a window, and the area and the entire frame can be nested into it.
  • the temporal graphic code of the core predicate area on the interface can be called multiple times into other locations with verbs.
  • Figure 8 shows the placement of the simple sentence "We always like some strong sports I" in the frame of the group sentence, and its coded digital form expression results are: 11 ⁇ 11512, 22 ⁇ 12214, 31 ⁇ 01617, 67 ⁇ 10316 , 68 ⁇ 21313.
  • means "put in the cell”
  • the two digits before ⁇ indicate the cell number.
  • the next five digits are a direct ideographic code or a natural language vocabulary code (there should also be a letter with a synonym).
  • Figures 9 to 12 show the complex sentence "Even if a corpse is half-hearted and interested in its own development, any country can use its cheap and obedient labor to produce goods for this global market and achieve rapid domestic economic growth.”
  • the placement results implemented in the group sentence framework according to the above group sentence placement rules. Among them, the 4th (Fig. 12) group sentence result is inserted into the 1st (Fig. 9) area (4) * as the subject attributive. Its grammatical coded digital expression is in 48 ⁇ (7) followed by 4 Frame encoding results.
  • FIG. 13 shows a logical block diagram of the machine translation system when implementing the present invention. Due to the unified semantic coding scheme of different natural languages of the present invention, the semantics and expression forms of natural language are fully agreed from the vocabulary to the syntactic level. A certain natural language trace can be directly used to enter the semantic input coding system. The system will use only the coded natural language vocabulary meaning items that can appear graphical codes or natural language vocabulary codes on the human-machine interface; Vocabulary can be automatically or manually transferred to the interface syntactic frame; only when the rules are used for group sentences according to the group sentence framework can machine translation be implemented to realize the semantic constraint field of natural language expression.
  • the semantic classification method can also be used to retrieve and call the encoded vocabulary. Because of the comprehensive agreement on the semantic expression of natural language, compared with the direct use of natural language for semantic expression, although a part of the fineness is lost, the clarity is greatly improved, and the result of semantic expression according to the semantic encoding system It can be converted to many natural languages, so the information output can be converted into its own mother tongue for verification of expression results. If you are not satisfied, you can also modify it directly on the interface.
  • the system shown in Figure 13), and all natural language machine translation systems developed on the semantic coding system can communicate with each other, which can greatly shorten the development cycle of the machine translation system, reduce development costs, and improve the machine translation system. Value.
  • the semantic coding system of the present invention has a good complementary relationship with the existing natural language processing technology. The combination of the mutual coding and complementary advantages has led to the production of a good multilingual translation technology. For example, users can directly Semantic expression through the interface format of the semantic coding scheme, and its automatic generation results are directly embedded in the original organic translation translation system, which can take advantage of the speed of the existing machine translation system, and make the quality and translation of scientific and technological documents translation The practicality of the system has been substantially improved.
  • the semantic encoding system of the present invention can take advantage of computer network technology. Only one machine ⁇ translates the encoding processing, and uses the same encoding form to transmit semantic information on the network. Each network terminal then decodes it into a variety of natural resources according to user needs. Language can help save network space, improve the efficiency of network information transmission, and realize the popularization of network information sources? Show sharing. Any ordinary user with a high school education or higher can freely control the semantic coding system of the present invention to overcome different natural language communication barriers after a short training, although It is not as vivid, delicate, beautiful, and natural as human translation, but the adequacy of semantic expression ability and the clarity of semantic information transmission can be well guaranteed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a computer character information treatment system which is a unifying encoded input method from which a machine translating system is developed in confined natural language code. Vocabulary unifying semantic encoding includes establishing a common concept code system in basic concept code form and a resembling auxiliary code. Confined semantic unifing encoding is accomplished for different natural languages in a common concept code. Confined semantic unifying codes are translated into a relative to natural language (or words) by means of man-machine interface.

Description

不同自然语言的语义受限统一编码的 计算机输入法和计算机输入系统 技术领域  Computer input method and computer input system with semantically limited unified coding of different natural languages
本发明涉及一种计算机文字信息处理糸统,更确切地说是涉及 一种不同自然语言机器翻译的语义受限统一编码的计算机输入法 和计算机输入糸统。 本发明还涉及一种不同自然语言的语义受限统一编码。 背景技术  The present invention relates to a computer text information processing system, and more particularly, to a computer input method and a computer input system for semantically limited unified coding of different natural language machine translations. The invention also relates to a semantically restricted unified coding of different natural languages. Background technique
长期以来,针对不同民族之间语言文字的互通障碍问题已做了 大量计算机处理工作,传统的思路是将自然语言逻辑化,再运用计 算机完成翻译工作。 自然语言中存在有大量的規则、規律和概率性知识是可供计算 机处理的,如自然语言表达形式中所有可供人脑死记硬背的部分、 所有无例外的词汇、词性搭配及语义匹配关糸、所有通过对有限语 料进行统计而得到的概率性语言知识,这些自然语言信息都有助于 机 ^翻译糸统对其进行自动化处理。 但是,自然语言中也有大量不适合计算机处理的内容。 由于自 然语言文字符号与其语义内容的关糸是处于被不断创迨——流通 ——承认的状态中,因此新词的产生、旧词的新解、语法与字符的变 通现象永远处于动态之中,这就是自然语言的俗成特性;此外,自然 语言中语义符号所指代的语义内容并不具有确定性,如不同的专家 用同一种自然语言解释同一篇文章,会出现不同的版本,这就是自 然语言的模糊特性;再者,对自然语言语义的理解是具有很强依赖 性的,如上下文语言环境和非语言环境、不同交?示群团、不同知识背 景、不同思维方式都可导致对同一种自然语种同一表达形式的不同 语义理解,而且是合理的语义理解,这就是自然语言内在的随机特 性。 综上所述,自然语言又是一种对语境具有敏惑依賴性、在语义 表达形式方面具有无穷自创能力、语言理解行为具有某种不可预见 性的系统,凡此种种又不能为计算机"理解"和"计算"。因此,既然计 算机在自然语言翻译中有能力边界问题,如果机 ^翻译继续面纯粹 的自然语言,那末机^翻译始终只能是人工翻译的参考或辅助。 发明内容 For a long time, a lot of computer processing work has been done to deal with the problem of interoperability between languages of different nationalities. The traditional idea is to logicalize natural language and then use computers to complete translation work. There are a large number of rules, laws, and probabilistic knowledge in natural language that can be processed by computers, such as all parts of natural language expressions that can be memorized by the brain, all words without exceptions, parts of speech, and semantic matching. (2) All the probabilistic language knowledge obtained through statistics on limited corpora. This natural language information can help the machine to translate it automatically. However, there is also a lot of content in natural language that is not suitable for computer processing. Because the relationship between natural language characters and their semantic content is in a state of continuous creation-circulation-recognition, the generation of new words, new interpretation of old words, and grammatical and character adaptations are always dynamic. This is the vulgar nature of natural language; in addition, the semantic content referred to by semantic symbols in natural language is not deterministic, such as different experts When the same natural language is used to explain the same article, different versions will appear. This is the fuzzy nature of natural language. Furthermore, the understanding of natural language semantics is highly dependent, such as contextual locales and non-locales, Different presentation groups, different knowledge backgrounds, and different ways of thinking can lead to different semantic understanding of the same natural language and the same expression form, and reasonable semantic understanding. This is the inherent random nature of natural language. To sum up, natural language is a system that is sensitive to context, infinitely capable of creating semantic expressions, and has some unpredictable behavior in language understanding. All these cannot be computers. "Understand" and "calculate". Therefore, since computers have the ability to border issues in natural language translation, if machine-to-machine translation continues to be pure natural language, machine-to-machine translation can always only be a reference or auxiliary for manual translation. Summary of the invention
本发明的目的是提供一种不同自然语言语义受限统一编码的 计算机输入法和计算机输入糸统,即采用受限的自然语言来开发机 器翻译糸统,在承认机器翻译中自然语言是受机 能力边界限制的 认识基础上,为了适应计算机的自然语言处理能力,先对自然语言 符号形式与语义内容之间的关糸进行限制性约定,然后再对受限的 语义进行统一编码,从而有效地提高机器翻译质量,解决不同自然 语言的互通障碍问题。 根据本发明的一个方面,提供 一种不同自然语言的语义受限 统一编码的计算机输入法,包括词';匚层面的统一语义编码和句法层 面的统一语义编码,其中:  The object of the present invention is to provide a computer input method and a computer input system with uniformly-encoded semantics of different natural languages. That is, a limited natural language is used to develop a machine translation system. On the basis of the recognition of capability boundaries, in order to adapt to the computer's natural language processing capabilities, first make a restrictive agreement on the relationship between the natural language symbol form and semantic content, and then uniformly encode the restricted semantics, thereby effectively Improve the quality of machine translation and solve the problem of interoperability barriers between different natural languages. According to one aspect of the present invention, there is provided a computer input method of uniformly-encoded semantics of different natural languages, including a word '; a unified semantic encoding at a level and a unified semantic encoding at a syntax level, wherein:
所述的词汇层面的统一语义编码包括:  The unified semantic coding at the lexical level includes:
1 ) 在统计自然语言多语种同义词基础上,用语义归类中心词 及相同数量原语集做参考,建立由介词、连词、动词、副词、形容词和 抽象名词的基础概念构成的共有概念体糸;  1) On the basis of statistical natural language multilingual synonyms, use the semantically classified headword and the same number of primitive sets as a reference to establish a common concept body consisting of the basic concepts of prepositions, conjunctions, verbs, adverbs, adjectives and abstract nouns. ;
2 ) 每一共有基础概念按程度加、程序减、近义、褒义、貶义、 口语、书面语、俚语、俗语、成语扩展建成近义附码;2) Each common basic concept is added by degree, reduced by procedure, near-meaning, righteous, derogatory, Spoken, written, slang, idioms, and idioms have been expanded into synonyms.
3) 建立共有概念的形式代码; 3) Formal code for common concepts;
4) 采用各自然语言近义词语义叠加定义法对共有概念进行无 解释定义标注;  4) Use the definition method of superimposition meaning of each natural language to mark common concepts without explanation;
5) 对由基础概念及其近义附码組成的共有概念进行统一编 码;  5) Unified coding of the common concepts consisting of the basic concepts and their near-coded codes;
6) 建立自然语言概念与共有概念及近义附码的对位关系,多 义词得到多个编码结果;在建立自然语言概念与共有概念对位关糸 然语言对基础概念进行定义描述;在建立自然语言概念与共有概念 对位关糸时,如共有概念中的近义附码无对应词,则用基础概念进 行替代或用该种自然语言进行定义描述;  6) Establish the alignment relationship between natural language concepts and common concepts and near-symbols, and polysemous words get multiple encoding results; in the establishment of natural language concepts and common concepts alignment, Ranlang defines and describes the basic concepts; in the establishment of natural When the concept of language is in opposition to the common concept, if there is no corresponding word in the near-sense code in the common concept, the basic concept is used instead or the natural language is used to define and describe it;
7) 建立不同自然语言专业词汇对应数据庠;  7) Establish corresponding data of different natural language vocabulary;
8) 通过以上步骤均未获得编码的自然语言词汇作为冗余概念 不予编码;  8) Natural language vocabularies that have not been encoded through the above steps are not encoded as a redundant concept;
所述的句法层面的统一语义编码包括:  The unified semantic coding at the syntactic level includes:
1 ) 将句法概念体糸转换成人机共识的立体定位的句法框架, 用空间区位在組句框架中的位置表达句法概念;  1) The syntactic frame is transformed into a stereotactic syntactic frame of adult machine consensus, and the position of the spatial location in the group sentence frame is used to express the syntactic concept;
2 )按纵向修饰区、核心区、补充区和橫向主语区、谓语区、宾语 区组合构成九个句法区,每一句法区等分成 3 X 3袼,每一格以区 号、格号为其编号,将各自然语言組句中的共有概念,按其句法以表 意图形代码及数字编码形式放位于组句框架的相应区、格中,得到 语义表达的句法信息编码。 本发明还提供了一种不同自然语言的语义受限统一编码的计 算机输入糸统以及一种用于计算机翻译的不同自然语言的语义受 限统一编码。 本发明所述的不同自然语言语义受限统一编码是一种易识別、 可操作的机 ^翻译公共媒介语,可供持不同母语的人使用。 下面结合实施例及附图进一步说明本发明的技术。 附图概述 2) Nine syntactic regions are formed according to the combination of the vertical modification area, the core area, the supplementary area and the horizontal subject area, the predicate area, and the object area. Each syntactic area is equally divided into 3 X 3 袼. Numbering, the common concepts in each natural language group sentence are placed in the corresponding areas and cells of the group sentence frame in the form of ideographic graphics codes and digital codes according to their syntax, and the syntax information encoding of the semantic expression is obtained. The invention also provides a computer input system with semantically limited unified coding of different natural languages and a semantically limited unified coding of different natural languages for computer translation. The unified encoding of semantically restricted semantics of different natural languages of the present invention is an easily identifiable and operable machine-translated public media language, which can be used by people with different mother tongues. The technology of the present invention is further described below with reference to the embodiments and the accompanying drawings. Overview of the drawings
图 1.共有概念困形代码设计举例示意图。  Figure 1. Schematic diagram of a shared concept sleepy code design example.
困 2a.类图  2a. Class diagram
图 2b.时间类语义与图形代码、数码的对应举例示意图。  Figure 2b. Schematic illustration of the correspondence between temporal semantics, graphic codes, and numbers.
图 3.基础词 "笑"的自然语言慨念与共有概念对位图表。  Figure 3. Alignment chart of natural language thoughts and common concepts of the basic word "laugh".
图 4.基础词 "美丽 "的自然语言概念与共有概念对位图表。 图 5.基础词 "如果 "的自然语言概念与共有概念对位图表。 图 6.多语种实物名词的实物图谱示意图。  Figure 4. Alignment chart of natural language concept and common concept of the basic word "beautiful". Figure 5. Alignment diagram of natural language concepts and common concepts of the basic word "if". Figure 6. Schematic diagram of the physical map of multilingual real nouns.
困 7.句法框架结构示意图。  Sleepy 7. Schematic diagram of syntactic frame structure.
图 8.句法框架 单句使用实例。  Figure 8. Syntactic framework Example usage of a single sentence.
图 9.至图 12.句法框架复杂句使用实例。  Figure 9. to Figure 12. Syntax frame examples of complex sentences.
图 13.受限语义统一编码机译束统逻辑框图。 本发明最佳实施方式  Figure 13. Logic block diagram of restricted semantic unified coding machine translation system. Best Mode of the Invention
本发明的建立词汇层面共有概念体系,是桉统计自然语言不同 语种同义词为基^ ί,参考自然语言语义归类中心词与相应数量原语 集进行比较和筛迭获得的,实施例从使用自然语言的人口分布现实 和共有概念必须具有 "孤立语"特征的现实需求出发,以英语、汉语 同义词、语义归类中心词与相应数量原语集进行比较、 ^选,形成一 个共有概念基础体糸,经初步实验而获得的共有概念体糸中约有 4 千左右基础概念,包括一百多个介词、连词,七百多个动词,一" 多 个副词、形容词,其余主要为抽象名词。 为了保证共有概念的 "够用性",还需在进行英、汉词汇意项向 共有概念转换实验基础上对共有概念进行必要的调整。 但可以推 断,与共有概念体系建立对应关系的自然语言数量总会超于稳定, 因为共有概念过于丰富必然导致每一种自然语言概念体糸与之对 应的能力下降,而每增加一个"非自然 "概念(在某种自然语言中没 有明确符号表达形式的概念),必然导致用户学习成本增加,因此易 学实用原则与够用原则之间的相互制约关糸,客观上限制 逸择和 设计共有概念的自由度。如汉语"好"在辞典标注中共有 16个义项, 去除使用频度过低及非蚬范概念部分,只选择 50%的义项进行概 念对应库。 为了尽可能反映自然语言概念表达形式的丰富性,可为共有基 础概念建立程度、褒、貶、口、书、俚、俗、成等近义附码,来增强共有 概念体糸对自然语言词汇的编码处理能力。 本发明的不同自然语言语义受限统一编码,首先为任意语种共 同词意的词汇按词典释义、象征、演译、比喻、手语借鉴方法等方法 作完形处理而一一设计出图彖代码,如困 1中用描述定义法设计的 "走"、"电话"的共有概念图形代码;用解释定义法设计的 "懒情 "的 共有概念图形代码;用典型举例定义法设计的"抽彖"的共有概念困 形代码;用对比定义法设计的"高 "、"矮 "的共有概念图形代码;用象 征法设计的"和平"的共有概念图形代码;用演绎法设计的"外交"、 "国家"、 "交往 "的共有概念图形代码;用比喻法设计的"高尚"的共 有概念图形代码;用手语借鉴法设计的 "因为"、 "所以 "的共有概念 图形代码。 其中描述定义法是对有确定自然定型特征的概念采用 的;对通过词典释义可以荻得其形彖特征的概念采用解释定义法, 如"懒惰"的辞典释义为 "不想干活";从解释定义中不能直接获得概 念的形象特征时,可束用典型举例法,如"抽象"的辞典释义为 "从各 种事务中抽取本质特征";对有相 k意义的概念用对比定义获得形 象表达如"高,,、"矮 ";当辞典铎义的形象性很差时可采用常见的象 征图象的象征法,如"和平"的辞典释义为 "没有战争的状态",其常 见的彖征图象是衔着橄榄枝的和平鸽;演绎法是由其他概念的图形 编码演绎而来的图形编码,如"外交"由国家和交往演绎而成;比喻 法是从字义联想中获得比喻表达,如"高尚"的辞典释义为"道德水 平高…… ";手语借鉴法也称心理意象法,一些表示抽象概念的词汇 可以借鉴手语的表达方法。 图形代码可以是彩色的,以便于判断识 别。 有了概念图形代码,再将具有同类劝能词义的图形代码以座标 矩阵形式排布而形成类图(见图 2a),每一图形代码在类图中所处 位置的纵横座标数为其类码;同一类码中不同词义的图形代码按页 以座标矩阵形式排布形成位图,第一图形代码在一页类困中所处位 置的纵横座标数为其位码;类码、页码、位码顺序构成任一图形代码 的 5位等长数码,数码定义在键盘数字键上。 如可按时间、方位、数 量、指代、关联、语法、原材料(动物、植物、人工合成物)、物理运动、 人体运动、气象、物理、人称、生活、杜交、心理、思维、交通、通信、金 融、贸易、经济、旅游、饮食、娱乐、医疗、购物、行政、文教、科技、政 治、军事等方面先进行分类,建立类图与类码,然后对每一类按页建 立位图和位码,图 2示出时间类的部分概念图形代码及其数码,包 括年、月、星期、日、小时、分钟、秒、白天、早上、上午、中午、下午、傍 晚、夜晚、时间、前天、昨天、今天、明天、后天、过去、现在、未来、季 节、春、秋、夏、冬、时期、早期、中期、晚期、时代、朝代、公无、世纪、年 代、古代、近代、现代、当代等。 例如, "白天 "的代码为" 01112",其中,表示时间的类码为" 01 ", 页码为 " 1 ",位码为 " 12"。 本发明作为共有概念形式代码的,可以是直接表意图形符号和 数字代码,也可以是自然语言词汇原型,或者是直接表意图形符号 和自然语言词汇原型相结合。 采用直接表意图形符号作为共有概念形式代码,具有快捷的特 点,更适合于经常使用统一编码机译糸统的用户,但在使用者的学 习过程中,需要较长的时间才能常握。 而采用自然语言词汇原型作 为共有概念形式代码,具有易学的特点,但它存在着多义的缺点。此 时,图形符号可与之相结合,在多义判别过程中作为供用户选用的 判别符号之一。 因此,计算机界面的操作符可同时包括自然语言词汇原型和直 接表意图形符号,发挥这两种形式代码各自的优势。 此外,还可以采用色彩来建立词性代码。 比如,红色表示动词, 白色表示名词,粉色表示形容词,黄色表示副词,灰色表示专业术 语,银灰色表示虛词等。 由于用自然语言对共有概念进行语义描述难以摆脱自然语言 理解的随机性,因此本发明对共有概念代码进行语义标注是采用自 然语言近义词语义叠加定义法,对共有概念进行无解释定义标注, 同时近义词无解释定义标注法也为多种自然语言概念与共有概念 建立起对位图表,而且为便于计算机处理,还对共有概念(包括基础 概念及其近义附码)进行统一数字编码,列入到对位图表中。 通过以上步骤我们会荻得词汇层面语义表达的共有概念糸统, 在该糸统中,每一个共有概念都由不同自然语言近义词的语义叠加 定义法"自明"其语义内涵,第一语义内涵"自明"的共有概念都有一 个表意图形代码、唯一数字编码以及一个自然语言词汇原型。 图 3、困 4、图 5分别示出基础词"笑"、 "美丽"、 "如果 "(汉语) "laugh"、"beautifur、"if" (英语)的自然语言概念与共有概念的对位 图表。中间一列给出媒介语概念图形代码、数码及其按程度减、程度 加、近义、成语、书面语、口语、俚语、俗语、褒义、貶义形成的近义附 码(数码后加 a、b、〜j)而形成语义叠加定义。 其中在图 3的汉语近 义群中列出"微笑"(程度减)、 "笑哈哈"(程度加)、 "笑逐颜开"(成 语)和冷笑(貶义),在英语近义群中列出 "smile" (程度减)、 "grin及 cackle")近义和 "sneer" (貶义),按相同原则也可列出俄语近义群、日 语近义群等。 在图 4的汉语近义群中列出 "漂亮 "(程度减)、"标致" "优美 "(近义)、美不胜收(成语)、 "好看 "(口语)、和"妖娆,,(貶义), 在英语近义群中列出 "pretty " (程度减)、 "handsome "、 "dainty " (近 义)、 "personable" (口语)和" tawdry" (貶义)。在图 5的汉语近义群中 列出"假如"(近义)、 "如若"、 "倘若 "(书面语)、"万一"、 "假使 "(口 语)、 "要是 "(俗语),在英语近义群中列出 "in case" (近义)和" in the event of ,,(书面语)。 本发明对不同自然语言词汇层面概念体系进行统一语义编码 时,可按以下几方面进行: a- 各种自然语言词汇义项与共有概念建立对位关糸,使自 然语言中的多义词得到多个编码,如汉语 "好"的多个义项可包括: The vocabulary-level common concept system established by the present invention is based on synonymous statistics of synonyms of different languages in natural languages. It is obtained by comparing and sifting the natural language semantic classification headword with the corresponding number of primitive sets. The embodiment is based on using natural Language population distribution and shared concepts must have the characteristics of "isolated language". Based on English, Chinese synonyms, semantically classified headwords and corresponding sets of primitives, select and form a common concept base. There are about 4,000 basic concepts in the common concept body obtained through preliminary experiments, including more than one hundred prepositions, conjunctions, more than seven hundred verbs, one "more adverbs, adjectives, and the rest are mainly abstract nouns. To ensure the "sufficientness" of the shared concept, it is necessary to make necessary adjustments to the shared concept based on the conversion of English and Chinese vocabulary meaning items to the shared concept. However, it can be inferred that the number of natural languages that correspond to the shared concept system Always more than stable, because the common concept is too rich, it will inevitably lead to the opposite of every natural language concept. The ability to respond decreases, and each addition of an "unnatural" concept (the concept of no explicit symbolic expression in a certain natural language) will inevitably lead to an increase in user learning costs, so the mutual constraint between the easy-to-learn practical principle and the sufficient principle Guan Yan, objectively restricts the freedom of choice and design of common concepts. For example, the Chinese word "good" has 16 meanings in the dictionary labeling. Excluding the parts that are used too infrequently and non-standard concepts, only 50% of the meanings are selected for the concept correspondence library. In order to reflect the richness of the natural language concept expression as much as possible, the common basic concepts can be established with a degree of closeness, derogation, oral, book, slang, vulgarity, achievement, etc. to enhance the common concept body and natural language vocabulary. Encoding processing power. In the present invention, different natural language semantics are uniformly coded uniformly. First, for the words with common meanings in any language, they are finished by dictionary interpretation, symbolization, translation, metaphor, sign language reference method and so on, and one by one map code is designed. For example, the common conceptual graphic code of "go" and "phone" designed by the descriptive definition method in "Sleep 1"; the common conceptual graphic code of "lazy" designed by the interpretive definition method; the "drawing" designed by the typical example definition method Common concept sleepy code; "high" and "short" common concept graphic code designed by contrast definition; "peace" common concept graphic code designed by symbolic method; "diplomatic", " The common concept graphic code of "country" and "interaction"; the common concept graphic code of "noble" designed by metaphor; the common concept graphic code of "because" and "so" designed by sign language. The descriptive definition method is used for concepts with certain natural stereotypes; the definitions of concepts that can be obtained through dictionary interpretation can be interpreted and defined, such as the "lazy" dictionary meaning "do not want to work"; from the explanation In the definition, when the image features of the concept cannot be obtained directly, typical examples can be used, such as the "abstract" dictionary definition of "extracting essential features from various affairs"; for concepts with relevant k meaning, use the contrast definition to obtain image expression Such as "high", "short"; when the iconicity of the dictionaries of the dictionaries is poor, common symbolic methods of symbolic images can be used. For example, the meaning of the dictionary of "peace" is "state without war". The levy image is a dove of peace with an olive branch; the deductive method is a graphic code derived from the graphic codes of other concepts, such as "diplomacy" derived from the country and communication; the metaphorical method is obtained from metaphorical expressions such as The definition of "noble" is "moral water" Ping Gao ... "The sign language reference method is also called psychological imagery. Some words that represent abstract concepts can learn from sign language expression methods. Graphic codes can be colored for easy identification. With the concept graphic code, there will be similar The graphic codes of persuasive word meanings are arranged in the form of a coordinate matrix to form a class diagram (see Figure 2a). The number of vertical and horizontal coordinates of the position of each graphic code in the class diagram is its class code; different word meanings in the same class code The graphic code is arranged in the form of a coordinate matrix according to the page to form a bitmap. The number of vertical and horizontal coordinates of the position of the first graphic code on a page is its bit code; the order of the class code, page number, and bit code constitutes any 5-digit isometric number of graphic code, the number is defined on the keyboard number keys. For example, time, orientation, quantity, reference, association, grammar, raw materials (animals, plants, artificial composites), physical movement, human movement, Meteorology, physics, personal, life, social relations, psychology, thinking, transportation, communication, finance, trade, economy, tourism, food, entertainment, medical, shopping, administration, culture, education, technology, In terms of governance, military and other aspects, first classify, establish class diagrams and class codes, and then establish bitmaps and bitcodes for each category on a page-by-page basis. Figure 2 shows some conceptual graphic codes of time classes and their numbers, including year, month, Day, day, hour, minute, second, day, morning, morning, noon, afternoon, evening, night, time, day before yesterday, yesterday, today, tomorrow, day after tomorrow, past, present, future, season, spring, autumn, summer, Winter, period, early, middle, late, epoch, dynasty, gongwu, century, era, ancient, modern, modern, contemporary, etc. For example, the code for "daytime" is "01112", where the class code for time is "01", page number is "1", bit code is "12". The present invention, as a common conceptual form code, can be a direct ideographic symbol and number code, a natural language vocabulary prototype, or a direct ideographic symbol Combined with natural language vocabulary prototypes. Adopting direct ideographic symbols as a common conceptual form code has fast features and is more suitable for users who often use the unified coding machine translation system. However, in the learning process of the user, it takes a long time to hold it often. It is easy to learn because it has a common conceptual form code, but it has the disadvantage of polysemy. At this time, the graphic symbol can be combined with it and used as one of the discrimination symbols for users to choose in the process of polysemous discrimination. Therefore, the operator of the computer interface can include both the natural language vocabulary prototype and the direct ideographic graphic symbol, taking advantage of the two forms of code. In addition, colors can be used to build part-of-speech codes. For example, red indicates a verb, white indicates a noun, pink indicates an adjective, yellow indicates an adverb, gray indicates a technical term, silver gray indicates a function word, and so on. Since it is difficult to get rid of the randomness of natural language understanding in the semantic description of common concepts with natural language, the semantic annotation of common concept codes in the present invention uses the natural language synonym superposition definition method to mark common concepts with unexplained definitions. The unexplained definition notation also establishes alignment charts for multiple natural language concepts and common concepts, and for the convenience of computer processing, it also uniformly encodes common concepts (including basic concepts and their synonymous codes) and includes them in Alignment chart. Through the above steps, we will obtain a common concept system of semantic expression at the lexical level. In this system, each common concept is defined by the semantic superposition of different natural language synonyms. The common concepts of "self-evident" have an ideographic code, a unique numeric code, and a prototype of a natural language vocabulary. Figures 3, 4, and 5 show the alignment of the natural language concepts and common concepts of the basic words "laugh", "beauty", "if" (Chinese), "laugh", "beautifur," and "if" (English), respectively. Figures. The middle column gives the media language concept graphic code, numbers and their subtraction by degree, plus, synonym, idiom, written language, colloquialism, slang, idiomatic, slang, and derogatory. , B, ~ j) to form a semantic overlay definition. Among them, "smile" (minus degree), "laugh"("degreeplus"),"laugh by color" (idiom) and sneer ( Derogatory meaning), list "smile" (minus degree), "grin and cackle ") synonym and" sneer "(derogatory), according to the same principle, Russian synonym groups, Japanese synonym groups, etc. can also be listed. In the Chinese synonym group in Figure 4 list" pretty "(minus degree) , "Peugeot", "beautiful" (synonym), beautiful (idiom), "good-looking" (spoken), and "enchanting, (derogatory), list" pretty "(minus degree) in the English synonym group, "handsome", "dainty" (synonym), "personable" (spoken), and "tawdry" (derogatory). In the Chinese synonymous group in Figure 5, list "if if" (nearly), "if if", "if if" (written), "what if", "when" (spoken), "if it is" (slang), in "In case" (nearly) and "in the event of" (written language) are listed in the English synonym group. When the present invention performs unified semantic coding on the conceptual system of different natural language vocabulary levels, it can be carried out in the following aspects: a- Various natural language vocabulary meaning terms and common concepts establish counterpoints, so that polysemous words in natural language get multiple encodings, such as Chinese "good" multiple meaning terms may include:
( 1 ) 与"坏"相对,其近义词群有 "优、精、良、妙、出色、到家(1) As opposed to "bad", its synonyms are "excellent, refined, good, wonderful, excellent, and home"
" " " " "" ""
(2) 友愛,其近义词群有 "友善、友好、和睦、投机 ··· ··· ", (2) Friendship, its synonyms are "friendly, friendly, harmonious, speculative ...",
(3) 健康,其近义词群有 "结实、强壮、健全、硬朗…… ",(3) health, and its synonyms are "sturdy, strong, sound, tough ...",
( 4) 同意,其近义词群有 "认可、首肯、可以…… ", (4) Agree, its synonyms include "recognition, approval, can ...",
(5) 容易,其近义词群有 "轻易、轻而易举、便当、 ^易 ··· ··· ", (5) Easy, and its synonyms include "easy, easy, convenient, convenient, easy to use ...",
(6) 便于,其近义词群有 "方便、省事、轻便、便捷…… ",(6) Convenience. Its synonym group is "convenient, saves effort, light, convenient ...",
( 7) 非常,其近义词群有 "特別、格外、分外、万分、甚……,,,(7) Very, its synonyms are "special, extraordinarily, extraordinarily, extremely, very ...",
(8) 喜欢,其近义词群有 "喜愛、喜好、愛好、喜、愛…… "; b. 实物类名词可通过实物图诸建立多语种实物名词对应关糸 数据库,以便作一词多义歧义判別,参见图 6。 当自然语言词汇出现一词多义时,可调出图谱供用户进行语义 逸择,例如汉语的"花",其名词(种子植物的有性繁殖器官)、动词 (花费、用的含意)、形容词(表示颜色或种类镨综复杂)各有不同含 义,当机^翻译糸统难以判别用户所逸语义时,可将实物图谱中的(8) Like, and its synonyms include "likes, preferences, hobbies, joys, loves ..."; b. The real nouns can be used to establish a multilingual real noun correspondence database through physical maps, so as to make the word ambiguous. Discrimination, see Figure 6. When polysemy appears in natural language vocabulary, maps can be drawn for users to make semantic choices, such as "flower" in Chinese, its nouns (sexual reproduction organs of seed plants), verbs (Cost, meaning used), adjectives (indicating complex colors or types) have different meanings. When the translation system is difficult to determine the semantic meaning of the user, you can use the physical map
"花"(51 )调入计算机界面供用户作语义判别,以提高用户语义判別 的速度。 c 专业词汇可建立多语种专业词汇对应数据库。 d. 在建立某种自然语言的词汇层面概念与共有对位关糸时, 如果出现共有基础概念无对应词,则用该种自然语言对共有基础概 念进行定义描述,如果出现近义附码无对应词,可用该种自然语言 对共有基础概念进行定义描述,也可用基础概念进行空位替代,如 图 3至图 5中所示; e- 若按 a-d方法仍未能与共有概念休糸建立对位关糸的自然 语言概念(语汇义项)即作为冗余概念不予编码。 通过 a- e步骤,可使多种自然语言的词汇层面概念得到有限的 统一语义编码,凡被编码的词汇义项都有了明确的语义约定。 本发明的句法层面的统一语义编码,是词法层面概念的組合关 k的形象表达。 为满足机器翻译的要求、适应计算机技术的自然语 言处理能力,必须把自然语言的隐性句法概念提升到表层然后再予 以统一的概念编码。 在已有的机^翻译技术中,句法分析的最终结果通常会表达成 一个语法树,这种语法树所涉及到的语法概念,可以说就是在进行 机 ^翻译时必须从源语中得到的句法信息。为了使本发明的自然语 言受限方案达到易学实用标准,而将"语法树"所表达的句法概念体 糸转换成一个空间定位的句法框架,在这个句法框架中,空间区位 即是句法概念的形象表达。 困 Ί所示的句法框架(組句框架)包括:(1)至(9)共九个区,每 区由一 1至一 9九个格组成,图中实线框为区,虛线框为格,每格的 编号由区号加袼号组成,如小区一2格的编码为 1-2,(1)、(2)、(3) 区分别为核心主、谓、宾语区,(4)、(5)、(6)区分别为(1)、(2)、(3)区 的修饰区,(7)、(8)、(9)分別为(1)、(2)、(3)区的补充区。 下面结合图 Ί逐一介绍組句框架的使用規则实施例。 "Hua" (51) is transferred to the computer interface for users to make semantic discrimination, so as to improve the speed of semantic discrimination. c Professional vocabulary can establish a multilingual professional vocabulary correspondence database. d. When establishing a vocabulary-level concept of a certain natural language and common counterpoints, if there is no corresponding word in the common basic concept, then use this natural language to define and describe the common basic concept. Corresponding words, this natural language can be used to define and describe common basic concepts, and basic concepts can also be used to replace gaps, as shown in Figures 3 to 5; e- If the method of ad is still not used to establish a relationship with the common concepts. The relevant natural language concepts (vocabulary terms) are not coded as redundant concepts. Through the a-e step, the vocabulary-level concepts of multiple natural languages can be given a limited unified semantic encoding, and all encoded lexical meaning terms have a clear semantic agreement. The unified semantic coding at the syntactic level of the present invention is an image expression of a combination of concepts at the lexical level. In order to meet the requirements of machine translation and adapt to the natural language processing capabilities of computer technology, the implicit syntax concept of natural language must be promoted to the surface level and then unified conceptual coding. In the existing machine translation technology, the final result of syntactic analysis is usually expressed as a syntax tree. The grammatical concepts involved in this syntax tree can be said to be obtained from the source language when performing machine translation. Syntactic information. In order to make the natural language restricted scheme of the present invention reach the practical and easy-to-learn standard, the syntactic concept expressed by the "grammar tree" is transformed into a spatially-oriented syntactic frame. In this syntactic frame, the spatial location is syntactic. Image expression. The syntactic frame (group sentence frame) shown in the dilemma includes: (1) to (9) a total of nine regions, each of which consists of one to nineteen cells. The solid line frame in the figure is the area, and the dotted frame For cells, the number of each cell is composed of area code and 袼. For example, the code of cell 2 and cell 2 is 1-2. (1), (2), and (3) are the core subject, predicate, and object areas. (4 ), (5), (6) are modified regions of (1), (2), (3), and (7), (8), (9) are (1), (2), ( 3) The supplementary area of the district. In the following, an embodiment of the usage rule of the group sentence frame will be introduced one by one with reference to the figure.
1. (1)、(3)区框袼有相同的使用規则,包括: 1. Areas (1) and (3) have the same usage rules, including:
a. 充当句子核心成分的单一主词和宾(表)词分别放位于该区 的一 1格;  a. The single subject and object (table) words serving as the core components of the sentence are placed in the first cell of the area;
b. 同位主词、宾词分別橫向放位于该区的一 1格;  b. Appositive subjects and objects are placed horizontally in the 1 cell of the area;
c 并列主词及并列宾词分别依序放位于该区的一 4、一 5、一6、 -3,-9格, 5个以上的并列主词及宾词在一9格中依序并排放位; d. 并列的同位主词、宾词的放位同 c;  c Parallel subjects and parallel objects are placed sequentially in the four, one, five, one, six, -3, and -9 cells of the area, and five or more parallel subjects and objects are arranged side by side in a nine cell sequence; d . Placement of juxtaposed co-subjects and objects is the same as c;
e. 主语子句或宾语子句中的主、谓(糸)、宾(表)分別放位于该 区的一 1、一 2、一 3格;  e. The subject, predicate (糸), and object (table) in the subject clause or object clause are placed in the 1, 1, 2, and 3 cells of the area;
f. 主语、宾语中出现能愿连动谓词、并列谓词时,并徘放位于 该区一 2格;  f. When subjective and objective predicates are able to be connected with each other and juxtaposed predicates, they are placed in the second cell of the district;
g. 双宾语中的直接宾语放位方法同 a、f 规则,间接宾语放(8) 区。  g. The direct object placement method in the double object is the same as the a and f rules, and the indirect object placement (8) area.
2. (2)区使用規则包括: 2. (2) The rules for using zones include:
a. 核心谓语区中的核心谓词放位于该区一 2袼;  a. The core predicate in the core predicate area is placed in the area 2 袼;
b- 能愿连动谓词的能愿谓词放在该区一1袼,核心谓词放该区 一 2格;  b- The willing predicate of the willing predicate predicate is placed in the area 1 ~ 1, and the core predicate is placed in the 2 area of the area;
c 并列谓词放位于该区一 2、一 3格;  c Parallel predicates are placed in the first, second, and third compartments of the area;
d. 超过两个以上的并列谓词橫向并排放位于该区一 3袼; e- 表示核心谓词时态、体态、语态的专用困码的过去时、将来 时、进行体、完成体、否定式、被动态放位于该区的一 4、一7、一 6、一 9、一 5、一 8格,用专用图码表达时态。 d. More than two side-by-side predicates are located side by side in this area; e- represents the past tense and future of the special sleepy code of the core predicate tense, posture, and voice. Time, progress, completion, negation, are dynamically placed in the 4th, 4th, 7th, 6th, 9th, 5th, and 8th cells in the area, expressing the tense with special figure codes.
3. ( 4 )- (6 )区使用規则是将修饰成分放在与被修饰成分同行的 修饰区内,如核心主语(1 )区的修饰成分必须放在(4)区;有并列主 词时如分別有修饰词,则修饰词必须与被修饰词同行。 3. (4)-(6) The rules for using the area are to place the modified component in the modified area that goes with the modified component. For example, the modified component of the core subject (1) area must be placed in (4) area; when there is a parallel subject If there are modifiers, they must go with the modifier.
4. 可供自由选用的修饰区使用规则包括: 4. The rules for using freely modified areas include:
A. 簡单修饰  A. Simple modification
a. 单一修饰、并列修饰放在(4)区一 7、一 8、一 9格;  a. Single modification, side-by-side modification are placed in the (4) zone 1-7, 8-8, 9;
b. 程度、数量 +单一修饰,程度、数量词在前,单一修饰在后, 同格放位;  b. Degree, quantity + single modification, the degree and quantity words come first, the single modification comes after, and the same cell is placed;
c 单一修饰对多个被修饰成分,修饰词与第一被修饰词对应。  c For a single modification, for multiple modified components, the modifier corresponds to the first modifier.
B. 主、谓、宾修饰 B. Subject, predicate, object modification
a. 简单主、谓、宾结构修饰可分别放位于相应区的一 4、一 5、一 6、一 1、一 2、一 3、一 7、一 8、 _9格;  a. Simple subject, predicate, and object structure modification can be placed in the corresponding area of a 4, 4, 5, 6, 1, 1, 2, 1, 3, 7, 7, 8, _9;
b. 含能愿连动、并列谓词的主、谓、宾结构修饰,能愿连动、并 列谓词并排同格放位于相应区的一 1、一 2、一 3格;  b. Modifications of the subject, predicate, and object structures that can be linked together and co-located predicates, can be linked together and co-located predicates side-by-side and placed in one, one, two, one, and three cells in the corresponding area;
c 含修饰成分的主、谓、宾修饰为整区修饰,整区修饰时,在相 应区一 1格输入 ,然后主、谓、宾放位于相应区一 5、一2、一 8袼;修 饰成分放位于相应区一 4、一 1、一7袼;补充成分放相应区一6、一3、 -9格;如果修饰区内和放入别的信息,还需采用整区修饰,则整区 插入,即在被插入框袼内放入插入符 *及区号,插入区号紧跟 *之 后,然后在插入区按整区修饰放位。  c The subject, predicate, and object with modified components are modified by the whole area. When the whole area is modified, enter it in the corresponding area-1 cell, and then the subject, predicate, and object are placed in the corresponding area-5, 1, 2, and 8; The ingredients are placed in the corresponding zone 1, 4, 1, 1, and 7; the supplementary ingredients are placed in the corresponding zone 1, 6, 3, and -9. If the modified area and other information are added, and the whole area needs to be modified, the whole area is modified. Insert the area, that is, insert the caret * and area code in the frame to be inserted, insert the area code immediately after *, and then place the entire area in the insertion area.
C. 多重修饰 C. Multiple modifications
a. ^单、多重修饰放位于相应区的一 4、一 1、一 7、一 5、一2、一8、 .6、一 3、一 9格;  a. ^ Single and multiple modifications are placed in the 4th, 4th, 1st, 1st, 7th, 5th, 2nd, 8th, .6th, 1st, 3rd, and 9th positions in the corresponding area;
b. 含主、谓、宾结枸的多重修饰中,其主、谓、宾部分采用插入  b. In the multiple modifications including subject, predicate, and object knot, the subject, predicate, and object parts are inserted.
- u · D. 介宾修饰 -u · D. Referee modification
a. 单介宾结枸介宾同格;  a. Single stabbing knot and stabbing homing;
b. 含主、谓、宾的介宾结构介词与主词或谓词同格;  b. Prepositional prepositions with subject, predicate, and object are in the same case as the subject or predicate;
c. 含多重修饰的介宾结枸介词放位于第一重修饰词前,并与 第一重修饰词同格放位;  c. Prepositions with multiple modifications are prepositioned before the first modifier, and placed in the same cell as the first modifier;
E. 并列修饰, A、B、C类修饰方法可并列使用,其并列方法包 括: E. Side-by-side modification. A, B, and C modification methods can be used side by side. The side by side methods include:
a. 一一对应;  a. One-to-one correspondence;
b. 一多对应;  b. One-to-many correspondence;
c. 多一对应;  c. One-to-one correspondence;
d. 转折并列修饰,转折词与转折前的最后一个修饰词同格放 位。  d. The transition is modified side by side. The transition word is placed in the same case as the last modifier before the transition.
5. 其他規则包括: 5. Other rules include:
a- 补充区(7)- (9)的放位规则与(4)- (6)区基本相同,不同点在 于简单、并列补充放位于一 4、一5、一 6格; · a- The supplementary zone (7)-(9) is basically the same as the rules in (4)-(6), the difference is that simple, side-by-side supplementary placement is in cells 4, 5, 5, and 6;
. 谓语时态表达方法为^核心区时态码带入谓语格; c 对动词、修饰词的 单否定与动词、修饰词同袼放位; d. 数量词組放同一袼且数前量后;  . Predicate tense expression method: ^ core area tense code is brought into the predicate; c single negation of verbs and modifiers and verbs and modifiers are placed together;
e. 紧密词組同格放位;  e. Tight phrase coherent placement;
f. 在自然诲言中省略的主、谓、宾词,組句时必须补入相应框 格; '  f. Subjects, predicates, and objects that are omitted in natural predicates must be filled in with corresponding boxes when forming sentences; '
g. 基本句之间的连接词放位于专用格内;  g. Conjunctions between basic sentences are placed in special cases;
h- 表示疑问、陈述、感叹、祁使句的专用符?,!。放入标点格内;  h- Special characters for questions, statements, exclamations, and Qishang? ,! . Put into punctuation grid;
1. 人名、地名、象声词用拼音表示并加用引号""。 1. Person names, place names, and onomatopoeia are represented by pinyin and quoted "".
句法框架中的格可作为窗口被打开,区和整个框桨可被嵌套进 被打开的"窗口"内,界面上核心谓语区的时态图形代码可被多次调 用到放有动词的其他区位中。 图 8示出简单句 "我们总是喜欢一些强壮的运动 I "在组句框 架中的放位情况,其编码数字形式表达结果是: 11 § 11512 , 22 § 12214 ,31 § 01617 , 67 § 10316 , 68 § 21313。 其中 §表示"格中放 入", §前的两位数字表示格号。 §后的五位数字是直接表意图形代 码或自然语言词汇代码(还应有一位近义附码的字母)。 图 9至图 12示出复杂句"即使尸、是半心半意关心本国发展的 任何囯家都可以利用其廉价又听话的劳动力来为这个全球市场生 产货物和实现国内经济快速增长",在組句框架中按上述組句放位 規则实施的放位结果。 其中第 4幅(图 12)組句结果整幅插入第 1 幅(图 9)的(4)区 *号处作为主语定语,其语法编码的数字表达形 式为在 48 § (7)接第 4幅编码结果。 The grid in the syntax frame can be opened as a window, and the area and the entire frame can be nested into it. In the opened "window", the temporal graphic code of the core predicate area on the interface can be called multiple times into other locations with verbs. Figure 8 shows the placement of the simple sentence "We always like some strong sports I" in the frame of the group sentence, and its coded digital form expression results are: 11 § 11512, 22 § 12214, 31 § 01617, 67 § 10316 , 68 § 21313. Where § means "put in the cell", and the two digits before § indicate the cell number. The next five digits are a direct ideographic code or a natural language vocabulary code (there should also be a letter with a synonym). Figures 9 to 12 show the complex sentence "Even if a corpse is half-hearted and interested in its own development, any country can use its cheap and obedient labor to produce goods for this global market and achieve rapid domestic economic growth." The placement results implemented in the group sentence framework according to the above group sentence placement rules. Among them, the 4th (Fig. 12) group sentence result is inserted into the 1st (Fig. 9) area (4) * as the subject attributive. Its grammatical coded digital expression is in 48 § (7) followed by 4 Frame encoding results.
图 13示出实施本发明时的机译系统的逻辑框图,由于本发明 的不同自然语言的统一语义编码方案,从词汇到句法层面对自然语 言的语义及表达形式均作了全面约定,因此用户可直接采用某种自 然语言迹入语义输入编码糸统,糸统将采用只有已被编码的自然语 言词汇义项可在人机界面上出现图形代码或自然语言词汇代码;只 有得到唯一数字代表码的词汇可自动或被人工调入界面句法框桨; 只有按照組句框架使用規则进行組句才給予机器翻译的方法,实现 自然语言表达的语义约束场。如果用户用自然语言输入的词汇未被 编码,也可采用语义分类法进行已被编码词汇的检索和调用。 由于对自然语言的语义表达进行了全面约定,因此与直接采用 自然语言进行语义表达相比,虽然损失了一部分細膩度,但却大幅 度提高了明确性,并且按语义编码系统进行语义表达的结果是可向 多种自然语言进行转换,因此信息输出方可 組句结果转换为自己 的母语进行表达结果验证,若不满意还可在界面上直接进行修改, 同时,信息接收方用户只要懂得本組句规则,当发现译文语义不明 时, 可调用信息输出方在界面上的組句结果,利用不同母语直接 进行语义查询,显然,这为机^翻译糸统在语义信息传递的可靠性 方面提供 7有力的保 。 本发明的统一语义编码方案解决了机器翻译糸统中最困难的 自然语言源语分析阶段的开发问题,其语义表达结果的中间文件形 式可直接与多种自然语言的生成技术接口,因此在本发明语义编码 糸统基础上开发机 翻译糸统,就可以只建立自然语言词汇层面概 念体糸与统一语义编码概念体系的对应关系数据库(包括实物图镨 数据库和专业词汇数据库)和开发向自然语言转换生成的技术(图FIG. 13 shows a logical block diagram of the machine translation system when implementing the present invention. Due to the unified semantic coding scheme of different natural languages of the present invention, the semantics and expression forms of natural language are fully agreed from the vocabulary to the syntactic level. A certain natural language trace can be directly used to enter the semantic input coding system. The system will use only the coded natural language vocabulary meaning items that can appear graphical codes or natural language vocabulary codes on the human-machine interface; Vocabulary can be automatically or manually transferred to the interface syntactic frame; only when the rules are used for group sentences according to the group sentence framework can machine translation be implemented to realize the semantic constraint field of natural language expression. If the vocabulary entered by the user in natural language is not encoded, the semantic classification method can also be used to retrieve and call the encoded vocabulary. Because of the comprehensive agreement on the semantic expression of natural language, compared with the direct use of natural language for semantic expression, although a part of the fineness is lost, the clarity is greatly improved, and the result of semantic expression according to the semantic encoding system It can be converted to many natural languages, so the information output can be converted into its own mother tongue for verification of expression results. If you are not satisfied, you can also modify it directly on the interface. At the same time, as long as the user of the information receiver understands the rules of the group sentence, when it is found that the semantics of the translation are unclear, the results of the group sentence on the interface of the information output side can be called and the semantic query can be performed directly in different mother tongues. Obviously, this is a machine translation system Provides 7 powerful guarantees in the reliability of semantic information transmission. The unified semantic coding scheme of the present invention solves the development problem of the most difficult natural language source language analysis stage in the machine translation system. The intermediate file form of the semantic expression result can directly interface with multiple natural language generation technologies. By inventing the semantic coding system and developing the machine translation system, it is possible to establish a correspondence database (including physical map database and professional vocabulary database) of the conceptual system of the natural language vocabulary level and the unified semantic coding concept system and the development of natural language Transformation generated technology (Figure
13所示糸统),同时所有在语义编码糸统上开发的自然语言机 翻 译系统都能相互连通,可大幅度缩短机^翻译糸统的开发周期、减 少开发费用,提高机^翻译系统的应用价值。 本发明的语义编码糸统与已有的自然语言处理技术之间具有 良好的互补关系,其相互结合、优势互补的结果导致了一种性能良 好的多语互译技术的产生,比如用户可直接通过语义编码方案界面 形式进行语义表达,再 其自动生成结果直接嵌入原有机^翻译糸 统的译文中,既可发挥现有机^翻译糸统速度快的优势,又使科技 文献翻译的质量和翻译系统的实用性得到实质性改善。 本发明的语义编码糸统可在计算机网络技术上发挥优势,只通 过一次机^翻译编码处理,用同一种编码形式在网络上进行语义信 息传递,各个网络终端再根据用户需要解码为多种自然语言,有助 于节省网络空间、提高网络信息的传递效率和实现网络信息 源大 众化国?示共享。 任何具有高中以上文化程度的普通用户经过短期训练就可自 由驾驭本发明的语义编码糸统去見服不同自然语言交流障碍,尽管 它不象人工翻译那样生动、细腻、优美、自然,但语义表达能力的够 用性和语义信息传递的明确性却可有良好的保 。 The system shown in Figure 13), and all natural language machine translation systems developed on the semantic coding system can communicate with each other, which can greatly shorten the development cycle of the machine translation system, reduce development costs, and improve the machine translation system. Value. The semantic coding system of the present invention has a good complementary relationship with the existing natural language processing technology. The combination of the mutual coding and complementary advantages has led to the production of a good multilingual translation technology. For example, users can directly Semantic expression through the interface format of the semantic coding scheme, and its automatic generation results are directly embedded in the original organic translation translation system, which can take advantage of the speed of the existing machine translation system, and make the quality and translation of scientific and technological documents translation The practicality of the system has been substantially improved. The semantic encoding system of the present invention can take advantage of computer network technology. Only one machine ^ translates the encoding processing, and uses the same encoding form to transmit semantic information on the network. Each network terminal then decodes it into a variety of natural resources according to user needs. Language can help save network space, improve the efficiency of network information transmission, and realize the popularization of network information sources? Show sharing. Any ordinary user with a high school education or higher can freely control the semantic coding system of the present invention to overcome different natural language communication barriers after a short training, although It is not as vivid, delicate, beautiful, and natural as human translation, but the adequacy of semantic expression ability and the clarity of semantic information transmission can be well guaranteed.

Claims

权利要求 Rights request
1. 一种不同自然语言的语义受限统一编码的计算机输入法, 通过计算机上的输入装置输入词汇层面的统一语义编码和句法层 面的统一语义编码,其特征在于: 1. A computer input method of semantically-unified unified encoding of different natural languages. The unified semantic encoding at the lexical level and the unified semantic encoding at the syntactic level are input through an input device on a computer, and are characterized by:
所述的词汇层面的统一语义编码包括:  The unified semantic coding at the lexical level includes:
1 ) 在统计自然语言多语种同义词基础上,用语义归类中心词 及相同数量原语集做参考,建立由介词、连词、动词、副词、形容词和 抽彖名词的基础概念构成的共有概念体糸;  1) Based on statistical natural language multilingual synonyms, use the semantically classified headword and the same number of primitive sets as references to establish a common conceptual body composed of the basic concepts of prepositions, conjunctions, verbs, adverbs, adjectives, and abstract nouns糸
2 ) 将每一共有基础概念按程度加、程序减、近义、襃义、貶义、 口语、书面语、俚语、俗语、成语扩展建成近义附码;  2) Extend each common basic concept by degree, procedure reduction, close meaning, righteousness, derogatory meaning, spoken, written, slang, idiomatic, and idioms into a close-sense code;
3) 建立共有概念的形式代码;  3) Formal code for common concepts;
4) 采用各自然语言近义词语义叠加定义法对共有概念进行无 解释定义标注 ·,  4) Annotation definition of common concepts by superimposing definitions of synonymous meanings of each natural language ·,
5 ) 对由基 概念及其近义附码组成的共有概念进行统一编 码;  5) Uniformly encode the common concept consisting of the base concept and its near-affixed code;
6) 建立自然语言概念与共有概念及近义附码的对位关糸,多 义词得到多个编码结果;在建立自然语言概念与共有概念对位关糸 时,如共有概念中的基础概念出现无自然语言吋应词时则用该种自 然语言对基础概念进行定义描述;在建立自然语言根据与共有概念 对位关糸时,如共有概念中的近义附码无对应词,则用基础概念进 行替代或用该种自然语言进行定义描述;  6) Establish the alignment of natural language concepts and common concepts and near-symbols, and polysemes get multiple encoding results. When establishing the alignment of natural language concepts and common concepts, if the basic concepts in common concepts appear In the natural language, the natural language is used to define and describe the basic concept; when the natural language basis is aligned with the shared concept, if the close-ended code in the shared concept does not have a corresponding word, the basic concept is used Substitute or use the natural language to define and describe;
7) 建立不同自然语言专业词汇对应数据库;  7) Establish a database of specialized natural language vocabulary;
8) 通过以上步骤均未荻得编码的自然语言词汇作为冗余概念 不予编码;  8) Natural language vocabularies that have not been encoded through the above steps are not encoded as a redundant concept;
所述的句法层面的统一语义编码包括:  The unified semantic coding at the syntactic level includes:
1 ) 句法概念体糸转换成人机共识的立休定位的句法框架, 用空间区位在组句框架中的位置表达句法概念; 2 )按纵向修饰区、核心区、补充区和橫向主语区、谓语区、宾语 区组合构成九个句法区,每一句法区等分成 3 X 3格,每一格以区 号、格号为其编号,^各自然语言組句中的共有概念,按其句法以表 意图形代码及数字编码形式放位于組句框架的相应区、袼中,得到 语义表达的句法信息编码。 1) The syntactic concept body transforms the syntactic framework of the positioning of the adult machine consensus, and uses the position of the spatial location in the group sentence framework to express the syntactic concept; 2) Nine syntactic regions are formed according to the combination of the vertical modification area, the core area, the supplementary area and the horizontal subject area, the predicate area, and the object area. Each syntactic area is equally divided into 3 X 3 cells, and each cell is divided into an area number and a cell number. Numbering, ^ The common concepts in each natural language group sentence, according to its syntax, are placed in the corresponding area and frame of the group sentence frame in the form of ideographic graphics code and digital encoding to obtain the syntax information encoding of semantic expression.
2- 如权利要求 1所述的计算机输入法,其特征在于,所述的词 汇层面的统一语义编码还包括:采用色彩建立代表共有概念词性的 词性代码。 2- The computer input method according to claim 1, wherein the unified semantic encoding at the vocabulary level further comprises: using a color to establish a part-of-speech code representing a common concept part-of-speech.
3. 如权利要求 1或 1所述的计算机输入法,其特征在于,所述 的形式代码包括直接表意图形符号、自然语言词汇原型、数字代码 或它们的組合。 3. The computer input method according to claim 1 or 1, wherein the formal code comprises a direct ideographic symbol, a natural language vocabulary prototype, a numeric code, or a combination thereof.
4. 如权利要求 3所述的计算机输入法,其特征在于,所述的直 接表意图形符号是以图彖代码与任意语种共同的词意——对应,图 象代码的生成是按词典释义、象征、演绎、比喻、手语借鉴、对比定 义、典型举例定义、描述定义所作用定形处理。 4. The computer input method according to claim 3, wherein the direct ideographic symbol is a common meaning of a map code and any language, and the image code is generated according to a dictionary definition, Symbolic, deductive, figurative, sign language reference, contrastive definition, typical example definition, descriptive definition.
' 5. 如权利要求 4所述的计算机输入法,其特征在于,所述的数 字代码是这样形成的:将具有同类功能词义的图彖代码以座标矩阵 形式排布,形成类图,每一图象代码在类图中所处位置的纵、橫座标 数为其类码;同一类码中不同词义的图象代码按页以座标矩阵形式 徘列形成位图,每一图象代码在一页类困中所处位置的纵、橫座标 数为位码;类码、页码、位码顺序构成任一图象代码的 5位等长数 码,数码定义在计算机键盘的数字键上。 5. The computer input method according to claim 4, characterized in that said numerical code is formed by arranging graphs and codes having similar functional meanings in the form of a coordinate matrix to form a class diagram. The number of vertical and horizontal coordinates of the position of an image code in the class diagram is its class code; the image codes of different word meanings in the same class code are arranged in a coordinate matrix form by page to form a bitmap. Each image The number of vertical and horizontal coordinates where the code is located on a page is a bit code; the class code, page number, and bit code constitute a 5-digit equal-length number of any image code, and the numbers are defined on the number keys of a computer keyboard on.
6. 如权利要求 5所述的计算机输入法 其特征在于,所述的由 基础概念及其近义附码組成的共有概念的统一编码由数字形式代 码和近义附码构成。 6. The computer input method according to claim 5, wherein the unified coding of the common concept consisting of the basic concept and its close-affixed code is composed of a digital form code and a close-affixed code.
7. 如权利要求 3所述的计算机输入法,其特征在于,还包括建 立实物性名词图诸及对应概念数字编码,用于词多义语义判断。 7. The computer input method according to claim 3, further comprising establishing physical noun maps and corresponding conceptual digital codes for judging polysemous semantics of words.
8. 根据权利要求 1所述的计算机输入法,其特征在于:所述的 句法框架中的格可作为窗口被打开,任一区和整个框架可被嵌套进 被打开的窗口内,界面上核心谓语区的时态图形代码可被多次调用 到放有动词的其他区位中。 8. The computer input method according to claim 1, wherein the cells in the syntactic frame can be opened as windows, and any area and the entire frame can be nested in the opened window, on the interface. The temporal graphic code of the core predicate area can be called multiple times into other locations where verbs are placed.
9. 一种不同自然语言的语义受限统一编码的计算机输入糸 统,包括微机、显示^、输入装置,可通过输入装置输入词汇层面的 统一语义编码和句法层面的统一语义编码,其特征在于: 9. A computer input system with uniformly-encoded semantics of different natural languages, including a microcomputer, a display ^, and an input device. The input device can be used to input a unified semantic code at the lexical level and a unified semantic code at the syntactic level. It is characterized by :
所述的词汇层面的统一语义编码包括:  The unified semantic coding at the lexical level includes:
1 ) 在统计自然语言多语种同义词基础上,用语义归类中心词 及相同数量原语集做参考,建立甶介词、连词、动词、副词、形容词和 抽象名词的基础概念构成的共有概念体糸;  1) On the basis of statistical natural language multilingual synonyms, use the semantically classified headword and the same number of primitive sets as references to establish a common concept body consisting of the basic concepts of prepositions, conjunctions, verbs, adverbs, adjectives and abstract nouns. ;
2 ) 将第一共有基础概念按程度加、程序减、近义、褒义、貶义、 口语、书面语、僅语、俗语、成语扩展建成近义附码;  2) Extend the first common basic concept by degree, procedure reduction, syntactic meaning, derogatory meaning, derogatory meaning, spoken language, written language, linguistic language, idiom, idiom, and expand into the synonymous code;
3) 建立共有概念的形式代码;  3) Formal code for common concepts;
4) 采用各自然语言近义词语义叠加定义法对共有概念进行无 解释定义标注;  4) Use the definition method of superimposition meaning of each natural language to mark common concepts without explanation;
5 ) 对由基础概念及其近义附码組成的共有概念进行统一编 码;  5) uniformly encode the common concept consisting of the basic concept and its close-affixed code;
6) 建立自然语言概念与共有概念及近义附码的对位关糸,多 义词得到多个编码结果;在建立自然语言概念与共有概念对位关糸 时,如共有概念中的基础概念出现无自然语言对应词时则用该种自 然语言对基础概念进行定义描述;在建立自然语言根据与共有概念 对位关系时,如共有概念中的近义附码无对应词,则用基础概念进 行替代或用该种自然语言进行定义描述;  6) Establish the alignment of natural language concepts and common concepts and near-symbols, and polysemes get multiple encoding results. When establishing the alignment of natural language concepts and common concepts, if the basic concepts in common concepts appear When natural language corresponding words are used, the basic concept is used to define and describe the basic concept. When the natural language basis is aligned with the common concept, if there is no corresponding word in the synonym of the common concept, the basic concept is used instead. Or use the natural language to define and describe;
7) 建立不同自然语言专业词汇对应数据库; 8) 通过以上步骤均未获得编码的自然语言词汇作为冗余概念 不予编码; 7) Establish a database of specialized natural language vocabulary; 8) Natural language vocabularies that have not been coded through the above steps are not coded as redundant concepts;
所述的句法层面的统一语义编码包括:  The unified semantic coding at the syntactic level includes:
1 ) 将句法概念体糸转换成人机共识的立体定位的句法框架, 用空间区位在组句框架中的位置表达句法概念;  1) The syntactic frame is transformed into a stereotactic syntactic frame of adult machine consensus, and the position of the spatial location in the group sentence frame is used to express the syntactic concept;
2 )按纵向修饰区、核心区、补 区和横向主语区、谓语区、宾语 区組合构成九个句法区,每一句法区等分成 3 X 3格,每一格以区 号、格号为其编号, ^各自然语言组句中的共有概念,按其句法以表 意图形代码及数字编码形式放位于組句框架的相应区、格中,得到 语义表达的句法信息编码。  2) Nine syntactic regions are formed according to the combination of the vertical modification area, the core area, the complement area and the horizontal subject area, the predicate area, and the object area. Each syntactic area is equally divided into 3 X 3 cells. Numbering, ^ The common concepts in each natural language group sentence are placed in the corresponding areas and cells of the group sentence frame in the form of ideographic graphics codes and digital codes according to their syntax to obtain the syntax information encoding of semantic expression.
10. 如权利要求 9所述的计算机输入糸统,其特征在于,所述 的词汇层面的统一语义编码还包括:采用色彩建立代表共有概念词 性的词性代码。  10. The computer input system according to claim 9, wherein the unified semantic encoding at the lexical level further comprises: using a color to establish a part-of-speech code representing a common concept part-of-speech.
11. 如权利要求 9或 10所述的计算机输入糸统,其特征在于, 所述的形式代码包括直接表意图形符号、自然语言词汇原型、数字 代码或它们的組合。 11. The computer input system according to claim 9 or 10, wherein the formal code comprises a direct ideographic symbol, a natural language vocabulary prototype, a numeric code, or a combination thereof.
12. 如权利要求 11所述的计算机输入糸统,其特征在于,所述 的直接表意图形符号是以图象代码与任意语种共同的词意一一对 应,图彖代码的生成是按词典释义、象征、演绎、比喻、手语借鉴、对 比定义、典型举例定义、描述定义所作用定形处理。 12. The computer input system according to claim 11, characterized in that the direct ideographic graphic symbols correspond one-to-one with image codes and common meanings in any language, and the graphic code is generated according to a dictionary definition. , Symbolization, deduction, metaphor, sign language reference, contrastive definition, typical example definition, descriptive definition.
13. 如权利要求 12所述的计算机输入系统,其特征在于,所述 的数字代码是这样形成的:将具有同类功能词义的图彖代码以座标 矩阵形式徘布,形成类图,每一图象代码在类图中所处位置的纵、橫 座标数为其类码;同一类码中不同词义的图象代码按页以座标矩阵 形式排岂有此理形成位图,每一图彖代码在一页类图中所处位置的 纵、横座标数为位码;类码、页码、位码顺序构成任一图象代码的 5 位等长数码,数码定义在计算机键盘的数字键上。 13. The computer input system according to claim 12, wherein the digital code is formed by arranging the graph code having the same functional word meaning in the form of a coordinate matrix to form a class diagram. The number of vertical and horizontal coordinates of the position of the image code in the class diagram is its class code; image codes with different meanings in the same class of code are arranged in the form of a coordinate matrix by page. How can this be done to form a bitmap? The number of vertical and horizontal coordinates on a page of the class diagram is a bit code; the class code, page number, and bit code sequentially constitute a 5-digit equal-length number of any image code, and the numbers are defined on the number keys of the computer keyboard.
14. 如权利要求 13所述的计算机输入糸统,其特征在于,所述 的由基础概念及其近义附码組成的共有概念的统一编码,由数字形 式代码和近义附码构成。 14. The computer input system according to claim 13, wherein the unified coding of the common concept consisting of the basic concept and its near-code is composed of a digital form code and a near-code.
15. 如权利要求 11所述的计算机输入糸统,其特征在于,还包 括建立实物性名词图 普及对应概念数字编码,用于词多义语义判 断。 15. The computer input system according to claim 11, further comprising establishing a physical noun map, a universal corresponding concept digital coding, and used for the word polysemy semantic judgment.
16. 根据权利要求 9所述的计算机输入糸统,其特征在于:所 述的句法框架中的格可作为窗口被打开,任一区和整个框架可被嵌 套进被打开的窗口内,界面上核心谓语区的时态图形代码可被多次 调用到放有动词的其他区位中。 16. The computer input system according to claim 9, wherein the cells in the syntactic frame can be opened as windows, and any area and the entire frame can be nested in the opened window. The interface The temporal graphic code of the upper core predicate area can be called multiple times into other locations where verbs are placed.
17. 一种不同自然语言的语义受限统一编码,包括词汇层面的 统一语义编码和句法层面的统一语义编码,其特征在于: 17. A semantically restricted unified encoding of different natural languages, including a unified semantic encoding at the lexical level and a unified semantic encoding at the syntactic level, which are characterized by:
所述的词汇层面的统一语义编码包括:  The unified semantic coding at the lexical level includes:
1 ) 在统计自然语言多语种同义词基础上,用语义归类中心词 及相同数量原语集做参考,建立由介词、连词、动词、副词、形容词和 抽象名词的基础概念构成的共有概念体系;  1) On the basis of statistical natural language multilingual synonyms, use the semantically classified headword and the same number of primitive sets as references to establish a common concept system consisting of the basic concepts of prepositions, conjunctions, verbs, adverbs, adjectives and abstract nouns;
2) 将每一共有基础概念按程度加、程序减、近义、襃义、貶义、 口语、书面语、俚语、俗语、成语扩展建成近义附码;  2) Expand each common basic concept by degree, procedure reduction, close meaning, righteousness, derogatory meaning, spoken, written, slang, idiomatic, and idioms to expand into the close meaning code;
3) 建立共有概念的形式代码;  3) Formal code for common concepts;
4) 采用各自然语言近义词语义叠加定义法对共有概念进行无 解释定义标注;  4) Use the definition method of superimposition meaning of each natural language to mark common concepts without explanation;
5 ) 对由基础概念及其近义附码組成的共有概念进行统一编 码;  5) uniformly encode the common concept consisting of the basic concept and its close-affixed code;
6) 建立自然语言概念与共有概念及近义附码的对位关糸,多 义词得到多个编码结果;在建立自然语言概念与共有概念对位关糸 时,如共有概念中的基础概念出现无自然语言对应词时则用该种自 然语言对基础概念进行定义描述;在建立自然语言根据与共有概念 对位关糸时,如共有概念中的近义附码无对应词,则用基础概念进 行替代或用该种自然语言进行定义描述; 6) Establish the alignment of natural language concepts and common concepts and near-symbols, and polysemes get multiple encoding results. When establishing the alignment of natural language concepts and common concepts, if the basic concepts in common concepts appear Natural language counterparts Natural language defines and describes the basic concepts; when the natural language is based on a counterpoint to the common concept, if the near-symbols in the common concept have no corresponding words, the basic concept is used instead or the natural language is used to define description;
7) 建立不同自然语言专业词汇对应数据库;  7) Establish a database of specialized natural language vocabulary;
8) 通过以上步骤均未获得编码的自然语言词汇作为冗余概念 不予编码;  8) Natural language vocabularies that have not been encoded through the above steps are not encoded as a redundant concept;
所述的句法层面的统一语义编码包括:  The unified semantic coding at the syntactic level includes:
1 ) 将句法概念体糸转换成人机共识的立体定位的句法框架, 用空间区位在组句框架中的位置表达句法概念;  1) The syntactic frame is transformed into a stereotactic syntactic frame of adult machine consensus, and the position of the spatial location in the group sentence frame is used to express the syntactic concept;
2) 按纵向修饰区、核心区、补充区和横向主语区、谓语区、宾语 区组合构成九个句法区,每一句法区等分成 3 X 3格,每一格以区 号、格号为其编号,将各自然语言组句中的共有概念,按其句法以表 意图形代码及数字编码形式放位于组句框架的相应区、格中,得到 语义表达的句法信息编码。  2) Nine syntactic regions are formed according to the combination of vertical modification area, core area, supplementary area and horizontal subject area, predicate area and object area. Each syntactic area is equally divided into 3 X 3 cells. Numbering, the common concepts in each natural language group sentence are placed in the corresponding areas and cells of the group sentence frame in the form of ideographic graphics codes and digital codes according to their syntax, and the syntax information encoding of the semantic expression is obtained.
18. 如权利要求 1所述的编码,其特征在于,所述的词汇层面 的统一语义编码还包括:采用色彩建立代表共有概念词性的词性代 码。 18. The encoding according to claim 1, wherein the unified semantic encoding at the lexical level further comprises: using a color to establish a part-of-speech code representing a common concept part-of-speech.
. .
19. 如权利要求 1或 2所述的编码,其特征在于,所述的形式 代码包括直接表意图形符号、自然语言词汇原型、数芋代码或它们 的組合。 19. The code according to claim 1 or 2, wherein the formal code comprises a direct ideographic symbol, a natural language vocabulary prototype, a number code, or a combination thereof.
20. 如权利要求 3所述的编码,其特征在于,所述的直接表意 图形符号是以图象代码与任意语种共同的词意一一对应,图象代码 的生成是按词典释义、象征、演绎、比喻、手语借鉴、对比定义、典型 举例定义、描述定义所作用定形处理。 20. The encoding according to claim 3, wherein the direct ideographic graphic symbols correspond one-to-one with image codes and common meanings in any language, and the image codes are generated according to dictionary definitions, symbols, Deduction, metaphor, sign language reference, contrastive definition, typical example definition, descriptive definition.
21. 如权利要求 4所述的编码,其特征在于,所述的数字代码 是这样形成的: 具有同类劝能词义的图彖代码以座标矩阵形式排 布,形成类图,每一困彖代码在类图中所处位置的纵、横座标数为其 类码;同一类码中不同词义的图象代码按页以座标矩阵形式徘岂有 此理形成位图,每一图象代码在一页类图中所处位置的纵、橫座标 数为位码;类码、页码、位码顺序构成任一图象代码的 5位等长数 码,数码定义在计算机键盘的数芋键上。 21. The code according to claim 4, wherein the digital code is formed as follows: a graph code having the same persuasive meaning is arranged in the form of a coordinate matrix Cloth, forming a class diagram, the number of vertical and horizontal coordinates of each trapped code position in the class diagram is its class code; image codes of different word meanings in the same class code are formed in the form of a coordinate matrix by page. Figure, the number of vertical and horizontal coordinates of the position of each image code on a page of a class diagram is a bit code; the class code, page number, and bit code sequence constitute a 5-digit equal-length number of any image code, and the digital definition On the number key of a computer keyboard.
22. 如权利要求 5所述的编码,其特征在于,所述的由基础概 念及其近义附码組成的共有概念的统一编码,由数字形式代码和近 义附码构成。 22. The coding according to claim 5, wherein the unified coding of the common concept consisting of the basic concept and its near-code is composed of a digital form code and a near-code.
23. 如权利要求 3所述的编码,其特征在于,还包括建立实物 性名词图谱及对应概念数芋编码,用于词多义语义判断。 23. The encoding according to claim 3, further comprising establishing a physical noun map and corresponding conceptual number coding, which are used to judge polysemous semantics of words.
24. 根据权利要求 1所述的编码,其特征在于:所述的句法框 架中的格可作为窗口被打开,任一区和整个框架可被嵌套进被打开 的窗口内,界面上核心谓语区的时态图形代码可被多次调用到放有 动词的其他区位中。 24. The encoding according to claim 1, characterized in that the cells in the syntactic frame can be opened as windows, any area and the entire frame can be nested in the opened window, and the core predicate on the interface The temporal graphic code of a zone can be called multiple times into other locations with verbs.
PCT/CN1997/000069 1996-07-02 1997-07-02 Computer input method of confined semantic unifying encoding for different natural languages and computer input system thereof WO1998000773A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU33336/97A AU3333697A (en) 1996-07-02 1997-07-02 Computer input method of confined semantic unifying encoding for different natural languages and computer input system thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN 96107009 CN1169555A (en) 1996-07-02 1996-07-02 Computor input method of limited-semateme encoding of different natural language
CN96107009.9 1996-07-02

Publications (1)

Publication Number Publication Date
WO1998000773A1 true WO1998000773A1 (en) 1998-01-08

Family

ID=5119457

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN1997/000069 WO1998000773A1 (en) 1996-07-02 1997-07-02 Computer input method of confined semantic unifying encoding for different natural languages and computer input system thereof

Country Status (3)

Country Link
CN (1) CN1169555A (en)
AU (1) AU3333697A (en)
WO (1) WO1998000773A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111326139A (en) * 2020-03-10 2020-06-23 科大讯飞股份有限公司 Language identification method, device, equipment and storage medium
CN112507705A (en) * 2020-12-21 2021-03-16 北京百度网讯科技有限公司 Position code generation method and device and electronic equipment

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011158066A1 (en) * 2010-06-16 2011-12-22 Sony Ericsson Mobile Communications Ab User-based semantic metadata for text messages
JP2013069158A (en) * 2011-09-22 2013-04-18 Toshiba Corp Machine translation device, machine translation method and machine translation program
CN112115722A (en) * 2020-09-10 2020-12-22 文化传信科技(澳门)有限公司 Human brain-simulated Chinese analysis method and intelligent interaction system
CN112214649B (en) * 2020-10-21 2022-02-15 北京航空航天大学 Distributed transaction solution system of temporal graph database

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1076535A (en) * 1993-02-24 1993-09-22 刘莎 Special purpose pocket calculator for social intercourse
CN1079717A (en) * 1992-06-02 1993-12-22 住友化学工业株式会社 Preparation method for a-alumine
CN1121597A (en) * 1994-10-24 1996-05-01 中国物资贸易发展总公司 pattern-code self-definition phonetic input method and electronic self-calling device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1079717A (en) * 1992-06-02 1993-12-22 住友化学工业株式会社 Preparation method for a-alumine
CN1076535A (en) * 1993-02-24 1993-09-22 刘莎 Special purpose pocket calculator for social intercourse
CN1121597A (en) * 1994-10-24 1996-05-01 中国物资贸易发展总公司 pattern-code self-definition phonetic input method and electronic self-calling device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111326139A (en) * 2020-03-10 2020-06-23 科大讯飞股份有限公司 Language identification method, device, equipment and storage medium
CN111326139B (en) * 2020-03-10 2024-02-13 科大讯飞股份有限公司 Language identification method, device, equipment and storage medium
CN112507705A (en) * 2020-12-21 2021-03-16 北京百度网讯科技有限公司 Position code generation method and device and electronic equipment
CN112507705B (en) * 2020-12-21 2023-11-14 北京百度网讯科技有限公司 Position code generation method and device and electronic equipment

Also Published As

Publication number Publication date
CN1169555A (en) 1998-01-07
AU3333697A (en) 1998-01-21

Similar Documents

Publication Publication Date Title
Androutsopoulos et al. Generating natural language descriptions from OWL ontologies: the NaturalOWL system
Silberztein Formalizing natural languages: The NooJ approach
US8478581B2 (en) Interlingua, interlingua engine, and interlingua machine translation system
CN102272755A (en) Method for semantic processing of natural language using graphical interlingua
Kang Spoken language to sign language translation system based on HamNoSys
WO1998000773A1 (en) Computer input method of confined semantic unifying encoding for different natural languages and computer input system thereof
Azarowa RussNet as a computer lexicon for Russian
Wu et al. Lexical Ontological Semantics
Alsulaiman et al. Handbook of Terminology
Maraldo Translating Nishida
Bertacco et al. An Interview with Emily Apter
Zhichang Chinese English: a future power?
Strong 1973 student paper award. An alogorithm for generating structural surrogates of english text
Kroskrity Aspects of Arizona Tewa language structure and language use.
Liu The chemistry of Chinese language
Song Sentence-final particle vs. sentence-final emoji: The syntax-pragmatics interface in the era of CMC
Tao et al. Foreignization of Tao Te Ching Translation in the Western World.
Baghini et al. Persian FrameNet: a Novel Approach to build FrameNet in the Persian Language Applicable to Islamic Context
Marini et al. CROATPAS: A Lexicographic Resource for Croatian Verbs and its Potential for Croatian Language Teaching
Elliott A human language corpus for interstellar message construction
Geller Experiencing Wissenstransfer in the First Episteme: Mesopotamia
Yang et al. On the Processing of Interrogative Sentence and Sentence Tense in Chinese-English Machine Translation
Yu et al. Text Processing Method based on Natural Language Processing:--Taking Two Model as Examples
Faber et al. Pronominal Feature Re-assembly: L1 and L2 Pronoun Resolution of Spanish Epicene and Common Gender Antecedents
Wu Analysis of the Cultural Characteristics of Loanwords in International Chinese Textbooks

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG US UZ VN AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH KE LS MW SD SZ UG ZW AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: JP

Ref document number: 98503704

Format of ref document f/p: F

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: CA

122 Ep: pct application non-entry in european phase