WO1998000773A1 - Procede et systeme informatique pour le codage semantique unifie et confine de differents langages naturels - Google Patents

Procede et systeme informatique pour le codage semantique unifie et confine de differents langages naturels Download PDF

Info

Publication number
WO1998000773A1
WO1998000773A1 PCT/CN1997/000069 CN9700069W WO9800773A1 WO 1998000773 A1 WO1998000773 A1 WO 1998000773A1 CN 9700069 W CN9700069 W CN 9700069W WO 9800773 A1 WO9800773 A1 WO 9800773A1
Authority
WO
WIPO (PCT)
Prior art keywords
code
natural language
common
concept
concepts
Prior art date
Application number
PCT/CN1997/000069
Other languages
English (en)
Chinese (zh)
Inventor
Sha Liu
Original Assignee
Sha Liu
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sha Liu filed Critical Sha Liu
Priority to AU33336/97A priority Critical patent/AU3333697A/en
Publication of WO1998000773A1 publication Critical patent/WO1998000773A1/fr

Links

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute

Definitions

  • the present invention relates to a computer text information processing system, and more particularly, to a computer input method and a computer input system for semantically limited unified coding of different natural language machine translations.
  • the invention also relates to a semantically restricted unified coding of different natural languages. Background technique
  • the object of the present invention is to provide a computer input method and a computer input system with uniformly-encoded semantics of different natural languages. That is, a limited natural language is used to develop a machine translation system.
  • a limited natural language is used to develop a machine translation system.
  • On the basis of the recognition of capability boundaries in order to adapt to the computer's natural language processing capabilities, first make a restrictive agreement on the relationship between the natural language symbol form and semantic content, and then uniformly encode the restricted semantics, thereby effectively Improve the quality of machine translation and solve the problem of interoperability barriers between different natural languages.
  • a computer input method of uniformly-encoded semantics of different natural languages including a word '; a unified semantic encoding at a level and a unified semantic encoding at a syntax level, wherein:
  • the unified semantic coding at the lexical level includes:
  • the unified semantic coding at the syntactic level includes:
  • syntactic frame is transformed into a stereotactic syntactic frame of adult machine consensus, and the position of the spatial location in the group sentence frame is used to express the syntactic concept;
  • the invention also provides a computer input system with semantically limited unified coding of different natural languages and a semantically limited unified coding of different natural languages for computer translation.
  • the unified encoding of semantically restricted semantics of different natural languages of the present invention is an easily identifiable and operable machine-translated public media language, which can be used by people with different mother tongues.
  • the technology of the present invention is further described below with reference to the embodiments and the accompanying drawings. Overview of the drawings
  • Figure 1 Schematic diagram of a shared concept sleepy code design example.
  • Figure 2b Schematic illustration of the correspondence between temporal semantics, graphic codes, and numbers.
  • Figure 4. Alignment chart of natural language concept and common concept of the basic word "beautiful”.
  • Figure 5. Alignment diagram of natural language concepts and common concepts of the basic word "if”.
  • Figure 6. Schematic diagram of the physical map of multilingual real nouns.
  • Figure 13 Logic block diagram of restricted semantic unified coding machine translation system. Best Mode of the Invention
  • the vocabulary-level common concept system established by the present invention is based on synonymous statistics of synonyms of different languages in natural languages. It is obtained by comparing and sifting the natural language semantic classification headword with the corresponding number of primitive sets.
  • the embodiment is based on using natural Language population distribution and shared concepts must have the characteristics of "isolated language".
  • Chinese synonyms, semantically classified headwords and corresponding sets of primitives select and form a common concept base.
  • the descriptive definition method is used for concepts with certain natural stereotypes; the definitions of concepts that can be obtained through dictionary interpretation can be interpreted and defined, such as the "lazy” dictionary meaning "do not want to work”; from the explanation
  • typical examples can be used, such as the "abstract” dictionary definition of "extracting essential features from various affairs”; for concepts with relevant k meaning, use the contrast definition to obtain image expression Such as “high”, “short”; when the iconicity of the dictionaries of the dictionaries is poor, common symbolic methods of symbolic images can be used.
  • the meaning of the dictionary of "peace” is "state without war".
  • the levy image is a dove of peace with an olive branch;
  • the deductive method is a graphic code derived from the graphic codes of other concepts, such as "diplomacy” derived from the country and communication;
  • the metaphorical method is obtained from metaphorical expressions such as The definition of "noble” is "moral water” Ping Gao ...
  • the sign language reference method is also called psychological imagery.
  • Some words that represent abstract concepts can learn from sign language expression methods.
  • Graphic codes can be colored for easy identification. With the concept graphic code, there will be similar
  • the graphic codes of persuasive word meanings are arranged in the form of a coordinate matrix to form a class diagram (see Figure 2a).
  • the number of vertical and horizontal coordinates of the position of each graphic code in the class diagram is its class code; different word meanings in the same class code
  • the graphic code is arranged in the form of a coordinate matrix according to the page to form a bitmap.
  • the number of vertical and horizontal coordinates of the position of the first graphic code on a page is its bit code; the order of the class code, page number, and bit code constitutes any 5-digit isometric number of graphic code, the number is defined on the keyboard number keys.
  • time, orientation, quantity, reference, association, grammar raw materials (animals, plants, artificial composites), physical movement, human movement, Meteorology, physics, personal, life, social relations, psychology, thinking, transportation, communication, finance, trade, economy, tourism, food, entertainment, medical, shopping, administration, culture, education, technology,
  • first classify establish class diagrams and class codes, and then establish bitmaps and bitcodes for each category on a page-by-page basis.
  • Figure 2 shows some conceptual graphic codes of time classes and their numbers, including year, month, Day, day, hour, minute, second, day, morning, morning, noon, afternoon, evening, night, time, day before yesterday, yesterday, today, tomorrow, day after tomorrow, past, present, future, season, spring, autumn, summer, Winter, period, early, middle, late, epoch, dynasty, gongwu, century, era, ancient, modern, modern, contemporary, etc.
  • the code for "daytime” is "01112", where the class code for time is "01”, page number is "1", bit code is "12".
  • the present invention can be a direct ideographic symbol and number code, a natural language vocabulary prototype, or a direct ideographic symbol Combined with natural language vocabulary prototypes.
  • Adopting direct ideographic symbols as a common conceptual form code has fast features and is more suitable for users who often use the unified coding machine translation system. However, in the learning process of the user, it takes a long time to hold it often. It is easy to learn because it has a common conceptual form code, but it has the disadvantage of polysemy. At this time, the graphic symbol can be combined with it and used as one of the discrimination symbols for users to choose in the process of polysemous discrimination.
  • the operator of the computer interface can include both the natural language vocabulary prototype and the direct ideographic graphic symbol, taking advantage of the two forms of code.
  • colors can be used to build part-of-speech codes. For example, red indicates a verb, white indicates a noun, pink indicates an adjective, yellow indicates an adverb, gray indicates a technical term, silver gray indicates a function word, and so on. Since it is difficult to get rid of the randomness of natural language understanding in the semantic description of common concepts with natural language, the semantic annotation of common concept codes in the present invention uses the natural language synonym superposition definition method to mark common concepts with unexplained definitions.
  • the unexplained definition notation also establishes alignment charts for multiple natural language concepts and common concepts, and for the convenience of computer processing, it also uniformly encodes common concepts (including basic concepts and their synonymous codes) and includes them in Alignment chart.
  • common concepts including basic concepts and their synonymous codes
  • Alignment chart Through the above steps, we will obtain a common concept system of semantic expression at the lexical level. In this system, each common concept is defined by the semantic superposition of different natural language synonyms.
  • the common concepts of "self-evident" have an ideographic code, a unique numeric code, and a prototype of a natural language vocabulary.
  • Figures 3, 4, and 5 show the alignment of the natural language concepts and common concepts of the basic words “laugh”, “beauty”, “if” (Chinese), “laugh”, “beautifur,” and “if” (English), respectively.
  • the middle column gives the media language concept graphic code, numbers and their subtraction by degree, plus, synonym, idiom, written language, colloquialism, slang, idiomatic, slang, and derogatory.
  • B, ⁇ j) to form a semantic overlay definition.
  • “Hua” (51) is transferred to the computer interface for users to make semantic discrimination, so as to improve the speed of semantic discrimination.
  • c Professional vocabulary can establish a multilingual professional vocabulary correspondence database.
  • d When establishing a vocabulary-level concept of a certain natural language and common counterpoints, if there is no corresponding word in the common basic concept, then use this natural language to define and describe the common basic concept. Corresponding words, this natural language can be used to define and describe common basic concepts, and basic concepts can also be used to replace gaps, as shown in Figures 3 to 5; e- If the method of ad is still not used to establish a relationship with the common concepts.
  • the relevant natural language concepts (vocabulary terms) are not coded as redundant concepts.
  • the vocabulary-level concepts of multiple natural languages can be given a limited unified semantic encoding, and all encoded lexical meaning terms have a clear semantic agreement.
  • the unified semantic coding at the syntactic level of the present invention is an image expression of a combination of concepts at the lexical level.
  • the implicit syntax concept of natural language must be promoted to the surface level and then unified conceptual coding.
  • the final result of syntactic analysis is usually expressed as a syntax tree.
  • the grammatical concepts involved in this syntax tree can be said to be obtained from the source language when performing machine translation. Syntactic information.
  • the syntactic concept expressed by the "grammar tree" is transformed into a spatially-oriented syntactic frame.
  • the spatial location is syntactic.
  • the syntactic frame (group sentence frame) shown in the dilemma includes: (1) to (9) a total of nine regions, each of which consists of one to nineteen cells.
  • the solid line frame in the figure is the area, and the dotted frame For cells, the number of each cell is composed of area code and ⁇ . For example, the code of cell 2 and cell 2 is 1-2.
  • (1), (2), and (3) are the core subject, predicate, and object areas.
  • Areas (1) and (3) have the same usage rules, including:
  • the single subject and object (table) words serving as the core components of the sentence are placed in the first cell of the area;
  • the direct object placement method in the double object is the same as the a and f rules, and the indirect object placement (8) area.
  • the core predicate in the core predicate area is placed in the area 2 ⁇ ;
  • the willing predicate of the willing predicate predicate is placed in the area 1 ⁇ 1, and the core predicate is placed in the 2 area of the area;
  • e- represents the past tense and future of the special sleepy code of the core predicate tense, posture, and voice.
  • Time, progress, completion, negation, are dynamically placed in the 4th, 4th, 7th, 6th, 9th, 5th, and 8th cells in the area, expressing the tense with special figure codes.
  • the rules for using the area are to place the modified component in the modified area that goes with the modified component.
  • the modified component of the core subject (1) area must be placed in (4) area; when there is a parallel subject If there are modifiers, they must go with the modifier.
  • the rules for using freely modified areas include:
  • the modifier corresponds to the first modifier.
  • Simple subject, predicate, and object structure modification can be placed in the corresponding area of a 4, 4, 5, 6, 1, 1, 2, 1, 3, 7, 7, 8, _9;
  • Modifications of the subject, predicate, and object structures that can be linked together and co-located predicates, can be linked together and co-located predicates side-by-side and placed in one, one, two, one, and three cells in the corresponding area;
  • the subject, predicate, and object with modified components are modified by the whole area.
  • the whole area enter it in the corresponding area-1 cell, and then the subject, predicate, and object are placed in the corresponding area-5, 1, 2, and 8;
  • the ingredients are placed in the corresponding zone 1, 4, 1, 1, and 7; the supplementary ingredients are placed in the corresponding zone 1, 6, 3, and -9.
  • the whole area is modified. Insert the area, that is, insert the caret * and area code in the frame to be inserted, insert the area code immediately after *, and then place the entire area in the insertion area.
  • Prepositional prepositions with subject, predicate, and object are in the same case as the subject or predicate;
  • Prepositions with multiple modifications are prepositioned before the first modifier, and placed in the same cell as the first modifier;
  • A, B, and C modification methods can be used side by side.
  • the side by side methods include:
  • transition is modified side by side.
  • the transition word is placed in the same case as the last modifier before the transition.
  • the supplementary zone (7)-(9) is basically the same as the rules in (4)-(6), the difference is that simple, side-by-side supplementary placement is in cells 4, 5, 5, and 6;
  • Predicate tense expression method ⁇ core area tense code is brought into the predicate; c single negation of verbs and modifiers and verbs and modifiers are placed together;
  • the grid in the syntax frame can be opened as a window, and the area and the entire frame can be nested into it.
  • the temporal graphic code of the core predicate area on the interface can be called multiple times into other locations with verbs.
  • Figure 8 shows the placement of the simple sentence "We always like some strong sports I" in the frame of the group sentence, and its coded digital form expression results are: 11 ⁇ 11512, 22 ⁇ 12214, 31 ⁇ 01617, 67 ⁇ 10316 , 68 ⁇ 21313.
  • means "put in the cell”
  • the two digits before ⁇ indicate the cell number.
  • the next five digits are a direct ideographic code or a natural language vocabulary code (there should also be a letter with a synonym).
  • Figures 9 to 12 show the complex sentence "Even if a corpse is half-hearted and interested in its own development, any country can use its cheap and obedient labor to produce goods for this global market and achieve rapid domestic economic growth.”
  • the placement results implemented in the group sentence framework according to the above group sentence placement rules. Among them, the 4th (Fig. 12) group sentence result is inserted into the 1st (Fig. 9) area (4) * as the subject attributive. Its grammatical coded digital expression is in 48 ⁇ (7) followed by 4 Frame encoding results.
  • FIG. 13 shows a logical block diagram of the machine translation system when implementing the present invention. Due to the unified semantic coding scheme of different natural languages of the present invention, the semantics and expression forms of natural language are fully agreed from the vocabulary to the syntactic level. A certain natural language trace can be directly used to enter the semantic input coding system. The system will use only the coded natural language vocabulary meaning items that can appear graphical codes or natural language vocabulary codes on the human-machine interface; Vocabulary can be automatically or manually transferred to the interface syntactic frame; only when the rules are used for group sentences according to the group sentence framework can machine translation be implemented to realize the semantic constraint field of natural language expression.
  • the semantic classification method can also be used to retrieve and call the encoded vocabulary. Because of the comprehensive agreement on the semantic expression of natural language, compared with the direct use of natural language for semantic expression, although a part of the fineness is lost, the clarity is greatly improved, and the result of semantic expression according to the semantic encoding system It can be converted to many natural languages, so the information output can be converted into its own mother tongue for verification of expression results. If you are not satisfied, you can also modify it directly on the interface.
  • the system shown in Figure 13), and all natural language machine translation systems developed on the semantic coding system can communicate with each other, which can greatly shorten the development cycle of the machine translation system, reduce development costs, and improve the machine translation system. Value.
  • the semantic coding system of the present invention has a good complementary relationship with the existing natural language processing technology. The combination of the mutual coding and complementary advantages has led to the production of a good multilingual translation technology. For example, users can directly Semantic expression through the interface format of the semantic coding scheme, and its automatic generation results are directly embedded in the original organic translation translation system, which can take advantage of the speed of the existing machine translation system, and make the quality and translation of scientific and technological documents translation The practicality of the system has been substantially improved.
  • the semantic encoding system of the present invention can take advantage of computer network technology. Only one machine ⁇ translates the encoding processing, and uses the same encoding form to transmit semantic information on the network. Each network terminal then decodes it into a variety of natural resources according to user needs. Language can help save network space, improve the efficiency of network information transmission, and realize the popularization of network information sources? Show sharing. Any ordinary user with a high school education or higher can freely control the semantic coding system of the present invention to overcome different natural language communication barriers after a short training, although It is not as vivid, delicate, beautiful, and natural as human translation, but the adequacy of semantic expression ability and the clarity of semantic information transmission can be well guaranteed.

Abstract

L'invention concerne un système de traitement informatique d'informations sous forme de caractères, un procédé informatique de codage unifié permettant de développer un système de traduction machine par code de language naturel confiné. Un codage sémantique unifié de vocabulaire consiste à établir un système de code conceptuel commun constitué de codes de forme conceptuelle de base et de codes auxiliaires ressemblants. Un codage d'unification sémantique confiné est appliqué pour différents langages naturels à l'aide de codes conceptuels communs. Les codes d'unification sémantique confinés sont traduits en langage (ou mots) relatif-à-naturel au moyen d'une interface homme-machine.
PCT/CN1997/000069 1996-07-02 1997-07-02 Procede et systeme informatique pour le codage semantique unifie et confine de differents langages naturels WO1998000773A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU33336/97A AU3333697A (en) 1996-07-02 1997-07-02 Computer input method of confined semantic unifying encoding for different natural languages and computer input system thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN 96107009 CN1169555A (zh) 1996-07-02 1996-07-02 不同自然语言语义受限统一编码的计算机输入法
CN96107009.9 1996-07-02

Publications (1)

Publication Number Publication Date
WO1998000773A1 true WO1998000773A1 (fr) 1998-01-08

Family

ID=5119457

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN1997/000069 WO1998000773A1 (fr) 1996-07-02 1997-07-02 Procede et systeme informatique pour le codage semantique unifie et confine de differents langages naturels

Country Status (3)

Country Link
CN (1) CN1169555A (fr)
AU (1) AU3333697A (fr)
WO (1) WO1998000773A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111326139A (zh) * 2020-03-10 2020-06-23 科大讯飞股份有限公司 一种语种识别方法、装置、设备及存储介质
CN112507705A (zh) * 2020-12-21 2021-03-16 北京百度网讯科技有限公司 一种位置编码的生成方法、装置及电子设备

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8849930B2 (en) * 2010-06-16 2014-09-30 Sony Corporation User-based semantic metadata for text messages
JP2013069158A (ja) * 2011-09-22 2013-04-18 Toshiba Corp 機械翻訳装置、機械翻訳方法および機械翻訳プログラム
CN112115722A (zh) * 2020-09-10 2020-12-22 文化传信科技(澳门)有限公司 一种仿人脑中文解析方法及智能交互系统
CN112214649B (zh) * 2020-10-21 2022-02-15 北京航空航天大学 一种时态图数据库分布式事务解决系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1076535A (zh) * 1993-02-24 1993-09-22 刘莎 场景交际专用袖珍计算器
CN1079717A (zh) * 1992-06-02 1993-12-22 住友化学工业株式会社 α-氧化铝制造方法
CN1121597A (zh) * 1994-10-24 1996-05-01 中国物资贸易发展总公司 图码键自定义语音输入法及电子自呼机

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1079717A (zh) * 1992-06-02 1993-12-22 住友化学工业株式会社 α-氧化铝制造方法
CN1076535A (zh) * 1993-02-24 1993-09-22 刘莎 场景交际专用袖珍计算器
CN1121597A (zh) * 1994-10-24 1996-05-01 中国物资贸易发展总公司 图码键自定义语音输入法及电子自呼机

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111326139A (zh) * 2020-03-10 2020-06-23 科大讯飞股份有限公司 一种语种识别方法、装置、设备及存储介质
CN111326139B (zh) * 2020-03-10 2024-02-13 科大讯飞股份有限公司 一种语种识别方法、装置、设备及存储介质
CN112507705A (zh) * 2020-12-21 2021-03-16 北京百度网讯科技有限公司 一种位置编码的生成方法、装置及电子设备
CN112507705B (zh) * 2020-12-21 2023-11-14 北京百度网讯科技有限公司 一种位置编码的生成方法、装置及电子设备

Also Published As

Publication number Publication date
AU3333697A (en) 1998-01-21
CN1169555A (zh) 1998-01-07

Similar Documents

Publication Publication Date Title
US8478581B2 (en) Interlingua, interlingua engine, and interlingua machine translation system
CN102272755A (zh) 使用图形国际语对自然语言进行语义处理的方法
Kang Spoken language to sign language translation system based on HamNoSys
WO1998000773A1 (fr) Procede et systeme informatique pour le codage semantique unifie et confine de differents langages naturels
Azarowa RussNet as a computer lexicon for Russian
Wu et al. Lexical Ontological Semantics
Alsulaiman et al. Handbook of Terminology
Maraldo Translating Nishida
Bertacco et al. An Interview with Emily Apter
Strong 1973 student paper award. An alogorithm for generating structural surrogates of english text
Kroskrity Aspects of Arizona Tewa language structure and language use.
Tohidi et al. PAMR: Persian Abstract Meaning Representation Corpus
Liu The chemistry of Chinese language
Song Sentence-final particle vs. sentence-final emoji: The syntax-pragmatics interface in the era of CMC
Tao et al. Foreignization of Tao Te Ching Translation in the Western World.
Baghini et al. Persian FrameNet: a Novel Approach to build FrameNet in the Persian Language Applicable to Islamic Context
Marini et al. CROATPAS: A Lexicographic Resource for Croatian Verbs and its Potential for Croatian Language Teaching
Fakhriddinovna PROBLEMS IN TRANSLATION
Elliott A human language corpus for interstellar message construction
Geller Experiencing Wissenstransfer in the First Episteme: Mesopotamia
Yang et al. On the Processing of Interrogative Sentence and Sentence Tense in Chinese-English Machine Translation
Yu et al. Text Processing Method based on Natural Language Processing:--Taking Two Model as Examples
Faber et al. Pronominal Feature Re-assembly: L1 and L2 Pronoun Resolution of Spanish Epicene and Common Gender Antecedents
Wu Analysis of the Cultural Characteristics of Loanwords in International Chinese Textbooks
Fuertes-Olivera A database on English lexicology: The formal-informal English language database (field)

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG US UZ VN AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH KE LS MW SD SZ UG ZW AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: JP

Ref document number: 98503704

Format of ref document f/p: F

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: CA

122 Ep: pct application non-entry in european phase