WO1999052041A1 - Opening and holographic template type of language translation method having man-machine dialogue function and holographic semanteme marking system - Google Patents

Opening and holographic template type of language translation method having man-machine dialogue function and holographic semanteme marking system Download PDF

Info

Publication number
WO1999052041A1
WO1999052041A1 PCT/CN1999/000046 CN9900046W WO9952041A1 WO 1999052041 A1 WO1999052041 A1 WO 1999052041A1 CN 9900046 W CN9900046 W CN 9900046W WO 9952041 A1 WO9952041 A1 WO 9952041A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
template
language
semantic
translation
Prior art date
Application number
PCT/CN1999/000046
Other languages
French (fr)
Chinese (zh)
Inventor
Sha Liu
Original Assignee
Sha Liu
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN 98101156 external-priority patent/CN1231453A/en
Priority claimed from CN 98125015 external-priority patent/CN1254895A/en
Application filed by Sha Liu filed Critical Sha Liu
Priority to AU33249/99A priority Critical patent/AU3324999A/en
Publication of WO1999052041A1 publication Critical patent/WO1999052041A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation

Definitions

  • the present invention relates to a computer translation method, and more particularly to a machine translation method suitable for each network terminal in a computer network to perform information transfer and communication in different natural languages.
  • the statistical analysis of large-scale real texts is based on multi-angle information sampling analysis of large-scale real texts, such as symbols, sentence patterns, parts of speech, and semantics, so as to provide multiple matching modes for symbol strings in any natural language.
  • a language information processor based on experience Law. Methodologically, this natural language information processing method can superimpose multiple matching analysis results of the source language and establish a matching relationship with the multiple matching analysis results of the target language to complete the natural language directly.
  • Automatic translation but the reality is that natural language systems have random and open characteristics. Any statistical method can only provide probabilistic knowledge. It is impossible to restrict access to natural language vocabulary and its conceptual definitions, and it is impossible to determine various omitted expressions. It is also impossible to resolve the new ambiguities after generating the target language. Therefore, although the statistical analysis of large-scale real texts is indeed a meaningful basic work for the use of computers for various natural language information processing, for machine translation, this technical means needs to be combined in a comprehensive and effective The object processing system method can fully realize its application value.
  • Human-machine dialogue and natural language-restricted machine translation methods involve the user adjusting the machine translation dictionary and adjusting the source language expression at the input end, while adjusting the translation results. Although this method can obtain better machine translation quality, it requires users to be proficient Mastering the source and target languages of machine translation requires a relatively high learning and operating cost of human-machine dialogue, which is comparable to human translation. Purpose of the invention
  • the object of the present invention is to design an open holographic template human-machine dialogue machine translation method to comprehensively solve the problem of obstacles to multilingual information transmission and communication in computer networks, and try to achieve a substantial breakthrough in machine translation technology.
  • This breakthrough must meet the following requirements:
  • Another object of the present invention is to propose a holographic semantic annotation system, which can be used to make holographic semantic annotation on a text, and store the annotation information together with the text. Recall the callout information along with the text when needed.
  • an open holographic template-type human-machine dialogue language translation method including the following steps:
  • the man-machine dialogue template provides all alternative semantic information items corresponding to the original language symbols subject to contract restrictions and blank information items for user expansion;
  • the computer of the translation system first automatically optimizes all the alternative semantic information restricted by the contract, and then the original user manually adjusts and confirms the preferred result on the human-computer dialogue template;
  • the translation system generates a translation according to the semantic information items determined by the human-machine complementarity, and converts the semantic information items determined by the human-machine complementarity into a translation symbol, and provides the translation user with the translation for query.
  • a holographic semantic tagging system which includes: a necessary semantic information library, which stores basic vocabulary and its conceptual definitions and syntactic information items;
  • a text input device for inputting text to be semantically annotated;
  • a text storage device is used to store text input through a text input device;
  • a text display device is used to display a certain text stored in the text storage device;
  • a sentence selection device is used to select a certain one of the text displayed by the text display device.
  • Automatic sentence structure analysis device for automatically analyzing the structure of a selected sentence according to statistical experience
  • a semantic annotation template display device is used to display a semantic annotation template.
  • the semantic annotation template is displayed corresponding to the selected sentence when a sentence is selected by the sentence selection device, and includes a vocabulary corresponding to each word in the sentence.
  • the information element item and the syntactic information element item, the lexical information element item displays the corresponding vocabulary's concept definition and all synonyms included in the necessary semantic information base, and each syntactic information element item is analyzed by the automatic analysis device according to the sentence structure Results of displaying all possible syntactic information items of the corresponding vocabulary, where each syntactic information item is stored in the necessary semantic information base;
  • Semantic labeling device used for selecting concept definitions and synonyms and syntactic information item items in each lexical information element item in the semantic labeling template; labeling text storage device for storing the labels with labels The text of the message;
  • a labeling instruction device configured to instruct a certain sentence in the text displayed by the text display device to display its labeling
  • the annotation display device is configured to display, in the form of the annotation template, annotation information corresponding to the commanded sentence stored in the annotation text storage device.
  • the technical features of the open holographic template-based human-machine dialogue machine translation method of the present invention are:
  • the basic point of human-machine dialogue is that the user directly selects template information, for the user, only the mother tongue needs to be mastered, and there is basically no learning cost; It is made with full consideration of the computer's actual boundary ability for information processing, and the accuracy of semantic information transmission as the central task and practical goal.
  • This method makes full use of the complementary advantages of human and machine, and the translation content is not affected by the language environment and application.
  • This method provides a comprehensive system solution to solve the basic technical obstacles of machine translation by establishing a unified limited standard and a full-selection human-machine dialogue, which provides a comprehensive technical guarantee for fundamentally improving the quality of machine translation;
  • this method Can make full use of the results of large-scale corpus construction, the natural language processing method is concise and practical, and has good implementability; although in the source language information solution stage, the language that the user does not understand is impossible to conduct human-computer dialogue, but can be used in Under the premise of ensuring translation quality, multilingual translation results can be obtained by implementing one language input.
  • the open holographic template human-machine dialogue language translation method of the present invention has universal application value in the field of network information exchange, and has a broad international market in opening online online machine translation services.
  • the holographic semantic annotation system of the present invention can store the lexical interpretation and grammatical structure information of a text with the text at the same time, and display these annotation information when needed. This system can be widely used in the interpretation of legal documents and language teaching. Brief description of the drawings
  • Figure 1 is a schematic diagram of the structure of a natural language holographic dialogue template with a sentence as the object;
  • Figure 2 shows the content of a holographic dialogue template with an English sentence as the object;
  • Figure 3 is a schematic diagram of the vocabulary information communication restriction structure between different natural languages;
  • Fig. 4a and Fig. 4b are schematic diagrams of two methods for displaying dialogue information in the process of human-machine dialogue;
  • Fig. 5 is a schematic diagram of the spatial positioning structure of syntactic component information;
  • Fig. 6 is a process of man-machine interaction information when an English sentence is translated according to the method of the present invention.
  • FIG. 7 is a schematic diagram of a syntactic information item actually carried by a translation user querying a natural language symbol "with a telescope"; a preferred embodiment of the present invention
  • the principle and implementation process of the open holographic template human-machine dialogue language translation method of the present invention is explained below with an example of translating English sentences into Chinese.
  • the example sentence used is
  • the example sentence contains multiple language symbols.
  • the language symbols mentioned here can be either words or phrases.
  • Each language symbol carries a certain amount of semantic information, including the concept definition of the language symbol, the tense, the voice, and the composition of the language symbol in the sentence.
  • the conceptual definition of the word “saw” is "see”
  • the tense is the past tense
  • the voice is the active voice
  • the component in the sentence is a predicate.
  • linguistic symbols may carry more than one kind of semantic information.
  • the concept of the word “saw” can be defined as “seeing” and “understanding and understanding” as well as the phrase "with
  • the syntactic component of "a telescope” can be either a predicate modifier or an object modifier.
  • the method adopted by the present invention is to solve all semantic information items of the original text in a human-computer interaction manner on the original user side, generate a translation according to the result of the solution, and convert the solution result into a translation symbol, which is provided to the translation.
  • the target users can be searched in order to realize the full translation with the original user and target users participating together, and improve the quality of semantic information transmission.
  • the present invention establishes a natural language holographic dialogue template with a sentence as an object as shown in FIG. 1.
  • the so-called "hologram" refers to the inclusion of all natural language characters in this template.
  • the necessary semantic information elements include conceptual definition items, tense information items and voice information items belonging to lexical information elements, and syntactic component items belonging to syntactic information elements.
  • the dialog template is used to provide the original text user with alternative semantic information items corresponding to the original language symbols for human-computer interaction selection. The content of these dialog information items, such as What will be explained later must be limited by the system.
  • the dialog template also includes some non-required information items, such as semantic attributes, grammatical attributes, and higher-level semantics (lattice). These information items can not be selected by the user, and only a probabilistic automatic solution can be performed by the computer in order to convert automatically. Generate translations to provide relevant information.
  • the present invention carries out a unified and integrated treatment of the differences between different natural languages by establishing a system of the principle of contractual limitation.
  • This principle of restrictive communication includes syntactic information communication and lexical information communication.
  • syntactic principles of syntactic information designed by the present invention include: uniformly merge syntactic information with the same function and different objects; try to delete syntactic concepts that are not indispensable in the analysis of semantic aggregation relations, such as direct objects and indirect objects in English grammar.
  • the present invention provides only the simplified syntactic information concept on the dialog template as a standard syntactic information item of different natural languages for users to choose.
  • the vocabulary information communication principle designed by the present invention is shown in FIG. 3, and a basic concept set is determined through statistical analysis and synonym merging of vocabulary usage frequencies in large languages.
  • a basic concept set is determined through statistical analysis and synonym merging of vocabulary usage frequencies in large languages.
  • the verb meaning term of the English vocabulary orphan is defined as the basic concept, while there is no corresponding word in Chinese, it is described as "orphan".
  • synonyms that are basic concepts of various natural languages are used as synonyms. Since it is not possible to find all the corresponding concepts of a vocabulary in a natural language, it is impossible to find the corresponding concepts in other natural languages.
  • the lexical use probability is used as a redundancy standard for lexical concepts. Preference is given to vocabularies used in multiple languages, followed by the probability of using a natural language. High vocabulary. Vocabularies that do not meet the above two conditions are treated as redundant concepts, and blank information items are provided in the holographic dialog template accordingly.
  • the concept definitions that have been processed through contract restrictions are provided as vocabulary alternatives in the holographic template for different natural language users to choose from, so as to ensure that the natural language vocabulary concept information can be interchanged equivalently.
  • the invention also sets a unified encoding for the corresponding vocabulary concepts in different natural languages, so as to facilitate information transmission on the network.
  • the dialog template of the present invention is designed to be open based on the basic principle of contract restrictions, that is, when a certain When the original natural language symbol is not included in the machine translation system, the original user can call the natural language symbol of the limited information item that the system has already included to describe it semantically.
  • the method for compulsorily restricting a variety of natural language concept systems of the present invention is essentially different from the traditional intermediate language method.
  • the traditional intermediate language technology faces a completely unrestricted natural language system, Multilingual translation is achieved through the establishment of an intermediate concept system between multiple natural languages, but the openness of various natural language concept systems makes it impossible for the intermediate language system to have continuity; the mandatory method of restricting covenants is through man-machine Dialogue methods make necessary restrictions and cohesion on vocabulary and meaning items, and the differences and openness between various natural language concept systems Make reasonable restrictions to ensure that the vocabulary and syntactic concepts of multiple natural languages can be interchanged successfully.
  • FIG. 2 shows the alternative restricted semantic information items provided by the man-machine dialogue template to the original user corresponding to each language symbol of the original.
  • the process of solving the original semantic information is the process of selecting, confirming, and supplementing these alternative information items in the human-computer dialogue template.
  • vocabulary information items In the selection of vocabulary information items, we must make full use of the advantages of man-machine complementarity.
  • the basic principles followed by computer automatic optimization are: Through a large-scale statistical analysis of real text, the frequency order of vocabulary information items of polysemous words is ranked to reduce Search scope of user options; Through large-scale statistical analysis of real texts, vocabulary information items are optimized according to the correlation characteristics between syntactic information items and lexical information items to further narrow the selection of information items.
  • Vocabulary is preferably its nouns, such as "" and "telescope” in Figure 2.
  • vocabulary information items such as Chinese are so beautiful Duohua ", where” ⁇ "is a polysemous word, and the most likely meaning solution of the word” good "before the adjective” beautiful “is the degree adverb” very "; for a text symbol that expresses part-of-speech information explicitly,
  • the selected vocabulary information items can be derived to narrow the selection of information items, such as "spring” in English Although the root of the word is ambiguous, the past tense of the verb "spmng" has clearly limited the choice of meaning items.
  • the syntactic information generally includes part-of-speech information, syntactic component information, and higher semantic (case) information, among which the syntactic component Information is the only syntactic organization system with complete organization capabilities and universal commonality. Therefore, as long as the information items of syntactic components are determined, the semantic aggregation relationship of a natural language symbol string has actually been determined. In the selection of syntactic information items, it is also necessary to make full use of the complementary advantages of human and machine. The basic principles that it follows are: to obtain the word order, part of speech, higher semantic (lattice) information and syntactic information through large-scale statistical analysis of real text. To automatically select syntactic information items.
  • the word order of a vocabulary is 1, the part of speech is a noun, and the higher-level semantics is the subject of the behavior, it can be determined as the subject; the user finally determines the syntactic component information item through the option operation.
  • vocabulary information items and syntactic information items on the template By selecting vocabulary information items and syntactic information items on the template in a man-machine dialogue mode, the actual semantic information of the original text is solved.
  • the user selects the vocabulary information items and syntactic information items actually carried by each natural language symbol string directly on the holographic dialog template, which is the simplest way of human-computer dialogue.
  • the specific method may be to process the identified items in bold, such as Shown in Figure 1.
  • the syntax can be linearly arranged as shown in Figure 5 in actual operation.
  • the component information items are transformed into spatial positioning expressions to assist in the selection of human-computer dialogues for syntactic component information items.
  • a syntactic information dialog frame is made, and the user can "with a telescope" in the frame. Select the modified object.
  • the template partial display method and template virtual method can also be used.
  • the syntactic information is fully displayed (? In the figure indicates that the user can select it again :) and "I shown in Figure 4b. see a boy with a telescope near the bank ".
  • Those skilled in the art should understand that the dialogue during the man-machine dialogue There are many methods for displaying information, and they are not limited to the examples in this specification.
  • the method of the present invention has the necessary information for automatic conversion to a variety of natural language expression forms by restricting the systematic communication of grammatical concepts and common concepts, and performing full selection of human-computer complementary information within the scope of restricted information items.
  • a syntax component that is omitted by the user.
  • Logically speaking as long as all the information items of existing text symbols are determined, most of the omitted parts can be automatically added by the user according to the context when reading the information (such as subject and object omitted).
  • the non-omitted sentence components must also be enhanced by holographic dialog templates to ensure the quality of machine translation (such as the subject and object have been selected in the alternative information items of a sentence, they cannot be omitted).
  • Related verbs such as the subject and object have been selected in the alternative information items of a sentence, they cannot be omitted.
  • the holographic intermediate live translation results are provided to the target language user for direct query along with the translation, which can achieve the complete resolution of the new target language ambiguity. If the user intentionally retains the ambiguity or ambiguity of the language expression, he can make multiple simultaneous selections when selecting the information item.
  • the flowchart illustrates the basic process of human-computer interaction information processing in the open holographic template-type human-machine dialogue language translation method of the present invention.
  • the middle columns 11 to 17 are the main flow of the computer of the translation system.
  • Columns 21 to 26 show the user's participation process, and columns 31 to 35 on the right show the relationship with the internal database and rule base during human-computer interaction.
  • One-way arrows indicate the direction of human-computer interaction, and two-way arrows indicate the language translation.
  • the marked N indicates that the system information processing requires human-computer interaction
  • the marked Y indicates the next operation step of automatically entering the system flow.
  • # # # # # # Indicates the information processing of this translation system and the Internet system interface. Above it is the original client and below it is the translation client.
  • the process starts, and steps 11 are executed, and the natural language symbols to be translated are input by the original user in sequence.
  • step 12 of the main program of the system performs a search of lexical spare information items for each natural language symbol in the extensible multilingual corresponding lexical information item symbol library 31, when When the search is not available, the semantics of natural language symbols can be described by the original user on the template using the system's already acquired semantic symbols through step 21.
  • the above process finally generates the concept definition items, semantic attribute items, tense items, For vocabulary spare information items composed of voice items, etc., if the concept definition information item is blank under a natural language symbol, such as "?" At the symbol "bank", the original user can use the information item that has been provided in the system.
  • Lexical symbols describe them semantically, that is, the concept definition item "institution for keeping or lending money ⁇ in the template; step 13 of the main program of the system, according to the rules in the lexical information item probabilistic selection rule 32, are included in the template by the computer pair Multiple vocabulary spare information items of each natural language symbol in the automatic optimization, such as those specified in bold in the template
  • the information items can be selected and confirmed by the original user for the semantic information items that have not been determined and preferred in step 22;
  • step 14 of the system main program automatically labels the rule base 33 by calling the syntactic component information items, and
  • the syntactic information items of natural language symbols are automatically labeled, and the above process finally generates syntactic component items, part-of-speech items, and higher-level "lattice" items in the template.
  • step 15 of the main program of the system the syntax component information item automatic selection rule base 34 is called.
  • the syntactic component information items of each natural language symbol are automatically optimized.
  • the three-dimensional structure model library 23 of the syntax information items can be called through step 24, and the original user selects and confirms the syntax information items that have not obtained the only preferred result on the template, such as a template.
  • the information items specified in boldface; the main program of the system can now pass the identified information items on the network in a self-defined encoding form.
  • the dialogue template includes all the information items that natural language symbols can carry. All of its spare information items include not only the definition of natural language symbols, tense information, voice information, syntactic information, higher "lattice" information, part-of-speech information, Singular and plural information, masculine positive information, and other information that can be manually designed and labeled can be expanded under the open template.
  • the system program also automatically counts the frequency of its use. When the frequency of use reaches a certain level, that is, the natural language of all languages in the translation system. Simultaneously add new natural language symbols or new information items in the symbol library. For example, when the use frequency of the solution bank reaches a certain level, the system adds a new symbol to the French natural language symbol library.
  • step 16 of the main program of the translation client system the translation automatic translation generation rule base 35 is called, and according to the multilingual symbol and ordinal conversion rules, the solution results of the information items confirmed by the original user are automatically converted into the translation user requirements.
  • the natural language translation of the translation is shown in Figure 7.
  • the Chinese translation generates the result "I saw a boy with a telescope near the bank"; the system main program will ask the user if the translation is unambiguous in step 17; if there is any ambiguity, the translation user
  • the query range of the related information items can be determined through the human-computer interaction process, during which the multilingual corresponding information item symbol library 25 can be called, such as whether the translation user modifies the subject or object in order to solve "with a telescope", as shown in FIG. 7 ?
  • the quality of semantic information transmission is the fundamental obstacle for machine translation technology to win the huge international market in the era of global network information.
  • human-machine dialogue is inevitable.
  • the translation scheme with complementary advantages of human-machine dialogue can effectively improve translation Quality has practical value. Because this method has the advantages of accurate semantic information transmission, no restriction of locale environment, convenient operation by users, simultaneous conversion and generation of multiple target languages, multilingual generalization of dialogue schemes, and simple and reliable technical means, it will be used in the field of network information exchange. It has universal application value and will also have a broad market in online machine translation services.
  • the present invention also provides a holographic semantic tagging system, which includes: Necessary semantic information base, which contains the basic vocabulary and its conceptual definitions and syntactic information items;
  • a text input device for inputting text to be semantically annotated
  • a text storage device for storing text input through a text input device
  • a text display device for displaying a certain text stored in the text storage device; a sentence selection device for selecting a certain sentence in the text displayed by the text display device;
  • Automatic sentence structure analysis device for automatically analyzing the structure of a selected sentence according to statistical experience
  • a semantic annotation template display device is used to display a semantic annotation template.
  • the semantic annotation template is displayed corresponding to the selected sentence when a sentence is selected by the sentence selection device, and includes a vocabulary corresponding to each word in the sentence.
  • the information element item and the syntactic information element item, the lexical information element item displays the corresponding vocabulary's concept definition and all synonyms included in the necessary semantic information base, and each syntactic information element item is analyzed by the automatic analysis device according to the sentence structure Results of displaying all possible syntactic information items of the corresponding vocabulary, where each syntactic information item is stored in the necessary semantic information base;
  • Semantic labeling device used for selecting concept definitions and synonyms and syntactic information item items in each lexical information element item in the semantic labeling template; labeling text storage device for storing the labels with labels The text of the message;
  • a labeling instruction device configured to instruct a certain sentence in the text displayed by the text display device to display its labeling
  • the annotation display device is configured to display the annotation information corresponding to the commanded sentence stored in the annotation text storage device in the form of the annotation template.
  • One application of the holographic semantic labeling system of the present invention is a homologous holographic semantic labeling system. Taking the legal industry as an example, there are many types of laws, and corresponding knowledge bases need to be established. Developing expert systems has a wide range of applications. One of the common application requirements is that ordinary users Semantic understanding and identification of legal provisions.
  • the same-language semantic tagging technology is not only applicable to the development of various expert-level knowledge systems, but also has universal practical value for improving the accuracy of semantic interpretation of legal interpretation, contract content, and technical description documents.
  • One application of the holographic semantic tagging system of the present invention is a foreign language holographic language teaching system.
  • Computer-assisted instruction has been widely used.
  • the application in the field of foreign language teaching mainly uses multimedia teaching methods (listening, speaking, reading, and writing in parallel) and examination question bank teaching.
  • the language holographic template provides a computer-assisted instruction method for foreign language teaching that systematically reflects the commonalities and symbolic personality of different language concepts.
  • the holographic template can call up all corresponding vocabularies in multiple languages.
  • holographic teaching can use the interface technology and internal conversion rules of the holographic translation system to provide step-by-step process of symbol deformation and order transformation in any language.
  • the holographic template can both provide holographic semantic annotations in foreign languages, or directly convert the holographic semantic annotations into the mother tongue.

Abstract

An opening and holographic template type of language translation method having man-machine dialogue function, includes: creating a natural language restricted dialogue template, wherein it contains all necessary semantic information elements in all of the natural languages; determining vocabulary information items and syntax information items which are actually included in symbols of the natural language through check all type man-machine dialogue on the template; implementing original text solution; generating translation based on the solution; and converting the solution to translation symbols for query of translation syntax. The method makes syntax analysis without depending on the language environment of context and fully utilizes complementary man-machine advantages; this method can be used for eliminating the drawback of syntax information transferring in global network communication.

Description

开放式全息模板式人机对话语言翻译方法及全息语义标注系统 技术领域  Open holographic template human-machine dialogue language translation method and holographic semantic tagging system
本发明涉及一种计算机翻译方法, 更确切地说是涉及一种在计算机 网络中适于各网络终端以不同自然语言进行信息传递交流的机器翻译 方法。 背景技术  The present invention relates to a computer translation method, and more particularly to a machine translation method suitable for each network terminal in a computer network to perform information transfer and communication in different natural languages. Background technique
计算机网络技术以其四通八达、 无处不到的优势而迅速开创出一个 全球化的网络信息时代。但由于不同自然语言之间语义信息的传递交流 障碍, 已明显制约了网络及网络信息的使用效率, 如何通过机器翻译处 理使各网络终端用户仅使用自己的自然语言在网络上进行语义信息传 递, 对于节省网络空间、 提高网络信息的传递效率和实现网络信息资源 的大众化国际共享, 都无疑具有重要的现实意义和很高的商业价值。  Computer network technology has rapidly created a global era of network information with its advantages of extending in all directions and everywhere. However, due to the obstacles to the transmission and exchange of semantic information between different natural languages, the efficiency of the use of the network and network information has been significantly restricted. How to use machine translation processing to make each network end user use only his own natural language to transmit semantic information on the network. It is undoubtedly of great practical significance and high commercial value for saving network space, improving the transmission efficiency of network information, and realizing the popular international sharing of network information resources.
目前在机器翻译领域, 一方面由人工智能教科书上系统介绍的机器 翻译方法在实际的产品开发中很少被使用, 另一方面, 在已开发出的机 器翻译系统中所应用的机器翻译方法又不能达到预期的目标,上述现象 说明: 基础理论研究严重滞后; 所采用的机器翻译技术方法具有普遍共 性的缺陷; 预期目标本身不具有现实性。 进入 90年代以来, 出现了大 致两类新兴的机器翻译方法并逐渐成为自然语言信息处理的技术主 流。'一种是以对大规模真实文本的统计分析为基本手段建设语料库, 另 一种是人机对话及自然语言受限的机器翻译方法。  At present, in the field of machine translation, on the one hand, the machine translation methods introduced systematically by artificial intelligence textbooks are rarely used in actual product development. On the other hand, the machine translation methods applied in the developed machine translation systems are The above-mentioned phenomena cannot be achieved, which indicates that: the basic theoretical research is seriously lagging behind; the machine translation techniques and methods used have common defects; the expected goals are not realistic. Since the 1990s, there have been two emerging types of machine translation methods and they have gradually become the main stream of natural language information processing technology. 'One is to build a corpus based on the statistical analysis of large-scale real texts, and the other is a human-machine dialogue and natural language-limited machine translation method.
大规模真实文本的统计分析是通过对大规模真实文本进行符号、 句 型、 词性、 语义等多角度的信息取样分析, 从而为任何一种自然语言中 的符号串提供多种匹配模式, 因而是一种基于经验的语言信息处理方 法。 从方法论上说, 用这种自然语言信息处理方法可以将源语的多种匹 配分析结果作叠加处理,并通过与译出目标语的多种匹配分析结果建立 匹配关系, 而直接完成自然语言的自动翻译, 但现实状况是, 自然语言 系统具有随机开放特性, 任何统计方法都只能提供概率性知识, 不可能 对自然语言词汇及其概念定义进行准入限制,不可能确定各种省略表达 部分的确切内容, 也不可能解决生成目标语后的新增歧义。 因此, 大规 模真实文本的统计分析虽然对于利用计算机进行各种自然语言信息处 理来说确是有意义的基础工作, 但对于机器翻译来说, 这种技术手段还 需要组合在一种全面有效的对象处理系统方法中才能充分实现其应用 价值。 The statistical analysis of large-scale real texts is based on multi-angle information sampling analysis of large-scale real texts, such as symbols, sentence patterns, parts of speech, and semantics, so as to provide multiple matching modes for symbol strings in any natural language. A language information processor based on experience Law. Methodologically, this natural language information processing method can superimpose multiple matching analysis results of the source language and establish a matching relationship with the multiple matching analysis results of the target language to complete the natural language directly. Automatic translation, but the reality is that natural language systems have random and open characteristics. Any statistical method can only provide probabilistic knowledge. It is impossible to restrict access to natural language vocabulary and its conceptual definitions, and it is impossible to determine various omitted expressions. It is also impossible to resolve the new ambiguities after generating the target language. Therefore, although the statistical analysis of large-scale real texts is indeed a meaningful basic work for the use of computers for various natural language information processing, for machine translation, this technical means needs to be combined in a comprehensive and effective The object processing system method can fully realize its application value.
人机对话及自然语言受限的机器翻译方法是由用户在输入端调整 机器翻译词典和调整源语言表达方式, 同时调整译文结果, 该方法虽可 获得较好的机器翻译质量,但要求用户熟练掌握机器翻译的源语和目标 语并需付出相当高的人机对话学习成本及操作成本,与人工翻译不相上 下。 本发明目的  Human-machine dialogue and natural language-restricted machine translation methods involve the user adjusting the machine translation dictionary and adjusting the source language expression at the input end, while adjusting the translation results. Although this method can obtain better machine translation quality, it requires users to be proficient Mastering the source and target languages of machine translation requires a relatively high learning and operating cost of human-machine dialogue, which is comparable to human translation. Purpose of the invention
本发明的目的是设计一种开放式全息模板式人机对话机器翻译方 法, 以全面解决计算机网络多语种信息传递交流障碍问题, 试图取得机 器翻译技术的实质性突破。 这种突破必须满足以下要求:  The object of the present invention is to design an open holographic template human-machine dialogue machine translation method to comprehensively solve the problem of obstacles to multilingual information transmission and communication in computer networks, and try to achieve a substantial breakthrough in machine translation technology. This breakthrough must meet the following requirements:
1 . 对自然语言普通词汇及其概念定义进行有效的准入限制; 1. Effective access restrictions on natural language common words and their definitions;
2. 不依赖上下文语境进行语义分析; 2. Do not rely on context for semantic analysis;
3 . 通过直译手段实现语义信息准确传递;  3. Realize accurate transmission of semantic information through literal translation;
4. 找到生成目标语后的新增歧义解决办法;  4. Find the new ambiguity solution after generating the target language;
5 . 用户只需熟练掌握母语; 6. 利用大规模真实文本统计分析的手段与成果, 充分实现人机优 势互补; 5. Users only need to be proficient in their mother tongue; 6. Utilize the means and results of large-scale statistical analysis of real texts to fully realize the complementary advantages of human and machine;
7. 满足向多种目标语言转换的需要。  7. Meet the need for conversion to multiple target languages.
本发明的另一个目的是提出一种全息语义标注系统, 利用其可对一 文本进行全息语义标注, 并将标注信息与文本一起存储。 当需要时可将 标注信息与文本一起调出。 本发明概述  Another object of the present invention is to propose a holographic semantic annotation system, which can be used to make holographic semantic annotation on a text, and store the annotation information together with the text. Recall the callout information along with the text when needed. Summary of the invention
根据本发明的一个方面提出一种开放式全息模板式人机对话语言 翻译方法, 包括下述步骤:  According to an aspect of the present invention, an open holographic template-type human-machine dialogue language translation method is provided, including the following steps:
a.对各种自然语言进行通约限制;  a. General restrictions on various natural languages;
b.建立一个以句子为对象的包括各种自然语言的必要语义信息要素 的人机对话模板;  b. Establish a human-machine dialogue template that takes sentences as objects and includes necessary semantic information elements of various natural languages;
c.由人机对话模板提供与原文语言符号对应的经通约限制的所有备 选语义信息项及供用户扩展的空白信息项;  c. The man-machine dialogue template provides all alternative semantic information items corresponding to the original language symbols subject to contract restrictions and blank information items for user expansion;
d.先由翻译系统的计算机对经通约限制的所有备选语义信息顶进行 自动优选,再由原文用户在人机对话模板上对优选结果进行人工调整和 确认;  d. The computer of the translation system first automatically optimizes all the alternative semantic information restricted by the contract, and then the original user manually adjusts and confirms the preferred result on the human-computer dialogue template;
e.由翻译系统根据人机互补确定的语义信息项生成译文, 并将所述 人机互补确定的语义信息项转换成译文符号,连同译文提供给译文用户 查询。  e. The translation system generates a translation according to the semantic information items determined by the human-machine complementarity, and converts the semantic information items determined by the human-machine complementarity into a translation symbol, and provides the translation user with the translation for query.
根据本发明的另一个方面提出一种全息语义标注系统, 包括: 必要语义信息库, 其内存有基本词汇及其概念定义以及句法信息 项;  According to another aspect of the present invention, a holographic semantic tagging system is provided, which includes: a necessary semantic information library, which stores basic vocabulary and its conceptual definitions and syntactic information items;
文本输入装置, 用于输入待对其进行语义标注的文本; 文本存储装置, 用于存储通过文本输入装置输入的文本; 文本显示装置, 用于显示存储在文本存储装置中的某一文本; 句子选择装置, 用于选择文本显示装置所显示的文本中的某一句 子; A text input device for inputting text to be semantically annotated; A text storage device is used to store text input through a text input device; a text display device is used to display a certain text stored in the text storage device; a sentence selection device is used to select a certain one of the text displayed by the text display device. A sentence
句子结构自动分析装置, 用于根据统计经验自动分析所选句子的结 构;  Automatic sentence structure analysis device, for automatically analyzing the structure of a selected sentence according to statistical experience;
语义标注模板显示装置, 用于显示一语义标注模板, 该语义标注模 板在通过句子选择装置选择了一个句子时对应于所选择的句子被显 示,其内包括有对应于该句子中各词汇的词汇信息要素项和句法信息要 素项,词汇信息要素项中显示有对应词汇的在必要语义信息库中所包含 的概念定义和所有同义词,而各句法信息要素项则根据所述句子结构自 动分析装置分析的结果, 显示对应词汇的所有可能的句法信息项, 该各 句法信息项存储于所述必要语义信息库中;  A semantic annotation template display device is used to display a semantic annotation template. The semantic annotation template is displayed corresponding to the selected sentence when a sentence is selected by the sentence selection device, and includes a vocabulary corresponding to each word in the sentence. The information element item and the syntactic information element item, the lexical information element item displays the corresponding vocabulary's concept definition and all synonyms included in the necessary semantic information base, and each syntactic information element item is analyzed by the automatic analysis device according to the sentence structure Results of displaying all possible syntactic information items of the corresponding vocabulary, where each syntactic information item is stored in the necessary semantic information base;
语义标注装置, 用于供人对语义标注模板中的各词汇信息要素项中 的概念定义及同义词和各句法信息要素项中的句法信息项进行选择; 标注文本存储装置, 用于存储带有标注信息的文本;  Semantic labeling device, used for selecting concept definitions and synonyms and syntactic information item items in each lexical information element item in the semantic labeling template; labeling text storage device for storing the labels with labels The text of the message;
标注指令装置, 用于指令对文本显示装置显示的文本中的某一句子 显示其标注;  A labeling instruction device, configured to instruct a certain sentence in the text displayed by the text display device to display its labeling;
标注显示装置, 用于以所述标注模板的形式显示存储在标注文本存 储装置中的对应于被指令句子的标注信息。  The annotation display device is configured to display, in the form of the annotation template, annotation information corresponding to the commanded sentence stored in the annotation text storage device.
工业实用性 Industrial applicability
本发明开放式全息模板式人机对话机器翻译方法的技术特点是: 人 机对话的基本点是由用户对模板信息直接进行选择,对用户而言只需掌 握母语, 基本无学习成本; 本方法是在充分考虑计算机对信息处理的实 际边界能力并以语义信息传递的准确性为中心任务及实际目标而作出 的; 本方法充分利用了人机优势互补, 翻译内容不受语言环境和应用领 域限制; 本方法通过建立统一受限标准和全息全选全程的人机对话, 提 供了一揽子解决机器翻译基本技术障碍的系统方案,为根本改善机器翻 译质量提供了全方位的技术保证;本方法可充分利用大规模语料库建设 的成果, 对自然语言的处理方法简洁实用, 具有良好的可实施性; 虽然 在源语信息求解阶段, 用户看不懂的语言不可能进行人机对话, 但可在 保证翻译质量的前提下实现一种语言输入得到多语种译出结果。 The technical features of the open holographic template-based human-machine dialogue machine translation method of the present invention are: The basic point of human-machine dialogue is that the user directly selects template information, for the user, only the mother tongue needs to be mastered, and there is basically no learning cost; It is made with full consideration of the computer's actual boundary ability for information processing, and the accuracy of semantic information transmission as the central task and practical goal. This method makes full use of the complementary advantages of human and machine, and the translation content is not affected by the language environment and application. Domain limitation; This method provides a comprehensive system solution to solve the basic technical obstacles of machine translation by establishing a unified limited standard and a full-selection human-machine dialogue, which provides a comprehensive technical guarantee for fundamentally improving the quality of machine translation; this method Can make full use of the results of large-scale corpus construction, the natural language processing method is concise and practical, and has good implementability; although in the source language information solution stage, the language that the user does not understand is impossible to conduct human-computer dialogue, but can be used in Under the premise of ensuring translation quality, multilingual translation results can be obtained by implementing one language input.
本发明的开放式全息模板式人机对话语言翻译方法在网络信息交 流领域具有普遍应用的价值,在打开网络在线机器翻译服务方面有广阔 的国际市场。  The open holographic template human-machine dialogue language translation method of the present invention has universal application value in the field of network information exchange, and has a broad international market in opening online online machine translation services.
本发明的全息语义标注系统可以将一文本的词汇释义及语法结构 信息与文本同时存储, 并在需要时显示这些标注信息。 该系统可广泛地 用于法律文件的释义及语言教学等方面。 附图的简要说明  The holographic semantic annotation system of the present invention can store the lexical interpretation and grammatical structure information of a text with the text at the same time, and display these annotation information when needed. This system can be widely used in the interpretation of legal documents and language teaching. Brief description of the drawings
图 1是以句子为对象的自然语言全息对话模板结构示意图; 图 2示出了以一个英文句子为对象的全息对话模板的内容; 图 3是不同自然语言之间词汇信息通约限制结构示意图; 图 4a、 图 4b是人机对话过程中的两种对话信息显示方法示意图; 图 5是句法成分信息的空间定位结构示意图;  Figure 1 is a schematic diagram of the structure of a natural language holographic dialogue template with a sentence as the object; Figure 2 shows the content of a holographic dialogue template with an English sentence as the object; Figure 3 is a schematic diagram of the vocabulary information communication restriction structure between different natural languages; Fig. 4a and Fig. 4b are schematic diagrams of two methods for displaying dialogue information in the process of human-machine dialogue; Fig. 5 is a schematic diagram of the spatial positioning structure of syntactic component information;
图 6是根据本发明的方法对一个英文句子进行翻译时的人机交互信 息处理过程。  Fig. 6 is a process of man-machine interaction information when an English sentence is translated according to the method of the present invention.
图 7是译文用户查询自然语言符号 "with a telescope " 实际携带的 句法信息项的示意图; 本发明的最佳实施方式 下面结合一个将英文句子翻译成中文的实例说明本发明的开放式 全息模板式人机对话语言翻译方法的原理和实施过程。所采用的例句为FIG. 7 is a schematic diagram of a syntactic information item actually carried by a translation user querying a natural language symbol "with a telescope"; a preferred embodiment of the present invention The principle and implementation process of the open holographic template human-machine dialogue language translation method of the present invention is explained below with an example of translating English sentences into Chinese. The example sentence used is
"I saw a boy with a telescope near the bank. " ( "我在银行附近看见一 个带望远镜的男孩。 " ) "I saw a boy with a telescope near the bank." ("I saw a boy with a telescope near the bank.")
该例句中包含多个语言符号, 这里所说的语言符号既可以是单词, 也可以是短语。 每个语言符号均携带了一定的语义信息, 其中包括语言 符号的概念定义、 时态、 语态以及该语言符号在句子中的成分等多种类 另 lj。 例如, 单词 " saw" 的概念定义是 "看见" , 时态为过去时, 语态 为主动语态, 在句子中的成分是谓语。 但是由于自然语言的复杂多样 性, 语言符号可能携带不止一个同类语义信息, 例如上述单词 " saw" 的概念除了可以定义为 "看见" , 还可以定义为 "理解与认识" , 又如 短语 "with a telescope" 的句法成分既可以是谓词修饰语, 也可以是宾 词修饰语。  The example sentence contains multiple language symbols. The language symbols mentioned here can be either words or phrases. Each language symbol carries a certain amount of semantic information, including the concept definition of the language symbol, the tense, the voice, and the composition of the language symbol in the sentence. For example, the conceptual definition of the word "saw" is "see", the tense is the past tense, the voice is the active voice, and the component in the sentence is a predicate. However, due to the complex and diverse nature of natural language, linguistic symbols may carry more than one kind of semantic information. For example, the concept of the word "saw" can be defined as "seeing" and "understanding and understanding" as well as the phrase "with The syntactic component of "a telescope" can be either a predicate modifier or an object modifier.
本发明人认为自然语言翻译的根本任务就是将原文语言符号所携 带的实际语义信息准确地传递给不同语言的用户。 为此, 本发明采取的 手段是在原文用户方以人机交互的方式对原文的所有语义信息项进行 求解, 根据求解的结果生成译文, 并将该求解结果转换为译文符号, 随 译文提供给译文用户以便查询,从而实现原文用户和译文用户共同参与 的全程翻译, 提高语义信息的传递质量。  The inventor believes that the fundamental task of natural language translation is to accurately transfer the actual semantic information carried by the original language symbols to users in different languages. To this end, the method adopted by the present invention is to solve all semantic information items of the original text in a human-computer interaction manner on the original user side, generate a translation according to the result of the solution, and convert the solution result into a translation symbol, which is provided to the translation. The target users can be searched in order to realize the full translation with the original user and target users participating together, and improve the quality of semantic information transmission.
为了对原文的语义信息进行求解,本发明建立了一个如图 1所示的 以句子为对象的自然语言全息对话模板, 所谓 "全息"是指在这个模板 中包括各种自然语言文字符号系统所有必要的语义信息要素,包括属于 词汇信息要素的概念定义项、 时态信息项及语态信息项, 和属于句法信 息要素的句法成分项。对话模板用来向原文用户提供与原文各语言符号 相应的备选语义信息项以供人机交互选择。 这些对话信息项的内容, 如 后面将要说明的, 必须受到系统的限制。 该对话模板中还包括一些非用 户必选信息项, 如语义属性、 语法属性、 上位语义 (格) 等, 这些信息 项可以不由用户进行选择, 只由计算机进行概率性自动求解, 以便为自 动转换生成译文提供相关信息。 In order to solve the semantic information of the original text, the present invention establishes a natural language holographic dialogue template with a sentence as an object as shown in FIG. 1. The so-called "hologram" refers to the inclusion of all natural language characters in this template. The necessary semantic information elements include conceptual definition items, tense information items and voice information items belonging to lexical information elements, and syntactic component items belonging to syntactic information elements. The dialog template is used to provide the original text user with alternative semantic information items corresponding to the original language symbols for human-computer interaction selection. The content of these dialog information items, such as What will be explained later must be limited by the system. The dialog template also includes some non-required information items, such as semantic attributes, grammatical attributes, and higher-level semantics (lattice). These information items can not be selected by the user, and only a probabilistic automatic solution can be performed by the computer in order to convert automatically. Generate translations to provide relevant information.
为了在不同语言之间准确传递语义信息, 最好采用直译手段, 这是 因为机器翻译系统不可能随机调整目标语句子的词汇和句型。但由于各 种自然语言的概念体系和句法体系之间存在差异,要想保证直译的译文 质量,必须保证词汇信息项和句法信息项能在源语与目标语间作等价交 换。因此本发明对不同自然语言间的差异通过建立系统的通约限制原则 进行统一整合处理。这种通约限制原则包括句法信息通约和词汇信息通 约。  In order to accurately convey semantic information between different languages, it is best to use literal translation. This is because machine translation systems cannot randomly adjust the vocabulary and sentence pattern of the target sentence. However, due to the differences between the conceptual systems and syntactic systems of various natural languages, to ensure the quality of literal translations, it is necessary to ensure that lexical information items and syntactic information items can be exchanged equivalently between the source language and the target language. Therefore, the present invention carries out a unified and integrated treatment of the differences between different natural languages by establishing a system of the principle of contractual limitation. This principle of restrictive communication includes syntactic information communication and lexical information communication.
本发明设计的句法信息通约原则包括: 统一合并功能同一、 对象不 同一的句法信息;尽量删除在语义聚合关系分析中并非不可缺少的句法 概念, 如英语语法中的直接宾语与间接宾语。 本发明在对话模板上只提 供经简化通约后的句法信息概念,作为不同自然语言的标准句法信息项 供用户选择。  The syntactic principles of syntactic information designed by the present invention include: uniformly merge syntactic information with the same function and different objects; try to delete syntactic concepts that are not indispensable in the analysis of semantic aggregation relations, such as direct objects and indirect objects in English grammar. The present invention provides only the simplified syntactic information concept on the dialog template as a standard syntactic information item of different natural languages for users to choose.
本发明设计的词汇信息通约原则如图 3中所示, 是通过对大语种词 汇使用频率的统计分析和同义归并而确定一个基础概念集。但实际操作 时, 不是每一种自然语言的基础概念都是完整对应的, 当出现对应空缺 时, 则要采用该语言的其他常用词汇对这一概念进行解释性描述, 使各 种语言的基础概念强制性对齐。 如英文词汇 orphan的动词义项被定为 基础概念, 而中文中没有对应词, 则用 "使成为孤儿"进行解释性描述。 另外, 以各种自然语言的基础概念的近义词作为近义附码。 由于一种自 然语言中某个词汇的全部近义概念也不可能在其它自然语言中全部找 到对应概念, 因此在当某种自然语言的近义概念出现对应空缺时则由基 础概念词进行近义替换 (人工翻译中近义替换也是不可避免的)。 经过上 述两项通约处理后仍不能处理的则作为冗余信息在全息对话模板中提 供空白信息项。 本发明在确定不同自然语言词汇的概念定义时, 采用以 内涵为中心的模糊通约 (如中文的 "学校"与英文的" school" :);不考虑 词性差异的概念统一通约 (如不考虑英文词汇 become的所有时态变形) 和对多种语言中都使用的概念作优先考虑的概率通约处理。为了丰富语 言的表达力, 任何语言都需要有同一概念的近义词, 因此以词汇的使用 概率作为词汇概念冗余标准, 优先选择多种语言中都使用的词汇, 其次 是在一种自然语言使用概率高的词汇。对于不满足上述两种情况的词汇 则作为冗余概念处理, 相应地在全息对话模板中提供空白信息项。 经过 通约限制处理后的概念定义才作为全息模板中的词汇备选项提供给不 同自然语言用户进行选择, 以保证不同自然语言词汇概念信息间能够等 价互换。本发明还为不同自然语言中相对应的词汇概念设定了统一的编 码, 以便于在网络上进行信息传递。 The vocabulary information communication principle designed by the present invention is shown in FIG. 3, and a basic concept set is determined through statistical analysis and synonym merging of vocabulary usage frequencies in large languages. However, in practice, not every basic concept of natural language is completely corresponding. When corresponding vacancies occur, other common words in the language should be used to explain this concept to explain the basics of various languages. Conceptual mandatory alignment. For example, the verb meaning term of the English vocabulary orphan is defined as the basic concept, while there is no corresponding word in Chinese, it is described as "orphan". In addition, synonyms that are basic concepts of various natural languages are used as synonyms. Since it is not possible to find all the corresponding concepts of a vocabulary in a natural language, it is impossible to find the corresponding concepts in other natural languages. Basic concept words are replaced with synonyms (synonym replacement is also inevitable in human translation). If the two items cannot be processed after the two contract processing, blank information items are provided in the holographic dialog template as redundant information. In determining the conceptual definitions of different natural language vocabularies, the present invention adopts connotative-focused vague conventions (such as "school" in Chinese and "school" in English :); concepts that do not consider differences in part-of-speech are unified (such as not Consider all tense variants of the English vocabulary (become) and probabilistic contracting that prioritizes concepts used in multiple languages. In order to enrich the expressiveness of a language, any language needs to have synonyms of the same concept. Therefore, the lexical use probability is used as a redundancy standard for lexical concepts. Preference is given to vocabularies used in multiple languages, followed by the probability of using a natural language. High vocabulary. Vocabularies that do not meet the above two conditions are treated as redundant concepts, and blank information items are provided in the holographic dialog template accordingly. The concept definitions that have been processed through contract restrictions are provided as vocabulary alternatives in the holographic template for different natural language users to choose from, so as to ensure that the natural language vocabulary concept information can be interchanged equivalently. The invention also sets a unified encoding for the corresponding vocabulary concepts in different natural languages, so as to facilitate information transmission on the network.
另一方面, 为了能对系统未收入的自然语言符号进行处理, 使得人 机交互的方式更加灵活,在通约限制的基本原则下将本发明的对话模板 设计成开放式的, 即当某个原文自然语言符号未被收入在机器翻译系统 中时, 原文用户可以调用系统己收入的、 已限定信息项的自然语言符号 对其进行语义描述。  On the other hand, in order to process the natural language symbols that are not included in the system and make the human-computer interaction more flexible, the dialog template of the present invention is designed to be open based on the basic principle of contract restrictions, that is, when a certain When the original natural language symbol is not included in the machine translation system, the original user can call the natural language symbol of the limited information item that the system has already included to describe it semantically.
本发明的对多种自然语言概念系统进行强制性通约受限的方法, 与 传统的中间语言方法间有着本质区别. ·传统的中间语言技术面对的是完 全不受限的自然语言系统,通过建立多种自然语言间的中间概念体系来 实现多语互译,但各种自然语言概念体系的开放性使中间语言体系不可 能具有周延性; 强制性的通约受限方法是通过人机对话方式对词汇及义 项作必要的限制和通约,对各种自然语言概念体系之间的差异和开放性 进行合理限制, 以保证多种自然语言的词汇概念及句法概念能成功地进 行等价互换。 The method for compulsorily restricting a variety of natural language concept systems of the present invention is essentially different from the traditional intermediate language method. The traditional intermediate language technology faces a completely unrestricted natural language system, Multilingual translation is achieved through the establishment of an intermediate concept system between multiple natural languages, but the openness of various natural language concept systems makes it impossible for the intermediate language system to have continuity; the mandatory method of restricting covenants is through man-machine Dialogue methods make necessary restrictions and cohesion on vocabulary and meaning items, and the differences and openness between various natural language concept systems Make reasonable restrictions to ensure that the vocabulary and syntactic concepts of multiple natural languages can be interchanged successfully.
现在再参见图 2, 继续说明原文用户方对原文的语义信息进行求解 的方法。该图中示出了人机对话模板向原文用户提供的与原文各语言符 号相应的经通约限制的备选语义信息项。对原文的语义信息进行求解的 过程也就是对人机对话模板中的这些备选信息项的选择、确认和补充的 过程。  Now refer to FIG. 2 again, and continue to explain the method for the original user to solve the semantic information of the original. The figure shows the alternative restricted semantic information items provided by the man-machine dialogue template to the original user corresponding to each language symbol of the original. The process of solving the original semantic information is the process of selecting, confirming, and supplementing these alternative information items in the human-computer dialogue template.
在对词汇信息项的选择中要充分利用人机优势互补,计算机自动优 选所遵循的基本原则是: 通过大规模的对真实文本的统计分析, 排列出 多义词的词汇信息项使用频率顺序, 以縮小用户选项的搜寻范围; 通过 大规模的对真实文本的统计分析,根据句法信息项与词汇信息项间的相 关性特性来优选词汇信息项, 以进一步缩小信息项选择范围, 例如凡可 做主语的词汇都优选其名词义项, 象图 2中的 " 和 "telescope"等; 通过大规模的对真实文本的统计分析, 获得词汇搭配的概率信息, 进一 步优选词汇信息项, 如汉语 "好漂亮的一朵花" , 其中的 "好"是多义 词, 而在形容词 "漂亮 "前的 "好"字的最可能的义项解是程度副词 "非 常"; 对于显性表达词性信息的文字符号, 通过词性即可推导出所选词 汇信息项来缩小信息项选择范围, 如英语中 " spring" 的词根虽然是多 义的, 但其动词的过去式 " spmng"则已明确限制了义项选择范围。  In the selection of vocabulary information items, we must make full use of the advantages of man-machine complementarity. The basic principles followed by computer automatic optimization are: Through a large-scale statistical analysis of real text, the frequency order of vocabulary information items of polysemous words is ranked to reduce Search scope of user options; Through large-scale statistical analysis of real texts, vocabulary information items are optimized according to the correlation characteristics between syntactic information items and lexical information items to further narrow the selection of information items. Vocabulary is preferably its nouns, such as "" and "telescope" in Figure 2. Through large-scale statistical analysis of real text, the probability information of collocations is obtained, and vocabulary information items such as Chinese are so beautiful Duohua ", where" 好 "is a polysemous word, and the most likely meaning solution of the word" good "before the adjective" beautiful "is the degree adverb" very "; for a text symbol that expresses part-of-speech information explicitly, The selected vocabulary information items can be derived to narrow the selection of information items, such as "spring" in English Although the root of the word is ambiguous, the past tense of the verb "spmng" has clearly limited the choice of meaning items.
通过以上技术手段的自动选项处理, 已能够将用户实际所需的大多 数词汇信息项排在首位, 由于表达语义所需要的词汇信息项已存在于用 户心中, 因此对用户而言, 大多数的词汇信息项选择只是一个对模板中 各首选信息项的确认过程。  Through the automatic option processing of the above technical means, most of the vocabulary information items actually required by the user have been ranked first. Since the lexical information items required to express semantics already exist in the user's mind, for the user, most of the Vocabulary information item selection is just a confirmation process for each preferred information item in the template.
各种自然语言中, 无论是隐性表达还是显性表达的句法信息, 大体 上包括词性信息、 句法成分信息和上位语义 (格) 信息, 其中句法成分 信息是唯一具有完整组织能力的, 并具有普遍共性的句法组织系统, 因 此, 只要确定句法成分信息项, 实际上已经确定了一个自然语言符号串 的语义聚合关系。 在句法信息项的选择中也要充分利用人机优势互补, 其所遵循的基本原则是: 通过大规模的对真实文本的统计分析获得词 序、 词性、 上位语义 (格) 信息与句法信息之间的匹配关系, 以自动优 选句法信息项。 如一个词汇的词序为 1, 词性为名词, 上位语义为行为 主体, 则可判定为主语; 用户通过选项操作最终确定句法成分信息项。 In various natural languages, whether it is implicit or explicit, the syntactic information generally includes part-of-speech information, syntactic component information, and higher semantic (case) information, among which the syntactic component Information is the only syntactic organization system with complete organization capabilities and universal commonality. Therefore, as long as the information items of syntactic components are determined, the semantic aggregation relationship of a natural language symbol string has actually been determined. In the selection of syntactic information items, it is also necessary to make full use of the complementary advantages of human and machine. The basic principles that it follows are: to obtain the word order, part of speech, higher semantic (lattice) information and syntactic information through large-scale statistical analysis of real text. To automatically select syntactic information items. For example, if the word order of a vocabulary is 1, the part of speech is a noun, and the higher-level semantics is the subject of the behavior, it can be determined as the subject; the user finally determines the syntactic component information item through the option operation.
通过以人机对话方式在模板上选定词汇信息项和句法信息项, 求解 原文的实际语义信息。 由用户直接在全息对话模板上选择各自然语言符 号串实际携带的词汇信息项和句法信息项, 是最简单的人机对话方式, 其具体方法可以是对所确定的项进行黑体标注处理, 如图 1中所示。  By selecting vocabulary information items and syntactic information items on the template in a man-machine dialogue mode, the actual semantic information of the original text is solved. The user selects the vocabulary information items and syntactic information items actually carried by each natural language symbol string directly on the holographic dialog template, which is the simplest way of human-computer dialogue. The specific method may be to process the identified items in bold, such as Shown in Figure 1.
通过在全息对话模板中对句子中词汇信息项和句法信息项的人机 互补选择、 确认, 已能够完成自然语言的信息求解任务, 因此不再需要 依赖上下文语境对句子进行语义分析,  Through the human-computer complementary selection and confirmation of lexical information items and syntactic information items in sentences in the holographic dialog template, the information solving task of natural language has been completed, so it is no longer necessary to rely on the context to perform semantic analysis on the sentence.
对于用户来说, 分析和确定抽象的句法关系远比判断多义词信息项 困难, 因此, 为了降低句法成分信息项的选择难度, 实际操作时可象图 5中所示的那样将呈线性排列的句法成分信息项转换成空间定位表达方 式, 协助进行句法成分信息项人机对话的选择。 以句法信息的修饰区、 核心区及补充区为横座标, 以句法信息的主语区、 谓语区及宾语区为纵 座标, 作出句法信息对话框架, 由用户在框架中对 " with a telescope" 的修饰对象进行选择。  For users, analyzing and determining abstract syntactic relationships is far more difficult than judging polysemous information items. Therefore, in order to reduce the difficulty of selecting syntactic component information items, the syntax can be linearly arranged as shown in Figure 5 in actual operation. The component information items are transformed into spatial positioning expressions to assist in the selection of human-computer dialogues for syntactic component information items. Using the modified area, core area and supplementary area of the syntactic information as horizontal coordinates, and the subject area, predicate area and object area of the syntactic information as vertical coordinates, a syntactic information dialog frame is made, and the user can "with a telescope" in the frame. Select the modified object.
在实际的人机对话过程中也可以采用模板部分显示方法和模板虚 拟方法,如图 4a所示的句法信息全显 (图中? 号表示由用户再选择:)和图 4b所示的 " I see a boy with a telescope near the bank"的虚拟对话模板后 的对话显示方法。 本领域的技术人员应该理解, 人机对话过程中的对话 信息显示方法可以有很多种, 而不限于本说明书中的示例。 In the actual human-machine dialogue process, the template partial display method and template virtual method can also be used. As shown in Figure 4a, the syntactic information is fully displayed (? In the figure indicates that the user can select it again :) and "I shown in Figure 4b. see a boy with a telescope near the bank ". Those skilled in the art should understand that the dialogue during the man-machine dialogue There are many methods for displaying information, and they are not limited to the examples in this specification.
本发明的方法通过对语法概念和普通概念的系统通约受限, 以及在 受限信息项范围内进行人机互补信息全选, 已经具有了向多种自然语言 表达形式作自动转换的必要信息, 但总有被用户省略的句法成分, 从逻 辑上说只要确定了已有文字符号的所有信息项,大多数省略部分可由用 户在阅读信息时根据上下文语境自动添加 (如主词、 宾词省略), 但为了 准确传递语义, 对不可省略的句子成分还要通过全息对话模板进行添 力口, 以保证机器翻译质量 (如在一个句子的备选信息项中已经选了主词 和宾词, 则不可省略相关动词)。  The method of the present invention has the necessary information for automatic conversion to a variety of natural language expression forms by restricting the systematic communication of grammatical concepts and common concepts, and performing full selection of human-computer complementary information within the scope of restricted information items. However, there is always a syntax component that is omitted by the user. Logically speaking, as long as all the information items of existing text symbols are determined, most of the omitted parts can be automatically added by the user according to the context when reading the information (such as subject and object omitted). However, in order to accurately convey the semantics, the non-omitted sentence components must also be enhanced by holographic dialog templates to ensure the quality of machine translation (such as the subject and object have been selected in the alternative information items of a sentence, they cannot be omitted). Related verbs).
为了解决生成目标语译文后发现新增歧义的问题, 将经过全息对活 的中间翻译结果随译文提供给目标语用户作直接查询,可实现目标语新 增歧义的全面消解。 如果用户有意保留语言表达的模糊性或双关性, 则 可在选择信息项时作多项同时选择。  In order to solve the problem of finding new ambiguities after generating the target language translation, the holographic intermediate live translation results are provided to the target language user for direct query along with the translation, which can achieve the complete resolution of the new target language ambiguity. If the user intentionally retains the ambiguity or ambiguity of the language expression, he can make multiple simultaneous selections when selecting the information item.
参见图 6, 图中流程说明了本发明的开放式全息模板式人机对话语 言翻译方法中的人机交互信息处理基本过程,其中中间列框 1 1至 17是 翻译系统计算机的主流程, 左边列框 21至 26示出用户的参与过程, 右 边列框 31至 35示出人机交互过程中与内部数据库、 规则库间的关系, 单向箭头表示人机交互流向,双向箭头表示在语言翻译过程中对数据及 规则的调用过程, 所标的 N表示系统信息处理需要人机交互, 所标的 Y 表示自动进入系统流程的下一个操作步骤, # # # #表示此翻 译系统与因特网系统的信息处理接口。 其上方表示原文用户端, 其下方 表示译文用户端。  Referring to FIG. 6, the flowchart illustrates the basic process of human-computer interaction information processing in the open holographic template-type human-machine dialogue language translation method of the present invention. The middle columns 11 to 17 are the main flow of the computer of the translation system. Columns 21 to 26 show the user's participation process, and columns 31 to 35 on the right show the relationship with the internal database and rule base during human-computer interaction. One-way arrows indicate the direction of human-computer interaction, and two-way arrows indicate the language translation. In the process of calling data and rules in the process, the marked N indicates that the system information processing requires human-computer interaction, and the marked Y indicates the next operation step of automatically entering the system flow. # # # # Indicates the information processing of this translation system and the Internet system interface. Above it is the original client and below it is the translation client.
处理过程开始, 执行步骤 1 1, 由原文用户顺序输入待翻译的自然语 言符号。  The process starts, and steps 11 are executed, and the natural language symbols to be translated are input by the original user in sequence.
结合参见图 2,在模板的序位 1至 10中依次填入本例中的十个自然 语言符号 "I saw a boy with a telescope near the bank" ; 系统主程序的步骤 12在可扩展的多语对应的词汇信息项符号库 31中, 对各自然语言符号进行词汇备用信息项搜索, 当搜索不到时可通过步骤 21由原文用户在模板上用系统已收入的语义符号对自然语言符号的语 义进行描述, 上述过程最终生成模板中的由概念定义项、 语义属性项、 时态项、 语态项等构成的词汇备用信息项, 如果在某自然语言符号下出 现概念定义信息项空白, 如在符号 "bank"处出现 "? " , 则原文用户 可采用系统中已提供有信息项的词汇符号对其进行语义描述, 即模板中 概念定义项 "institution for keeping or lending money ^ ; 系统主程序的 步骤 13, 根据词汇信息项概率性优选规则库 32中的规则, 由计算机对 列入模板中的各自然语言符号的多个词汇备用信息项进行自动优选,如 模板中用黑体字指定的信息项, 并可通过步骤 22由原文用户对未获得 确定优选的语义信息项进行选择确认; 系统主程序的步骤 14, 通过调 用句法成分信息项自动标注规则库 33, 对列入模板中的各自然语言符 号的句法信息项进行自动标注, 上述过程最终生成模板中的句法成分 项、 词性项、 上位 "格 "项; 系统主程序的步骤 15, 调用句法成分信息 项自动优选规则库 34, 对各自然语言符号的句法成分信息项进行自动 优选, 其间可通过步骤 24调用句法信息项三维结构模型库 23, 由原文 用户在模板上对未获得唯一优选结果的句法信息项进行选择确认,如模 板中用黑体字指定的信息项;系统主程序至此就可以以自定的编码形式 在网络上传递所确定的上述信息项。 Referring to FIG. 2 in combination, in the order 1 to 10 of the template, ten natural elements in this example are filled in order. The language symbol "I saw a boy with a telescope near the bank"; step 12 of the main program of the system performs a search of lexical spare information items for each natural language symbol in the extensible multilingual corresponding lexical information item symbol library 31, when When the search is not available, the semantics of natural language symbols can be described by the original user on the template using the system's already acquired semantic symbols through step 21. The above process finally generates the concept definition items, semantic attribute items, tense items, For vocabulary spare information items composed of voice items, etc., if the concept definition information item is blank under a natural language symbol, such as "?" At the symbol "bank", the original user can use the information item that has been provided in the system. Lexical symbols describe them semantically, that is, the concept definition item "institution for keeping or lending money ^ in the template; step 13 of the main program of the system, according to the rules in the lexical information item probabilistic selection rule 32, are included in the template by the computer pair Multiple vocabulary spare information items of each natural language symbol in the automatic optimization, such as those specified in bold in the template The information items can be selected and confirmed by the original user for the semantic information items that have not been determined and preferred in step 22; step 14 of the system main program automatically labels the rule base 33 by calling the syntactic component information items, and The syntactic information items of natural language symbols are automatically labeled, and the above process finally generates syntactic component items, part-of-speech items, and higher-level "lattice" items in the template. In step 15 of the main program of the system, the syntax component information item automatic selection rule base 34 is called. The syntactic component information items of each natural language symbol are automatically optimized. In the meantime, the three-dimensional structure model library 23 of the syntax information items can be called through step 24, and the original user selects and confirms the syntax information items that have not obtained the only preferred result on the template, such as a template. The information items specified in boldface; the main program of the system can now pass the identified information items on the network in a self-defined encoding form.
对话模板中包括了自然语言符号所能携带的所有信息项, 其全部备 用信息项不仅包括自然语言符号的概念定义、 时态信息、 语态信息、 句 法信息、 上位 "格"信息、 词性信息、 单复数信息、 阴阳性信息、 而且 可在开放式的模板下部扩展人工设计和标注的其它信息。 当原文用户在图 6的步骤 21中利用语义描述方法求解原文符号 时, 系统程序还要自动进行其使用频率的统计, 在使用频率达到一定水 平时, 即在翻译系统收入的所有语种的自然语言符号库中同步增添新增 的自然语言符号或新增信息项。 如人工描述求解 bank的使用频率达到 一定水平时, 系统即在法语的自然语言符号库中添加新增的符号 The dialogue template includes all the information items that natural language symbols can carry. All of its spare information items include not only the definition of natural language symbols, tense information, voice information, syntactic information, higher "lattice" information, part-of-speech information, Singular and plural information, masculine positive information, and other information that can be manually designed and labeled can be expanded under the open template. When the original user uses the semantic description method to solve the original symbol in step 21 of FIG. 6, the system program also automatically counts the frequency of its use. When the frequency of use reaches a certain level, that is, the natural language of all languages in the translation system. Simultaneously add new natural language symbols or new information items in the symbol library. For example, when the use frequency of the solution bank reaches a certain level, the system adds a new symbol to the French natural language symbol library.
" banque "和利用系统已收入的相应法语符号进行语义描述, 并给出其 它相关备选信息项。 其它语言的扩展方法与其相同。  "banque" and the corresponding French symbols already included in the system are used for semantic description, and other relevant alternative information items are given. The extension method is the same for other languages.
结合参见图 7, 译文用户端系统主程序的步骤 16, 调用译文自动转 换生成规则库 35, 根据多语符号与序位转换规则, 将经原文用户确认 的信息项求解结果自动转换成译文用户要求的自然语言译文,如图 7中 所示的中文转换生成结果 "在银行附近我看见一男孩带望远镜"; 系统 主程序在步骤 17将会询问用户译文是否已经无歧义, 如果有歧义, 译 文用户可在步骤 26通过人机交互过程确定相关信息项的查询范围, 其 间可调用多语对应信息项符号库 25, 如译文用户为了求解 "带望远镜" 到底是修饰主语还是修饰宾语, 如图 7中? 所示, 就可直接査询该符号 实际携带的句法信息项, 从而确定是修饰宾语。 至此翻译过程结束。  With reference to FIG. 7, in step 16 of the main program of the translation client system, the translation automatic translation generation rule base 35 is called, and according to the multilingual symbol and ordinal conversion rules, the solution results of the information items confirmed by the original user are automatically converted into the translation user requirements. The natural language translation of the translation is shown in Figure 7. The Chinese translation generates the result "I saw a boy with a telescope near the bank"; the system main program will ask the user if the translation is unambiguous in step 17; if there is any ambiguity, the translation user In step 26, the query range of the related information items can be determined through the human-computer interaction process, during which the multilingual corresponding information item symbol library 25 can be called, such as whether the translation user modifies the subject or object in order to solve "with a telescope", as shown in FIG. 7 ? As shown, you can directly query the syntactic information item that the symbol actually carries to determine that it is a modified object. This concludes the translation process.
语义信息传递质量是全球化网络信息时代机器翻译技术赢得巨大 国际市场的根本障碍, 要想取得实质性突破, 人机对话是不可避免的, 本发明人机对话优势互补的翻译方案可切实提高翻译质量,具有实用价 值。 由于本方法具有语义信息传递准确、 不受语言环境限制、 用户操作 使用方便、 可同步转换生成多种目标语、 对话方案多语通用及技术手段 简单可靠等优点, 因而在网络信息交流领域将会具有普遍应用价值, 在 网络的在线机译服务方面也会有广阔的市场。 根据上述方法的构思本发明还提供了一种全息语义标注系统,该系 统包括: 必要语义信息库, 其内存有基本词汇及其概念定义以及句法信息 项; The quality of semantic information transmission is the fundamental obstacle for machine translation technology to win the huge international market in the era of global network information. To achieve a substantial breakthrough, human-machine dialogue is inevitable. The translation scheme with complementary advantages of human-machine dialogue can effectively improve translation Quality has practical value. Because this method has the advantages of accurate semantic information transmission, no restriction of locale environment, convenient operation by users, simultaneous conversion and generation of multiple target languages, multilingual generalization of dialogue schemes, and simple and reliable technical means, it will be used in the field of network information exchange. It has universal application value and will also have a broad market in online machine translation services. According to the concept of the above method, the present invention also provides a holographic semantic tagging system, which includes: Necessary semantic information base, which contains the basic vocabulary and its conceptual definitions and syntactic information items;
文本输入装置, 用于输入待对其进行语义标注的文本;  A text input device for inputting text to be semantically annotated;
文本存储装置, 用于存储通过文本输入装置输入的文本;  A text storage device for storing text input through a text input device;
文本显示装置, 用于显示存储在文本存储装置中的某一文本; 句子选择装置, 用于选择文本显示装置所显示的文本中的某一句 子;  A text display device for displaying a certain text stored in the text storage device; a sentence selection device for selecting a certain sentence in the text displayed by the text display device;
句子结构自动分析装置, 用于根据统计经验自动分析所选句子的结 构;  Automatic sentence structure analysis device, for automatically analyzing the structure of a selected sentence according to statistical experience;
语义标注模板显示装置, 用于显示一语义标注模板, 该语义标注模 板在通过句子选择装置选择了一个句子时对应于所选择的句子被显 示,其内包括有对应于该句子中各词汇的词汇信息要素项和句法信息要 素项,词汇信息要素项中显示有对应词汇的在必要语义信息库中所包含 的概念定义和所有同义词,而各句法信息要素项则根据所述句子结构自 动分析装置分析的结果, 显示对应词汇的所有可能的句法信息项, 该各 句法信息项存储于所述必要语义信息库中;  A semantic annotation template display device is used to display a semantic annotation template. The semantic annotation template is displayed corresponding to the selected sentence when a sentence is selected by the sentence selection device, and includes a vocabulary corresponding to each word in the sentence. The information element item and the syntactic information element item, the lexical information element item displays the corresponding vocabulary's concept definition and all synonyms included in the necessary semantic information base, and each syntactic information element item is analyzed by the automatic analysis device according to the sentence structure Results of displaying all possible syntactic information items of the corresponding vocabulary, where each syntactic information item is stored in the necessary semantic information base;
语义标注装置, 用于供人对语义标注模板中的各词汇信息要素项中 的概念定义及同义词和各句法信息要素项中的句法信息项进行选择; 标注文本存储装置, 用于存储带有标注信息的文本;  Semantic labeling device, used for selecting concept definitions and synonyms and syntactic information item items in each lexical information element item in the semantic labeling template; labeling text storage device for storing the labels with labels The text of the message;
标注指令装置, 用于指令对文本显示装置显示的文本中的某一句子 显示其标注;  A labeling instruction device, configured to instruct a certain sentence in the text displayed by the text display device to display its labeling;
标注显示装置, 用于以所述标注模板的形式显示存储在标注文本存 储装置中的对应于被指令句子的标注信息。 本发明的该全息语义标注系统的一种应用为同文种全息语义标注系 统, 以法律业为例: 法律分门别类很多, 需要建立相应的知识库。 开发专 家系统具有广泛的应用价值。 其中一个带普遍性的应用需求是普通用户对 法律条文的语义理解和识别。 国内外已有的各种专家系统, 都是"问答式" 人-机介面: 系统依次提出许许多多问题, 用户逐一作出 "Yes "或 "No " 的选择, 或者输入简单的数据, 然后由系统搜索知识库, 根据问题与知识 的匹配情况推断出某个结论, 然后告诉用户。 The annotation display device is configured to display the annotation information corresponding to the commanded sentence stored in the annotation text storage device in the form of the annotation template. One application of the holographic semantic labeling system of the present invention is a homologous holographic semantic labeling system. Taking the legal industry as an example, there are many types of laws, and corresponding knowledge bases need to be established. Developing expert systems has a wide range of applications. One of the common application requirements is that ordinary users Semantic understanding and identification of legal provisions. Various expert systems at home and abroad are "question-and-answer" man-machine interfaces: the system asks many questions in turn, and the user makes a choice of "Yes" or "No" one by one, or enters simple data, and then The system searches the knowledge base, infers a certain conclusion based on the matching between the problem and the knowledge, and then tells the user.
这种 "问答式"人-机介面呆板、 繁琐, 并且系统所提问题事先设定, 不灵活。 这样的系统显得智商太低。  This "question-and-answer" human-machine interface is rigid and cumbersome, and the questions asked by the system are set in advance and are not flexible. Such a system seems too low in IQ.
如果在输入法律解释条文、 合同、 协议、 诉状时, 采用同文种语义标 注技术, 一次输入所使用语言符号的全息数据, 将大大方便用户査询及分 类整理。  If you use the same language semantic annotation technology when entering legal interpretation clauses, contracts, agreements, and pleadings, you can enter the holographic data of the language symbols used at one time, which will greatly facilitate users' query and classification.
同文种语义标注技术不仅适用于建立各类专家级知识系统开发, 而且 对于提高法律解释、 合同内容、 技术说明文件的语义表述精确性均具有普 遍实用价值。  The same-language semantic tagging technology is not only applicable to the development of various expert-level knowledge systems, but also has universal practical value for improving the accuracy of semantic interpretation of legal interpretation, contract content, and technical description documents.
同文种语义标注技术的实现方法:  Implementation method of the same language semantic tagging technology:
仅应用全息翻译模板的原文加工技术并提供专业词库, 即可实现同文 种语义标注。  Only by applying the original text processing technology of the holographic translation template and providing a professional thesaurus, the same language semantic annotation can be realized.
本发明的该全息语义标注系统的一种应用为外语全息语言教学系 统。  One application of the holographic semantic tagging system of the present invention is a foreign language holographic language teaching system.
计算机辅助教学目前已经应用的十分广泛。 在外语教学领域领域 的应用主要釆用的是多媒体教学法 (听、 说、 读、 写并行) 和应试题库 教学。而语言全息模板为外语教学提供了一种系统反映不同语言概念共 性和符号化个性的计算机辅助教学手段。  Computer-assisted instruction has been widely used. The application in the field of foreign language teaching mainly uses multimedia teaching methods (listening, speaking, reading, and writing in parallel) and examination question bank teaching. The language holographic template provides a computer-assisted instruction method for foreign language teaching that systematically reflects the commonalities and symbolic personality of different language concepts.
当用户输入母语句子时:  When the user enters a parent sentence:
如果用户选定母语词汇的概念定义, 通过系统提供的多语统一编 码, 全息模板即可调出多种语言的所有对应词汇。  If the user selects the concept definition of the native language vocabulary, and through the multilingual unified coding provided by the system, the holographic template can call up all corresponding vocabularies in multiple languages.
如果用户选定母语句子的时态、 语态、 句法成分信息项, 全息教学 系统则可利用全息翻译系统的界面技术及内部转换规则,分步提供任意 语种的符号变形和排序变换过程。 If the user selects the tense, voice, and syntactic component information items of the mother sentence, holographic teaching The system can use the interface technology and internal conversion rules of the holographic translation system to provide step-by-step process of symbol deformation and order transformation in any language.
如果用户直接输入外语句子, 通过系统提供的多语统一编码, 全息 模板则既可提供外语的全息语义标注,也可将全息语义标注直接转换为 母语。  If the user directly inputs a foreign sentence, and through the multi-language unified coding provided by the system, the holographic template can both provide holographic semantic annotations in foreign languages, or directly convert the holographic semantic annotations into the mother tongue.

Claims

权利要求书 Claim
1.一种开放式全息模板式人机对话语言翻译方法,其特征在于包括下 述步骤: An open holographic template human-machine dialogue language translation method, comprising the following steps:
a.对各种自然语言进行通约限制;  a. General restrictions on various natural languages;
b.建立一个以句子为对象的包括各种自然语言的必要语义信息要素 的人机对话模板;  b. Establish a human-machine dialogue template that takes sentences as objects and includes necessary semantic information elements of various natural languages;
c.由人机对话模板提供与原文语言符号对应的经通约限制的所有备 选语义信息项及供用户扩展的空白信息项;  c. The man-machine dialogue template provides all alternative semantic information items corresponding to the original language symbols subject to contract restrictions and blank information items for user expansion;
d.先由翻译系统的计算机对经通约限制的所有备选语义信息顶进行 自动优选,再由原文用户在人机对话模板上对优选结果进行人工调整和 确认;  d. The computer of the translation system first automatically optimizes all the alternative semantic information restricted by the contract, and then the original user manually adjusts and confirms the preferred result on the human-computer dialogue template;
e.由翻译系统根据人机互补确定的语义信息项生成译文,并将所述人 机互补确定的语义信息项转换成译文符号,连同译文提供给译文用户査 询。  e. A translation system generates a translation according to the semantic information items determined by the human-machine complementarity, and converts the semantic information items determined by the human-machine complementarity into a translation symbol, and provides the translation user with the translation for query.
2.根据权利要求 1所述的开放式全息模板式人机对话语言翻译方 法, 其特征在于: 所述步骤 b中的必要语义信息要素包括概念定义、 时 态信息、 语态信息和句法成分信息项。  The method of claim 1, wherein the necessary semantic information elements in step b include concept definition, tense information, voice information and syntactic component information item.
3.根据权利要求 1或 2所述的开放式全息模板式人机对话语言翻译方 法, 其特征在于所述步骤 a中的对各种自然语言的通约限制包括: al . 统一合并功能同一、对象不同一的句法概念; a2.尽量删除可缺少的句法 概念; a3.通过对主要语种词汇使用频率的统计分析和同义归并,建立多 语通用基础概念集; a4.以各种自然语言的基础概念的近义词作为近义附 码, 当不同自然语言出现近义词对应空缺时, 由基础概念词进行近义替 换; a5.对于不能用基础概念进行统一表达的自然语言词汇或概念, 由对 话模板提供空白信息项。 3. The open holographic template-type human-machine dialogue language translation method according to claim 1 or 2, characterized in that the general restrictions on various natural languages in step a include: al. Unified merge function, Objects with different syntactic concepts; a2. Try to delete indispensable syntactic concepts; a3. Establish a multilingual general basic concept set through statistical analysis and synonymous merging of the main language vocabulary usage frequency; a4. Use various natural language The synonyms of the basic concept are used as synonyms. When there are vacancies corresponding to the synonyms in different natural languages, the synonyms are replaced by the basic concept words. A5. For natural language words or concepts that cannot be expressed uniformly with the basic concepts, The call template provides blank information items.
4.根据权利要求 1所述的开放式全息模板式人机对话语言翻译方 法, 其特征在于: 所述的步骤 c中, 当与原文语言符号项对应的同语种 某备选信息项中出现有空白时,用户可调用已被系统收入的自然语言符 号对其进行描述。  The method of claim 1, wherein: in step c, when an alternative information item of the same language corresponding to the original language symbol item appears in step c When blank, users can call natural language symbols that have been included in the system to describe them.
5.根据权利要求 4所述的开放式全息模板式人机对话语言翻译方 法, 其特征在于该方法还包括: 对由用户扩展的信息项进行使用频率统 计, 并根据使用频率统计结果确定新增通用基础概念, 在翻译系统的所 有语种的人机对话模板中同步增添自然语言符号项及对应的信息项。  The method for translating an open holographic template-type human-machine dialogue language according to claim 4, further comprising: performing statistics on the frequency of use of information items extended by the user, and determining new additions based on the results of statistics on the frequency of use The universal basic concept adds natural language symbol items and corresponding information items to the human-machine dialogue templates in all languages of the translation system.
6. 根据权利要求 1所述的开放式全息模板式人机对话语言翻译方 法, 其特征在于: 所述步骤 d的对自动优选结果进行人工调整和确认的 方法是由用户在全息对话模板上对不确定信息项进行人工选择。  6. The open holographic template-type human-machine dialog language translation method according to claim 1, wherein: the method for manually adjusting and confirming the automatic optimization result in step d is performed by a user on the holographic dialog template. Uncertain information items are manually selected.
7.根据权利要求 1所述的开放式全息模板式人机对话语言翻译方 法, 其特征在于: 所述步骤 b的以句子为对象的人机对话模板是包括有 三维空间定位句法的对话框架。  7. The open holographic template-type human-machine dialogue language translation method according to claim 1, wherein: the sentence-oriented human-machine dialogue template in step b is a dialog frame including a three-dimensional spatial positioning syntax.
8.根据权利要求 1所述的开放式全息模板式人机对话语言翻译方 法, 其特征在于: 所述步骤 b的以句子为对象的人机对话模板是虚拟 的。  The method of claim 1, wherein: the human-machine dialogue template of the sentence in step b is a virtual human-machine dialogue template.
9.根据权利要求 3所述的开放式全息模板式人机对话语言翻译方 法, 其特征在于: 对各种自然语言的通约限制方法还包括有 a6.以内涵 为中心的模糊通约和 a7.不考虑词性差异的概念统一通约。  The method of claim 3, wherein the method for restricting various natural languages further comprises a6. Fuzzy connotations centered on connotation and a7 . The concept of uniformity does not take into account the difference of parts of speech.
10.根据权利要求 1所述的开放式全息模板式人机对话语言翻译方 法, 其特征在于: 所述步骤 d中, 用户可单项或多项地在全息对话模板 上对优选结果进行人工调整和确认选择。 The method of claim 1, wherein in the step d, the user can manually or manually adjust the preferred results on the holographic dialog template. Confirm your selection.
11. 一种全息语义标注系统, 包括: 11. A holographic semantic labeling system, comprising:
必要语义信息库, 其内存有基本词汇及其概念定义以及句法信息 项;  Necessary semantic information base, which contains the basic vocabulary and its conceptual definitions and syntactic information items;
文本输入装置, 用于输入待对其进行语义标注的文本;  A text input device for inputting text to be semantically annotated;
文本存储装置, 用于存储通过文本输入装置输入的文本;  A text storage device for storing text input through a text input device;
文本显示装置, 用于显示存储在文本存储装置中的某一文本; 句子选择装置, 用于选择文本显示装置所显示的文本中的某一句 子;  A text display device for displaying a certain text stored in the text storage device; a sentence selection device for selecting a certain sentence in the text displayed by the text display device;
句子结构自动分析装置, 用于根据统计经验自动分析所选句子的结 构;  Automatic sentence structure analysis device, for automatically analyzing the structure of a selected sentence according to statistical experience;
语义标注模板显示装置, 用于显示一语义标注模板, 该语义标注模 板在通过句子选择装置选择了一个句子时对应于所选择的句子被显 示,其内包括有对应于该句子中各词汇的词汇信息要素项和句法信息要 素项,词汇信息要素项中显示有对应词汇的在必要语义信息库中所包含 的概念定义和所有同义词,而各句法信息要素项则根据所述句子结构自 动分析装置分析的结果, 显示对应词汇的所有可能的句法信息项, 该各 句法信息项存储于所述必要语义信息库中;  A semantic annotation template display device is used to display a semantic annotation template. The semantic annotation template is displayed corresponding to the selected sentence when a sentence is selected by the sentence selection device, and includes a vocabulary corresponding to each word in the sentence. The information element item and the syntactic information element item, the lexical information element item displays the corresponding vocabulary's concept definition and all synonyms included in the necessary semantic information base, and each syntactic information element item is analyzed by the automatic analysis device according to the sentence structure Results of displaying all possible syntactic information items of the corresponding vocabulary, where each syntactic information item is stored in the necessary semantic information base;
语义标注装置, 用于供人对语义标注模板中的各词汇信息要素项中 的概念定义及同义词和各句法信息要素项中的句法信息项进行选择; 标注文本存储装置, 用于存储带有标注信息的文本;  Semantic labeling device, used for selecting concept definitions and synonyms and syntactic information item items in each lexical information element item in the semantic labeling template; labeling text storage device for storing the labels with labels The text of the message;
标注指令装置, 用于指令对文本显示装置显示的文本中的某一句子 显示其标注;  A labeling instruction device, configured to instruct a certain sentence in the text displayed by the text display device to display its labeling;
标注显示装置, 用于以所述标注模板的形式显示存储在标注文本存 储装置中的对应于被指令句子的标注信息。  The annotation display device is configured to display, in the form of the annotation template, annotation information corresponding to the commanded sentence stored in the annotation text storage device.
12. 根据权利要求 11所述的全息语义标注系统, 其特征在于所述 必要语义信息库中相对应地存有多种语种的通约受限的词汇及其概念 定义并且相对应地存有多种语种的通约受限的句法信息项。 12. The holographic semantic tagging system according to claim 11, characterized in that the necessary semantic information database correspondingly stores multi-language restricted vocabulary and its conceptual definitions, and correspondingly stores multiple Syntactic information items with limited language in each language.
13. 根据权利要求 11所述的全息语义标注系统, 其特征在于在所 述的某词汇的词汇信息要素项中还显示必要语义信息库中与该词汇对 应存储的指定语种的词汇且该词汇的句法信息要素项中还显示必要语 义信息库中与该词汇的句法信息项相对应地存储的该指定语种的句法 信息项。 13. The holographic semantic labeling system according to claim 11, wherein the vocabulary information element item of a certain vocabulary further displays a vocabulary of a specified language stored in the necessary semantic information database corresponding to the vocabulary, and the vocabulary's The syntax information element item also displays the syntax information item of the specified language stored in the necessary semantic information database corresponding to the syntax information item of the vocabulary.
14. 根据权利要求 11所述的全息语义标注系统, 其特征在于在所 述的词汇信息要素项中的内容除所述可供选择的内容外,还可被改为其 他用于说明该词汇意义的信息。  14. The holographic semantic tagging system according to claim 11, characterized in that the content in the vocabulary information element item can be changed to other vocabulary meanings in addition to the alternative content. Information.
PCT/CN1999/000046 1998-04-06 1999-04-06 Opening and holographic template type of language translation method having man-machine dialogue function and holographic semanteme marking system WO1999052041A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU33249/99A AU3324999A (en) 1998-04-06 1999-04-06 Opening and holographic template type of language translation method having man-machine dialogue function and holographic semanteme marking system

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN 98101156 CN1231453A (en) 1998-04-06 1998-04-06 Whole information, selection and process template type man-machine interaction language translating method
CN98101156.X 1998-04-06
CN98125015.7 1998-11-20
CN 98125015 CN1254895A (en) 1998-11-20 1998-11-20 Open full-information full-selection full-procedure template type man-machine complementary language translation method

Publications (1)

Publication Number Publication Date
WO1999052041A1 true WO1999052041A1 (en) 1999-10-14

Family

ID=25744605

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN1999/000046 WO1999052041A1 (en) 1998-04-06 1999-04-06 Opening and holographic template type of language translation method having man-machine dialogue function and holographic semanteme marking system

Country Status (3)

Country Link
CN (1) CN1111814C (en)
AU (1) AU3324999A (en)
WO (1) WO1999052041A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100346337C (en) * 2002-12-27 2007-10-31 联想(北京)有限公司 Dynamic forming system of open type natural language
CN110852113A (en) * 2019-10-10 2020-02-28 林原 Translation method, device, equipment and storage medium
CN111738024A (en) * 2020-07-29 2020-10-02 腾讯科技(深圳)有限公司 Entity noun tagging method and device, computing device and readable storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1618491B1 (en) * 2003-04-18 2006-11-15 International Business Machines Corporation System and method in a data table for creating recursive scalable template instances
CN104598443B (en) * 2013-10-31 2018-05-18 腾讯科技(深圳)有限公司 Language service providing method, apparatus and system
CN109219812B (en) * 2016-06-03 2023-12-12 微软技术许可有限责任公司 Natural language generation in spoken dialog systems

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN87106964A (en) * 1986-10-03 1988-06-01 英国电信公司 Language translation system
US5285386A (en) * 1989-12-29 1994-02-08 Matsushita Electric Industrial Co., Ltd. Machine translation apparatus having means for translating polysemous words using dominated codes
EP0622742A1 (en) * 1993-04-28 1994-11-02 International Business Machines Corporation Language processing system
US5418717A (en) * 1990-08-27 1995-05-23 Su; Keh-Yih Multiple score language processing system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN87106964A (en) * 1986-10-03 1988-06-01 英国电信公司 Language translation system
US5285386A (en) * 1989-12-29 1994-02-08 Matsushita Electric Industrial Co., Ltd. Machine translation apparatus having means for translating polysemous words using dominated codes
US5418717A (en) * 1990-08-27 1995-05-23 Su; Keh-Yih Multiple score language processing system
EP0622742A1 (en) * 1993-04-28 1994-11-02 International Business Machines Corporation Language processing system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100346337C (en) * 2002-12-27 2007-10-31 联想(北京)有限公司 Dynamic forming system of open type natural language
CN110852113A (en) * 2019-10-10 2020-02-28 林原 Translation method, device, equipment and storage medium
CN111738024A (en) * 2020-07-29 2020-10-02 腾讯科技(深圳)有限公司 Entity noun tagging method and device, computing device and readable storage medium
CN111738024B (en) * 2020-07-29 2023-10-27 腾讯科技(深圳)有限公司 Entity noun labeling method and device, computing device and readable storage medium

Also Published As

Publication number Publication date
AU3324999A (en) 1999-10-25
CN1296588A (en) 2001-05-23
CN1111814C (en) 2003-06-18

Similar Documents

Publication Publication Date Title
US20100121630A1 (en) Language processing systems and methods
US20210319344A1 (en) Natural language question answering
Davydov et al. Mathematical method of translation into Ukrainian sign language based on ontologies
KR101818598B1 (en) Server and method for automatic translation
WO2009103208A1 (en) Sentence component device and reading foreign languages and producing universal language and text conversion method
US7401016B2 (en) Communication support system, communication support method, and computer program
Banik et al. Statistical-based system combination approach to gain advantages over different machine translation systems
Boguslavsky et al. Creating a Universal Networking Language module within an advanced NLP system
Kang Spoken language to sign language translation system based on HamNoSys
Mo Design and implementation of an interactive english translation system based on the information-assisted processing function of the internet of things
WO1999052041A1 (en) Opening and holographic template type of language translation method having man-machine dialogue function and holographic semanteme marking system
Qian et al. Ontological approach for Chinese language interface design
Jung et al. Word reordering for translation into Korean sign language using syntactically-guided classification
Mukherjee et al. Natural language query handling using extended knowledge provider system
JP2007087157A (en) Translation system, translation device, translation method, and program
Hämäläinen et al. An open online dictionary for endangered uralic languages
Zhang Russian speech conversion algorithm based on a parallel corpus and machine translation
Chakrawarti et al. Phrase-Based Statistical Machine Translation of Hindi Poetries into English
Hu et al. Exploring discourse structure in document-level machine translation
Narita A corpus-based English language assistant to Japanese software engineers
Chakrawarti et al. Phrase-Based Statistical Machine Translation of Hindi Poetries into English by incorporating Word Sense Disambiguation
JP3768157B2 (en) Other language ontology dictionary utilization apparatus and method, and program
Morgan et al. Translation by Meaning and Style in LOLITA
Boitet A roadmap for MT: four «keys» to handle more languages, for all kinds of tasks, while making it possible to improve quality (on demand)
Liu et al. Construction of Medical Academic English Translation Model Driven by Bilingual Corpus-Based Data

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 99804904.2

Country of ref document: CN

AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
NENP Non-entry into the national phase

Ref country code: KR

WWE Wipo information: entry into national phase

Ref document number: 09647875

Country of ref document: US

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA