KR101506757B1

KR101506757B1 - Method for the formation of an unambiguous model of a text in a natural language

Info

Publication number: KR101506757B1
Application number: KR1020107013115A
Authority: KR
Inventors: 이바일로 포포브; 크라시미르 니콜라에브 포포브
Original assignee: 이바일로 포포브; 크라시미르 니콜라에브 포포브
Priority date: 2007-11-14
Filing date: 2008-11-12
Publication date: 2015-03-27
Also published as: EP2220572A4; CA2705345A1; EA201070614A1; KR20100108338A; JP2011503730A; EP2220572A1; WO2009062271A1; CN101855630A; JP2014139799A; BG109996A; BG66255B1

Abstract

본 방법은 자연어로 된 명확한 모델의 생성에 관한 것이다. 자연어가 표현하는 의미들(내용들)을 위한 기초 관념들이 정의되고, 각각의 기초 관념에 숫자가 첨부되고, 또는 이름 및 설명이 각각의 사용된 자연어에 대해 기초 관념을 나타낼 수 있는 단어들 목록에 더해진다. 명확한 모델은 오직 기초 관념들만을 사용한다. 그러므로 기계가 명확한 모델을 해석하고, 베이스에 있는 정보 및 데이터를 입력하거나 명확한 모델을 이용하여 다른 자연어로 된 본문을 생성하는 것이 가능하다. 또한 인공 언어로 된 본문은 프로그래밍을 위한 언어로서 생성될 수 있다.The method relates to the generation of a clear model in natural language. The basic notions for the meanings (contents) expressed by the natural language are defined, numbers are attached to each basic idea, or names and descriptions are used in the list of words that can represent the basic idea for each used natural language It is added. Clear models use only basic ideas. It is therefore possible for a machine to interpret a clear model, to input information and data in the base, or to generate text in a different natural language using a clear model. A text in an artificial language can also be generated as a language for programming.

Description

METHOD FOR THE FORMATION OF AN UNAMBIGUOUS MODEL OF A TEXT IN A NATURAL LANGUAGE BACKGROUND OF THE INVENTION [0001]

본 발명은 자연어를 이용한 기계에서의 정보 입력, 자연어의 기계 번역에 관한 것이다. The present invention relates to information input in a machine using a natural language, and machine translation of a natural language.

사토 요시후미(SATO YOSHIFUMI) 등의 “구조화된 문서를 생성하는 방법 및 장치”로 명명되고, 문서의 기 정의된 구조는 미리 문서의 정의된 구조에 일치하는 구조화된 문서를 생성하기 위하여, 비-구조화된 문서, 변경된 구조에 따른 문서의 해석 및 구조의 완전한 변경에서의 데이터 채용으로부터 유도된 키워드들로 만들어진, 구조의 일치에 관한 방식으로 변경되는 방법이 개시된 US6014680 문서가 알려져 있다. 키워드들이 명확하게 정의됨으로써 고려될 수 있는 사실임에도 불구하고, 정의된 문서가 하나의 단일 방식으로 해석될 수 있는 것처럼, 최근 문서는 그 방식에 따라 기초 관념들이 사용되는 명확한 모델의 생성을 위해 그 관념들은 그 생성이 자연어로 형성될 뿐만 아니라 자연어의 태그, 설명 및 다양한 자연어들의 기초 관념을 나타내는 단어들 및 텍스트들의 리스트를 가지기 때문에 본 발명과 다르고, 생성된 텍스트의 명확한 모델은 텍스트의 구조(그 안의 라인 및 문장들)만을 반영하는 것이 아니라, 텍스트 및 그들 간의 상호관계에서 찾을 수 있는 관념들을 가리킨다.
또한 파스 다니엘 씨(FASS DANIEL C) 등의 “정보 검색 및 처리를 위해 자연어 본문에서 관념들을 설명하고 식별하는 방법 및 시스템”에 관한 것으로서, 사용자가 미리 정의하고, 단어 및 설명으로 구성된 관념들을 개시하고, 그 설명은 주어진 단어의 동의어, 텍스트 또는 검색 규칙들을 찾기 위한 기회를 주는, 관념들을 설명하는 구체화된 언어로 표현된다. 기초 아이디어는 사용자에 의해 정의된 제시된 관념이 그 설명의 도움을 통해 주어진 텍스트에서 발견될 수 있다는 것이다. 이러한 방법을 사용하여 처리된 텍스트들은 더 용이한 장래의 검색을 위해 발견된 관념들과 함께 표시될 수 있다. 최신 관념은 사용자에 의해 정의되고, 채용된 검색 방법에 의존하여 표현된다.
본 발명에 따른 기본 관념은 그들이 사용하는 언어와 관계없이, 모든 사람들에 의해 하나의 동일한 방법으로 이해되는 내용 또는 작용이다. 기본 관념은 자연어로 된 설명에 의해 정의된다. 자연어로 된 설명은 검색에 사용되지 않으나, 기초 관념의 태그가 기계에서 기초 관념을 발화하도록 제공하는 동안 사람의 피드백으로, 자연어로 텍스트를 처리할 때의 관념 및 자연어로 된 텍스트의 생성을 위한 검색에 제공하기 위해 다양한 언어로 주어진 관념을 나타내는 단어들 및 텍스트들의 첨부된 리스트로 사용된다. 기초 관념들이 정의될 때, 그들이 사전에서 표현되는 주어진 단어의 유사어들의 설명 사이에서 비교가 사용된다. 이들이 “사용자들의” 정의이더라도 그들은 주어진 텍스트에서 관념을 찾는데 제공될 수 없다. 설명들이 비교되고, 유사한 텍스트들은 잠재적으로 기초 관념을 설명하는 것으로 정의되고, 또한 유사한 단어들을 포함하는 텍스트들이 의미적으로 유사한지 여부의 결정은 찾지 못하지만 기초 관념을 만드는 사람에 의해 얻어질 것이다. 기계가 자연어로부터 주어진 다수의 단어들을 해석하는 것이 가장 일반적인 시스템이며, 이들은 실제로 모든 인공 언어들이다. 단어들의 문법적인 의미를 정의하려는 시도가 있어 왔다. 예컨대 기계 번역 프로세스의 경우, 제시된 본문에 대하여 단어들의 바람직한 의미를 정의하기 위해 의미 범위(semantic range)(내용)가 입력되도록 하여 더 나은 결과를 얻을 수 있도록 하는 발전이 있다. 또한 제시된 본문에서 그를 수반하는 단어들 및 그러한 단어들의 출현 빈도의 통계적 분석에 기초하여 주어진 단어의 의미를 정의하려는 시도가 있었다. 여전히 제시된 언어의 단어들에 디지털 표시(marking)를 하고, 다수 단어들 내에서 다른 자연어의 단어들에 동일한 표시를 하여, 하나의 동일한 의미를 가진 두 언어들의 단어들이 등가물을 얻도록 하려는 시도가 있다.Quot ;, " Method and Apparatus for Generating Structured Documents ", such as SATO YOSHIFUMI, and the predefined structure of the document is named " non-structured " US 6014680 documents are known which disclose a method for altering the manner in which structures are matched, made from keywords derived from the use of data in a complete change of structure and interpretation of the document according to the changed structure. Just as the defined documents can be interpreted in a single way, even though the facts that keywords can be taken into account can be taken into account, a recent document has the notion that for the generation of a clear model, Is different from the present invention because its generation is formed not only as a natural language but also as a tag and description of a natural language and a list of words and texts representing basic ideas of various natural languages. Lines and sentences), but refers to ideas found in text and interrelationships between them.
It also relates to a method and system for describing and identifying ideas in a natural language text for information retrieval and processing, such as FASS DANIEL C, which discloses concepts predefined by the user and composed of words and descriptions , The description is expressed in a concrete language that describes ideas, giving the opportunity to find synonyms, text, or search rules for a given word. The basic idea is that the proposed notion defined by the user can be found in the given text with the help of the description. The texts processed using this method can be displayed with ideas found for easier future retrieval. The latest notion is defined by the user and expressed in dependence on the employed search method.
The basic idea according to the present invention is that content or function understood by all people in one and the same way, regardless of the language they use. The basic idea is defined by a description in natural language. Natural language explanations are not used in the search, but while the tags of the basic idea provide the basic idea to be ignited while the human feedback, the notion of processing the text in natural language and the search for the generation of text in natural language Quot; is used as an attached list of words and texts representing concepts given in various languages to provide < / RTI > When basic notions are defined, a comparison is used between the descriptions of similar words of a given word as they are expressed in a dictionary. Even though they are "user" definitions, they can not be provided to look for ideas in a given text. The explanations are compared, the similar texts are defined as potentially explaining basic ideas, and the determination of whether the texts containing similar words are semantically similar will be obtained by the person making the basic idea, although it is not found. It is the most common system that machines interpret a large number of words given from natural languages, and these are actually all artificial languages. There has been an attempt to define the grammatical meaning of words. For example, in the case of a machine translation process, a semantic range (content) is input in order to define the preferred meaning of the words in the presented text, thereby achieving better results. There has also been an attempt to define the meaning of a given word based on a statistical analysis of the words accompanying it and the frequency of occurrence of such words in the text presented. There is an attempt to mark words in still presented languages and to make the same markings on words in different natural words in multiple words so that the words of two languages with one same meaning get equivalent .

기계에 의해 자연어를 명확하게 해석하는 문제는 해결되지 않았으며, 이는 자연어의 아이템들을 이용하여 기계에서 정보와 데이터를 입력하는 경우에 장애가 된다. 한 가지는 공식 문서들의 번역에 대해서는 기계에 의존할 수 없다는 것이다. 어떠한 텍스트도 다른 사람들에 의한 명확한 해석의 대상이 될 수 있는 교과서 및 특허 출원서에 기재하는 경우에 특히 필요한, 자연어로 컴파일 될 수 없다. 자연어의 한 문장은 형식적인 관점에서 다수의 수용가능한 의미들을 가질 수 있고, 즉, 문법적으로 옳은 문장들이 다르게 해석될 수 있기 때문에 어떠한 컴퓨터도 자연어에 의해 프로그램 될 수 없다. 기계가 자연어로 작성된 하나의 정보를 처리할 수 있는 공식적인 도구가 없기 때문에, 축적된 인간 정보는 최적으로 활용될 수 없다.The problem of clearly interpreting a natural language by a machine has not been solved, which is a hindrance in inputting information and data from a machine using natural language items. One is that the translation of official documents can not be machine dependent. No text can be compiled into natural language, which is especially necessary when writing on textbooks and patent applications that can be the subject of clear interpretation by others. A sentence in a natural language can have many acceptable meanings from a formal point of view, ie, no computer can be programmed by natural language, because grammatically correct sentences can be interpreted differently. Since there is no official tool that can process one piece of information written in natural language, accumulated human information can not be utilized optimally.

자연어의 해석은 항상 해석된 정보의 기계 모델을 구축하는 것을 포함한다. 자연어의 본문은 문장과 그 속의 단어의 의미, 말의 문법적인 요소들이 정의될 수 있도록 다양한 의미로 처리된다. 문제는, 피드백이 존재하지 않고 그 형성된 모델을 변경할 수 없다는 것이다. 그 이유는 모델과 자연어 본문 간의 비교를 위한 기초의 부재 때문이다. 다시 말해, 이 모델은 -또 다시 한번-둘 이상의 방식으로 해석될 수 있는 구조이다. 본 제안의 기술적인 본질은 명확한 모델을 형성하기 위한 방법이다. 이러한 방식으로 형성된 모델은 하나의 유일한 명확한 방식으로만 해석될 것이다.
본 방법은 5가지 단계로 구성된다.
제1 단계에서는 상당 수의 언어에 대한 연구가 이루어지는데, 그 목적은 인류가 사용하는 기초 관념들의 집합을 정의하기 위함이다. 제시된 언어의 단어가 “기초” 관념으로 고려될 수 없음을 주의하여야 한다. “기초” 관념은 제시된 의미 컨텐츠(내용) 또는 작용의 기호(sign)이고, 대부분의 경우에 제시된 자연어의 하나의 동일한 단어는 다양한 기초 관념들을 나타내기 위해 사용되고, 즉 단어들은 다른 의미들을 갖는다. “sun=1”, neuioa=1(Bul.)로 표시하기 위한 기술적 제안은 기계 번역의 경우에 사용될 수 있지만 적절하고 명확한 번역을 우리에게 기여하지 못할 것이다. 이러한 시스템은 “User rignts(사용자 권리)=rights of the drug addict(Bul.)(마약 중독자의 권리)” 형태의 번역들을 만드는 반면, 그 단어는 단순히 “user rights” 외에 아무것도 나타내지 않는다. 단어들의 이러한 넘버링(numbering)은 여러 가지 방식으로 해석될 수 있는 일종의 중간 언어를 만드는 것 외에 아무것도 하지 않는다. 본 제안은 단어가 아니라 그 단어들의 의미에 숫자를 할당하는 것이다. 본 방법에 따라, 이 의미들은 2진수일 뿐만 아니라 널리 전파된 자연어의 단어들일 수 있는 단수 이름들(singular names)을 갖는다. 그러나 여기서 자연어로부터 제시된 단어가 제시된 의미 내용을 나타내기 위해 단 한번만 사용될 수 있음을 주의하여야 한다. 그러므로 단어 “sun”은 오직 star(별)만 나타낼 수 있는 반면, 단어 “sun”의 모든 다른 의미들에 대해 다른 단어들이 선택될 것이다. 단어들의 실질적인 의미의 표시는 자연어에 대해서는 아무런 영향을 미치지 않는다는 점을 이해하여야 한다. 본 방법에 따라, 내용들은 그들의 설명으로 특성화되고, 이 설명들은 컴파일링 시 자연어의 사전과 동일한 방식으로 자연어로 만들어진다. 첨부된 모든 내용은 제시된 자연어로 표현될 수 있는 단어들의 리스트, 즉 단어들이 아닌 그들의 의미(내용) 컨텐츠의 동의어 사전이다.
고유 라벨-이름 또는 숫자, 설명 및 단어 리스트, 자연어로 된 상기 의미 컨텐츠를 표시하는 혼합 단어들 또는 텍스트들-을 가진 의미(내용) 컨텐츠를 표시하는 구조는, 더 나아가서 기초 관념(Basic Notion)으로 불린다.
본 방법에서 제2 단계는 기초 관념만을 이용하여 자연어 본문의 모델을 구조화하는 것이다. 이 단계에서는, 모든 적용가능한 기술적 방법들이 사용되므로, 단어들의 문법적, 의미론적인 의미들이 정의될 수 있고, 개별 모델이 형성된다. 이러한 모델을 형성하는 과정에서, 그들의 다양한 의미에서 단어들의 사용에 대한 전체적인 통계가 사용될 수 있거나, 각각에 대한 국지적인 통계 데이터가 본 방법의 사용자를 구분한다. 또한 단어에 대하여 이미 정해진 의미를 갖는 유사한 본문이 사용될 수 있다. 또한 한 언어로부터 다른 언어까지 주어진 텍스트의 인간 번역들도 번역을 위해 선택된 단어들의 평가 및 원래의 본문들의 단어들 및 그 의미의 비교를 통해 자연어의 텍스트에 사용되는 기본 관념들을 정의하는 목적으로 채용될 수 있다.
본 방법의 제 3 단계는 피드백이다. 이 단계에서, 제2 단계에서 형성된 모델은 원래의 본문이 구성된 동일한 자연어에서의 본문을 생성하기 위한 기초로서 사용된다. 조작자는 본문에 대한 조작자의 이해를 따르도록, 생성된 모델에서의 변화들에 영향을 미치는 컴퓨터 프로그램을 사용할 기회를 가질 것이다. 이는, 예컨대 다양한 내용들 간의 관계들 외에 분기함(branching)으로써 표현되는, 의미(내용들)과 직접 작용함에 따른 모델의 직접적인 변화를 통해 성취될 수 있다. 이들 모두는 상당한 학습을 요하거나, 그 의미들이 변경되어야 할 컴퓨터에 설명하기 위한 시도를 통해 얻어질 수 있다. 원문을 생성된 본문과 비교하여 양자 간의 차이점을 표시하는 것이 가능하다. 각각의 표시된 단어에 대해 동의어들의 목록이 동의어들의 사전으로부터 얻어지고, 동의어들의 필터링이 거부되고 부적절한 내용의 동의어들에 대해 이루어진다. 조작자는 동의어목록으로부터 선택할 것이고, 이 프로세스는 실시간으로 반복되며, 즉 신규 생성 프로세스 및 가능한 신규 컬렉션이 존재한다. 그러나, 동의어들의 선택은 언제나 제시된 의미를 정의하기에 충분치 않다. 그로 인해, 의미는 주어진 텍스트의 두 기초 관념들 사이에서 상호접속의 해석에 대한 변경에 제공될 수 있다. 따라서, 시각적 의미를 이용하여-예컨대, 표시(marking) 및 밑줄(underlying)-제시된 상호 관계가 정의될 수 있다. 예컨대, 문장의 주어가 무엇인지 혹은 무엇이 서술어인지 및 속성이 무엇인지를 정의할 수 있다. 본문에서의 시제 관계가 표시되는 수단을 형성하는 것이 가능할 수 있다. 본문의 외적 특징들이 주어질 수 있고, 이는 해석과 생성이 더 효율적으로 제어될 수 있도록 하기 위함이다. 예컨대, 경우들은 참된 해석이 단어 유희 또는 풍자와 같은 것, 표준 해석과 다르게 인용될 수 있고, 그러한 경우 표준 및 외적 특성들에 따른 변경이 명확한 모델의 일부가 될 수 있도록 두 해석들 모두 제시되어야 한다. 그러한 수단의 다수는 평균적으로 교육받은 사람이 특히 자신이 생각하는 바를 컴퓨터에 나타내는 것을 돕기 원하므로 이용될 수 있다. 여기서, 목적은 가장 정확한 방식으로 메시지의 의미를 전할 수 있는 명확한 모델을 얻는 것이다.
본 방법의 제 4 단계에서는, 자연어 본문을 포함하는, 생성된 명확한 모델(Unambiguous Model)이 파일에 첨부된다. 이것은 자연어 본문의 해석을 명확하게 하고, 이는 특히 특허 출원과 기계 번역에서 유용하다. 첨부된 명확한 모델의 방법을 통해 교과서의 본문이 만들어질 때, 컴퓨터 프로그램은 본문에서 사용된 의미의 정의를 사용하는 것은 물론이고 상위 레벨의 내용을 정의하는 데 사용되는 내용의 정의를 반복해서 사용하므로 아마도 임의의 수준의 복잡성으로 설명을 생성할 수 있다.
본 방법의 제 5 단계는 자연어 본문의 명확한 모델로부터 유도되는, 기계-기반 학습 및 공식화된 지식을 기반으로 한 관념들 및 이론들의 인위적 형성을 위한 자연어 본문의 명확한 모델들의 이용을 포함한다. The interpretation of natural language always involves building a machine model of the interpreted information. The text of a natural language is processed in various meanings so that the meaning of the sentence, the word in it, and the grammatical elements of the word can be defined. The problem is that there is no feedback and the model can not be changed. This is due to the lack of a basis for comparing models and natural language texts. In other words, this model is a structure that can be interpreted in two more ways - once again. The technical nature of the proposal is a way to form a clear model. A model formed in this way will only be interpreted in one unique and clear way.
The method consists of five steps.
In the first stage, a number of languages are studied, the purpose of which is to define the set of basic ideas used by mankind. It should be noted that the words in the presented language can not be considered as a "basic" notion. The "basic" notion is a suggested semantic content (content) or sign of action, and in most cases one identical word in the presented natural language is used to represent various basic notions, ie words have different meanings. Technical suggestions for marking "sun = 1", neuioa = 1 (Bul.) May be used in the case of machine translation but will not contribute to us with a proper and clear translation. While this system creates translations in the form of "User rignts = rights of the drug addict (Bul.)", The word simply does not represent anything other than "user rights". This numbering of words does nothing but create a kind of intermediate language that can be interpreted in various ways. The proposal is not a word, but a number assigned to the meaning of the words. According to the method, these meanings are singular names that can be not only binary but also widely propagated natural language words. However, it should be noted here that words presented from natural language can be used only once to indicate the semantics presented. Thus, while the word "sun" can only represent a star, other words will be selected for all other meanings of the word "sun". It should be understood that the representation of the actual meaning of the words has no effect on the natural language. According to the method, the contents are characterized by their description, and these descriptions are made into natural language in the same way as the natural language dictionary at compiling. All the attached content is a list of words that can be expressed in the presented natural language, that is, a thesaurus of their meaning (content) contents rather than words.
The structure for displaying meaning contents having a unique label-name or number, a description and a word list, mixed words or texts representing the above-mentioned semantic content in a natural language can be further classified into basic notions It is called.
In this method, the second step is to structure the model of the natural language text using only basic ideas. At this stage, since all applicable technical methods are used, the grammatical and semantic meanings of the words can be defined, and individual models are formed. In the process of forming these models, global statistics on the use of words in their various meanings can be used, or local statistical data for each distinguishes users of the method. Similar texts with predefined meanings for words may also be used. Human translations of text given from one language to another are also employed for the purpose of defining basic concepts used in the text of a natural language through evaluation of selected words for translation and comparison of the meanings of the original texts and their meanings .
The third step of the method is feedback. At this stage, the model formed in the second step is used as the basis for generating the text in the same natural language in which the original text is constructed. The operator will have the opportunity to use a computer program that affects the changes in the generated model to follow the operator's understanding of the text. This can be accomplished, for example, through direct changes in the model as it interacts directly with the meaning (contents), represented by branching in addition to the relationships between the various contents. All of these may require considerable learning, or may be obtained through an attempt to explain to the computer whose meanings need to be changed. It is possible to compare the original text with the generated text to indicate the difference between them. A list of synonyms for each displayed word is obtained from the dictionary of synonyms, filtering of synonyms is rejected, and synonyms for inappropriate content are made. The operator will select from a list of synonyms, and this process is repeated in real time, i. E. There is a new generation process and a possible new collection. However, the choice of synonyms is not always sufficient to define the suggested meaning. As such, the semantics can be provided for changes to the interpretation of the interconnection between the two basic notions of a given text. Thus, using the visual meaning - for example, marking and underlying - the presented correlation can be defined. For example, you can define what the subject of the sentence is, what is the narrative and what the attribute is. It may be possible to form means by which the temporal relationship in the text is indicated. External features of the text can be given, so that interpretation and generation can be controlled more efficiently. For example, cases may be quoted differently from the standard interpretation, such as word play or satire, and both interpretations should be presented so that changes in standard and extrinsic characteristics are part of a clear model . A large number of such means can be used on average, because they want to help educated people express their thoughts on the computer. Here, the goal is to get a clear model that conveys the meaning of the message in the most accurate way.
In a fourth step of the method, a generated Unambiguous Model is attached to the file, including the natural language text. This makes interpretation of the natural language text clear, which is particularly useful in patent applications and machine translation. When the body of a textbook is created through the method of the explicit model attached, the computer program uses the definition of the meaning used in the text, as well as the definition of the content used to define the content at the higher level Perhaps you can generate a description with any level of complexity.
The fifth step of the method involves the use of explicit models of the natural language text for artificial formation of ideas and theories based on machine-based learning and formalized knowledge, derived from a clear model of the natural language text.

본 발명은, 오늘날의 기술 수준에서 본문이 포함하는 단어에 기초하는 검색이 아니라 검색이 해당 검색되는 본문에 관한 유사한 명확한 모델에 관한 것인 경우에 정보를 검색하는데 있어서, 기계 번역에 있어서 적용될 수 있다. 본문의 명확한 모델의 분석을 이용하는 검색이 이루어질 수도 있으며, 이로써 익스플로러(Explorer)는 불가리아법에 따라 외국의 시민에게 재산을 양도하는 것에 관한 정보를 검색하는 것과 같은 질문에 답할 수 있다.The present invention can be applied in machine translation in retrieving information in the case where the search is based on a similar and clear model of the text to be searched, rather than searching based on the words contained in the text at the present level of technology . A search may be made using a clear model analysis of the text, which allows the Explorer to answer questions such as searching information about transferring property to a foreign citizen in accordance with Bulgarian law.

본 발명의 제1 단계에 대한 예시적인 구현
컴퓨터 프로그램의 도움으로, 언어의 기초 관념이 정의되고, 조사되는 자연어의 각 단어에 대해 동의어 목록이 조사된다. 개별 사전에서 제시된 해당 언어로부터 각 단어에 대한 설명은 동일한 사전에서 제시된 동의어의 설명과 비교된다. 설명들을 비교하는 것은 본문들의 단순 비교와 유사한 본문에서의 검색을 통해 이루어진다. 목적은 각 의미의 동의어에 따라 제시된 단어의 다양한 의미들을 정의하는 것이다. 사전에 제시된 것처럼, 두 설명들로부터 유사한 본문들이 정의될 수 있는, 그 단어의 설명과 그 단어의 동의어의 설명을 비교함으로써 이루어진다. 또한 이들은 의미(내용)을 형성한다. 이 내용의 설명은 원칙적으로 두 동의어들의 설명에서 유사한 텍스트들에 의해 형성된다. 그러한 내용을 발견할 때, 그러한 내용이 등록된 내용의 설명과 새로 발견된 내용의 설명을 비교하는 것에 기초하여, 이미 등록되지 않았는지 여부를 규명하도록 데이터 베이스 확인이 수행될 수 있다. 신규 내용이 데이터베이스에 등록되어 있지 않다면, 이후에 천천히 등록될 것이다.
내용과 그 설명의 데이터 베이스를 자동적으로 형성한 후에, 전문가는 그 내용들을 위한 라벨을 정의하고, 그들의 설명을 구체화하도록 불러올 것이다. 본문과 관계없이-예컨대, 과학적 본문, 언어 유희 등, 단어 및 특징들을 포함하는 본문에 의존하는 임의의 조건 하에서 그 내용들을 나타낼 수 있는 내용들에 첨부될 것이다. 제시된 언어에서 모든 내용의 데이터 베이스가 제시되는 것이 가능하고, 각각의 내용은 자연어로 된 설명의 명확한 모델을 통해 설명될 수 있다. 이는, 자연어에서 내용의 자동 형성된 설명에 기초하여, 그 언어의 기초 관념들을 이용하여 명확한 모델을 생성할 수 있는 전문 언어학자들에 의해 성취될 수 있다. 제시된 자연어에서 기초 관념을 찾은 뒤, 그 다음의 자연어는 기초 관념의 형성된 데이터 베이스를 이용할 것이다. 언어학자는 단순히 등록된 내용들을 개별 언어로 표현하는 법을 정의할 것이고, -아마도- 다수의 내용들은 데이터 베이스에 추가되어야 한다. 그 베이스에 대해 자연어의 일치를 유지하는 언어학자들은, 그들이 제안하고, 그들이 책임져야할 언어로 그 내용을 적절히 표시하기 위해, 신규 내용의 추가 각각을 알게 되어야 한다. 그 내용의 표시는 완전히 묘사적이도록 할 수 있다.
또한 제2, 및 그 다음의 자연어에 대한 조사를 자동화하는 것이 가능하다. 그러한 경우에, 제1 자연어와 동일한 절차가 적용된다. 따라서, 등록된 내용의 신규 데이터 베이스가 생성될 것이다. 제시된 내용이 제2 데이터 베이스로부터 얻어질 표시들은 제2 언어로부터의 단어들이다. 제2 언어에서 제1 언어로의 사전으로부터, 제2 베이스로부터 내용의 표시 각각에 대한 가능한 번역들이 만들어진다. 제1 베이스에서 제1 언어로부터의 단어 각각의 번역에 대하여, 이 번역으로 나타낼 수 있는 내용들을 뺄 것이다. 의사 번역(Pseudo-translation)은 제1 언어의 모든 가능한 번역들로 된 설명들에서 각 단어의 내용들의 모든 조합들을 생성함으로써, 제2 언어로부터 그 내용의 설명이 만들어질 것이다. 제2 언어로 된 내용의 설명에 대한 의사 번역은 제1 베이스로부터 뺀 내용들의 설명과 비교될 것이다. 최상의 상관관계가 발견되고 표시된다. 이러한 각각의 일치(Conformity)는 언어학자에 의하여 승인되어야 한다. 모든 일치를 처리한 후에, 제2 데이터 베이스에 남아있는 내용들은 제1 베이스에 신규 내용으로서 등록되거나 제1 데이터 베이스에서 그들의 상관관계를 찾을 것이다.
공식 문서의 경우에 있어서, 명확한 모델로부터 자연어로 된 생성된 본문의 단일성이 성취되어야 한다. 이는 자연어에서 생성된 본문을 단순화하도록 이용하는 데 있어서도 이루어질 수 있고, 즉 언어학자들의 일이 명확한 생성의 성취가 충분할 때 본문의 다수 특성들을 명확한 모델에 추가하는 것이라 하더라도, 언어학적 관점에서, 명확한 모델에 의해 운반되는, 의미를 갖고 정보를 전달할 자연어 본문이 다수 생성되는 것이 가능하다. 그러한 접근은 한 언어에서 다른 언어로 공식 문서들을 번역하는 경우 및 특히 특허 출원들을 번역하는 데 있어서 특히, 중요하다.
한편, 소설 번역 시, 제시된 언어로 쓰여진 소설 문학으로부터 도출된 통계적 데이터를 이용하여 실체적인 언어에 가장 적절한 구조를 선택하기 위하여, 명확한 모델로부터 자연어로 된 본문을 다수 생성하는 것이 더 좋다.
본 발명의 제2 단계에 대한 예시적인 구현
본문은 각 트리(tree)가 본문으로부터 한 문장을 나타내는 다각화-목록(branch-out list)으로 표현될수 있다. 상이한 트리들 간의 연결이 가능하다. 트리의 각 요소는 본문으로부터 자동으로 도출되거나, 조작자에 의해 수동으로 추가된 특징들을 가진 객체이다. 텍스트에 있는 문장들인 트리의 일부 요소들-예컨대 대명사-은 다른 트리들의 요소들에 대한 연결을 가질 수 있다. 그 목록에 있는 트리들의 배열은, 원래의 본문에서 문장들의 배열 및 명확한 모델로부터 생성된 본문을 가리키기 때문에 중요하다.
본 발명의 제 3 단계에 대한 예시적인 구현
문서 편집기 상부 구조(Superstructure)는 추가적인 기능, 및 본문 형식의 단순화와 더불어 자동으로 형성된 명확한 텍스트 모델에 변경들이 초래되도록 할 수 있는 지원으로 만들어진다. 예를 들어, 스크린이 3개의 구역으로 분할된다. 하나의 구역은 전체 원래 본문을 포함할 것이다-일반적인 문서 편집기. 두번째는 명확한 모델을 구축할 때 피드백을 포함할 것이다. 그것은 기계에 의해 생성된 처리 문장의 본문을 포함할 것이다. 기계-생성된 문서의 제시된 단어에 마우스의 커서를 고정시키면, 지원 문장(힌트)이 개별 단어가 나타내는기초의미(내용)의 설명을 공급할 것이다. 동일한 문장이 원래 본문에 적절히 표시될 것이다. 제3 구역은 제2 구역에 적용가능한 명확한 모델을 변경하는 데 사용되는 장치 집단이다. 이 장치들은 질문의 단어에 의해 명명된 다른 내용의 동의어와 함께 그 단어의 동의어 표시를 통해 그 해석된 내용에서의 변경을 포함하고, 또한 힌트로써, 동의어에 의해 명명된 기초 내용의 표현이 표시되는 것도 가능하다. 이것은 언어 유희, 농담, 시 혹은 과학 교재와 같은 구체적인 본문의 선택을 포함한다. 또한 실제로 그, 그녀 또는 그것과 같이, 사용된 대명사의 내용에 대한 정확한 의미들을 나타내는데 사용된다. 정확한 의미는 본문의 이전 문장들과 주어진 대명사의 관계를 나타냄으로써, 전체 본문의 틀(framework) 내에 주어질 수 있다. 본문은 명확한 모델을 형성하기 위하여 모든 필수 특징들 및 관계들을 입력함으로써 시작부터 끝까지 연속적으로 분석된다. 질문의 문장은 기계 생성이 적어도 원래의 본문과 동일한 의미를 가진 본문을 만들지 않을 때까지 처리될 것이다. 처리는 다수의 수정 및 생성들로 구성된다.
본 발명의 제 4 단계에 대한 예시적인 구현
제시된 본문의 생성된 명확한 모델이 원본 파일에 부착된다. 이러한 부착은 여러 가지 방식으로 이루어질 수 있다. 원본 파일에서 본문의 명확한 모델에 링크가 부착될 수 있다. 원문의 파일 및 명확한 모델의 파일은 하나의 아카이브(archive) 팩에 등록될 수 있다. 원칙적으로 자연어로 제시된 본문은 다수의 형성된 명확한 모델들을 가질 수 있음을 명심해야 한다. 이는 자연어로 제시된 본문의 해석의 다양성은 인간, 즉 자연어로부터 명확한 기계 모델로의 본문 번역 과정에서 자신만의 이해를 이용하는 조작자에 의해 필터링되기 때문에 그러하다. 그러므로 자연어로 제시된 본문을 다수의 명확한 모델들에 연결하는 것이 가능하다. 특허 출원의 경우, 보호 대상은 당연히 제출되는 것과 같이, 출원 본문의 단일의 명확한 모델인 것이다.
본 발명의 제 5 단계에 대한 예시적인 구현
자연어 본문의 명확한 모델은 형식적인 프로세스에 종속될 수 있다. 명확한 모델의 다양한 종류의 표현들이 다양한 종류의 기계 프로세스에 적합한 것으로 생성될 수 있다. 형식적 해석이 그 모델들에 적용될 수 있기 때문에, 명확한 모델들은 새로운 종류의 컴퓨터 소프트웨어로서 보여질 수 있다. 이러한 방식으로, 기계 트레이닝은 자연어 본문의 명확한 모델로부터의 사실들 및 관계들을 추출함으로써 인지될 수 있다. 인공 지능으로 학습되는 모든 메커니즘이 명확하면서도 공식적으로 적용될 수 있다. 이러한 방식으로, 종래의 소프트웨어는, 명확한 모델의 손쉬운 부가와 함께 자연어로 일반 사용자와 접촉하고 사용자의 필요에 따라 응용 소프트웨어의 생성을 위한 서비스를 제공하는 전문가 시스템으로 대체될 수 있다.
본 개시된 방법은 특별한 컴퓨터 소프트웨어에 의하여 실행된다. 컴퓨터 프로그램은 인류에 의하여 사용되는 기초 관념을 갖는 데이터베이스를 생성하고 지원하도록 전문가에 의하여 사용될 수 있다. 다른 컴퓨터 소프트웨어는 자연어 본문의 명확한 모델을 생성 및 이용하는 모든 사용자들에 의하여 사용 가능하다. 최종적인 컴퓨터 소프트웨어는 기초 관념과 데이터베이스 간의 관계를 형성할 수 있어야 한다.
본 발명의 응용은 본문에 포함된 단어들을 이용하지 않고 인지되는-현재 기술 수준들이 추정하는- 지식을 검색하는 데 있어서, 기계 번역의 영역 내에 있을 수 있지만, 요구된 텍스트에서 발견된 명확한 모델과 유사한 명확한 모델들을 찾는 것이 나을 수 있다. 또한 텍스트에서 명확한 모델들의 분석의 도움으로 검색도 가능하다. 그러한 방식으로, 검색 엔진은 불가리법에 따라 외국 시민들에게 재산 양도하는 것에 관한 정보와 관련된 질문들에 대답할 수 있다. 본 방법의 적용은 자동화된 검색과 조사에 대한 가능성과 보호 대상의 명확한 정의를 위해서뿐만 아니라 인류를 위한 새로운 정보의 자동적인 생성을 유도할 인류의 가장 최신의 가치 있는 정보에 있어서 기계 프로세스의 가능성을 위해서라도 특허 시스템의 분야에서 특히 중요하다. An exemplary implementation of the first step of the present invention
With the aid of a computer program, the basic notion of language is defined and a list of synonyms is searched for each word of the natural language being examined. The description of each word from the language presented in the individual dictionary is compared with the description of the synonym presented in the same dictionary. Comparisons of descriptions are made through searches in text similar to simple comparisons of texts. The purpose is to define the various meanings of the presented words according to the synonyms of each meaning. This is done by comparing the description of the word with the description of the word's synonyms, from which similar texts can be defined, as suggested in the preface. They also form meaning (content). The description of this content is, in principle, formed by similar texts in the description of the two synonyms. When such content is found, a database check may be performed to determine whether the content has not already been registered, based on a comparison of the description of the registered content with the description of the newly discovered content. If the new content is not registered in the database, it will be registered slowly later.
After automatically forming a database of content and its description, the specialist will invite them to define labels for their content and to specify their descriptions. Regardless of the text - may be attached to the contents that may represent the contents under any condition depending on the text including, for example, scientific text, language play, words and features. It is possible to present a database of all contents in the presented language, and each content can be explained through a clear model of description in natural language. This can be accomplished by professional linguists capable of generating a clear model based on an automatically formed description of content in a natural language, using basic notions of the language. After finding the basic idea in the proposed natural language, the next natural language will use the formed database of the basic idea. The linguist will simply define how to express the registered contents in a separate language, and - perhaps - many of the contents have to be added to the database. Linguists who maintain a natural accord with the base should be made aware of each addition of new content so that they can propose and display the content appropriately in the language they are responsible for. The display of the contents can be made completely descriptive.
It is also possible to automate the investigation of the second, and subsequent natural language. In such a case, the same procedure as the first natural language is applied. Thus, a new database of registered contents will be created. The indications whose content is to be obtained from the second database are words from the second language. From the dictionary in the second language to the first language, possible translations are made for each representation of the content from the second base. For the translation of each of the words from the first language in the first base, the content that can be represented by this translation will be subtracted. Pseudo-translation will produce a description of its content from a second language by generating all combinations of the contents of each word in the descriptions with all possible translations of the first language. The physician translation for the description of the content in the second language will be compared to the description of the content subtracted from the first base. The best correlation is found and displayed. Each of these conformities must be approved by a linguist. After processing all matches, the remaining content in the second database will either be registered as new content in the first base or will find their correlation in the first database.
In the case of official documents, the unity of the generated text in a natural language from a clear model must be achieved. This can also be done in order to simplify the texts generated in natural language, that is, when the work of linguists is to add many features of the text to a definite model when there is sufficient achievement of clear production, It is possible for a large number of natural language texts to be conveyed and meaningful to convey information. Such an approach is particularly important in translating official documents from one language to another and especially in translating patent applications.
On the other hand, in the novel translation, it is better to generate a large number of texts in natural language from the definite model, in order to select the most appropriate structure for the actual language using the statistical data derived from the novel literature written in the presented language.
An exemplary implementation of the second stage of the present invention
The text can be represented as a branch-out list in which each tree represents a sentence from the text. Connection between different trees is possible. Each element in the tree is an object with features automatically derived from the body text or manually added by the operator. Some elements of the tree - sentences in the text - such as pronouns - can have links to elements of other trees. The arrangement of the trees in the list is important because it points to the body of the original body, as well as the array of sentences and the body from the explicit model.
An exemplary implementation of the third step of the present invention
The text editor superstructure is made up of additional features and support that allows changes to be made to the explicit text model automatically created with simplification of the body format. For example, the screen is divided into three zones. One section will contain the entire original body - a generic text editor. The second one will include feedback when building a clear model. It will contain the body of the processing statement generated by the machine. When the cursor of the mouse is fixed to the presented word of the machine-generated document, the supporting sentence (hint) will provide a description of the underlying meaning (content) that the individual word represents. The same sentence will be properly displayed in the original text. The third zone is a group of devices used to change a clear model applicable to the second zone. These devices include modifications to the interpreted content through synonym display of the word with the synonyms of other content named by the words of the question, and also by hints, representations of the underlying content named by the synonyms are displayed It is also possible. This includes the selection of specific texts such as language play, jokes, poetry, or science texts. It is also used to denote the exact meanings of the contents of pronouns actually used, such as her, her, or the like. The exact meaning can be given in the framework of the entire text by indicating the relationship between the previous sentences of the text and the given pronoun. The text is continuously analyzed from start to finish by entering all the necessary features and relationships to form a clear model. The sentence of the question will be processed until the machine creation does not create a text that has at least the same meaning as the original text. The process consists of a number of modifications and productions.
An exemplary implementation of the fourth stage of the present invention
The generated clear model of the proposed text is attached to the original file. This attachment can be done in various ways. Links can be attached to a clear model of the text in the original file. Files of the original text and files of the definite model can be registered in a single archive pack. In principle, it should be borne in mind that the text presented in natural language can have a large number of well-defined models. This is because the diversity of interpretation of the text presented in natural language is filtered by the human operator, who uses his or her own understanding in the process of translating the text from a human, that is, a natural language into a definite machine model. It is therefore possible to link the text presented in natural language to a number of distinct models. In the case of a patent application, the protected object is a single, definite model of the filed text, as is, of course, submitted.
An exemplary implementation of the fifth step of the present invention
A clear model of natural language text can be subject to formal processes. Various kinds of expressions of a definite model can be generated as suitable for various kinds of mechanical processes. Because formal interpretations can be applied to those models, explicit models can be viewed as a new kind of computer software. In this way, machine training can be perceived by extracting facts and relationships from a clear model of the natural language text. All mechanisms learned by artificial intelligence can be applied clearly and formally. In this way, conventional software can be replaced by an expert system that provides a service for the generation of application software in accordance with the user's needs, in a natural language, with the easy addition of a clear model.
The disclosed method is implemented by special computer software. Computer programs can be used by professionals to create and support databases with basic notions used by mankind. Other computer software is available to all users who create and use a clear model of natural language text. The final computer software should be able to form relationships between the basics and the database.
The application of the present invention may be in the area of machine translation in searching for knowledge that is perceived without using the words contained in the text, which is assumed by current skill levels, but is similar to a clear model found in the required text It may be better to find clear models. It is also possible to search with the help of analysis of definite models in text. In that way, the search engine can answer questions related to information about transferring property to foreign citizens under the Bulgari Act. The application of this method is not only for the possibility of automated searches and investigations, but also for the clear definition of protected objects, as well as the possibility of mechanical processes in the most up-to-date valuable information of mankind that will lead to the automatic generation of new information for mankind. It is especially important in the field of patent systems.

Claims

CLAIMS What is claimed is: 1. A method of forming an unambiguous model of a natural language text, the creation of a data structure including words in a natural language text and semantic and grammatical relationships between them, silver,
a computer generates a clear model of the natural language text by selecting words and / or text associated with basic notions and selecting the basic notions named by the words or texts, ; Selecting one basic idea for each one word or text according to a predefined criterion; Establishing a relationship between the model of the text and the notions that includes only basic ideas, the basic notions being the markers of an entity or an action, A unique label, a description of said independent meaning in natural language, and an associated list of words in natural language and the text used in said method, said words or text being each said independent Lt; / RTI >
b) a defined criterion for the selection is applied in advance and uses the definite model on a basis of computer generation of new text in the same natural language or in another natural language, Displaying the basics in the model by one of the words or texts in a list of words or text associated with the word;
c) comparing the generated text using the computer to display the natural language text processed in step a) and the difference between two texts in a language;
d) applying user feedback for the selection of alternate basic notions when there is no semantic conformity between the explicit model and the natural language text processed in step a), and using the modified clear model b) Or proceeding to step e); And
e) using said computer, linking said generated clear model of said text to a file containing said text in said natural language.

delete

The method according to claim 1,
The list of semantic and grammatical properties, together with the type of text (science, prose or poem), as well as indicators of alternative basic ideas in the case of satire and language play, Lt; RTI ID = 0.0 > concept. &Lt; / RTI >

The method according to claim 1,
The application of the feedback,
Editing the clear model using an expression of alternative notions of basic notions having expressions in respective languages that are the same as the basic notions automatically selected; And / or
Editing the definite model using a synonym representation of a representation in each language of the automatically selected baseline; And / or
And displaying the explicit model using a representation of dependencies in the processed natural language text.

The method according to claim 1,
Wherein the linking of the natural language text and the explicit model is performed using an archive package comprising a link or a file having the natural language text and a file having the data structure.

The method according to claim 1,
And generating a body of a language different from the language of the original body.

The method according to claim 1,
Wherein the artificial language comprises an information language used in various systems for programming, data processing, a formalized language for symbolic registration of scientific facts, And a language using the theories of the scientific domain.

The method according to claim 1,
Wherein the natural language descriptions of the basic concepts also include representations by explicit models.

CLAIMS What is claimed is: 1. A method of defining a set of basic notions in natural language, including the discovery of synonyms of each word from a natural language of a dictionary and storage in a database,
Comparing the definition of each word with the definition of the synonym of the word given in the dictionary and defined in the pseudo texts associated with the two definitions, the definitions being in different meanings And there is one basic notion for each independent entity;
Deriving an associated baseline for each pair of similar texts that includes similar words or word-synonyms higher than a given percentage of the total content;
For each related basic idea, a similar text found in a previous step is compared with a description of the basic ideas in the database to check in the database whether each of the basic ideas is already registered, and similar words Or if the word synonyms match more than a given percentage, deriving the basic notion as already registered, along with the explanations found for the basic notion and the other two similar explanations provided as a reason for the check ;
An application step of user feedback to check whether the derived texts match semantically due to the similarity of the texts, and if so, to add one or two word-synonyms to the registered representation of the basic idea; And
Adding a new baseline concept to the database when a check performed during the previous phase indicates failure, wherein one of the two similar texts is selected or a new description is entered.

The method of claim 9,
In order to add a different language to the set of pre-established bases,
A first step of defining a new set of basic notions for the natural language when the basic notion can be automatically defined when the basic notion can be automatically defined;
In each of the basic notions defined in the previous step, from all possible translations for displaying each using a dictionary stored in a machine-readable memory, from the language to a language already present in a set of prefabricated preconceptions, Finding the second stage;
A third step of finding basic ideas having expressions attached to those matching the translations in the second step from the set of pre-formed basic ideas;
By generating each of the possible combinations of replacing each word from the above description with all possible translations for a language existing in the set of pre-formed bases using a dictionary in the machine-readable memory, A fourth step of generating pseudo-translations for the description from the language processed in the second step;
If the pseudo-translations are generated in the fourth step, the description of the basic ideas from the set of pre-formed basic ideas found in the third step, Comparing the pseudo-translations to the descriptions of the basic ideas from the fifth step;
A sixth step of arranging similar texts in descending order according to the degree of similarity;
A seventh step of applying user feedback to each correlation found in the fifth step to determine whether similar explanations of the similar words have semantic similarity;
Deleting the basic notions from the new set if semantic similarity is found in the previous description in the previous step and replacing the descriptions and representations in the language of the deleted basic notion from the new set into a set of pre- Adding to each existing existing notion and marking an added list of expressions in the language;
A ninth step of registering new basic ideas in the pre-formed set, or applying user feedback to find a correlation in a pre-formed set of all remaining basic ideas in the new set after applying the previous steps;
And a step of finding an appropriate expression in the language of all the basic ideas from the preformed set that has not received the expression after application of the previous steps using user feedback.

A method of forming feedback in a natural language processing, the method comprising automatically suggesting alternate words and / or texts, the method comprising applying the feedback to determine the meaning of the words and generating speech and semantic, &Lt; / RTI >
Using global statistics for use of the words in different meanings of the words, and / or using local statistics for each of the individual users of the method; And / or
Using a pseudo text having a predefined meaning of the words; And / or
Using a translation of the original text generated by the human in one or more languages.

An apparatus comprising a definite model of a natural language text formed according to any one of claims 1 and 3 to 11, the apparatus for performing the mechanical processing of the same text in natural language.

The method of claim 12,
Wherein the machine processing is machine translation from one natural language to another using a generation of a clear model of the first natural language text and a generation from a definite model of the second natural language text.

The method of claim 12,
The mechanical processing not only replaces the basic ideas from the explicit model of the processed natural language text with descriptions but also replaces the basic ideas from the explicit model of the explanatory use repeatedly with the above description, A device that is a creation of a body having a predetermined complexity.

The method of claim 12,
Wherein the machine processing is a textual creation in an artificial language.

The method of claim 12,
Wherein the mechanical processing searches in the explicit model of natural language texts for the generation of a textual body in a natural or artificial language from a found definite model or to display a body match in a found definite model.

The method of claim 12,
Wherein said clear model of the text in a natural language is processed and the results are input in an information base from which data, facts and correlations are extracted.

The method of claim 12,
The mechanical processing may provide the user with an opportunity to operate one computer device while the mechanical processing is performed by another device and / or data from the machine processing is stored in a device other than the one on which the mechanical processing is performed A computer-readable medium having computer-readable instructions thereon.