WO2009062271A1 - Formalization of a natural language - Google Patents

Formalization of a natural language Download PDF

Info

Publication number
WO2009062271A1
WO2009062271A1 PCT/BG2008/000022 BG2008000022W WO2009062271A1 WO 2009062271 A1 WO2009062271 A1 WO 2009062271A1 BG 2008000022 W BG2008000022 W BG 2008000022W WO 2009062271 A1 WO2009062271 A1 WO 2009062271A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
natural language
basic
language
notions
Prior art date
Application number
PCT/BG2008/000022
Other languages
English (en)
French (fr)
Inventor
Ivaylo Popov
Original Assignee
Ivaylo Popov
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ivaylo Popov filed Critical Ivaylo Popov
Priority to US12/740,106 priority Critical patent/US20120101803A1/en
Priority to JP2010533390A priority patent/JP2011503730A/ja
Priority to CA2705345A priority patent/CA2705345A1/en
Priority to CN200880115885A priority patent/CN101855630A/zh
Priority to EP08850309A priority patent/EP2220572A4/en
Priority to KR1020107013115A priority patent/KR101506757B1/ko
Priority to EA201070614A priority patent/EA201070614A1/ru
Publication of WO2009062271A1 publication Critical patent/WO2009062271A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the invention is about input of knowledge in a machine using a natural language. It can be used as a machine translator of a natural language.
  • a machine cannot be used for an official translation of a document because it is not a reliable way for a translation. It cannot be created a text of a natural language which has an unambiguous interpretation from different people but it is really important while writing textbooks or patent applications.
  • a computer cannot be programmed using a natural language because one sentence of a natural language has many possible meanings from a formal point of view, so grammatically true sentences can be interpreted in different ways.
  • the existing human knowledge cannot be used optimally because there is no formalized way in which a machine interprets directly knowledge written in a natural language.
  • the interpretation of a natural language always includes building of a machine model of interpreted knowledge.
  • the text in a natural language is interpreted by different means so that it can be defined the grammatical parts of speech, the meaning of the sentence and of the words in it.
  • the problem is that there is no backward relation and a person cannot have influence on the formed model. This is that because there is no base for comparison between the model and the text in a natural language. So the model is also a structure which cannot be interpreted in one way only.
  • Technical essence of the offer is method for creating an unambiguous model. The model formed in this way can be interpreted in one unique way only.
  • the method has five steps.
  • 'User rights prava na narkomana' ( 'prava na narkomana' is in English the rights of drug addicted), but in fact in the given context 'user rights' means the rights of the customer.
  • This kind of numerated words creates just an intermediate language with ambiguous meaning.
  • the offer is to numerate the entities but not the words.
  • the entities according to the method have unique names.
  • the names can be numbers, but they can also be words from a widely spread natural language. It has to be mentioned that a given word in a natural language can be used only in one way for denoting of an entity.
  • the structure about an entity that has an unique label — name or number, a description, and a list of words representing said entity in a natural language is further called basic notion.
  • the second step of the method is to be created the model of the text in a natural language using only basic notions .
  • these step of the method they are used all applicable methods from background art which gives the ability to be defined grammatical and semantic meanings of the words in the text and to be created the model.
  • During the creation of the model it can be used global statistics for the usage of words in their different meanings or a local statistics for each user of the method. It can be used similar texts with already specified meaning of the words.
  • Human translations of a given texts from one language into another can also be used for defining the basic notions used in the text in a natural language as the used words in translations are explored and they are compared to words from the original text considering their meanings.
  • the third step of the method is a backward relation
  • the created model in the second step is used as a base for generating a text in the same natural language in which the original text is.
  • An operator has the ability to make changes in the generated model using computer program so that the generated model meets his expectations for understanding of the text. This can be made with a direct change in the model as it is worked directly with represented entities, for example with a tree of the relations between the entities. This manner of work requires serious training.
  • the change in the model can be done by the means of attempt to explain to the computer which entity should be changed. It is possible the original text to be compared with the generated text and to mark the differences between the original and the generated text.
  • a thesaurus dictionary For each marked word from a thesaurus dictionary it outputs a list of synonyms as it is possible to filter those synonyms that have been rejected as some with unappropriated meaning.
  • the operator chooses from the list with synonyms and the process repeats in real time - so there is new generation and there is a possible new correction.
  • the choice of synonyms however not always is enough for defining of a given entity. So it can be considered some means for change of the interpretation of the relationship between two basic notions in a given text. In that way, a relationship can be made using visual means for marking and identification. For example, it can be specified which the subject in the sentence is or which the mean is and which the explanation is. It is possible to be created a mean by which it is indicated the tense relations in the text.
  • the forth step of the method - The generated unambiguous model of the text in a natural language is attached to the file containing the text in the natural language. This makes unambiguous interpretation of the text in the natural language which is useful in patent applications and in machine translation.
  • a text in a textbook is created using the method with attaching unambiguous model it is possible the computer program to generate an explanation in a random level of complexity as it uses the def- initions of the entities used in the text and as well a recursive usage of the definitions of the entities used when defining the entities in an upper level.
  • Fifth step of the method is usage of unambiguous models of texts in natural language for machine learning and for creation of concepts and theories by a machine using the base of formalized knowledge got from the unambiguous models of the texts in a natural language.
  • the application of the invention can be in a machine translation, in searching for knowledge, where searching is not in the base of words the text contains, as it is in the today's level of technics, but the searching is of similar unambiguous models of the searched text. It is possible to be made also a search using analysis of unambiguous models of the texts - so the explorer can answer a question like searching for information about transferring property to foreign citizens according to the Bulgarian laws.
  • Pseudo-translations of the description of the entity from the second language are compared to the descriptions of the taken out entities of the first base. It is found and marked the best accordance. Each found accordance in this way should be approved by a philologist. After approval of an accordance the entity is erased from the second base. The list of names for this entity in the second language is marked that it is in the second language and it is added to the entity of the first base. After processing all accordances, those entities that are still in the second base are either registered as new entities in the first base or a human finds their accordance in the first base.
  • the text can be presented as a list of trees and each tree is one sentence of the text. It is possible to have relationships between the separate trees.
  • Each element of a tree is an object which has additional characteristics which are extracted automatically from the text or are been added manually by an operator, A part of these characteristics are relationships between each element of the tree and the other elements of the tree.
  • Some of the elements of the tree representing a sentence in the text, for example the pronouns, can have a relationship with the elements belonging to other trees.
  • the order of the trees in the list is of an importance. It represents the order of the sentences in the original text and eventually in the generated text from the unambiguous model.
  • the screen to be divided into three areas.
  • First area is for the whole original text - an ordinary text editor.
  • the second area is for a backward relationship when the unambiguous model has been created, hi it it is the machine generated text of the processed sentence of the text.
  • the third area is a tools bar for changing the unambiguous model which is applicable on the second area.
  • These tools include the change of the the interpreted entity as giving a synonym of the word which is a synonym of another entity named by the word hi hand. It is possible as a hint to be given the description of the basic notion named by the synonym. It includes means to chose a characteristic of the text such as playing with words, a jest, poetry or scientific text. It includes defining the exact meanings for substitution of the used pronouns, for example who in fact He, She is or which It is. The exact meaning can be defined within the range of the whole text as it sets the relationship given with a definite pronoun to the previous sentences in the text. The text is examined consecutively from the beginning to the end as it is given all needed characteristics and relationships so that it is formed an unambiguous model. A sentence is processed while a machine generation make a text which at least has the same meaning as the original text. The process consists of set of changes and generations.
  • the generated unambiguous model for a given text is attached to the original file.
  • Such an attachment can be made by many ways. It is possible in the original file to be added a link to the unambiguous model of the text. It is possible the file in the original text and the file of the unambiguous model to be written in one archive package. It must have in mind that in a general text in a natural language is possible to have multiple formed unambiguous models. This is that way because the multitude of inter- pretations of a given text in a natural language is filtered by a human - operator, who uses his/her own understanding so that he/she translates the text in the natural language in an unambiguous machine model. So it is possible to foresee attaching of a text in a natural language to many unambiguous models. When it is about a patent application it is naturally the object of protection to be only one unambiguous model of the text of the application the same as it has been applied.
  • Unambiguous models of the texts of a natural language can give in to a formal processing. It is possible to be created different kinds of representation of the unambiguous model which are proper for different kinds of machine processing. Unambiguous models can be defined as a new kind of computer software because they can be a subject to formal interpretation, hi this way it can be realized a machine learning as it is dragged out facts and relationships from the unambiguous models of the texts in a natural language. It can be applied unambiguously and formally all mechanisms which are studied in the artificial intelligence. In this way the traditional software will be replaced with expert systems which contact with ordinary user in a natural language with easy addition of an unambiguous model and which give services for generation of applied software in accordance with the needs of the user.
  • the disclosed methods are executed by a special computer software.
  • a computer program can be used by professionals to create and support the database with basic notions used by the human race.
  • Another computer software can be used by all users, those creating and using unambiguous models of natural language texts.
  • the last computer software must be able to make a connection to the database with basic notions.
  • the methods can be used in machine translation from a natural language to another natural language or to artificial language e.g. program language.
  • the methods can be used in searching and processing natural language.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
PCT/BG2008/000022 2007-11-14 2008-11-12 Formalization of a natural language WO2009062271A1 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
US12/740,106 US20120101803A1 (en) 2007-11-14 2008-11-12 Formalization of a natural language
JP2010533390A JP2011503730A (ja) 2007-11-14 2008-11-12 自然言語の定式化
CA2705345A CA2705345A1 (en) 2007-11-14 2008-11-12 Method for the creation of an unambiguous model of a text in a natural language
CN200880115885A CN101855630A (zh) 2007-11-14 2008-11-12 自然语言的形式化
EP08850309A EP2220572A4 (en) 2007-11-14 2008-11-12 FORMALIZING A NATURAL LANGUAGE
KR1020107013115A KR101506757B1 (ko) 2007-11-14 2008-11-12 자연어로 된 본문의 명확한 모델을 형성하는 방법
EA201070614A EA201070614A1 (ru) 2007-11-14 2008-11-12 Формализация естественного языка

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
BG109996 2007-11-14
BG10109996A BG66255B1 (en) 2007-11-14 2007-11-14 Natural language formalization

Publications (1)

Publication Number Publication Date
WO2009062271A1 true WO2009062271A1 (en) 2009-05-22

Family

ID=40638266

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/BG2008/000022 WO2009062271A1 (en) 2007-11-14 2008-11-12 Formalization of a natural language

Country Status (8)

Country Link
EP (1) EP2220572A4 (bg)
JP (2) JP2011503730A (bg)
KR (1) KR101506757B1 (bg)
CN (1) CN101855630A (bg)
BG (1) BG66255B1 (bg)
CA (1) CA2705345A1 (bg)
EA (1) EA201070614A1 (bg)
WO (1) WO2009062271A1 (bg)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11928435B2 (en) 2020-03-19 2024-03-12 Beijing Baidu Netcom Science Technology Co., Ltd. Event extraction method, event extraction device, and electronic device

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013098701A1 (en) * 2011-12-27 2013-07-04 Koninklijke Philips Electronics N.V. Text analysis system
US10303769B2 (en) * 2014-01-28 2019-05-28 Somol Zorzin Gmbh Method for automatically detecting meaning and measuring the univocality of text
CN112861548B (zh) * 2021-02-10 2023-06-23 百度在线网络技术(北京)有限公司 自然语言生成及模型的训练方法、装置、设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6014680A (en) * 1995-08-31 2000-01-11 Hitachi, Ltd. Method and apparatus for generating structured document
WO2002027524A2 (en) * 2000-09-29 2002-04-04 Gavagai Technology Incorporated A method and system for describing and identifying concepts in natural language text for information retrieval and processing
US20020128816A1 (en) * 1997-09-30 2002-09-12 Haug Peter J. Probabilistic system for natural language processing
WO2006128238A1 (en) * 2005-06-02 2006-12-07 Newsouth Innovations Pty Limited A method for summarising knowledge from a text

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1484217A (zh) * 2003-07-11 2004-03-24 中国科学院声学研究所 层次分类与逻辑相结合的自然口语对话描述方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6014680A (en) * 1995-08-31 2000-01-11 Hitachi, Ltd. Method and apparatus for generating structured document
US20020128816A1 (en) * 1997-09-30 2002-09-12 Haug Peter J. Probabilistic system for natural language processing
WO2002027524A2 (en) * 2000-09-29 2002-04-04 Gavagai Technology Incorporated A method and system for describing and identifying concepts in natural language text for information retrieval and processing
WO2006128238A1 (en) * 2005-06-02 2006-12-07 Newsouth Innovations Pty Limited A method for summarising knowledge from a text

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11928435B2 (en) 2020-03-19 2024-03-12 Beijing Baidu Netcom Science Technology Co., Ltd. Event extraction method, event extraction device, and electronic device

Also Published As

Publication number Publication date
CN101855630A (zh) 2010-10-06
KR20100108338A (ko) 2010-10-06
EP2220572A1 (en) 2010-08-25
EA201070614A1 (ru) 2010-10-29
KR101506757B1 (ko) 2015-03-27
CA2705345A1 (en) 2009-05-22
BG66255B1 (en) 2012-09-28
JP2014139799A (ja) 2014-07-31
BG109996A (bg) 2009-05-29
EP2220572A4 (en) 2011-03-09
JP2011503730A (ja) 2011-01-27

Similar Documents

Publication Publication Date Title
US20120101803A1 (en) Formalization of a natural language
Khan et al. A novel natural language processing (NLP)–based machine translation model for English to Pakistan sign language translation
Mandera et al. Subtlex-pl: subtitle-based word frequency estimates for Polish
JPH1011447A (ja) パターンに基づく翻訳方法及び翻訳システム
Romeral et al. BOOTH-FRIENDLY TERM EXTRACTION METHODOLOGY BASED ON PARALLEL CORPORA FOR TRAINING MEDICAL INTERPRETERS.
Gotscharek et al. Enabling information retrieval on historical document collections: the role of matching procedures and special lexica
EP2220572A1 (en) Formalization of a natural language
Tufiş et al. Methodological issues in building the Romanian Wordnet and consistency checks in Balkanet
Bhatti et al. Phonetic-based sindhi spellchecker system using a hybrid model
Hamann et al. Detailed mark‐up of semi‐monographic legacy taxonomic works using FlorML
Aman et al. An automated detection of confusing variable pairs with highly similar compound names in Java and Python programs
Koeva et al. Towards bulgarian wordnet
Luthfita et al. Digitalizing a local language dictionary: Challenges and opportunities.
Dickinson On morphological analysis for learner language, focusing on Russian
McGrane et al. Is science lost in translation? Language effects in the international baccalaureate diploma programme science assessments
Chiarcos et al. Creating and exploiting a resource of parallel parses
Lau et al. Morphology in the Eurotra base level concept
Spinazzè 'Cursus in clausula', an Online Analysis Tool of Latin Prose
Feng et al. Mark-up-based writing error analysis model in an on-line classroom
Meurman-Solin The manuscript-based diachronic corpus of Scottish correspondence
Mykowiecka et al. Resources for Information Extraction from Polish texts
Mossop et al. Computer aids to checking
He Translating English Relative Clauses into Chinese: A Corpus-assisted Study
Lloret et al. Applying Natural Language Processing Techniques to Generate Open Data Web APIs Documentation
Sandhan Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200880115885.2

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08850309

Country of ref document: EP

Kind code of ref document: A1

DPE2 Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2705345

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2010533390

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

DPE2 Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2008850309

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 4217/DELNP/2010

Country of ref document: IN

Ref document number: 201070614

Country of ref document: EA

ENP Entry into the national phase

Ref document number: 20107013115

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: A201007428

Country of ref document: UA

WWE Wipo information: entry into national phase

Ref document number: 12740106

Country of ref document: US