WO2009062271A1 - Formalization of a natural language - Google Patents
Formalization of a natural language Download PDFInfo
- Publication number
- WO2009062271A1 WO2009062271A1 PCT/BG2008/000022 BG2008000022W WO2009062271A1 WO 2009062271 A1 WO2009062271 A1 WO 2009062271A1 BG 2008000022 W BG2008000022 W BG 2008000022W WO 2009062271 A1 WO2009062271 A1 WO 2009062271A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text
- natural language
- basic
- language
- notions
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/55—Rule-based translation
- G06F40/56—Natural language generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Definitions
- the invention is about input of knowledge in a machine using a natural language. It can be used as a machine translator of a natural language.
- a machine cannot be used for an official translation of a document because it is not a reliable way for a translation. It cannot be created a text of a natural language which has an unambiguous interpretation from different people but it is really important while writing textbooks or patent applications.
- a computer cannot be programmed using a natural language because one sentence of a natural language has many possible meanings from a formal point of view, so grammatically true sentences can be interpreted in different ways.
- the existing human knowledge cannot be used optimally because there is no formalized way in which a machine interprets directly knowledge written in a natural language.
- the interpretation of a natural language always includes building of a machine model of interpreted knowledge.
- the text in a natural language is interpreted by different means so that it can be defined the grammatical parts of speech, the meaning of the sentence and of the words in it.
- the problem is that there is no backward relation and a person cannot have influence on the formed model. This is that because there is no base for comparison between the model and the text in a natural language. So the model is also a structure which cannot be interpreted in one way only.
- Technical essence of the offer is method for creating an unambiguous model. The model formed in this way can be interpreted in one unique way only.
- the method has five steps.
- 'User rights prava na narkomana' ( 'prava na narkomana' is in English the rights of drug addicted), but in fact in the given context 'user rights' means the rights of the customer.
- This kind of numerated words creates just an intermediate language with ambiguous meaning.
- the offer is to numerate the entities but not the words.
- the entities according to the method have unique names.
- the names can be numbers, but they can also be words from a widely spread natural language. It has to be mentioned that a given word in a natural language can be used only in one way for denoting of an entity.
- the structure about an entity that has an unique label — name or number, a description, and a list of words representing said entity in a natural language is further called basic notion.
- the second step of the method is to be created the model of the text in a natural language using only basic notions .
- these step of the method they are used all applicable methods from background art which gives the ability to be defined grammatical and semantic meanings of the words in the text and to be created the model.
- During the creation of the model it can be used global statistics for the usage of words in their different meanings or a local statistics for each user of the method. It can be used similar texts with already specified meaning of the words.
- Human translations of a given texts from one language into another can also be used for defining the basic notions used in the text in a natural language as the used words in translations are explored and they are compared to words from the original text considering their meanings.
- the third step of the method is a backward relation
- the created model in the second step is used as a base for generating a text in the same natural language in which the original text is.
- An operator has the ability to make changes in the generated model using computer program so that the generated model meets his expectations for understanding of the text. This can be made with a direct change in the model as it is worked directly with represented entities, for example with a tree of the relations between the entities. This manner of work requires serious training.
- the change in the model can be done by the means of attempt to explain to the computer which entity should be changed. It is possible the original text to be compared with the generated text and to mark the differences between the original and the generated text.
- a thesaurus dictionary For each marked word from a thesaurus dictionary it outputs a list of synonyms as it is possible to filter those synonyms that have been rejected as some with unappropriated meaning.
- the operator chooses from the list with synonyms and the process repeats in real time - so there is new generation and there is a possible new correction.
- the choice of synonyms however not always is enough for defining of a given entity. So it can be considered some means for change of the interpretation of the relationship between two basic notions in a given text. In that way, a relationship can be made using visual means for marking and identification. For example, it can be specified which the subject in the sentence is or which the mean is and which the explanation is. It is possible to be created a mean by which it is indicated the tense relations in the text.
- the forth step of the method - The generated unambiguous model of the text in a natural language is attached to the file containing the text in the natural language. This makes unambiguous interpretation of the text in the natural language which is useful in patent applications and in machine translation.
- a text in a textbook is created using the method with attaching unambiguous model it is possible the computer program to generate an explanation in a random level of complexity as it uses the def- initions of the entities used in the text and as well a recursive usage of the definitions of the entities used when defining the entities in an upper level.
- Fifth step of the method is usage of unambiguous models of texts in natural language for machine learning and for creation of concepts and theories by a machine using the base of formalized knowledge got from the unambiguous models of the texts in a natural language.
- the application of the invention can be in a machine translation, in searching for knowledge, where searching is not in the base of words the text contains, as it is in the today's level of technics, but the searching is of similar unambiguous models of the searched text. It is possible to be made also a search using analysis of unambiguous models of the texts - so the explorer can answer a question like searching for information about transferring property to foreign citizens according to the Bulgarian laws.
- Pseudo-translations of the description of the entity from the second language are compared to the descriptions of the taken out entities of the first base. It is found and marked the best accordance. Each found accordance in this way should be approved by a philologist. After approval of an accordance the entity is erased from the second base. The list of names for this entity in the second language is marked that it is in the second language and it is added to the entity of the first base. After processing all accordances, those entities that are still in the second base are either registered as new entities in the first base or a human finds their accordance in the first base.
- the text can be presented as a list of trees and each tree is one sentence of the text. It is possible to have relationships between the separate trees.
- Each element of a tree is an object which has additional characteristics which are extracted automatically from the text or are been added manually by an operator, A part of these characteristics are relationships between each element of the tree and the other elements of the tree.
- Some of the elements of the tree representing a sentence in the text, for example the pronouns, can have a relationship with the elements belonging to other trees.
- the order of the trees in the list is of an importance. It represents the order of the sentences in the original text and eventually in the generated text from the unambiguous model.
- the screen to be divided into three areas.
- First area is for the whole original text - an ordinary text editor.
- the second area is for a backward relationship when the unambiguous model has been created, hi it it is the machine generated text of the processed sentence of the text.
- the third area is a tools bar for changing the unambiguous model which is applicable on the second area.
- These tools include the change of the the interpreted entity as giving a synonym of the word which is a synonym of another entity named by the word hi hand. It is possible as a hint to be given the description of the basic notion named by the synonym. It includes means to chose a characteristic of the text such as playing with words, a jest, poetry or scientific text. It includes defining the exact meanings for substitution of the used pronouns, for example who in fact He, She is or which It is. The exact meaning can be defined within the range of the whole text as it sets the relationship given with a definite pronoun to the previous sentences in the text. The text is examined consecutively from the beginning to the end as it is given all needed characteristics and relationships so that it is formed an unambiguous model. A sentence is processed while a machine generation make a text which at least has the same meaning as the original text. The process consists of set of changes and generations.
- the generated unambiguous model for a given text is attached to the original file.
- Such an attachment can be made by many ways. It is possible in the original file to be added a link to the unambiguous model of the text. It is possible the file in the original text and the file of the unambiguous model to be written in one archive package. It must have in mind that in a general text in a natural language is possible to have multiple formed unambiguous models. This is that way because the multitude of inter- pretations of a given text in a natural language is filtered by a human - operator, who uses his/her own understanding so that he/she translates the text in the natural language in an unambiguous machine model. So it is possible to foresee attaching of a text in a natural language to many unambiguous models. When it is about a patent application it is naturally the object of protection to be only one unambiguous model of the text of the application the same as it has been applied.
- Unambiguous models of the texts of a natural language can give in to a formal processing. It is possible to be created different kinds of representation of the unambiguous model which are proper for different kinds of machine processing. Unambiguous models can be defined as a new kind of computer software because they can be a subject to formal interpretation, hi this way it can be realized a machine learning as it is dragged out facts and relationships from the unambiguous models of the texts in a natural language. It can be applied unambiguously and formally all mechanisms which are studied in the artificial intelligence. In this way the traditional software will be replaced with expert systems which contact with ordinary user in a natural language with easy addition of an unambiguous model and which give services for generation of applied software in accordance with the needs of the user.
- the disclosed methods are executed by a special computer software.
- a computer program can be used by professionals to create and support the database with basic notions used by the human race.
- Another computer software can be used by all users, those creating and using unambiguous models of natural language texts.
- the last computer software must be able to make a connection to the database with basic notions.
- the methods can be used in machine translation from a natural language to another natural language or to artificial language e.g. program language.
- the methods can be used in searching and processing natural language.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Priority Applications (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/740,106 US20120101803A1 (en) | 2007-11-14 | 2008-11-12 | Formalization of a natural language |
JP2010533390A JP2011503730A (ja) | 2007-11-14 | 2008-11-12 | 自然言語の定式化 |
CA2705345A CA2705345A1 (en) | 2007-11-14 | 2008-11-12 | Method for the creation of an unambiguous model of a text in a natural language |
CN200880115885A CN101855630A (zh) | 2007-11-14 | 2008-11-12 | 自然语言的形式化 |
EP08850309A EP2220572A4 (en) | 2007-11-14 | 2008-11-12 | FORMALIZING A NATURAL LANGUAGE |
KR1020107013115A KR101506757B1 (ko) | 2007-11-14 | 2008-11-12 | 자연어로 된 본문의 명확한 모델을 형성하는 방법 |
EA201070614A EA201070614A1 (ru) | 2007-11-14 | 2008-11-12 | Формализация естественного языка |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
BG109996 | 2007-11-14 | ||
BG10109996A BG66255B1 (en) | 2007-11-14 | 2007-11-14 | Natural language formalization |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2009062271A1 true WO2009062271A1 (en) | 2009-05-22 |
Family
ID=40638266
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/BG2008/000022 WO2009062271A1 (en) | 2007-11-14 | 2008-11-12 | Formalization of a natural language |
Country Status (8)
Country | Link |
---|---|
EP (1) | EP2220572A4 (bg) |
JP (2) | JP2011503730A (bg) |
KR (1) | KR101506757B1 (bg) |
CN (1) | CN101855630A (bg) |
BG (1) | BG66255B1 (bg) |
CA (1) | CA2705345A1 (bg) |
EA (1) | EA201070614A1 (bg) |
WO (1) | WO2009062271A1 (bg) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11928435B2 (en) | 2020-03-19 | 2024-03-12 | Beijing Baidu Netcom Science Technology Co., Ltd. | Event extraction method, event extraction device, and electronic device |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013098701A1 (en) * | 2011-12-27 | 2013-07-04 | Koninklijke Philips Electronics N.V. | Text analysis system |
US10303769B2 (en) * | 2014-01-28 | 2019-05-28 | Somol Zorzin Gmbh | Method for automatically detecting meaning and measuring the univocality of text |
CN112861548B (zh) * | 2021-02-10 | 2023-06-23 | 百度在线网络技术(北京)有限公司 | 自然语言生成及模型的训练方法、装置、设备和存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6014680A (en) * | 1995-08-31 | 2000-01-11 | Hitachi, Ltd. | Method and apparatus for generating structured document |
WO2002027524A2 (en) * | 2000-09-29 | 2002-04-04 | Gavagai Technology Incorporated | A method and system for describing and identifying concepts in natural language text for information retrieval and processing |
US20020128816A1 (en) * | 1997-09-30 | 2002-09-12 | Haug Peter J. | Probabilistic system for natural language processing |
WO2006128238A1 (en) * | 2005-06-02 | 2006-12-07 | Newsouth Innovations Pty Limited | A method for summarising knowledge from a text |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1484217A (zh) * | 2003-07-11 | 2004-03-24 | 中国科学院声学研究所 | 层次分类与逻辑相结合的自然口语对话描述方法 |
-
2007
- 2007-11-14 BG BG10109996A patent/BG66255B1/bg unknown
-
2008
- 2008-11-12 CA CA2705345A patent/CA2705345A1/en not_active Abandoned
- 2008-11-12 KR KR1020107013115A patent/KR101506757B1/ko active IP Right Grant
- 2008-11-12 EA EA201070614A patent/EA201070614A1/ru unknown
- 2008-11-12 CN CN200880115885A patent/CN101855630A/zh active Pending
- 2008-11-12 EP EP08850309A patent/EP2220572A4/en not_active Withdrawn
- 2008-11-12 WO PCT/BG2008/000022 patent/WO2009062271A1/en active Application Filing
- 2008-11-12 JP JP2010533390A patent/JP2011503730A/ja active Pending
-
2014
- 2014-02-21 JP JP2014031296A patent/JP2014139799A/ja active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6014680A (en) * | 1995-08-31 | 2000-01-11 | Hitachi, Ltd. | Method and apparatus for generating structured document |
US20020128816A1 (en) * | 1997-09-30 | 2002-09-12 | Haug Peter J. | Probabilistic system for natural language processing |
WO2002027524A2 (en) * | 2000-09-29 | 2002-04-04 | Gavagai Technology Incorporated | A method and system for describing and identifying concepts in natural language text for information retrieval and processing |
WO2006128238A1 (en) * | 2005-06-02 | 2006-12-07 | Newsouth Innovations Pty Limited | A method for summarising knowledge from a text |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11928435B2 (en) | 2020-03-19 | 2024-03-12 | Beijing Baidu Netcom Science Technology Co., Ltd. | Event extraction method, event extraction device, and electronic device |
Also Published As
Publication number | Publication date |
---|---|
CN101855630A (zh) | 2010-10-06 |
KR20100108338A (ko) | 2010-10-06 |
EP2220572A1 (en) | 2010-08-25 |
EA201070614A1 (ru) | 2010-10-29 |
KR101506757B1 (ko) | 2015-03-27 |
CA2705345A1 (en) | 2009-05-22 |
BG66255B1 (en) | 2012-09-28 |
JP2014139799A (ja) | 2014-07-31 |
BG109996A (bg) | 2009-05-29 |
EP2220572A4 (en) | 2011-03-09 |
JP2011503730A (ja) | 2011-01-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20120101803A1 (en) | Formalization of a natural language | |
Khan et al. | A novel natural language processing (NLP)–based machine translation model for English to Pakistan sign language translation | |
Mandera et al. | Subtlex-pl: subtitle-based word frequency estimates for Polish | |
JPH1011447A (ja) | パターンに基づく翻訳方法及び翻訳システム | |
Romeral et al. | BOOTH-FRIENDLY TERM EXTRACTION METHODOLOGY BASED ON PARALLEL CORPORA FOR TRAINING MEDICAL INTERPRETERS. | |
Gotscharek et al. | Enabling information retrieval on historical document collections: the role of matching procedures and special lexica | |
EP2220572A1 (en) | Formalization of a natural language | |
Tufiş et al. | Methodological issues in building the Romanian Wordnet and consistency checks in Balkanet | |
Bhatti et al. | Phonetic-based sindhi spellchecker system using a hybrid model | |
Hamann et al. | Detailed mark‐up of semi‐monographic legacy taxonomic works using FlorML | |
Aman et al. | An automated detection of confusing variable pairs with highly similar compound names in Java and Python programs | |
Koeva et al. | Towards bulgarian wordnet | |
Luthfita et al. | Digitalizing a local language dictionary: Challenges and opportunities. | |
Dickinson | On morphological analysis for learner language, focusing on Russian | |
McGrane et al. | Is science lost in translation? Language effects in the international baccalaureate diploma programme science assessments | |
Chiarcos et al. | Creating and exploiting a resource of parallel parses | |
Lau et al. | Morphology in the Eurotra base level concept | |
Spinazzè | 'Cursus in clausula', an Online Analysis Tool of Latin Prose | |
Feng et al. | Mark-up-based writing error analysis model in an on-line classroom | |
Meurman-Solin | The manuscript-based diachronic corpus of Scottish correspondence | |
Mykowiecka et al. | Resources for Information Extraction from Polish texts | |
Mossop et al. | Computer aids to checking | |
He | Translating English Relative Clauses into Chinese: A Corpus-assisted Study | |
Lloret et al. | Applying Natural Language Processing Techniques to Generate Open Data Web APIs Documentation | |
Sandhan | Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200880115885.2 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08850309 Country of ref document: EP Kind code of ref document: A1 |
|
DPE2 | Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2705345 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010533390 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
DPE2 | Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2008850309 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 4217/DELNP/2010 Country of ref document: IN Ref document number: 201070614 Country of ref document: EA |
|
ENP | Entry into the national phase |
Ref document number: 20107013115 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: A201007428 Country of ref document: UA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 12740106 Country of ref document: US |