WO2016117920A1 - Knowledge represention expansion method and apparatus - Google Patents

Knowledge represention expansion method and apparatus Download PDF

Info

Publication number
WO2016117920A1
WO2016117920A1 PCT/KR2016/000579 KR2016000579W WO2016117920A1 WO 2016117920 A1 WO2016117920 A1 WO 2016117920A1 KR 2016000579 W KR2016000579 W KR 2016000579W WO 2016117920 A1 WO2016117920 A1 WO 2016117920A1
Authority
WO
WIPO (PCT)
Prior art keywords
predicate
knowledge
knowledge expression
text
expression
Prior art date
Application number
PCT/KR2016/000579
Other languages
French (fr)
Korean (ko)
Inventor
최기선
함영균
서지우
Original Assignee
한국과학기술원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020150139189A external-priority patent/KR101685053B1/en
Application filed by 한국과학기술원 filed Critical 한국과학기술원
Priority to US15/545,054 priority Critical patent/US20180144049A1/en
Publication of WO2016117920A1 publication Critical patent/WO2016117920A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data

Definitions

  • the present invention relates to a method and apparatus for extending knowledge representation.
  • the semantic web is a semantic web that expresses relationships between information and semantic information (Semanteme) in ontology that can be processed by a computer in a distributed environment such as the Internet.
  • semantic information Semanteme
  • many studies are being conducted to build an ontology-based knowledge database.
  • knowledge is written in natural language, and some studies have shown that more knowledge is contained in unstructured data than in structured databases. Therefore, researches for automatically generating instances of ontology schemas from unstructured data including natural language texts are being conducted to extend the knowledge database.
  • the Semantic Web must express the knowledge of the Web in a structured format that can be understood by a computer, that is, Resource Description Framework (RDF) triples.
  • RDF Resource Description Framework
  • the Semantic Web has properties that can fully describe various attributes of the knowledge elements. Ontology is required.
  • RDF Triple is an international standard governed by the World Wide Web Consortium (W3C). Its knowledge and information are subject (subject), predicate (property) and object (object (literal)). ] In the form of three pairs, where the property corresponds to the predicate of the RDF triple and the relationship between the subject and the object.
  • DBpedia the latest technology on the Semantic Web, is a knowledge database built automatically from Wikipedia, the encyclopedia of text.
  • Divipedia uses Divipedia Ontology, originated from Wikipedia's infobox, to express Wikipedia's knowledge.
  • D.B. ontologies may be sufficient to express Wikipedia's summarized knowledge, it is difficult to guarantee that all knowledge in Wikipedia's text can be expressed. Therefore, we need an ontology that can express various attributes of knowledge elements in natural language text, and we need a technology to expand knowledge by automatically building knowledge database based on this.
  • An object of the present invention is to extend a knowledge expression method and apparatus, and when the knowledge extracted from any text cannot be expressed as a knowledge expression language used in the knowledge expression ontology, a method for extending the knowledge expression using a semantic expression language. will be.
  • An apparatus for expanding knowledge expression comprising: a predicate-argument structure analyzer for extracting a predicate and at least one argument from text using a semantic expression language, a knowledge expression language that is a structured format that can be understood by a computer Extracts a second predicate corresponding to the first predicate extracted by the predicate-dissertation structure analysis unit from the ontology unit expressing the knowledge using and the similarity between the first predicate and the second predicate
  • the first expression includes a knowledge expression unit for representing the knowledge extracted from the text.
  • the knowledge expression unit may extract the second predicate related to the at least one argument from the ontology unit.
  • the knowledge expression unit extracts a first domain that is similar to a lexical type assigned to the at least one argument from domains of the knowledge expression language by more than a reference value, and is assigned to the at least one argument among the ranges of the knowledge expression language.
  • the first range similar to the lexical type and the reference value may be extracted, and the first domain and the predicate related to the first range may be extracted as the second predicate.
  • the knowledge expression unit may generate a string in which information related to any one of the first predicate and the at least one argument is combined, and add the string to the knowledge expression language of the ontology portion.
  • the knowledge expression language may be a language expressed in a resource description framework (RDF) ternary relationship.
  • RDF resource description framework
  • a method extends a knowledge expression, the method comprising: receiving text including at least one sentence, expressing the text as a first predicate and at least one argument based on a semantic expression language And extracting a second predicate corresponding to the first predicate, comparing the similarity between the first predicate and the second predicate, and, if the similarity is equal to or less than a reference value, from the text. Expressing the extracted knowledge using the first predicate.
  • the second predicate corresponding to the first predicate may be extracted from the knowledge expression ontology using the vocabulary type assigned to the at least one argument.
  • the knowledge expression ontology uses a knowledge expression language that expresses knowledge in a ternary relation of a subject, predicate, and object, and extracting a second predicate corresponding to the first predicate.
  • a predicate kit that is similar to the lexical type assigned to the at least one item among the subjects of the knowledge expression language or more than the reference value, and is similar to the lexical type assigned to the at least one item among the objects of the knowledge expression language. Can be extracted with the second predicate.
  • the expressing using the first predicate may generate a string in which information related to any one of the first predicate and the at least one argument is combined, and express the knowledge extracted from the text using the string.
  • the method may further include adding the character string to a knowledge expression language of the knowledge representation ontology.
  • An apparatus extends a knowledge expression, the method comprising: interpreting a predicate-argument structure of text, matching the predicate-argument structure of the text with a ternary relation of the knowledge expression language, and Adding the first predicate extracted from the predicate-dissertation structure of the text as a predicate of the knowledge expression language based on a matching similarity.
  • the adding of the knowledge expression language as a predicate may include extracting a second predicate matching the first predicate of the predicate-non-serial structure of the text from the ternary relation of the knowledge expression language, the first predicate and the second predicate. Comparing the similarity of the predicate, and if the similarity is less than the reference value, adding the first predicate to the knowledge expression language.
  • the method may further include expressing the text in a ternary relationship using the first predicate.
  • Matching the ternary relation of the knowledge expression language may match the predicate-nonserial structure of the text to the ternary relation based on the similarity between the domains and the range of the ternary relations extracted from the predicate-terminal structure of the text. can do.
  • the knowledge expression when the knowledge extracted from a text cannot be expressed as the knowledge expression language used in the knowledge expression ontology, the knowledge expression may be extended using the semantic expression language. That is, according to the embodiment of the present invention can solve the problem that the knowledge representation ontology does not have sufficient coverage when building the knowledge database from the web text.
  • the knowledge database can be expanded quickly and easily by expressing knowledge included in unstructured data such as natural language as a knowledge expression language in a computer understandable format based on sentence semantic predicate-dissertation structure.
  • the "relationship" ontology of the knowledge database can be expanded to increase knowledge expression power and can be applied to CGC (Collaboratively Generated Content) oriented knowledge forms and interpretations.
  • FIG. 1 is an illustration of a semantic expression language according to an embodiment of the present invention.
  • FIG. 2 is a block diagram of an apparatus for expanding knowledge representation according to an embodiment of the present invention.
  • FIG. 3 is an exemplary diagram illustrating a result of analyzing a predicate-dissertation structure according to an embodiment of the present invention.
  • FIG. 4 is an exemplary diagram illustrating a ternary relation knowledge expression structure according to an embodiment of the present invention.
  • FIG. 5 is a flowchart of a method of expanding an expression of knowledge according to an embodiment of the present invention.
  • FIG. 6 is a flowchart illustrating a method of extending knowledge representation according to an embodiment of the present invention.
  • FIG. 7 is a diagram illustrating a result of analyzing a predicate-dissertation structure of an example sentence according to an embodiment of the present invention.
  • FIG. 8 is a diagram illustrating a ternary relation knowledge expression structure of an example sentence according to an embodiment of the present invention.
  • the knowledge database stores structured information in the knowledge expression language.
  • Ontology represents knowledge in a structured format that can be understood by a computer.
  • the knowledge expression language may vary, but may be, for example, an RDF triple.
  • RDF triples represent a knowledge and information in the ternary relation of a subject (Subject (resource)), predicate (Predicate (property)), and object ((Object (literal)), where a predicate or property is a predicate.
  • FIG. 1 is an illustration of a semantic expression language according to an embodiment of the present invention.
  • the ontology of the knowledge database allows this (interferon) to express the type of "glycoprotein” as structured information (RDF).
  • RDF structured information
  • predicates such as “infected”, “generating”, “retarding”, “acting”, “produced”, “used in therapy” are important information, It is difficult to express them.
  • the present invention enhances the expressive power of knowledge using a semantic expression language.
  • the semantic expression language is a language for expressing the meaning of a sentence based on a relationship between a predicate (Property / Predicate) and an argument (Argument).
  • Predicate-argument structure refers to the relationship of arguments that a predicate requires in constructing a sentence. The number of arguments depends on the predicates. A predicate can require one essential argument to create a clause or sentence, and a predicate can require two or three arguments.
  • the semantic expression language can describe the causes, consequences, opinions, behaviors, and conditions for a particular entity that is difficult to express in the DIBIDI ontology.
  • the predicate-discussion structure may be extracted using FrameNet, but is not limited thereto.
  • Framenet is a language resource constructed by annotating how vocabulary is used in sentences in the form of semantic-frames.
  • a query statement may be expressed as a graph of a framenet structure of an RDF structure.
  • the query statement can be expressed in a predicate-discussion structure.
  • infected can be expressed as "Influence_of_event_on_cognizer” in Framenet
  • create can be expressed as “Creating” in Framenet
  • inhibiting It may be expressed as "Intercepting” of the framenet
  • “treat” may be expressed as "Cure” of the framenet.
  • FIG. 2 is a block diagram of an apparatus for expanding knowledge representation according to an embodiment of the present invention
  • FIG. 3 is an exemplary view illustrating a result of analyzing a predicate-nonsense structure according to an embodiment of the present invention
  • the knowledge expression expanding apparatus (hereinafter referred to as “device”) 100 may include a text input unit 110, a predicate-dissertation structure analysis unit 130, a knowledge expression ontology unit 150, and a knowledge expression unit ( 170).
  • the text input unit 110 receives text including at least one sentence.
  • the predicate-argument structure interpreter 130 divides the text into a predicate and at least one argument based on the semantic expression language.
  • a semantic expression language specifies at least one argument that must be present in any word of a sentence (eg, a word corresponding to a predicate), and expresses the meaning of the sentence using a predicate-dissertation structure.
  • the predicate-dissertation structure interpreter 130 finds a predicate (predicate.L) in the text, and finds at least one argument (item 1 to n) corresponding to the predicate.
  • the predicate-argument structure analyzer 130 may output lexical types T.1 to T.n of each argument.
  • the semantic expression language may be FrameNet.
  • the predicate-dissertation structure analyzer 130 identifies the frame target in the sentence and finds the frame element.
  • the frame object corresponds to the predicate of the sentence
  • the frame element corresponds to the argument related to the predicate.
  • the predicate-argument structure analysis unit 130 may output an annotation text on the framenet analysis result.
  • the knowledge representation ontology unit 150 expresses knowledge in a structured format that can be understood by a computer. To this end, the knowledge representation ontology unit 150 describes the attributes of the knowledge elements using the knowledge expression language.
  • the knowledge expression language may be a resource description framework (RDF), and knowledge is expressed as an RDF triple, that is, a ternary relationship ⁇ S, P, O>.
  • RDF resource description framework
  • the knowledge expression ontology unit 150 expresses the text in a predefined ternary relationship.
  • the knowledge expression language may be RDF, and may be expressed as ⁇ Domain (D), Predicate (Predikit), Range (Range, R)>.
  • the domain D is a class of the domain related to the predicate, and corresponds to the class of the subject in the ternary relationship.
  • the scope R is the class of the scope related to the predicate, which corresponds to the class of the object in the ternary relationship.
  • Divipedia Ontology can be read from the sentence ("Cheol was born in 1944 in Korea") from ⁇ People: “Pole”, dbo: birthPlace, Place: “South Korea”> and ⁇ People: "Pole”, dbo We can extract: birthDay, time: "1944"> in a ternary relation of knowledge expressions.
  • the knowledge expression unit 170 converts the predicate-dissertation structure of the text into the format of the knowledge expression ontology unit 150.
  • the knowledge expression unit 170 compares the similarity of the knowledge expressions and determines whether the knowledge interpreted by the predicate-dissertation structure analysis unit 130 can be expressed in the format of the knowledge expression ontology unit 150.
  • the knowledge expression unit 170 is the knowledge expression ontology unit 150 in the format of knowledge.
  • the knowledge expression unit 170 is interpreted by the predicate-argument structure analysis unit 130. Express knowledge using knowledge. Therefore, the knowledge expression unit 170 extracts knowledge from the text based on the semantic expression language when it is difficult to properly express the meaning of the text in a predefined ternary relationship. In addition, the knowledge expression unit 170 may transmit the attribute (corresponding to the ontology instance and the predicate) generated using the semantic expression language to the knowledge expression ontology unit 150. The knowledge expression ontology unit 150 may add information (ontology instances) generated using the semantic expression language to the knowledge expression language.
  • the knowledge expression extension apparatus 100 may extend the knowledge expression of the knowledge expression ontology using the semantic expression language.
  • FIG. 5 is a flowchart of a method of expanding an expression of knowledge according to an embodiment of the present invention.
  • the device 100 receives text including at least one sentence (S110).
  • the apparatus 100 expresses the text as a predicate and at least one argument based on the semantic expression language (S120).
  • the apparatus 100 searches for predicates (predicates.L) and predicates (items 1 to n) in the text as shown in FIG. 3.
  • the device 100 may output the lexical types T.1 to T.n of each argument.
  • the apparatus 100 extracts a predicate (predicate.K) corresponding to a predicate (predicate.L) extracted as a semantic expression language from the knowledge expression ontology (S130).
  • the device 100 matches the predicate-nonserial structure of the text into a ternary relationship of the knowledge expression language.
  • the device 100 is assigned to the domain D and the range R as shown in FIG. 4.
  • the device 100 may find a domain D and a range R that are the same or similar to the lexical type of the argument.
  • the apparatus 100 determines the similarity between the predicate (predicate.L) extracted as the semantic expression language and the predicate (predicate.K) of the knowledge expression language (S140). In this case, the apparatus 100 may determine the similarity between the predicate (predicate.L) extracted as the semantic expression language and the string combining the lexical type of the argument and the predicate (predicate.K) of the knowledge expression language.
  • Methods of determining similarity include: 1) similarity at the string level (2), similarity in word semantics (measurement of similarity using the concept hierarchy using language resources), and 3) measurement of word similarity based on corpus. There is a way. 1) In order to measure the similarity at the string level, there is a method of calculating the number of edits that a string takes to convert to a target string, and traditionally such as Levenshtein Distance. . 2) The similarity in word semantics is calculated by measuring the similarity between words in a hierarchical structure using a semantic lexical database such as WordNet.
  • the method of measuring the minimum distance between nodes in a WordNet hierarchy such as path similarity
  • the method of measuring the minimum distance and maximum depth between nodes such as Leacock & Chodorow similarity
  • the Wu & Palmer similarity there is a method of utilizing the depth of a node and the distance from the minimum upper node between nodes.
  • each word in the corpus is calculated to have a specific vector value in the dimensional space, thereby measuring the similarity between words in the similar vector space.
  • an approach using word embedding has been used.
  • the device 100 extracts knowledge from text using a knowledge expression language already stored (S150). Since the knowledge interpreted in the semantic expression language can be sufficiently represented in the format of the knowledge expression ontology, the apparatus 100 expresses the knowledge of the text in the format of the knowledge expression language. That is, since the apparatus 100 is similar to the predicate (predicate.L) extracted as the semantic expression language more than the reference value of the predicate (predicate.K) of the knowledge expression language, the format of the knowledge expression language does not need to be expanded. Judges that the input text can be represented sufficiently. Knowledge may be expressed as ⁇ a vocabulary corresponding to a domain (D), a predicate.K, a vocabulary corresponding to a range (R)>.
  • the apparatus 100 If not, the apparatus 100 generates a predicate including a predicate (predicate.L) extracted as a semantic expression language (S160).
  • predicate.L a predicate extracted as a semantic expression language
  • the apparatus 100 extracts knowledge from the text using the generated predicate (S170). That is, if the device 100 can express the text in the ternary relation existing in the knowledge expression ontology, the input device expresses the input text based on the stored knowledge expression ontology, and if the text cannot be expressed in the knowledge expression ontology, the input text is predicate-determined. Expressed in extended ternary relation using structure predicates. Knowledge is: vocabulary corresponding to domain (D), predicate.L, vocabulary corresponding to range (R)> or vocabulary corresponding to domain (D), predicate.L + vocabulary type corresponding to range (R), Vocabulary corresponding to the range (R)>.
  • the device 100 adds the generated predicate to the knowledge expression ontology (S180).
  • the generated predicate is added as a new knowledge representation instance.
  • FIG. 6 is a flowchart illustrating a knowledge expression extension method according to an embodiment of the present invention.
  • FIG. 7 is a view illustrating a result of analyzing a predicate-dissertation structure of an example sentence according to an embodiment of the present invention. Is a diagram illustrating a ternary relation knowledge expression structure of an example sentence according to an embodiment of the present invention.
  • the device 100 receives text (“Br. Was born in 1944 in Korea.”) (S210).
  • the apparatus 100 classifies text into predicates and arguments based on the semantic expression language as shown in FIG. 7. If the argument for the predicate ("born") is "Who", “when” or “where”, then the strings corresponding to the argument are “Abstract”, “Korea”, and “1944". When using a framenet, the frame target is "born” and the frame predicate class is "being_born”.
  • the frame arguments for the frame predicate class ("being_born) are defined as "Child”, “Place”, and “Time”, so the frame argument-string pairs are Child-Joe, Place-Korea, and Time-1944.
  • the vocabulary type for the argument is also determined, the vocabulary type of "Child” is “people”, the vocabulary type of "Place” is “place”, and the vocabulary type of "Time” is "time ( time) ".
  • the apparatus 100 compares the domain of the dispute with the ternary relation, and extracts a dispute that matches the domain of the ternary relation among the disputes (S230).
  • the device 100 may find a domain of ternary relation similar to the lexical type of the arguments.
  • the device 100 finds the domain / range related to the argument in order to convert the predicate-claim structure into a ternary relationship, which may first make a non-domain similarity measure.
  • the device 100 may determine that "people" of the lexical type of the argument is similar to "people" which is a domain of ternary relation.
  • the device 100 compares the range of the argument and the ternary relation, and extracts a dispute that matches the range of the ternary relation among the arguments (S240).
  • the device 100 may determine that "time" of the lexical type of the argument is similar to "Time” which is a range of ternary relations.
  • the apparatus 100 Since the apparatus 100 extracts the subject (domain) and the object (range) required by the ternary relation knowledge expression, the apparatus 100 extracts a predicate (predikit) related to the subject (domain) and the object (range) (S250). Referring to FIG. 8, the predicate (fredikit) related to the domain "people" and the range "Time” is "birthday”.
  • the apparatus 100 measures the similarity between the predicate ("being_born") of the semantic expression language and the predicate ("birthday") of the ternary relation (S260). At this time, the device 100 combines the predicate "being_born” with "time” which is a lexical type / related range of the related argument / related argument to generate a combined string ("being_bornTime”), and "being_bornTime” and “birthday”. "Can be compared.
  • the device 100 expresses the knowledge extracted from the text using the predicate (“birthday”) of the ternary relationship (S270).
  • the knowledge extracted from the text can be ⁇ Bill, birthday, 1994>, and "Bail” and "1994" can be URIs linked.
  • the apparatus 100 expresses the knowledge extracted from the text using the predicate "being_born" of the semantic expression language (S280). That is, since the device 100 currently defined in the knowledge expression language ("birthday") does not sufficiently express the meaning of the sentence, the apparatus 100 uses the predicate of the semantic expression language instead of the predicate of the ternary relation.
  • the newly generated predicate may be a string including "being_born", for example, "being_bornTime”.
  • the knowledge extracted from the text is expressed in an extended ternary relationship, and may be, for example, ⁇ Atract, being_born, 1994> or ⁇ Atract, being_bornTime, 1994>. "Withdrawal” and "1994" can be URIs linked.
  • the device 100 stores the new predicate as a predicate related to the domain "people" and the range "Time".
  • the new predicate is a string including "being_born", for example, may be "being_bornTime”.
  • the predicate currently defined in the knowledge expression language (“birthday”) contains time information similar to “1944”, but “1944” is the birth year, not “birthday”, so that it can express insufficient knowledge.
  • the device 100 may replace "being_born” or more specifically "being_bornTime” with a predicate instead of "birthday.”
  • the apparatus 100 may automatically extend the limited expressive power of the knowledge expression language using the semantic expression language, and thereby, may construct a knowledge expression language capable of extracting more accurate knowledge.
  • the device 100 may determine that "place” of the lexical type of the argument is similar to "Place” which is the range of the ternary relationship.
  • the predicate (fredkit) associated with the domain “people” and the scope “Place” is "birthplace”.
  • the apparatus 100 may extract knowledge by using "birthplace” as it is or by using a predicate extended to "being_bornPlace”.
  • the device 100 may extend knowledge representation power of ontology-based knowledge database as well as Divpedia.
  • the apparatus 100 may be ontology in a format in which a classification of a word of a sentence is designated, such as a framenet, and may be extended to a semantic expression language in which arguments related to a word are designated.
  • the knowledge expression when the knowledge extracted from any text cannot be expressed as the knowledge expression language used in the knowledge expression ontology, the knowledge expression may be extended using the semantic expression language. That is, according to the embodiment of the present invention can solve the problem that the knowledge representation ontology does not have sufficient coverage when building the knowledge database from the web text.
  • the knowledge database can be expanded quickly and easily by expressing knowledge included in unstructured data such as natural language as a knowledge expression language in a computer understandable format based on sentence semantic predicate-dissertation structure.
  • the "relationship" ontology of the knowledge database can be expanded to increase knowledge expression power and can be applied to CGC (Collaboratively Generated Content) oriented knowledge forms and interpretations.
  • the knowledge expression expansion apparatus 100 may store instructions for performing the knowledge expression expansion method described with reference to FIGS. 1 to 8, or may be stored in a memory or a memory for temporarily storing the instructions by loading the instructions from the storage device. And a processor for processing the knowledge representation extension method of the present invention by executing instructions, or loaded instructions. Instructions for performing the knowledge expression extension method described with reference to FIGS. 1 to 8 are implemented as a program that can be processed by a processor.
  • the embodiments of the present invention described above are not only implemented through the apparatus and the method, but may be implemented through a program for realizing a function corresponding to the configuration of the embodiments of the present invention or a recording medium on which the program is recorded.

Abstract

A knowledge representation expansion apparatus includes: a predicate-argument structure analyzing unit for extracting a predicate and at least one argument from a text using a meaning representation language; an ontology unit for representing knowledge using a knowledge representation language, which is a structured format understandable by a computer, and for extracting a second predicate corresponding to a first predicate, which is extracted from the predicate-argument structure analyzing unit; and a knowledge representation unit for representing knowledge extracted from the text using the first predicate, when the similarity of the first predicate and the second predicate is equal to or less than a threshold value.

Description

지식표현 확장 방법 및 장치Knowledge expression extension method and device
본 발명은 지식표현 확장 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for extending knowledge representation.
최근 시맨틱 웹(semantic web)과 빅데이터 기반으로 질의응답 시스템에 대한 연구가 활발하다. 시맨틱 웹은 인터넷과 같은 분산환경에서 정보들 사이의 관계와 의미 정보(Semanteme)를 컴퓨터가 처리할 수 있는 온톨로지로 표현하는 의미론적인 웹이다. 또한 온톨로지 기반 지식데이터베이스를 구축하는 많은 연구들이 진행되고 있다. 그러나 전통적으로 지식은 자연 언어로 작성되어 있으며, 특히나 몇몇 연구에 의하면 구조화된 데이터베이스보다 비구조 데이터에서 많은 지식이 포함되어 있다고 알려져 있다. 따라서 자연 언어 텍스트를 포함하는 비구조 데이터로부터 온톨로지 스키마의 인스턴스들을 자동으로 생성하는 연구들이 지식데이터베이스를 확장을 위해 진행되고 있다.Recently, research on question and answer system based on semantic web and big data is active. The semantic web is a semantic web that expresses relationships between information and semantic information (Semanteme) in ontology that can be processed by a computer in a distributed environment such as the Internet. In addition, many studies are being conducted to build an ontology-based knowledge database. Traditionally, however, knowledge is written in natural language, and some studies have shown that more knowledge is contained in unstructured data than in structured databases. Therefore, researches for automatically generating instances of ontology schemas from unstructured data including natural language texts are being conducted to extend the knowledge database.
특히, 시맨틱 웹은 웹의 지식을 컴퓨터가 이해할 수 있는 구조화된 포맷, 즉 RDF(Resource Description Framework) 트리플로 표현해야 하고, 이를 위해 지식 요소들의 다양한 속성들을 충분히 설명할 수 있는 프로퍼티(property)를 갖고 있는 온톨로지가 요구된다. RDF 트리플은 월드 와이드 웹 컨소시엄(World Wide Web Consortium, W3C)이 관장하는 국제 표준으로서, 지식과 정보를 서브젝트[Subject(resource)], 프레디키트[Predicate(property)], 오브젝트[(Object(literal)]의 세 쌍으로 나타내는 형식이다. 여기서, 프로퍼티는 RDF 트리플의 프레디키트에 해당하며, 서브젝트와 오브젝트 사이의 관계에 해당한다. In particular, the Semantic Web must express the knowledge of the Web in a structured format that can be understood by a computer, that is, Resource Description Framework (RDF) triples. For this purpose, the Semantic Web has properties that can fully describe various attributes of the knowledge elements. Ontology is required. RDF Triple is an international standard governed by the World Wide Web Consortium (W3C). Its knowledge and information are subject (subject), predicate (property) and object (object (literal)). ] In the form of three pairs, where the property corresponds to the predicate of the RDF triple and the relationship between the subject and the object.
시맨틱 웹의 최신 기술인 디비피디아(DBpedia)는 백과사전적 텍스트인 위키피디아로부터 자동 구축된 지식데이터베이스이다. 디비피디아는 위키피디아의 지식을 표현하기 위하여 위키피디아의 인포박스에서 기원한 디비피디아 온톨로지를 사용한다. 그러나, 디비피디아 온톨로지는 위키피디아의 요약된 지식을 표현하기에는 충분하다고 볼 수 있지만, 위키피디아 텍스트상의 모든 지식을 표현할 수 있다고 보장하기는 어렵다. 따라서 자연어 텍스트에서 나타난 지식 요소들의 다양한 속성들을 표현할 수 있는 온톨로지가 필요하고, 이를 기초로 자동으로 지식데이터베이스를 구축하여 지식을 확장하는 기술이 필요하다.DBpedia, the latest technology on the Semantic Web, is a knowledge database built automatically from Wikipedia, the encyclopedia of text. Divipedia uses Divipedia Ontology, originated from Wikipedia's infobox, to express Wikipedia's knowledge. However, while D.B. ontologies may be sufficient to express Wikipedia's summarized knowledge, it is difficult to guarantee that all knowledge in Wikipedia's text can be expressed. Therefore, we need an ontology that can express various attributes of knowledge elements in natural language text, and we need a technology to expand knowledge by automatically building knowledge database based on this.
본 발명이 해결하려는 과제는 지식표현 확장 방법 및 장치로서, 지식표현 온톨로지에서 사용 중인 지식표현언어로 어느 텍스트로부터 추출한 지식을 표현할 수 없는 경우, 의미표현언어를 이용하여 지식표현을 확장하는 방법에 관한 것이다. An object of the present invention is to extend a knowledge expression method and apparatus, and when the knowledge extracted from any text cannot be expressed as a knowledge expression language used in the knowledge expression ontology, a method for extending the knowledge expression using a semantic expression language. will be.
본 발명의 한 실시예에 따른 지식표현 확장 장치로서, 의미표현언어를 이용하여 텍스트에서 술어와 적어도 하나의 논항을 추출하는 술어-논항 구조 해석부, 컴퓨터가 이해할 수 있는 구조화된 포맷인 지식표현언어를 이용하여 지식을 표현하는 온톨로지부, 그리고 상기 온톨로지부에서 상기 술어-논항 구조 해석부에서 추출된 제1술어에 대응하는 제2술어를 추출하고, 상기 제1술어와 상기 제2술어의 유사도가 기준값 이하인 경우, 상기 제1술어를 이용하여 상기 텍스트로부터 추출된 지식을 표현하는 지식 표현부를 포함한다.An apparatus for expanding knowledge expression according to an embodiment of the present invention, comprising: a predicate-argument structure analyzer for extracting a predicate and at least one argument from text using a semantic expression language, a knowledge expression language that is a structured format that can be understood by a computer Extracts a second predicate corresponding to the first predicate extracted by the predicate-dissertation structure analysis unit from the ontology unit expressing the knowledge using and the similarity between the first predicate and the second predicate When the reference value is less than or equal to, the first expression includes a knowledge expression unit for representing the knowledge extracted from the text.
상기 지식 표현부는 상기 온톨로지부에서 상기 적어도 하나의 논항에 관계된 상기 제2술어를 추출할 수 있다.The knowledge expression unit may extract the second predicate related to the at least one argument from the ontology unit.
상기 지식 표현부는 상기 지식표현언어의 도메인들 중에서 상기 적어도 하나의 논항에 부여된 어휘 타입과 기준값이상으로 유사한 제1도메인을 추출하고, 상기 지식표현언어의 범위들 중에서 상기 적어도 하나의 논항에 부여된 어휘 타입과 기준값이상으로 유사한 제1범위를 추출하며, 상기 제1도메인과 상기 제1범위에 관련된 술어를 상기 제2술어로 추출할 수 있다.The knowledge expression unit extracts a first domain that is similar to a lexical type assigned to the at least one argument from domains of the knowledge expression language by more than a reference value, and is assigned to the at least one argument among the ranges of the knowledge expression language. The first range similar to the lexical type and the reference value may be extracted, and the first domain and the predicate related to the first range may be extracted as the second predicate.
상기 지식 표현부는 상기 제1술어와 상기 적어도 하나의 논항 중 임의 논항에 관련된 정보가 결합된 문자열을 생성하고, 상기 문자열을 상기 온톨로지부의 지식표현언어에 추가할 수 있다.The knowledge expression unit may generate a string in which information related to any one of the first predicate and the at least one argument is combined, and add the string to the knowledge expression language of the ontology portion.
상기 지식표현언어는 RDF(Resource Description Framework) 삼항 관계로 표현되는 언어일 수 있다.The knowledge expression language may be a language expressed in a resource description framework (RDF) ternary relationship.
본 발명의 다른 실시예에 따른 장치가 지식표현을 확장하는 방법으로서, 적어도 하나의 문장을 포함하는 텍스트를 입력받는 단계, 의미표현언어를 기초로 상기 텍스트를 제1술어와 적어도 하나의 논항으로 표현하는 단계, 지식표현 온톨로지에서, 상기 제1술어에 대응하는 제2술어를 추출하는 단계, 상기 제1술어와 상기 제2술어의 유사도를 비교하는 단계, 그리고 상기 유사도가 기준값 이하인 경우, 상기 텍스트로부터 추출된 지식을 상기 제1술어를 이용하여 표현하는 단계를 포함한다.A method according to another embodiment of the present invention extends a knowledge expression, the method comprising: receiving text including at least one sentence, expressing the text as a first predicate and at least one argument based on a semantic expression language And extracting a second predicate corresponding to the first predicate, comparing the similarity between the first predicate and the second predicate, and, if the similarity is equal to or less than a reference value, from the text. Expressing the extracted knowledge using the first predicate.
상기 제1술어에 대응하는 제2술어를 추출하는 단계는 상기 적어도 하나의 논항에 부여된 어휘 타입을 이용하여 상기 지식표현 온톨로지에서 상기 제1술어에 대응하는 상기 제2술어를 추출할 수 있다.In the extracting of the second predicate corresponding to the first predicate, the second predicate corresponding to the first predicate may be extracted from the knowledge expression ontology using the vocabulary type assigned to the at least one argument.
상기 지식표현 온톨로지는 지식을 서브젝트(subject), 프레디키트(Predicate), 오브젝트(object)의 삼항 관계로 표현하는 지식표현언어를 사용하고, 상기 제1술어에 대응하는 제2술어를 추출하는 단계는 상기 지식표현언어의 서브젝트들 중에서 상기 적어도 하나의 논항에 부여된 어휘 타입과 기준값이상으로 유사하고, 상기 지식표현언어의 오브젝트들 중에서 상기 적어도 하나의 논항에 부여된 어휘 타입과 기준값이상으로 유사한 프레디키트를 상기 제2술어로 추출할 수 있다.The knowledge expression ontology uses a knowledge expression language that expresses knowledge in a ternary relation of a subject, predicate, and object, and extracting a second predicate corresponding to the first predicate. A predicate kit that is similar to the lexical type assigned to the at least one item among the subjects of the knowledge expression language or more than the reference value, and is similar to the lexical type assigned to the at least one item among the objects of the knowledge expression language. Can be extracted with the second predicate.
상기 제1술어를 이용하여 표현하는 단계는 상기 제1술어와 상기 적어도 하나의 논항 중 임의 논항에 관련된 정보가 결합된 문자열을 생성하고, 상기 문자열을 이용하여 상기 텍스트로부터 추출한 지식을 표현할 수 있다.The expressing using the first predicate may generate a string in which information related to any one of the first predicate and the at least one argument is combined, and express the knowledge extracted from the text using the string.
상기 방법은 상기 문자열을 상기 지식표현 온톨로지의 지식표현언어에 추가하는 단계를 더 포함할 수 있다.The method may further include adding the character string to a knowledge expression language of the knowledge representation ontology.
본 발명의 또 다른 실시예에 따른 장치가 지식표현을 확장하는 방법으로서, 텍스트의 술어-논항 구조를 해석하는 단계, 상기 텍스트의 술어-논항 구조를 지식표현언어의 삼항 관계로 매칭하는 단계, 그리고 매칭 유사도를 기초로 상기 텍스트의 술어-논항 구조에서 추출된 제1술어를 상기 지식표현언어의 술어로 추가하는 단계를 포함한다.An apparatus according to another embodiment of the present invention extends a knowledge expression, the method comprising: interpreting a predicate-argument structure of text, matching the predicate-argument structure of the text with a ternary relation of the knowledge expression language, and Adding the first predicate extracted from the predicate-dissertation structure of the text as a predicate of the knowledge expression language based on a matching similarity.
상기 지식표현언어의 술어로 추가하는 단계는 상기 지식표현언어의 삼항 관계에서, 상기 텍스트의 술어-논항 구조의 제1술어에 매칭된 제2술어를 추출하는 단계, 상기 제1술어와 상기 제2술어의 유사도를 비교하는 단계, 그리고 상기 유사도가 기준값 이하인 경우, 상기 제1술어를 상기 지식표현언어에 추가하는 단계를 포함할 수 있다.The adding of the knowledge expression language as a predicate may include extracting a second predicate matching the first predicate of the predicate-non-serial structure of the text from the ternary relation of the knowledge expression language, the first predicate and the second predicate. Comparing the similarity of the predicate, and if the similarity is less than the reference value, adding the first predicate to the knowledge expression language.
상기 방법은 상기 제1술어를 이용하여 상기 텍스트를 삼항 관계로 표현하는 단계를 더 포함할 수 있다.The method may further include expressing the text in a ternary relationship using the first predicate.
상기 지식표현언어의 삼항 관계로 매칭하는 단계는 상기 텍스트의 술어-논항 구조에서 추출된 논항들과 상기 삼항 관계의 도메인 및 범위의 유사도를 기초로 상기 텍스트의 술어-논항 구조를 상기 삼항 관계로 매칭할 수 있다.Matching the ternary relation of the knowledge expression language may match the predicate-nonserial structure of the text to the ternary relation based on the similarity between the domains and the range of the ternary relations extracted from the predicate-terminal structure of the text. can do.
본 발명의 실시예에 따르면 지식표현 온톨로지에서 사용 중인 지식표현언어로 어느 텍스트로부터 추출한 지식을 표현할 수 없는 경우, 의미표현언어를 이용하여 지식표현을 확장할 수 있다. 즉, 본 발명에 실시예에 따르면 지식표현 온톨로지가 웹 텍스트로부터 지식데이터베이스를 구축할 때 충분한 커버리지를 갖지 못하는 문제를 해결할 수 있다. According to an embodiment of the present invention, when the knowledge extracted from a text cannot be expressed as the knowledge expression language used in the knowledge expression ontology, the knowledge expression may be extended using the semantic expression language. That is, according to the embodiment of the present invention can solve the problem that the knowledge representation ontology does not have sufficient coverage when building the knowledge database from the web text.
본 발명의 실시예에 따르면 문장 의미 술어-논항 구조 기반으로 자연 언어와 같은 비구조 데이터에 포함된 지식을 컴퓨터가 이해할 수 있는 포맷의 지식표현언어로 표현하여 지식데이터베이스를 빠르고 쉽게 확장할 수 있다.According to an embodiment of the present invention, the knowledge database can be expanded quickly and easily by expressing knowledge included in unstructured data such as natural language as a knowledge expression language in a computer understandable format based on sentence semantic predicate-dissertation structure.
본 발명의 실시예에 따르면 지식데이터베이스의 "관계" 온톨로지가 확충되어, 지식표현력을 높일 수 있고, CGC(Collaboratively Generated Content) 지향 지식 형태 및 해석에 적용될 수 있다.According to an embodiment of the present invention, the "relationship" ontology of the knowledge database can be expanded to increase knowledge expression power and can be applied to CGC (Collaboratively Generated Content) oriented knowledge forms and interpretations.
도 1은 본 발명의 한 실시예에 따른 의미표현언어의 예시이다.1 is an illustration of a semantic expression language according to an embodiment of the present invention.
도 2는 본 발명의 한 실시예에 따른 지식표현 확장 장치의 블록도이다. 2 is a block diagram of an apparatus for expanding knowledge representation according to an embodiment of the present invention.
도 3은 본 발명의 한 실시예에 따른 술어-논항 구조를 해석한 결과를 설명하는 예시 도면이다.FIG. 3 is an exemplary diagram illustrating a result of analyzing a predicate-dissertation structure according to an embodiment of the present invention. FIG.
도 4는 본 발명의 한 실시예에 따른 삼항 관계 지식표현 구조를 설명하는 예시 도면이다.4 is an exemplary diagram illustrating a ternary relation knowledge expression structure according to an embodiment of the present invention.
도 5는 본 발명의 한 실시예에 따른 지식표현 확장 방법의 흐름도이다.5 is a flowchart of a method of expanding an expression of knowledge according to an embodiment of the present invention.
도 6은 본 발명의 한 실시예에 따른 지식표현 확장 방법을 예시하는 흐름도이다.6 is a flowchart illustrating a method of extending knowledge representation according to an embodiment of the present invention.
도 7은 본 발명의 한 실시예에 따른 예시문의 술어-논항 구조를 해석한 결과를 설명하는 도면이다.7 is a diagram illustrating a result of analyzing a predicate-dissertation structure of an example sentence according to an embodiment of the present invention.
도 8은 본 발명의 한 실시예에 따른 예시문의 삼항 관계 지식 표현 구조를 설명하는 도면이다.8 is a diagram illustrating a ternary relation knowledge expression structure of an example sentence according to an embodiment of the present invention.
아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.
명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part is said to "include" a certain component, it means that it can further include other components, without excluding other components unless specifically stated otherwise.
지식데이터베이스는 지식표현언어로 구조화된 정보를 저장한다. 온톨로지(ontology)는 컴퓨터가 이해할 수 있는 구조화된 포맷으로 지식을 표현한다. 지식표현언어는 다양할 수 있으나, 예를 들면 RDF 트리플일 수 있다. RDF 트리플은 지식과 정보를 서브젝트[Subject(resource)], 프레디키트[Predicate(property)], 오브젝트[(Object(literal)]의 삼항 관계로 나타내는 형식이다. RDF 트리플 중 프레디키트 또는 프로퍼티는 술어로서, 주어(서브젝트) 자리에 있는 개체(entity)와 목적어(오브젝트) 자리에 있는 개체 또는 값(value) 사이의 관계(relationship)/속성(property)을 나타낸다.The knowledge database stores structured information in the knowledge expression language. Ontology represents knowledge in a structured format that can be understood by a computer. The knowledge expression language may vary, but may be, for example, an RDF triple. RDF triples represent a knowledge and information in the ternary relation of a subject (Subject (resource)), predicate (Predicate (property)), and object ((Object (literal)), where a predicate or property is a predicate. , Represents the relationship / property between the entity at the subject (object) and the entity or value at the object (object).
이렇게 온톨로지는 구조화된 정보에 국한하므로, 구조화되지 않은 지식원(knowledge source)에서 추출된 지식을 표현하기 어려운 한계가 있다. 특히, 링크드 데이터(linked data)의 중심인 디비피디아에 대한 온톨로지의 커버리지 계산을 통해 텍스트로부터 충분한 지식을 추출할 수 있는지 검토한 결과, 구조화되지 않은 텍스트를 지식원으로 하여 새로운 지식을 추출할 때 표현력이 제한되는 것을 알 수 있다. Since the ontology is limited to structured information, it is difficult to express knowledge extracted from an unstructured knowledge source. In particular, we examined whether sufficient knowledge can be extracted from the text by calculating the ontology of Divipedia, which is the center of linked data, and when expressing new knowledge using unstructured text as a knowledge source. It can be seen that this is limited.
다음에서 의미표현언어를 기초로 지식표현을 확장하는 방법에 대해 설명한다. 즉, 텍스트로부터 추출된 지식을 현재의 지식표현언어로 표현할 수 없는 경우, 새로운 온톨로지 인스턴스를 생성하여 지식표현을 확장하는 방법에 대해 설명한다.The following describes how to extend knowledge expression based on semantic expression language. That is, when the knowledge extracted from the text cannot be expressed in the current knowledge expression language, a method of extending the knowledge expression by creating a new ontology instance will be described.
도 1은 본 발명의 한 실시예에 따른 의미표현언어의 예시이다.1 is an illustration of a semantic expression language according to an embodiment of the present invention.
도 1을 참고하면, 다음과 같은 질의문을 예로 들어 설명한다. 질의문의 "이것"은 "인터페론(interferon)"이다.Referring to Figure 1, it will be described taking the following query example as an example. "This" in the query is "interferon".
질의문: 이것은 바이러스에 감염된 동물 세포가 생성하는 당단백질이다. 바이러스의 감염과 증식을 저지하는 작용을 한다. 유전공학의 발달로 대량 생산되며, B형 감염이나 헤르페스(포진) 따위의 바이러스 질병 치료에 쓰인다. Query: This is a glycoprotein to an animal infected with a virus, the cells produced. It acts as a deterrent to the infection and proliferation of viruses. It is mass-produced with the development of genetic engineering and is used to treat viral diseases such as type B infection and herpes (herpes).
정답: 인터페론Answer: Interferon
지식데이터베이스의 온톨로지는 이것(인터페론)이 "당단백질(glycoprotein)"이라는 타입(type)을 구조화된 정보(RDF)로 표현할 수 있다. 하지만, 구조화되지 않은 질의문에서 "감염된", "생성하는", "저지하는", "작용을 한다", "대량 생산되며", "치료에 쓰인다" 등의 술어가 중요한 정보이지만 지식표현언어로 이들을 표현하기 어렵다. The ontology of the knowledge database allows this (interferon) to express the type of "glycoprotein" as structured information (RDF). However, in unstructured queries, predicates such as "infected", "generating", "retarding", "acting", "produced", "used in therapy" are important information, It is difficult to express them.
본 발명은 의미표현언어를 이용하여 지식의 표현력을 높인다. 여기서, 의미표현언어는 술어(Property/Predicate)와 논항(Argument)의 관계를 기초로 문장의 의미를 표현하는 언어이다. 술어-논항 구조란 술어가 문장을 구성하면서 요구하는 논항들의 관계를 나타낸다. 논항의 수는 술어에 따라 결정된다. 어느 술어는 절이나 문장을 만들기 위해 한 개의 필수적인 논항을 요구할 수 있고, 어느 술어는 두 개 또는 세 개의 논항을 요구할 수 있다.The present invention enhances the expressive power of knowledge using a semantic expression language. Here, the semantic expression language is a language for expressing the meaning of a sentence based on a relationship between a predicate (Property / Predicate) and an argument (Argument). Predicate-argument structure refers to the relationship of arguments that a predicate requires in constructing a sentence. The number of arguments depends on the predicates. A predicate can require one essential argument to create a clause or sentence, and a predicate can require two or three arguments.
의미표현언어는 디비피디아 온톨로지로는 표현하기 어려운 특정 개체에 대한 원인, 결과, 의견, 행동, 상태 등에 대해 기술할 수 있다. 예를 들면, 술어-논항 구조는 프레임넷(FrameNet)을 이용해 추출될 수 있으나, 이에 한정되지 않는다. 프레임넷은 문장에서 어휘들이 어떻게 사용되는가를 시멘틱 프레임(Semantic-Frame)의 형태로 어노테이션하여 구축된 언어자원이다.The semantic expression language can describe the causes, consequences, opinions, behaviors, and conditions for a particular entity that is difficult to express in the DIBIDI ontology. For example, the predicate-discussion structure may be extracted using FrameNet, but is not limited thereto. Framenet is a language resource constructed by annotating how vocabulary is used in sentences in the form of semantic-frames.
도 1을 참고하면, 질의문은 RDF 구조(structure)의 프레임넷 구조 그래프로 표현될 수 있다. 이와 같이, 질의문은 술어-논항 구조로 표현될 수 있다. 예를 들어, "감염된"은 프레임넷의 "Influence_of_event_on_cognizer"로 표현될 수 있고, "생성하는(생성하다)"은 프레임넷의 "Creating"으로 표현될 수 있고, "저지하는(저지하다)"은 프레임넷의 "Intercepting"으로 표현될 수 있으며, "치료하다"는 프레임넷의 "Cure"으로 표현될 수 있다.Referring to FIG. 1, a query statement may be expressed as a graph of a framenet structure of an RDF structure. As such, the query statement can be expressed in a predicate-discussion structure. For example, "infected" can be expressed as "Influence_of_event_on_cognizer" in Framenet, "create" can be expressed as "Creating" in Framenet, and "inhibiting" It may be expressed as "Intercepting" of the framenet, and "treat" may be expressed as "Cure" of the framenet.
도 2는 본 발명의 한 실시예에 따른 지식표현 확장 장치의 블록도이고, 도 3은 본 발명의 한 실시예에 따른 술어-논항 구조를 해석한 결과를 설명하는 예시 도면이며, 도 4는 본 발명의 한 실시예에 따른 삼항 관계 지식표현 구조를 설명하는 예시 도면이다.FIG. 2 is a block diagram of an apparatus for expanding knowledge representation according to an embodiment of the present invention, FIG. 3 is an exemplary view illustrating a result of analyzing a predicate-nonsense structure according to an embodiment of the present invention, and FIG. An exemplary diagram illustrating a ternary relation knowledge expression structure according to an embodiment of the present invention.
도 2를 참고하면, 지식표현 확장 장치(앞으로 "장치"라고 한다)(100)는 텍스트 입력부(110), 술어-논항 구조 해석부(130), 지식표현 온톨로지부(150) 그리고 지식 표현부(170)를 포함한다. Referring to FIG. 2, the knowledge expression expanding apparatus (hereinafter referred to as “device”) 100 may include a text input unit 110, a predicate-dissertation structure analysis unit 130, a knowledge expression ontology unit 150, and a knowledge expression unit ( 170).
텍스트 입력부(110)는 적어도 하나의 문장을 포함하는 텍스트를 입력받는다. The text input unit 110 receives text including at least one sentence.
술어-논항 구조 해석부(130)는 의미표현언어를 기초로 텍스트를 술어와 적어도 하나의 논항으로 구분한다. 의미표현언어는 문장의 임의 단어(예를 들면, 술어에 해당하는 단어)에 반드시 있어야 하는 적어도 하나의 논항을 지정하고, 술어-논항 구조를 이용하여 문장의 의미를 표현한다. 도 3을 참고하면, 술어-논항 구조 해석부(130)는 텍스트에서 술어(술어.L)를 찾고, 술어에 해당하는 적어도 하나의 논항(논항1 ~ 논항n)을 찾는다. 이때, 술어-논항 구조 해석부(130)는 논항 각각의 어휘 타입(T.1 ~ T.n)을 출력할 수 있다. 예를 들어, 의미표현언어는 프레임넷(FrameNet)일 수 있다. 프레임넷을 이용하여 술어-논항 구조를 해석하는 경우, 술어-논항 구조 해석부(130)는 문장에서 프레임 대상(Frame target)을 파악하고 프레임 요소(Frame element)를 찾는다. 여기서, 프레임 대상이 문장의 술어에 해당하고, 프레임 요소가 술어에 관계된 논항에 해당한다. 술어-논항 구조 해석부(130)는 프레임넷 해석 결과에 대한 주석(annotation) 텍스트를 출력할 수 있다.The predicate-argument structure interpreter 130 divides the text into a predicate and at least one argument based on the semantic expression language. A semantic expression language specifies at least one argument that must be present in any word of a sentence (eg, a word corresponding to a predicate), and expresses the meaning of the sentence using a predicate-dissertation structure. Referring to FIG. 3, the predicate-dissertation structure interpreter 130 finds a predicate (predicate.L) in the text, and finds at least one argument (item 1 to n) corresponding to the predicate. At this time, the predicate-argument structure analyzer 130 may output lexical types T.1 to T.n of each argument. For example, the semantic expression language may be FrameNet. In the case of using the framenet to analyze the predicate-dissertation structure, the predicate-dissertation structure analyzer 130 identifies the frame target in the sentence and finds the frame element. Here, the frame object corresponds to the predicate of the sentence, and the frame element corresponds to the argument related to the predicate. The predicate-argument structure analysis unit 130 may output an annotation text on the framenet analysis result.
지식표현 온톨로지부(150)는 컴퓨터가 이해할 수 있는 구조화된 포맷으로 지식을 표현한다. 이를 위해, 지식표현 온톨로지부(150)는 지식표현언어를 이용하여 지식 요소들의 속성들을 설명한다. 예를 들면, 지식표현언어는 RDF(Resource Description Framework)일 수 있고, 지식은 RDF 트리플, 즉 삼항 관계<S,P,O>로 표현된다. 지식표현 온톨로지부(150)는 미리 정의된 삼항 관계로 텍스트를 표현한다. 도 4를 참고하면, 지식표현언어는 RDF일 수 있고, <도메인(Domain, D), 술어(프레디키트), 범위(Range, R)>로 표현될 수 있다. 여기서, 도메인(D)은 술어에 관계된 도메인의 클래스로서, 삼항 관계에서 서브젝트(Subject)의 클래스에 해당한다. 범위(R)는 술어에 관계된 범위의 클래스로서, 삼항 관계에서 오브젝트(object)의 클래스에 해당한다. 예를 들어, 디비피디아 온톨로지는 문장("철수는 한국에서 1944년에 태어났다.")으로부터 <사람:"철수", dbo:birthPlace, 장소:"한국">와 <사람:"철수", dbo:birthDay, 시간:"1944년">를 지식표현 삼항 관계로 추출할 수 있다.The knowledge representation ontology unit 150 expresses knowledge in a structured format that can be understood by a computer. To this end, the knowledge representation ontology unit 150 describes the attributes of the knowledge elements using the knowledge expression language. For example, the knowledge expression language may be a resource description framework (RDF), and knowledge is expressed as an RDF triple, that is, a ternary relationship <S, P, O>. The knowledge expression ontology unit 150 expresses the text in a predefined ternary relationship. Referring to FIG. 4, the knowledge expression language may be RDF, and may be expressed as <Domain (D), Predicate (Predikit), Range (Range, R)>. Here, the domain D is a class of the domain related to the predicate, and corresponds to the class of the subject in the ternary relationship. The scope R is the class of the scope related to the predicate, which corresponds to the class of the object in the ternary relationship. For example, Divipedia Ontology can be read from the sentence ("Cheol was born in 1944 in Korea") from <People: "Pole", dbo: birthPlace, Place: "South Korea"> and <People: "Pole", dbo We can extract: birthDay, time: "1944"> in a ternary relation of knowledge expressions.
지식 표현부(170)는 텍스트의 술어-논항 구조를 지식표현 온톨로지부(150)의 포맷으로 변환한다. 지식 표현부(170)는 지식표현의 유사성을 비교하여 술어-논항 구조 해석부(130)에서 해석된 지식이 지식표현 온톨로지부(150)의 포맷으로 표현될 수 있는지 판단한다. 술어-논항 구조 해석부(130)에서 해석된 지식이 지식표현 온톨로지부(150)의 포맷으로 충분히 표현 가능한 경우, 지식 표현부(170)는 지식표현 온톨로지부(150)의 포맷으로, 텍스트로부터 지식을 추출한다. 만약, 술어-논항 구조 해석부(130)에서 해석된 지식이 지식표현 온톨로지부(150)의 포맷으로 충분히 표현 불가능한 경우, 지식 표현부(170)는 술어-논항 구조 해석부(130)에서 해석된 지식을 이용하여 텍스트를 표현한다. 따라서, 지식 표현부(170)는 미리 정의된 삼항 관계로 텍스트의 의미를 제대로 표현하기 어려운 경우, 의미표현언어를 기초로 텍스트로부터 지식을 추출한다. 그리고, 지식 표현부(170)는 의미표현언어를 이용하여 생성한 속성(온톨로지 인스턴스, 술어에 해당함)을 지식표현 온톨로지부(150)에 전달할 수 있다. 지식표현 온톨로지부(150)는 의미표현언어를 이용하여 생성한 정보(온톨로지 인스턴스)를 지식표현언어에 추가할 수 있다. The knowledge expression unit 170 converts the predicate-dissertation structure of the text into the format of the knowledge expression ontology unit 150. The knowledge expression unit 170 compares the similarity of the knowledge expressions and determines whether the knowledge interpreted by the predicate-dissertation structure analysis unit 130 can be expressed in the format of the knowledge expression ontology unit 150. When the knowledge interpreted by the predicate-dissertation structure analysis unit 130 can be sufficiently represented in the format of the knowledge expression ontology unit 150, the knowledge expression unit 170 is the knowledge expression ontology unit 150 in the format of knowledge. Extract If the knowledge interpreted by the predicate-argument structure analysis unit 130 is not sufficiently represented in the format of the knowledge expression ontology unit 150, the knowledge expression unit 170 is interpreted by the predicate-argument structure analysis unit 130. Express knowledge using knowledge. Therefore, the knowledge expression unit 170 extracts knowledge from the text based on the semantic expression language when it is difficult to properly express the meaning of the text in a predefined ternary relationship. In addition, the knowledge expression unit 170 may transmit the attribute (corresponding to the ontology instance and the predicate) generated using the semantic expression language to the knowledge expression ontology unit 150. The knowledge expression ontology unit 150 may add information (ontology instances) generated using the semantic expression language to the knowledge expression language.
이와 같이, 지식표현 확장 장치(100)는 의미표현언어를 이용하여 지식표현 온톨로지의 지식표현을 확장할 수 있다.As such, the knowledge expression extension apparatus 100 may extend the knowledge expression of the knowledge expression ontology using the semantic expression language.
도 5는 본 발명의 한 실시예에 따른 지식표현 확장 방법의 흐름도이다.5 is a flowchart of a method of expanding an expression of knowledge according to an embodiment of the present invention.
도 5를 참고하면, 장치(100)는 적어도 하나의 문장을 포함하는 텍스트를 입력받는다(S110).Referring to FIG. 5, the device 100 receives text including at least one sentence (S110).
장치(100)는 의미표현언어를 기초로 텍스트를 술어와 적어도 하나의 논항으로 표현한다(S120). 장치(100)는 도 3과 같이 텍스트에서 술어(술어.L)와 술어의 논항(논항1 ~ 논항n)을 찾는다. 이때, 장치(100)는 논항 각각의 어휘 타입(T.1 ~ T.n)을 출력할 수 있다. The apparatus 100 expresses the text as a predicate and at least one argument based on the semantic expression language (S120). The apparatus 100 searches for predicates (predicates.L) and predicates (items 1 to n) in the text as shown in FIG. 3. In this case, the device 100 may output the lexical types T.1 to T.n of each argument.
장치(100)는 지식표현 온톨로지에서, 의미표현언어로 추출된 술어(술어.L)에 대응하는 술어(술어.K)를 추출한다(S130). 장치(100)는 텍스트의 술어-논항 구조를 지식표현언어의 삼항 관계로 매칭한다. 장치(100)는 술어-논항 구조 해석 결과에 따라, 삼항 관계 지식표현의 도메인(D)과 범위(R)에 해당하는 논항들이 확보되면, 도 4와 같이 도메인(D)과 범위(R)에 해당하는 술어(술어.K)를 추출할 수 있다. 장치(100)는 논항의 어휘 타입과 같거나 유사한 도메인(D)과 범위(R)를 찾을 수 있다.The apparatus 100 extracts a predicate (predicate.K) corresponding to a predicate (predicate.L) extracted as a semantic expression language from the knowledge expression ontology (S130). The device 100 matches the predicate-nonserial structure of the text into a ternary relationship of the knowledge expression language. When the arguments corresponding to the domain D and the range R of the ternary relation knowledge expression are secured according to the result of the predicate-dissertation structure analysis, the device 100 is assigned to the domain D and the range R as shown in FIG. 4. You can extract the corresponding predicate (predicate.K). The device 100 may find a domain D and a range R that are the same or similar to the lexical type of the argument.
장치(100)는 의미표현언어로 추출된 술어(술어.L)와 지식표현언어의 술어(술어.K)의 유사도를 판단한다(S140). 이때, 장치(100)는 의미표현언어로 추출된 술어(술어.L)와 논항의 어휘 타입이 결합된 문자열과, 지식표현언어의 술어(술어.K)의 유사도를 판단할 수 있다. The apparatus 100 determines the similarity between the predicate (predicate.L) extracted as the semantic expression language and the predicate (predicate.K) of the knowledge expression language (S140). In this case, the apparatus 100 may determine the similarity between the predicate (predicate.L) extracted as the semantic expression language and the string combining the lexical type of the argument and the predicate (predicate.K) of the knowledge expression language.
유사도를 판단하는 방법으로는, 1) 문자열 수준에서의 유사도 (Edit distance), 2) 단어 의미상의 유사도 (언어 자원을 사용한 개념 계층구조를 활용한 유사도 측정), 3) 코퍼스 기반의 단어 유사도 측정 등의 방법이 있다. 1) 문자열 수준에서의 유사도를 측정하기 위해서, 하나의 문자열이 대상 문자열로 변환하기 위해서 거치는 편집 작업의 수를 계산하는 방식이 있으며, 전통적으로는 레번슈타인 편집거리 (Levenshtein Distance) 와 같은 방법이 있다. 2) 단어 의미상의 유사도는 워드넷 (WordNet)과 같은 의미 어휘 데이터베이스를 사용하여, 어휘 사이의 유사도를 계층 구조 속에서 거리를 측정하여 계산한다. 전통적으로는 경로 유사도 (Path similarity) 와 같이 워드넷 계층 구조의 노드 사이의 최소 거리를 측정하는 방법, Leacock & Chodorow 유사도와 같이 노드 사이의 최소 거리 및 최대 깊이를 측정하는 방법, Wu & Palmer 유사도와 같이 노드의 깊이 및 노드간의 최소 상위 노드와의 거리를 활용하는 방법 등이 있다. 3)의 코퍼스 기반의 단어 유사도 측정의 경우, 코퍼스 내에서 각각의 단어가 차원 공간에서 특정 벡터값을 갖도록 계산하여, 유사한 벡터공간의 단어들간의 유사도를 측정하는 방법이다. 최근에는 워드 임베딩(Word embedding) 을 사용한 접근법이 사용되고 있다.Methods of determining similarity include: 1) similarity at the string level (2), similarity in word semantics (measurement of similarity using the concept hierarchy using language resources), and 3) measurement of word similarity based on corpus. There is a way. 1) In order to measure the similarity at the string level, there is a method of calculating the number of edits that a string takes to convert to a target string, and traditionally such as Levenshtein Distance. . 2) The similarity in word semantics is calculated by measuring the similarity between words in a hierarchical structure using a semantic lexical database such as WordNet. Traditionally, the method of measuring the minimum distance between nodes in a WordNet hierarchy, such as path similarity, the method of measuring the minimum distance and maximum depth between nodes, such as Leacock & Chodorow similarity, and the Wu & Palmer similarity Similarly, there is a method of utilizing the depth of a node and the distance from the minimum upper node between nodes. In the corpus-based word similarity measurement of 3), each word in the corpus is calculated to have a specific vector value in the dimensional space, thereby measuring the similarity between words in the similar vector space. Recently, an approach using word embedding has been used.
유사한 경우, 장치(100)는 이미 저장된 지식표현언어를 이용하여 텍스트로부터 지식을 추출한다(S150). 의미표현언어로 해석된 지식이 지식표현 온톨로지의 포맷으로 충분히 표현 가능하기 때문에, 장치(100)는 지식표현언어의 포맷으로 텍스트의 지식을 표현한다. 즉, 장치(100)는 의미표현언어로 추출된 술어(술어.L)가 지식표현언어의 술어(술어.K)와 기준값 이상으로 유사하므로, 지식표현을 확장할 필요 없이, 지식표현언어의 포맷으로 입력 텍스트를 충분히 표현할 수 있다고 판단한다. 지식은 <도메인(D)에 해당하는 어휘, 술어.K, 범위(R)에 해당하는 어휘>로 표현될 수 있다.In a similar case, the device 100 extracts knowledge from text using a knowledge expression language already stored (S150). Since the knowledge interpreted in the semantic expression language can be sufficiently represented in the format of the knowledge expression ontology, the apparatus 100 expresses the knowledge of the text in the format of the knowledge expression language. That is, since the apparatus 100 is similar to the predicate (predicate.L) extracted as the semantic expression language more than the reference value of the predicate (predicate.K) of the knowledge expression language, the format of the knowledge expression language does not need to be expanded. Judges that the input text can be represented sufficiently. Knowledge may be expressed as <a vocabulary corresponding to a domain (D), a predicate.K, a vocabulary corresponding to a range (R)>.
유사하지 않은 경우, 장치(100)는 의미표현언어로 추출된 술어(술어.L)를 포함하는 술어를 생성한다(S160).If not, the apparatus 100 generates a predicate including a predicate (predicate.L) extracted as a semantic expression language (S160).
장치(100)는 생성한 술어를 이용하여 텍스트로부터 지식을 추출한다(S170). 즉, 장치(100)는 텍스트를 지식표현 온톨로지에 존재하는 삼항 관계로 표현할 수 있으면, 저장된 지식표현 온톨로지 기반으로 입력 텍스트를 표현하고, 지식표현 온토롤지로 표현할 수 없는 경우, 입력 텍스트를 술어-논항 구조의 술어를 이용하여 확장된 삼항 관계로 표현한다. 지식은 <도메인(D)에 해당하는 어휘, 술어.L, 범위(R)에 해당하는 어휘> 또는 <도메인(D)에 해당하는 어휘, 술어.L + 범위(R)에 해당하는 어휘 타입, 범위(R)에 해당하는 어휘>로 표현될 수 있다.The apparatus 100 extracts knowledge from the text using the generated predicate (S170). That is, if the device 100 can express the text in the ternary relation existing in the knowledge expression ontology, the input device expresses the input text based on the stored knowledge expression ontology, and if the text cannot be expressed in the knowledge expression ontology, the input text is predicate-determined. Expressed in extended ternary relation using structure predicates. Knowledge is: vocabulary corresponding to domain (D), predicate.L, vocabulary corresponding to range (R)> or vocabulary corresponding to domain (D), predicate.L + vocabulary type corresponding to range (R), Vocabulary corresponding to the range (R)>.
장치(100)는 생성한 술어를 지식표현 온톨로지에 추가한다(S180). 생성한 술어는 새로운 지식표현 인스턴스로 추가된다.The device 100 adds the generated predicate to the knowledge expression ontology (S180). The generated predicate is added as a new knowledge representation instance.
다음에서, 예시문("철수는 한국에서 1944년에 태어났다.")에서 지식을 추출하는 방법을 예로 들어 설명한다. In the following, we will explain how to extract knowledge from the example sentence ("Cheol was born in 1944 in Korea").
도 6은 본 발명의 한 실시예에 따른 지식표현 확장 방법을 예시하는 흐름도이고, 도 7은 본 발명의 한 실시예에 따른 예시문의 술어-논항 구조를 해석한 결과를 설명하는 도면이며, 도 8은 본 발명의 한 실시예에 따른 예시문의 삼항 관계 지식 표현 구조를 설명하는 도면이다.FIG. 6 is a flowchart illustrating a knowledge expression extension method according to an embodiment of the present invention. FIG. 7 is a view illustrating a result of analyzing a predicate-dissertation structure of an example sentence according to an embodiment of the present invention. Is a diagram illustrating a ternary relation knowledge expression structure of an example sentence according to an embodiment of the present invention.
도 6을 참고하면, 장치(100)는 텍스트("철수는 한국에서 1944년에 태어났다.")를 입력받는다(S210).Referring to FIG. 6, the device 100 receives text (“Br. Was born in 1944 in Korea.”) (S210).
장치(100)는 도 7과 같이, 의미표현언어를 기초로 텍스트를 술어와 논항으로 분류한다(S220). 술어("태어났다")에 대한 논항이 "누가", "언제", "어디에서"인 경우, 논항에 해당하는 문자열은 "철수", "한국", "1944년"이다. 프레임넷을 이용하는 경우, 프레임 대상은 "태어났다"이고, 프레임 술어 클래스(Class)는 "being_born"이다. 프레임 술어 클래스("being_born")에 대한 프레임 논항은 "Child", "Place", "Time"으로 정해져 있으므로, 프레임 논항-문자열 쌍은 Child-철수, Place-한국, Time-1944년이다. 그리고 논항에 대한 어휘 타입도 정해져 있으며, "Child"의 어휘 타입은 "사람(people)"이고, "Place"의 어휘 타입은 "장소(place)"이며, "Time"의 어휘 타입은 "시간(time)"일 수 있다.The apparatus 100 classifies text into predicates and arguments based on the semantic expression language as shown in FIG. 7. If the argument for the predicate ("born") is "Who", "when" or "where", then the strings corresponding to the argument are "Abstract", "Korea", and "1944". When using a framenet, the frame target is "born" and the frame predicate class is "being_born". The frame arguments for the frame predicate class ("being_born") are defined as "Child", "Place", and "Time", so the frame argument-string pairs are Child-Joe, Place-Korea, and Time-1944. The vocabulary type for the argument is also determined, the vocabulary type of "Child" is "people", the vocabulary type of "Place" is "place", and the vocabulary type of "Time" is "time ( time) ".
장치(100)는 논항과 삼항 관계의 도메인을 비교하여, 논항들 중에서 삼항 관계의 도메인에 매칭되는 논항을 추출한다(S230). 장치(100)는 논항들의 어휘 타입과 유사한 삼항 관계의 도메인을 찾을 수 있다. 장치(100)는 술어-논항 구조를 삼항 관계로 변환하기 위해 논항에 관계된 도메인/범위를 찾는데, 논항-도메인 유사도 측정을 먼저 할 수 있다. 장치(100)는 논항의 어휘 타입 중 "사람"이 삼항 관계의 도메인인 "people"과 유사하다고 판단할 수 있다. The apparatus 100 compares the domain of the dispute with the ternary relation, and extracts a dispute that matches the domain of the ternary relation among the disputes (S230). The device 100 may find a domain of ternary relation similar to the lexical type of the arguments. The device 100 finds the domain / range related to the argument in order to convert the predicate-claim structure into a ternary relationship, which may first make a non-domain similarity measure. The device 100 may determine that "people" of the lexical type of the argument is similar to "people" which is a domain of ternary relation.
장치(100)는 논항과 삼항 관계의 범위를 비교하여, 논항들 중에서 삼항 관계의 범위에 매칭되는 논항을 추출한다(S240). 장치(100)는 논항의 어휘 타입 중 "시간"이 삼항 관계의 범위인 "Time"과 유사하다고 판단할 수 있다. The device 100 compares the range of the argument and the ternary relation, and extracts a dispute that matches the range of the ternary relation among the arguments (S240). The device 100 may determine that "time" of the lexical type of the argument is similar to "Time" which is a range of ternary relations.
장치(100)는 삼항 관계 지식표현에서 요구되는 서브젝트(도메인)와 오브젝트(범위)를 추출했으므로, 서브젝트(도메인)와 오브젝트(범위)에 관계된 술어(프레디키트)를 추출한다(S250). 도 8을 참고하면, 도메인 "people"과 범위 "Time"에 관계된 술어(프레디키트)는 "birthday"이다.Since the apparatus 100 extracts the subject (domain) and the object (range) required by the ternary relation knowledge expression, the apparatus 100 extracts a predicate (predikit) related to the subject (domain) and the object (range) (S250). Referring to FIG. 8, the predicate (fredikit) related to the domain "people" and the range "Time" is "birthday".
장치(100)는 의미표현언어의 술어("being_born")와 삼항 관계의 술어("birthday")의 유사도를 측정한다(S260). 이때, 장치(100)는 술어("being_born")에 관련 논항/관련 논항의 어휘타입/관련 범위인 "time"을 결합하여, 결합한 문자열("being_bornTime")을 생성하고, "being_bornTime"와 "birthday"를 비교할 수 있다.The apparatus 100 measures the similarity between the predicate ("being_born") of the semantic expression language and the predicate ("birthday") of the ternary relation (S260). At this time, the device 100 combines the predicate "being_born" with "time" which is a lexical type / related range of the related argument / related argument to generate a combined string ("being_bornTime"), and "being_bornTime" and "birthday". "Can be compared.
술어가 유사한 경우, 장치(100)는 삼항 관계의 술어("birthday")를 이용하여 텍스트로부터 추출된 지식을 표현한다(S270). 텍스트로부터 추출된 지식은 <철수, birthday, 1994년>일 수 있고, "철수"와 "1994년"은 URI가 링크될 수 있다.If the predicates are similar, the device 100 expresses the knowledge extracted from the text using the predicate (“birthday”) of the ternary relationship (S270). The knowledge extracted from the text can be <Bill, birthday, 1994>, and "Bail" and "1994" can be URIs linked.
술어가 유사하지 않은 경우, 장치(100)는 의미표현언어의 술어("being_born")를 이용하여 텍스트로부터 추출된 지식을 표현한다(S280). 즉, 장치(100)는 지식표현언어에서 현재 정의된 술어("birthday")가 문장의 의미를 충분히 표현하지 못하므로, 삼항 관계의 술어 대신, 의미표현언어의 술어를 이용한다. 여기서 새롭게 생성된 술어는 "being_born"을 포함하는 문자열일 수 있고, 예를 들면 "being_bornTime"일 수 있다. 텍스트로부터 추출된 지식은 확장된 삼항 관계로 표현되고, 예를 들면 <철수, being_born, 1994년> 또는 <철수, being_bornTime, 1994년>일 수 있다. "철수"와 "1994년"은 URI가 링크될 수 있다.If the predicates are not similar, the apparatus 100 expresses the knowledge extracted from the text using the predicate "being_born" of the semantic expression language (S280). That is, since the device 100 currently defined in the knowledge expression language ("birthday") does not sufficiently express the meaning of the sentence, the apparatus 100 uses the predicate of the semantic expression language instead of the predicate of the ternary relation. Herein, the newly generated predicate may be a string including "being_born", for example, "being_bornTime". The knowledge extracted from the text is expressed in an extended ternary relationship, and may be, for example, <Atract, being_born, 1994> or <Atract, being_bornTime, 1994>. "Withdrawal" and "1994" can be URIs linked.
장치(100)는 새로운 술어를 도메인 "people"과 범위 "Time"에 관계된 술어로 저장한다. 여기서 새로운 술어는 "being_born"을 포함하는 문자열이고, 예를 들면 "being_bornTime"일 수 있다. The device 100 stores the new predicate as a predicate related to the domain "people" and the range "Time". Here, the new predicate is a string including "being_born", for example, may be "being_bornTime".
지식표현언어에서 현재 정의된 술어("birthday")는 "1944년"과 유사한 시간 정보를 포함하나, "1944년"은 태어난 해일 뿐, "birthday"은 아니므로 불충분한 지식을 표현할 수 있다. 따라서, 장치(100)는 "birthday" 대신, "being_born" 또는 더 상세히 "being_bornTime"을 술어로 교체할 수 있다.The predicate currently defined in the knowledge expression language ("birthday") contains time information similar to "1944", but "1944" is the birth year, not "birthday", so that it can express insufficient knowledge. Thus, the device 100 may replace "being_born" or more specifically "being_bornTime" with a predicate instead of "birthday."
이와 같이, 장치(100)는 지식표현언어의 한정된 표현력을 의미표현언어를 이용하여 자동을 확장할 수 있고, 이를 통해, 좀더 정확한 지식을 추출할 수 있는 지식표현언어를 구축할 수 있다.As such, the apparatus 100 may automatically extend the limited expressive power of the knowledge expression language using the semantic expression language, and thereby, may construct a knowledge expression language capable of extracting more accurate knowledge.
한편, 장치(100)는 논항의 어휘 타입 중 "장소"가 삼항 관계의 범위인 "Place"와 유사하다고 판단할 수 있다. 도메인 "people"과 범위 "Place"에 관계된 술어(프레디키트)는 "birthplace"이다. 장치(100)는 위에서 설명한 방법과 같은 방법으로, "birthplace"를 그대로 이용하거나, "being_bornPlace" 등으로 확장된 술어를 이용하여 지식을 추출할 수 있다.On the other hand, the device 100 may determine that "place" of the lexical type of the argument is similar to "Place" which is the range of the ternary relationship. The predicate (fredkit) associated with the domain "people" and the scope "Place" is "birthplace". The apparatus 100 may extract knowledge by using "birthplace" as it is or by using a predicate extended to "being_bornPlace".
장치(100)는 디비피디아 뿐만 아니라, 온톨로지 기반의 지식데이터베이스의 지식표현력을 확장할 수 있다. 장치(100)는 프레임넷과 같이 문장의 어느 단어에 대한 분류가 지정된 포맷으로 온톨로지화 되어 있고, 단어에 관계된 논항들이 지정되어 있는 의미표현언어에 확장될 수 있다. The device 100 may extend knowledge representation power of ontology-based knowledge database as well as Divpedia. The apparatus 100 may be ontology in a format in which a classification of a word of a sentence is designated, such as a framenet, and may be extended to a semantic expression language in which arguments related to a word are designated.
이와 같이, 본 발명의 실시예에 따르면 지식표현 온톨로지에서 사용 중인 지식표현언어로 어느 텍스트로부터 추출한 지식을 표현할 수 없는 경우, 의미표현언어를 이용하여 지식표현을 확장할 수 있다. 즉, 본 발명에 실시예에 따르면 지식표현 온톨로지가 웹 텍스트로부터 지식데이터베이스를 구축할 때 충분한 커버리지를 갖지 못하는 문제를 해결할 수 있다. As described above, according to the exemplary embodiment of the present invention, when the knowledge extracted from any text cannot be expressed as the knowledge expression language used in the knowledge expression ontology, the knowledge expression may be extended using the semantic expression language. That is, according to the embodiment of the present invention can solve the problem that the knowledge representation ontology does not have sufficient coverage when building the knowledge database from the web text.
본 발명의 실시예에 따르면 문장 의미 술어-논항 구조 기반으로 자연 언어와 같은 비구조 데이터에 포함된 지식을 컴퓨터가 이해할 수 있는 포맷의 지식표현언어로 표현하여 지식데이터베이스를 빠르고 쉽게 확장할 수 있다.According to an embodiment of the present invention, the knowledge database can be expanded quickly and easily by expressing knowledge included in unstructured data such as natural language as a knowledge expression language in a computer understandable format based on sentence semantic predicate-dissertation structure.
본 발명의 실시예에 따르면 지식데이터베이스의 "관계" 온톨로지가 확충되어, 지식표현력을 높일 수 있고, CGC(Collaboratively Generated Content) 지향 지식 형태 및 해석에 적용될 수 있다.According to an embodiment of the present invention, the "relationship" ontology of the knowledge database can be expanded to increase knowledge expression power and can be applied to CGC (Collaboratively Generated Content) oriented knowledge forms and interpretations.
지식표현 확장 장치(100)는 도 1부터 도 8을 참고로 설명한 지식표현 확장 방법을 수행하기 위한 명령어(instructions)를 저장하고 있거나, 저장 장치로부터 명령어를 로드하여 일시 저장하는 메모리, 메모리에 저장되어 있거나 로드된 명령어를 실행하여 본 발명의 지식표현 확장 방법을 처리하는 프로세서, 그리고 통신장치를 포함한다. 도 1부터 도 8을 참고로 설명한 지식표현 확장 방법을 수행하기 위한 명령어(instructions)는 프로세서가 처리할 수 있는 프로그램으로 구현된다.The knowledge expression expansion apparatus 100 may store instructions for performing the knowledge expression expansion method described with reference to FIGS. 1 to 8, or may be stored in a memory or a memory for temporarily storing the instructions by loading the instructions from the storage device. And a processor for processing the knowledge representation extension method of the present invention by executing instructions, or loaded instructions. Instructions for performing the knowledge expression extension method described with reference to FIGS. 1 to 8 are implemented as a program that can be processed by a processor.
이상에서 설명한 본 발명의 실시예는 장치 및 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있다.The embodiments of the present invention described above are not only implemented through the apparatus and the method, but may be implemented through a program for realizing a function corresponding to the configuration of the embodiments of the present invention or a recording medium on which the program is recorded.
이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements of those skilled in the art using the basic concepts of the present invention defined in the following claims are also provided. It belongs to the scope of rights.

Claims (14)

  1. 지식표현 확장 장치로서,As a knowledge expression expansion device,
    의미표현언어를 이용하여 텍스트에서 술어와 적어도 하나의 논항을 추출하는 술어-논항 구조 해석부,A predicate-argument structure interpreter that extracts a predicate and at least one argument from text using a semantic expression language;
    컴퓨터가 이해할 수 있는 구조화된 포맷인 지식표현언어를 이용하여 지식을 표현하는 온톨로지부, 그리고An ontology branch that expresses knowledge using a knowledge expression language, which is a structure that the computer can understand, and
    상기 온톨로지부에서 상기 술어-논항 구조 해석부에서 추출된 제1술어에 대응하는 제2술어를 추출하고, 상기 제1술어와 상기 제2술어의 유사도가 기준값 이하인 경우, 상기 제1술어를 이용하여 상기 텍스트로부터 추출된 지식을 표현하는 지식 표현부Extracting a second predicate corresponding to the first predicate extracted by the predicate-non-serial structure analysis unit from the ontology unit and using the first predicate when the similarity between the first predicate and the second predicate is equal to or less than a reference value Knowledge expression unit for expressing knowledge extracted from the text
    를 포함하는 지식표현 확장 장치.Knowledge expression expansion device comprising a.
  2. 제1항에서,In claim 1,
    상기 지식 표현부는The knowledge expression unit
    상기 온톨로지부에서 상기 적어도 하나의 논항에 관계된 상기 제2술어를 추출하는 지식표현 확장 장치.And a knowledge expression extension device for extracting the second predicate related to the at least one argument from the ontology unit.
  3. 제2항에서,In claim 2,
    상기 지식 표현부는 The knowledge expression unit
    상기 지식표현언어의 도메인들 중에서 상기 적어도 하나의 논항에 부여된 어휘 타입과 기준값이상으로 유사한 제1도메인을 추출하고, 상기 지식표현언어의 범위들 중에서 상기 적어도 하나의 논항에 부여된 어휘 타입과 기준값이상으로 유사한 제1범위를 추출하며, 상기 제1도메인과 상기 제1범위에 관련된 술어를 상기 제2술어로 추출하는 지식표현 확장 장치.Extracting a first domain that is similar to the lexical type assigned to the at least one argument from the domains of the knowledge expression language, and having a reference value that is similar to or greater than the reference value; And a first range similar to the above, and extracting the first domain and a predicate related to the first range as the second predicate.
  4. 제3항에서,In claim 3,
    상기 지식 표현부는 The knowledge expression unit
    상기 제1술어와 상기 적어도 하나의 논항 중 임의 논항에 관련된 정보가 결합된 문자열을 생성하고, 상기 문자열을 상기 온톨로지부의 지식표현언어에 추가하는 지식표현 확장 장치.And generating a character string combining information related to any one of the first predicate and the at least one argument, and adding the character string to the knowledge expression language of the ontology part.
  5. 제1항에서,In claim 1,
    상기 지식표현언어는 RDF(Resource Description Framework) 삼항 관계로 표현되는 언어인 지식표현 확장 장치.The knowledge expression language is a knowledge expression extension device that is a language expressed in a resource description framework (RDF) ternary relationship.
  6. 장치가 지식표현을 확장하는 방법으로서,As a device extends knowledge representation,
    적어도 하나의 문장을 포함하는 텍스트를 입력받는 단계,Receiving text including at least one sentence,
    의미표현언어를 기초로 상기 텍스트를 제1술어와 적어도 하나의 논항으로 표현하는 단계,Expressing the text as a first predicate and at least one argument based on a semantic expression language,
    지식표현 온톨로지에서, 상기 제1술어에 대응하는 제2술어를 추출하는 단계,Extracting a second predicate corresponding to the first predicate from the knowledge expression ontology;
    상기 제1술어와 상기 제2술어의 유사도를 비교하는 단계, 그리고Comparing the similarity between the first predicate and the second predicate, and
    상기 유사도가 기준값 이하인 경우, 상기 텍스트로부터 추출된 지식을 상기 제1술어를 이용하여 표현하는 단계Expressing the knowledge extracted from the text using the first predicate when the similarity is equal to or less than a reference value
    를 포함하는 지식표현 확장 방법.Knowledge expression expansion method comprising a.
  7. 제6항에서,In claim 6,
    상기 제1술어에 대응하는 제2술어를 추출하는 단계는Extracting a second predicate corresponding to the first predicate
    상기 적어도 하나의 논항에 부여된 어휘 타입을 이용하여 상기 지식표현 온톨로지에서 상기 제1술어에 대응하는 상기 제2술어를 추출하는 지식표현 확장 방법.And extracting the second predicate corresponding to the first predicate from the knowledge representation ontology using the lexical type given to the at least one argument.
  8. 제6항에서,In claim 6,
    상기 지식표현 온톨로지는 지식을 서브젝트(subject), 프레디키트(Predicate), 오브젝트(object)의 삼항 관계로 표현하는 지식표현언어를 사용하고,The knowledge expression ontology uses a knowledge expression language that expresses knowledge in a ternary relation of a subject, a predicate, and an object.
    상기 제1술어에 대응하는 제2술어를 추출하는 단계는Extracting a second predicate corresponding to the first predicate
    상기 지식표현언어의 서브젝트들 중에서 상기 적어도 하나의 논항에 부여된 어휘 타입과 기준값이상으로 유사하고, 상기 지식표현언어의 오브젝트들 중에서 상기 적어도 하나의 논항에 부여된 어휘 타입과 기준값이상으로 유사한 프레디키트를 상기 제2술어로 추출하는 지식표현 확장 방법A predicate kit that is similar to the lexical type assigned to the at least one item among the subjects of the knowledge expression language or more than the reference value, and is similar to the lexical type assigned to the at least one item among the objects of the knowledge expression language. Knowledge expression extension method for extracting the expression as the second predicate
  9. 제6항에서,In claim 6,
    상기 제1술어를 이용하여 표현하는 단계는Expressing using the first predicate
    상기 제1술어와 상기 적어도 하나의 논항 중 임의 논항에 관련된 정보가 결합된 문자열을 생성하고, 상기 문자열을 이용하여 상기 텍스트로부터 추출한 지식을 표현하는 지식표현 확장 방법A knowledge expression extension method for generating a string in which information related to any one of the first predicate and the at least one argument is combined, and expressing knowledge extracted from the text using the string.
  10. 제9항에서,In claim 9,
    상기 문자열을 상기 지식표현 온톨로지의 지식표현언어에 추가하는 단계Adding the string to a knowledge expression language of the knowledge representation ontology
    를 더 포함하는 지식표현 확장 방법Knowledge expression extension method further including
  11. 장치가 지식표현을 확장하는 방법으로서,As a device extends knowledge representation,
    텍스트의 술어-논항 구조를 해석하는 단계,Interpreting the predicate-argument structure of the text,
    상기 텍스트의 술어-논항 구조를 지식표현언어의 삼항 관계로 매칭하는 단계, 그리고Matching the predicate-determination structure of the text to a ternary relation of the knowledge expression language, and
    매칭 유사도를 기초로 상기 텍스트의 술어-논항 구조에서 추출된 제1술어를 상기 지식표현언어의 술어로 추가하는 단계Adding a first predicate extracted from the predicate-dissertation structure of the text as a predicate of the knowledge expression language based on a matching similarity;
    를 포함하는 지식표현 확장 방법.Knowledge expression expansion method comprising a.
  12. 제11항에서,In claim 11,
    상기 지식표현언어의 술어로 추가하는 단계는 Adding as a predicate of the knowledge expression language
    상기 지식표현언어의 삼항 관계에서, 상기 텍스트의 술어-논항 구조의 제1술어에 매칭된 제2술어를 추출하는 단계,Extracting a second predicate matching the first predicate of the predicate-nonserial structure of the text from the ternary relation of the knowledge expression language;
    상기 제1술어와 상기 제2술어의 유사도를 비교하는 단계, 그리고Comparing the similarity between the first predicate and the second predicate, and
    상기 유사도가 기준값 이하인 경우, 상기 제1술어를 상기 지식표현언어에 추가하는 단계If the similarity is equal to or less than a reference value, adding the first predicate to the knowledge expression language.
    를 포함하는 지식표현 확장 방법.Knowledge expression expansion method comprising a.
  13. 제11항에서,In claim 11,
    상기 제1술어를 이용하여 상기 텍스트를 삼항 관계로 표현하는 단계Expressing the text in a ternary relationship using the first predicate
    를 더 포함하는 지식표현 확장 방법.Knowledge expression expansion method further comprising.
  14. 제11항에서,In claim 11,
    상기 지식표현언어의 삼항 관계로 매칭하는 단계는Matching with the ternary relation of the knowledge expression language
    상기 텍스트의 술어-논항 구조에서 추출된 논항들과 상기 삼항 관계의 도메인 및 범위의 유사도를 기초로 상기 텍스트의 술어-논항 구조를 상기 삼항 관계로 매칭하는 지식표현 확장 방법.And extending the predicate-non-argument structure of the text into the ternary relationship based on the similarity between the domains and the range of the ternary relationship extracted from the predicate-terminal structure of the text.
PCT/KR2016/000579 2015-01-20 2016-01-20 Knowledge represention expansion method and apparatus WO2016117920A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/545,054 US20180144049A1 (en) 2015-01-20 2016-01-20 Knowledge represention expansion method and apparatus

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR10-2015-0009518 2015-01-20
KR20150009518 2015-01-20
KR1020150139189A KR101685053B1 (en) 2015-01-20 2015-10-02 Method and apparatus for knowledge representation enrichment
KR10-2015-0139189 2015-10-02

Publications (1)

Publication Number Publication Date
WO2016117920A1 true WO2016117920A1 (en) 2016-07-28

Family

ID=56417382

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2016/000579 WO2016117920A1 (en) 2015-01-20 2016-01-20 Knowledge represention expansion method and apparatus

Country Status (1)

Country Link
WO (1) WO2016117920A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111552813A (en) * 2020-03-18 2020-08-18 国网浙江省电力有限公司 Power knowledge graph construction method based on power grid full-service data
WO2024011813A1 (en) * 2022-07-15 2024-01-18 山东海量信息技术研究院 Text expansion method and apparatus, device, and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040111408A1 (en) * 2001-01-18 2004-06-10 Science Applications International Corporation Method and system of ranking and clustering for document indexing and retrieval
US20110153673A1 (en) * 2007-10-10 2011-06-23 Raytheon Bbn Technologies Corp. Semantic matching using predicate-argument structure
US20110225167A1 (en) * 2010-03-15 2011-09-15 International Business Machines Corporation Method and system to store rdf data in a relational store
WO2011129481A1 (en) * 2010-04-16 2011-10-20 한국과학기술정보연구원 System and method for providing a question and answer service on the basis of an rdf search
US20130232143A1 (en) * 2012-03-02 2013-09-05 Xerox Corporation Efficient knowledge base system
KR20140052328A (en) * 2012-10-24 2014-05-07 에스케이텔레콤 주식회사 Apparatus and method for generating rdf-based sentence ontology

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040111408A1 (en) * 2001-01-18 2004-06-10 Science Applications International Corporation Method and system of ranking and clustering for document indexing and retrieval
US20110153673A1 (en) * 2007-10-10 2011-06-23 Raytheon Bbn Technologies Corp. Semantic matching using predicate-argument structure
US20110225167A1 (en) * 2010-03-15 2011-09-15 International Business Machines Corporation Method and system to store rdf data in a relational store
WO2011129481A1 (en) * 2010-04-16 2011-10-20 한국과학기술정보연구원 System and method for providing a question and answer service on the basis of an rdf search
US20130232143A1 (en) * 2012-03-02 2013-09-05 Xerox Corporation Efficient knowledge base system
KR20140052328A (en) * 2012-10-24 2014-05-07 에스케이텔레콤 주식회사 Apparatus and method for generating rdf-based sentence ontology

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111552813A (en) * 2020-03-18 2020-08-18 国网浙江省电力有限公司 Power knowledge graph construction method based on power grid full-service data
WO2024011813A1 (en) * 2022-07-15 2024-01-18 山东海量信息技术研究院 Text expansion method and apparatus, device, and medium

Similar Documents

Publication Publication Date Title
KR101107760B1 (en) System and Method for Intelligent Searching and Question-Answering
Satvat et al. Extractor: Extracting attack behavior from threat reports
Gupta et al. Part-of-speech tagging of program identifiers for improved text-based software engineering tools
Agarwal et al. Automatic extraction of social networks from literary text: A case study on alice in wonderland
US10120844B2 (en) Determining the likelihood that an input descriptor and associated text content match a target field using natural language processing techniques in preparation for an extract, transform and load process
US9053086B2 (en) Electronic document source ingestion for natural language processing systems
WO2021049706A1 (en) System and method for ensemble question answering
Qin et al. Automatic analysis and reasoning based on vulnerability knowledge graph
Chen et al. CN-Probase: a data-driven approach for large-scale Chinese taxonomy construction
WO2018101506A1 (en) Document multi-classification device and document multi-classification method for classifying one document into plurality of categories by using lexico-semantic pattern obtained by reconfiguring semantic category of words constituting sentence
Wang et al. Open relation extraction for chinese noun phrases
WO2016117920A1 (en) Knowledge represention expansion method and apparatus
Shang et al. A framework to construct knowledge base for cyber security
Rouces et al. Complex Schema Mapping and Linking Data: Beyond Binary Predicates.
KR101685053B1 (en) Method and apparatus for knowledge representation enrichment
WO2013172499A1 (en) Apparatus and method for extracting predicative concept expression of term in document
Weissenbacher et al. Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods
WO2021054512A1 (en) System and method for reinforcing knowledge base
Abebe et al. Supporting concept location through identifier parsing and ontology extraction
Sunitha et al. Automatic summarization of Malayalam documents using clause identification method
WO2017122904A1 (en) Open information extraction method and system for extracting reified ternary relationship
Nguyen et al. Systematic knowledge acquisition for question analysis
Govindapillai et al. An empirical study on Resource Description Framework reification for trustworthiness in knowledge graphs
Le et al. Using natural language tool to assist vprg automated extraction from textual vulnerability description
Ming et al. Resolving polysemy and pseudonymity in entity linking with comprehensive name and context modeling

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16740393

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 15545054

Country of ref document: US

122 Ep: pct application non-entry in european phase

Ref document number: 16740393

Country of ref document: EP

Kind code of ref document: A1