KR101086996B1

KR101086996B1 - Apparatus for generating ontology and method thereof

Info

Publication number: KR101086996B1
Application number: KR1020080107143A
Authority: KR
Inventors: 한동일; 최지영; 유치훈; 최호준
Original assignee: 주식회사 케이티
Priority date: 2008-10-30
Filing date: 2008-10-30
Publication date: 2011-11-29
Also published as: KR20100048122A

Abstract

본 발명은 온톨로지 생성 장치 및 그 방법에 관한 것으로, 인간의 개입 시간을 최소화하면서도 온톨로지 구축시간을 최대한 단축하고 온톨로지를 지속적으로 확장하기 위하여, 구문기반(syntactic-based)의 연관 정보(예 : 연관어 그래프)와 의미기반(semantic-based)의 온톨로지 유사체계(예 : 온톨로지) 간의 매칭을 통하여 의미기반 온톨로지를 생성하기 위한, 온톨로지 생성 장치 및 그 방법을 제공하고자 한다.The present invention relates to an ontology generating device and a method thereof, in order to minimize the time of human intervention and to minimize the ontology construction time and to continuously expand the ontology, syntactic-based association information (eg, related words) An ontology generating device and method for generating semantic based ontology through matching between a graph) and a semantic-based ontology-like system (eg, ontology) are provided.

이를 위하여, 본 발명은, 온톨로지 생성 방법에 있어서, 언어 자원의 특성을 반영하여 연관어 그래프의 용어와 온톨로지의 용어 간을 매칭시키는 언어기반 매칭 단계; 상기 언어기반 매칭 단계에서 매칭된 관계를 상황정보를 기반으로 매칭시키는 상황정보기반 매칭 단계; 및 상기 상황정보기반 매칭 단계에서 매칭된 관계를 토대로 연관어 그래프와 온톨로지 간의 연결(Link) 관계에 의미적 가중치를 부여하는 가중치 부여 단계를 포함한다.To this end, the present invention, in the ontology generation method, language-based matching step of matching between the term of the association graph and the term of the ontology by reflecting the characteristics of the language resources; A contextual information-based matching step of matching the matched relationship in the language-based matching step based on contextual information; And a weighting step of assigning a semantic weight to a link relationship between the association graph and the ontology based on the matched relationship in the contextual information-based matching step.

온톨로지 생성, 온톨로지 매칭, 연관어 그래프, 구문 관계, 연관 정보, 온톨로지 유사체계, 의미 관계, 의미기반 온톨로지 Ontology Generation, Ontology Matching, Association Graph, Syntax Relationship, Association Information, Ontology Similarity System, Semantic Relationship, Semantic Based Ontology

Description

Ontology generator and its method {APPARATUS FOR GENERATING ONTOLOGY AND METHOD THEREOF}

본 발명은 온톨로지 생성 장치 및 그 방법에 관한 것으로, 더욱 상세하게는 웹 문서상에 존재하는 용어들(terms) 사이의 구문적인 관계를 나타내는 연관 정보(예 : 연관어 그래프)와 온톨로지를 포함하는 온톨로지 유사체계(의미적 관계를 나타냄)를 매칭시켜 의미기반 온톨로지를 생성하기 위한, 온톨로지 생성 장치 및 그 방법에 관한 것이다.The present invention relates to an ontology generating apparatus and a method thereof, and more particularly, to ontology-like ontology including association information (e.g. association graph) representing a syntactic relationship between terms existing on a web document. The present invention relates to an ontology generating apparatus and a method for generating a semantic based ontology by matching a system (indicating a meaningful relationship).

이하의 본 발명의 일실시예에서는 일반 웹 문서에 포함된 용어인 경우를 예로 들어 설명하기로 하나, 본 발명이 이에 한정되는 것이 아님을 미리 밝혀둔다.In the following embodiment of the present invention will be described in the case of a term included in a general web document as an example, but the present invention is not limited thereto.

그리고 본 발명에서 온톨로지 유사체계란 온톨로지, 사전, 시소러스, 텍사노미, 또는 폭소노미 등을 말한다.In the present invention, an ontology-like system refers to an ontology, a dictionary, a thesaurus, a taxanomi, or a folksonomi.

그리고 이하의 본 발명의 일실시예에서는 연관 정보의 일예로 연관어 그래프를 예로 들어 설명하고, 온톨로지 유사체계의 일예로 온톨로지를 예로 들어 설명하 기로 하나, 본 발명이 이에 한정되는 것이 아님을 미리 밝혀둔다.In the following embodiments of the present invention, a related graph is described as an example of related information, and an ontology is described as an example of an ontology-like system, but the present invention is not limited thereto. Put it.

일반적으로, 온톨로지는 서로 다른 의미들 간의 공통되는 의미적 이해와 상호호환성(Interoperability)을 지원하기 위해 사용된다. 그리고 온톨로지 매칭 기술은, 웹과 같이 시맨틱 웹이 이기종(Heterogeneous) 분산(Distributed) 환경의 특성을 가지기 때문에 온톨로지를 생성하는데 필연적으로 수반되어야 하는 기술이다. 이렇게 생성되는 온톨로지 자체도 각각 이기종 환경을 포함하고 있다. 그러므로 온톨로지 매칭 방식은 서로 다른 온톨로지들 간의 의미적 관계성을 발견하는 방식으로 진행되고 있다. 그러나 현재까지는 온톨로지 구축 과정이 매우 어렵고, 각 분야별 전문가에 의존하여 온톨로지가 구축되고 있어, 소수의 온톨로지들 간의 매칭 방식만이 제안되고 있다.In general, ontology is used to support common semantic understanding and interoperability between different meanings. Ontology matching technology is a technology that must be involved in generating an ontology because the semantic web has a heterogeneous distributed environment like the web. The ontology itself generated in this way also contains heterogeneous environments. Therefore, the ontology matching scheme is proceeding as a way of discovering semantic relationships between different ontologies. However, until now, the ontology construction process is very difficult and ontologies are constructed depending on experts in each field, and only a matching method between a few ontologies has been proposed.

그에 따라, 온톨로지가 시맨틱 웹의 핵심 구성 요소임에도 불구하고, 현재까지도 상용화 서비스를 제공하기 위한 대규모 온톨로지를 구축하지 못하고 있다.Accordingly, even though the ontology is a core component of the semantic web, it has not been able to build a large scale ontology to provide a commercial service.

이를 도 1을 참조하여 좀 더 구체적으로 살펴보면 다음과 같다.This will be described in more detail with reference to FIG. 1 as follows.

도 1은 종래의 온톨로지들 간의 매칭 방식을 설명하기 위한 일실시예 설명도이다.1 is a diagram illustrating an example of a conventional matching method between ontologies.

도 1을 참조하여 살펴보면, 종래의 온톨로지들 간의 매칭 방식은 매칭의 대상이 반드시 온톨로지라는 가정 하에서 수행되는 매칭 방식이다.Referring to FIG. 1, a conventional matching method between ontologies is a matching method performed under the assumption that an object of matching is an ontology.

이러한 온톨로지들 간의 매칭 방식은 도메인의 동일성, 구조 변경 가능성, 및 새로운 온톨로지 생성 등의 요인에 따라서 언어 자원에 의한 기법, 확률적 추론 기법, 및 다양한 접근 방식의 통합 기법 등을 활용하여 온톨로지들 간의 매칭을 수행하는 방식이다.The matching method between ontologies is matched between ontologies by using language resources, stochastic reasoning, and integration of various approaches depending on factors such as domain identity, possibility of structural change, and creation of new ontology. Is the way to do it.

그러나 연관어 그래프로 구성된 관계는 온톨로지의 특성을 가지고 있지 않으며, 설사 온톨로지와 비슷한 구조를 가지고 있다고 하더라도 온톨로지들 간의 매칭 기법을 그대로 활용할 수 없는 상황이다.However, relations composed of associative graphs do not have the characteristics of ontologies, and even if they have a structure similar to ontologies, they cannot be used as a matching technique between ontologies.

따라서 현재 다양하게 생산되고 있는 수많은 웹 문서들이 시맨틱 웹에서 사용될 수 있으려면, 구문적인 관계를 포함하고 있는 연관어 그래프와 인간과 기계가 동시에 이해(Understand)할 수 있는 온톨로지를 매칭시켜 하나의 거대한 온톨로지로 생성하는 방안이 절실히 요구되고 있다.Therefore, if a large number of web documents are produced in various ways and can be used on the semantic web, one huge ontology is matched with an ontology that can be understood by both human and machine at the same time. There is an urgent need for a way to generate a solution.

전술한 바와 같이, 상기와 같은 종래 기술은 온톨로지들 간의 매칭 기법만이 제안되고 있으며, 그에 따라 대규모 온톨로지를 구축하여 시맨틱 웹의 다양한 애플리케이션을 개발하는데 소요되는 시간과 인력 소모가 과다하여 현실적으로 온톨로지를 구축하고 온톨로지에 지속적으로 생성되는 정보를 확장하기 어려운 문제점이 있으며, 이러한 문제점을 해결하고 상기 요구에 부응하고자 하는 것이 본 발명의 과제이다.As described above, in the conventional technology, only matching techniques between ontologies have been proposed. Accordingly, the ontology is realistically constructed due to excessive time and manpower consumption for developing various applications of the semantic web by constructing a large scale ontology. There is a problem that it is difficult to expand the information continuously generated in the ontology, and it is a problem of the present invention to solve this problem and to meet the needs.

따라서 본 발명은 인간의 개입 시간을 최소화하면서도 온톨로지 구축시간을 최대한 단축하고 온톨로지를 지속적으로 확장하기 위하여, 구문기반(syntactic-based)의 연관 정보(예 : 연관어 그래프)와 의미기반(semantic-based)의 온톨로지 유사체계(예 : 온톨로지) 간의 매칭을 통하여 의미기반 온톨로지를 생성하기 위한, 온톨로지 생성 장치 및 그 방법을 제공하는데 그 목적이 있다.Therefore, in order to minimize ontology construction time and to continuously extend ontology while minimizing human intervention time, the present invention is syntactic-based association information (eg association graph) and semantic-based. It is an object of the present invention to provide an ontology generating apparatus and method for generating a semantic based ontology through matching between ontologie-like systems (eg, ontology).

또한, 본 발명은 웹 문서상에 존재하는 용어들(terms) 사이에 포함된 구문적인 관계를 포함하는 연관어 그래프를 클러스터링한 후, 온톨로지를 포함하는 온톨로지 유사체계와 언어기반 매칭, 및 상황정보기반 매칭을 수행한 후, 가중치 부여, 매칭의 적절성 검사, 및 누락된 관계 매칭 과정을 통해 연관어 그래프와 온톨로지 유사체계를 매칭시켜 의미기반 온톨로지를 생성하기 위한, 온톨로지 생성 장치 및 그 방법을 제공하는데 다른 목적이 있다.In addition, the present invention clusters an association graph including a syntactic relationship between terms existing in a web document, and then ontology-like system, language-based matching, and context-based matching, including ontology. To provide an ontology generating apparatus and method for generating a semantic based ontology by matching association graphs and ontology similar systems through weighting, matching suitability checking, and missing relationship matching process. There is this.

본 발명의 목적들은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 실시예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.The objects of the present invention are not limited to the above-mentioned objects, and other objects and advantages of the present invention which are not mentioned can be understood by the following description, and will be more clearly understood by the embodiments of the present invention. Also, it will be readily appreciated that the objects and advantages of the present invention may be realized by the means and combinations thereof indicated in the claims.

상기 목적을 달성하기 위한 본 발명의 장치는, 온톨로지 생성 장치에 있어서, 연관 정보와 온톨로지 유사체계 간을 언어기반으로 매칭시키기 위한 언어기반 매칭부; 상기 언어기반 매칭부에서 매칭된 관계를 상황정보를 기반으로 매칭시키기 위한 상황정보기반 매칭부; 및 상기 상황정보기반 매칭부에서 매칭된 관계를 토대로 연관 정보와 온톨로지 유사체계 간에 의미적 가중치를 부여하기 위한 가중치 부여부를 포함한다.In accordance with another aspect of the present invention, an apparatus for generating an ontology includes: a language-based matching unit for performing language-based matching between association information and an ontology-like system; A contextual information-based matching unit for matching the relationship matched by the language-based matching unit based on contextual information; And a weighting unit for assigning a semantic weight between the association information and the ontology-like system based on the relationship matched in the contextual information-based matching unit.

또한, 상기 본 발명의 장치는, 기 구축된 연관 정보와 온톨로지 유사체계를 입력받아 연관 정보를 클러스터링하여 상기 언어기반 매칭부로 전달하기 위한 클러스터링부; 상기 가중치 부여부로부터 매칭 관계를 전달받아, 상기 클러스터링부의 클러스터링 단위로 용어가 온톨로지 유사체계에 일정하게 매칭되었는지를 검사하기 위한 검사부; 및 외부의 전문가로부터의 정보에 따라 상기 검사부로부터의 온톨로지 유사체계를 수정하거나 누락 용어를 온톨로지 유사체계에 강제적으로 매칭시키 기 위한 강제 매칭부를 더 포함한다.The apparatus may further include a clustering unit configured to receive pre-established association information and an ontology-like system, cluster the association information, and transmit the clustered association information to the language-based matching unit; A checker for receiving a matching relationship from the weighting unit, and checking whether a term consistently matches an ontology-like system in a clustering unit of the clustering unit; And a forced matching unit for modifying the ontology-like system from the inspection unit or forcibly matching a missing term to the ontology-like system according to information from an external expert.

한편, 상기 목적을 달성하기 위한 본 발명의 방법은, 온톨로지 생성 방법에 있어서, 언어 자원의 특성을 반영하여 연관어 그래프의 용어와 온톨로지의 용어 간을 매칭시키는 언어기반 매칭 단계; 상기 언어기반 매칭 단계에서 매칭된 관계를 상황정보를 기반으로 매칭시키는 상황정보기반 매칭 단계; 및 상기 상황정보기반 매칭 단계에서 매칭된 관계를 토대로 연관어 그래프와 온톨로지 간의 연결(Link) 관계에 의미적 가중치를 부여하는 가중치 부여 단계를 포함한다.On the other hand, the method of the present invention for achieving the above object, Ontology generation method, Language-based matching step for matching between the terms of the association graph and the terms of the ontology reflecting the characteristics of the language resources; A contextual information-based matching step of matching the matched relationship in the language-based matching step based on contextual information; And a weighting step of assigning a semantic weight to a link relationship between the association graph and the ontology based on the matched relationship in the contextual information-based matching step.

또한, 상기 본 발명의 방법은, 기 구축된 연관어 그래프와 온톨로지를 입력받아 연관어 그래프를 클러스터링하여 상기 언어기반 매칭 단계로 전달하는 클러스터링 단계; 상기 가중치 부여 단계로부터 매칭 관계를 전달받아, 상기 클러스터링 단계의 클러스터링 단위로 용어가 온톨로지에 일정하게 매칭되었는지를 검사하는 검사 단계; 및 외부의 전문가로부터의 정보에 따라 상기 검사 단계로부터의 온톨로지를 수정하거나 누락 용어를 온톨로지에 강제적으로 매칭시키는 강제 매칭 단계를 더 포함한다.In addition, the method of the present invention, the clustering step of receiving the pre-established association word graph and the ontology clustering the association word graph to the language-based matching step; Receiving a matching relationship from the weighting step and checking whether a term is consistently matched to an ontology in a clustering unit of the clustering step; And a forced matching step of correcting the ontology from the inspection step or forcibly matching the missing term to the ontology according to information from an external expert.

삭제delete

이처럼, 본 발명은 웹 문서상에 존재하는 수많은 용어들의 구문적 관계인 연관어 그래프와 온톨로지를 포함하는 온톨로지 유사체계 간을 매칭시키기 위하여, 연관어 그래프를 클러스터링하고, 언어기반 매칭 기법으로 1:N의 관계를 형성한 후, 상황정보기반 매칭 기법을 이용하여 1:N의 관계를 의미적으로 분해하여 1:1 관계를 형성하고, 새롭게 형성된 연관어 그래프와 온톨로지 간의 관계에 가중치를 부여한다.As such, the present invention clusters the association graph in order to match an association graph, which is a syntactic relationship between a number of terms existing on a web document, and an ontology-like system including an ontology, and uses a 1: N relation as a language-based matching technique. After forming, we use the contextual information-based matching technique to semantically decompose the 1: N relationship to form a 1: 1 relationship, and weight the relationship between the newly formed association word graph and the ontology.

즉, 본 발명은 일반 웹 문서에서 주요 용어들 간에 존재하는 구문적인 관계(용어 간의 연관도)를 이용하여 생성된 구문기반 연관어 그래프를 의미 관계가 정의된 온톨로지 유사체계에 매칭시켜 대규모의 의미기반 온톨로지를 새롭게 생성할 수 있다.That is, according to the present invention, a large-scale semantic basis is obtained by matching a syntax-based association graph generated by using syntactic relations (term associations) among key terms in a general web document to an ontology-like system in which semantic relations are defined. New ontology can be created.

상기와 같은 본 발명은, 구문기반(syntactic-based)의 연관 정보(예 : 연관어 그래프)와 의미기반(semantic-based)의 온톨로지 유사체계(예 : 온톨로지) 간의 매칭을 통하여 대규모의 의미기반 온톨로지를 생성함으로써, 인간의 개입 시간을 최소화하면서도 온톨로지 구축시간을 최대한 단축할 수 있고, 온톨로지를 지속적으로 확장할 수 있는 효과가 있다.As described above, the present invention provides a large-scale semantic-based ontology through matching between syntactic-based association information (eg, association graph) and semantic-based ontology-like system (eg, ontology). By generating the, it is possible to minimize the ontology construction time while minimizing the human intervention time, there is an effect that can continuously expand the ontology.

즉, 본 발명은 온톨로지 구축 자동화를 위한 자동 의미 체계 생성, 구축 인력의 최소화, 구축 시간 단축, 및 온톨로지의 지속적인 확장을 위해, 기존 웹 문서의 구문적인 관계(용어 간의 연관도)를 이용하여 생성된 구문기반 연관어 그래프를 의미 관계가 정의된 온톨로지 유사체계에 매칭시켜 대규모의 의미기반 온톨로지를 새롭게 생성할 수 있으며, 이렇게 생성된 의미기반 온톨로지를 이용하여 시맨틱 웹 응용 서비스를 제공할 수 있다.That is, the present invention is generated by using the syntactic relations (terms between terms) of existing web documents for automatic semantic generation for ontology construction automation, minimization of construction manpower, reduction of construction time, and continuous extension of ontology. A large-scale semantic-based ontology can be newly generated by matching a syntax-based association graph with an ontology-like system in which semantic relations are defined. A semantic web application service can be provided using the generated semantic-based ontology.

또한, 본 발명은 최근 기하급수적으로 증가하고 있는 웹 문서에 대한 검색 활동 증가와 이미지와 멀티미디어 컨텐츠의 증가에 따라서 무수히 생성되는 댓글과 태그(Tag) 등을 포함하는 웹 문서에 대한 검색 및 관리의 비용에 해당하는 사용자의 투입 시간, 인지적 부담, 및 심리적 부담 등을 최소화할 수 있고, 검색의 이익인 효율성과 편리성을 최대화할 수 있을 뿐만 아니라, 컨텐츠(텍스트, 이미지, 멀티미디어 등)의 의미를 이해하고 처리하는 시맨틱 웹 기술을 활용하여 인간과 기계(소프트웨어 온톨로지)가 모두 이해할 수 있는 온톨로지를 손쉽게 구축할 수 있는 탁월한 효과가 있다.In addition, the present invention is the expenditure of the search and management of the web document including comments and tags generated innumerably in accordance with the increase of search activity for the web document and the increase of images and multimedia content, which is increasing exponentially in recent years. The user's input time, cognitive burden, and psychological burden can be minimized, and the efficiency and convenience of search can be maximized, and the meaning of content (text, image, multimedia, etc.) By leveraging semantic web technology to understand and process, there is an excellent effect to easily build an ontology that both humans and machines (software ontology) can understand.

또한, 본 발명에 따라 생성된 온톨로지는 시맨틱 웹 응용 분야, 특히 시맨틱 검색, 시맨틱 데이터 통합 등의 처리에 적용되기에 적합하다.In addition, the ontology generated in accordance with the present invention is suitable for application to semantic web applications, in particular semantic search, semantic data integration, and the like.

상술한 목적, 특징 및 장점은 첨부된 도면을 참조하여 상세하게 후술되어 있는 상세한 설명을 통하여 보다 명확해 질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다.BRIEF DESCRIPTION OF THE DRAWINGS The above and other objects, features and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings, It can be easily carried out. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail.

먼저, 본 발명의 이해를 돕기 위하여 시맨틱 웹 기술에 대해 살펴보기로 한다.First, the semantic web technology will be described to help understand the present invention.

1989년에 팀 버너즈-리(Tim Berners-Lee)에 의해 제안된 웹(WWW : World Wide Web)은 HTML(HyperText Markup Language)이라는 간단한 마크업 언어로 정보를 표현하여 인터넷 공간에서 상호 정보를 공유할 수 있도록 함으로써, 인터넷 기반의 정보 공유와 전달에 획기적인 계기를 가져왔다. 이에 따라, 다양한 정보 자원을 상호 연결한 거대한 정보공간이 구축되었고, 인터넷이 실생활에까지 급속하게 확산되어 정보 사회의 혁신을 이루었다.The World Wide Web (WWW), proposed by Tim Berners-Lee in 1989, expresses information in a simple markup language called HyperText Markup Language (HTML) to share information in the Internet space. By doing so, it has revolutionized Internet-based information sharing and delivery. As a result, a huge information space was established that interconnected various information resources, and the Internet rapidly spread to the real world, thereby innovating the information society.

그러나 기존의 웹 기술은 웹 정보의 양이 방대하여 짐에 따라 많은 문제에 봉착하게 되었다. 특히, 기존의 웹은 키워드(keyword)에 의한 정보 접근만을 허용 하고 있으며, 그에 따른 단순 키워드 검색으로 인하여 무수히 많은 불필요한 정보가 양산되어 정보의 홍수가 만들어지고 있다. 또한, 기존의 웹은 컴퓨터가 스스로 필요한 정보를 효과적으로 추출하고, 해석하고, 가공하는 기능을 충분히 제공하지 못하는 인간 중심의 정보 처리 기술이라고 할 수 있다.However, the existing web technologies face many problems as the amount of web information increases. In particular, the existing web only permits access to information by keywords, and according to the simple keyword search, a myriad of unnecessary information has been mass produced, resulting in a flood of information. In addition, the existing web can be said to be a human-centered information processing technology in which a computer does not provide enough functions to effectively extract, interpret, and process necessary information on its own.

따라서 기존의 웹을 확장하여 컴퓨터가 이해할 수 있는 잘 정의된 의미를 기반으로 의미적 상호 운용성(semantic interoperability)을 실현하고 인간과 컴퓨터 간의 효과적인 협동 체제를 구축할 수 있는 기술로서 시맨틱(Semantic) 웹이 등장하게 되었다.Therefore, the semantic web is a technology that can extend the existing web to realize semantic interoperability based on well-defined meanings that computers can understand and to build an effective cooperative system between humans and computers. It appeared.

시맨틱 웹이란 웹상에 존재하는 정보를 사람뿐만 아니라 기계(컴퓨터)가 의미를 파악하고 사용자의 요구에 적합한 결과만을 찾아주는 의미 기반 검색을 수행하며, 사람과 기계 또는 기계와 기계 상호 간에 협업을 원활히 수행함으로써, 사람을 대신하여 자동적인 서비스가 가능한 웹을 말한다.The Semantic Web is a semantic-based search that allows not only people but machines (computers) to understand the information that exists on the web, and finds only the results that meet the user's needs, and facilitates collaboration between people and machines or machines and machines. By doing so, it refers to a web that can be used automatically for people.

즉, 시맨틱 웹은 컴퓨터가 정보 자원의 의미를 이해하고, 자동화하고, 통합하고, 재사용할 수 있는 차세대 웹 기술로서, 다음의 3가지 주요 요소로 이루어진다.In other words, the semantic web is a next-generation web technology that enables a computer to understand, automate, integrate, and reuse the meaning of information resources.

1) 온톨로지(ontology)1) Ontology

온톨로지는 공유된 개념화에 대한 형식적 명세 체계로서, 도메인 어휘의 의미 정보를 제공한다. 온톨로지는 일종의 지식 표현으로, 컴퓨터는 온톨로지로 표현된 개념을 이해하고 지식처리를 할 수 있다. 추론 등의 처리를 위해서는 온톨로지의 공리(axiom)와 규칙(rule) 체계가 필요하다.Ontology is a formal specification system for shared conceptualization and provides semantic information of domain vocabulary. Ontology is a kind of knowledge expression, and the computer can understand the concept represented by the ontology and process the knowledge. In order to deal with inferences, the ontology's axiom and rule system are needed.

2) 의미적으로 주석화된 웹(semantically annotated Web)2) semantically annotated web

의미적으로 주석화된 웹이란 온톨로지로 주석화된 웹으로, 일종의 지식 베이스(knowledge base)이다. 시맨틱 웹에서는 인터넷의 분산 정보 자원을 의미적으로 통합하는 거대한 지식 베이스를 구축할 수 있다. 좁은 의미에서 기업 또는 기관의 정보 자원에 대한 지식 베이스를 구축할 수도 있다.A semantically annotated web is an ontology annotated web, which is a knowledge base. The Semantic Web can build a huge knowledge base that semantically integrates the distributed information resources of the Internet. In a narrow sense, it may be possible to build a knowledge base of information resources of a company or institution.

3) 에이전트(agent)3) agent

에이전트(agent)는 사람(사용자)을 대신하여 정보 자원을 수집·검색하고 추론하며, 다른 에이전트와 상호 정보를 교환하는 등의 일을 수행하는 지능형 에이전트이다. 지능형 에이전트는 시맨틱 웹 기반 응용 시스템의 핵심이라 할 수 있다.An agent is an intelligent agent that collects, retrieves and infers information resources on behalf of a person (user), and exchanges information with other agents. Intelligent agents are the core of semantic web-based application systems.

시맨틱 웹은 온톨로지와 에이전트 기술을 활용하여 의미적 상호 운용성을 실현하며, 그에 따라 기존의 정보 표현 중심의 웹을 지식 기반 의미 중심의 웹으로 도약시킬 수 있게 되었다.The semantic web realizes semantic interoperability by using ontology and agent technology, and thus, the semantic web can leap from the information-based web to the knowledge-based semantic web.

다음으로, 본 발명의 이해를 돕기 위하여, 기존 방식인 온톨로지들 간의 매칭 방식과 본 발명에 따른 연관어 그래프와 온톨로지(온톨로지 유사체계) 간의 매칭 방식을 비교하여 살펴보면, 다음과 같은 차이점이 있다.Next, in order to help the understanding of the present invention, comparing the matching method between ontologies, which are conventional methods, and the matching method between the associated word graph and the ontology (ontology similar system) according to the present invention, there are differences as follows.

첫째, 기존 방식인 온톨로지들 간의 매칭 방식에 이용되는 각각의 온톨로지들은 의미적(semantic) 관계도이지만, 본 발명에 이용되는 연관어 그래프는 구문적(syntactic) 관계도이다. 그러므로 기존 방식인 온톨로지들 간의 매칭 방식은 의미적 관계도 간의 매칭 문제이지만, 본 발명에 따른 연관어 그래프와 온톨로지 간 의 매칭 방식은 구문적 관계도와 의미적 관계도 간의 매칭 문제이다.First, each ontology used in a matching method between ontologies, which is a conventional method, is a semantic relationship, but an associative graph used in the present invention is a syntactic relationship. Therefore, the existing matching method between ontology is a matching problem between the semantic relationship diagram, but the matching method between the associative graph and the ontology according to the present invention is a matching problem between the syntactic relationship and the semantic relationship diagram.

둘째, 기존 방식인 온톨로지들 간의 매칭 방식에서 대상이 되는 개별 온톨로지는 온톨로지의 특성(예 : 개념, 속성, 관계, 제약조건, 공리, 인스턴스 등)을 포함하고 있지만, 본 발명에 따른 연관어 그래프와 온톨로지 간의 매칭 방식에서의 연관어 그래프는 단지 단어(Term)들 간의 동시 발생 정도를 토대로 구성된 관계도이므로 온톨로지의 특성을 보유하고 있지 않다. 그러므로 본 발명에 따른 연관어 그래프와 온톨로지 간의 매칭 방식은 온톨로지와 비온톨로지 간의 매칭 문제이다.Second, the individual ontology targeted by the matching method between the ontology includes the characteristics of the ontology (e.g., concept, attribute, relationship, constraint, axiom, instance, etc.) The association word graph in the ontology matching method does not have the characteristics of the ontology because it is a relational diagram constructed based on the degree of simultaneous occurrence between words. Therefore, the matching method between the association word graph and the ontology according to the present invention is a matching problem between the ontology and the non-ontology.

셋째, 기존 방식인 온톨로지들 간의 매칭 방식에서는 모든 대상 온톨로지들이 완벽하게 의미적 관계(예 : 상하위 관계, 형제 관계 등)를 형성하지만, 본 발명에 따른 연관어 그래프와 온톨로지 간의 매칭 방식에서는 연관어 대상들이 구문적 관계이므로, 매칭 후 결과가 부분적으로만 의미적 관계를 형성하고, 부분적으로는 여전히 구문적 관계로 남아 있게 된다.Third, in the conventional method of matching between ontologies, all target ontology forms a perfect semantic relationship (eg, parent-child relationship, sibling relationship, etc.), but in the method of matching between graph and ontology according to the present invention Since these are syntactic relations, the result after matching only forms a semantic relation only partially, and remains partially syntactic.

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시예를 상세히 설명하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 2는 본 발명이 적용되는 온톨로지 생성 시스템과, 본 발명의 바람직한 일실시예에 따른 온톨로지 생성 장치의 구성도이다.2 is a configuration diagram of an ontology generating system to which the present invention is applied and an ontology generating device according to a preferred embodiment of the present invention.

먼저, 도 2를 참조하여 본 발명이 적용되는 온톨로지 생성 시스템을 예를 들어 살펴보면 다음과 같다.First, referring to FIG. 2, an ontology generation system to which the present invention is applied will be described as follows.

도 2에 도시된 바와 같이, 본 발명이 적용되는 온톨로지 생성 시스템은, 도 메인 전문가의 지식을 바탕으로 기 구축된 온톨로지(10)와 웹상에 존재하는 다양한 문서들로 공기(Co-occurrence Frequency)를 토대로 기 구축된 연관어 그래프(20)를 입력받아, 연관어 그래프를 온톨로지(온톨로지 유사체계)에 매칭시켜 의미기반 온톨로지를 생성하기 위한 온톨로지 생성 장치(30), 상기 온톨로지 생성 장치(30)에서 생성된 온톨로지를 저장하기 위한 온톨로지 저장소(40), 및 상기 온톨로지 저장소(40)에 저장되어 있는 온톨로지를 이용하여 다양한 시맨틱 웹 응용 서비스를 제공하고자 하는 시맨틱 웹 응용 애플리케이션 이용자 단말기 또는 시맨틱 웹 응용을 위한 시맨틱 애플리케이션(예 : 소프트웨어 에이전트)(50)을 포함한다.As shown in FIG. 2, the ontology generation system to which the present invention is applied may generate air (Co-occurrence Frequency) using the ontology 10 and various documents existing on the web based on the knowledge of domain experts. The ontology generation device 30 for generating a semantic-based ontology by matching the association word graph with an ontology (ontology similar system) based on the pre-established association word graph 20 is generated by the ontology generation device 30. Semantic web application application for the user terminal or semantic web application to provide a variety of semantic web application services using the ontology storage 40 for storing the ontology, and the ontology stored in the ontology storage 40 (Eg, software agent) 50.

다음으로, 도 2에 도시된 바와 같이, 본 발명에 따른 온톨로지 생성 장치(30)는, 외부로부터 입력받은 연관어 그래프(20)와 온톨로지(10) 간을 언어기반으로 매칭시키기 위한 언어기반 매칭부(32), 상기 언어기반 매칭부(32)에서 매칭된 관계를 상황정보를 기반으로 매칭시키기 위한 상황정보기반 매칭부(33), 및 상기 상황정보기반 매칭부(33)에서 매칭된 관계를 토대로 연관어 그래프와 온톨로지 간에 의미적 가중치를 부여하기 위한 의미적 가중치 부여부(34)를 포함한다.Next, as shown in Figure 2, the ontology generation apparatus 30 according to the present invention, the language-based matching unit for matching the on-ology 10 between the associated language graph 20 and the ontology 10 received from the outside (32), based on the contextual information-based matching unit 33, and the contextual information-based matching unit 33 for matching the relationship matched in the language-based matching unit 32 based on the contextual information It includes a semantic weighting unit 34 for giving a semantic weight between the association graph and the ontology.

또한, 본 발명에 따른 온톨로지 생성 장치(30)는, 기 구축된 온톨로지(10)와 연관어 그래프(20)를 입력받아 연관어 그래프를 클러스터링하여 상기 언어기반 매칭부(32)로 전달하기 위한 연관어 그래프 클러스터링부(31), 상기 의미적 가중치 부여부(34)에서 의미적 가중치가 부여된 매칭 관계를 전달받아, 상기 연관어 그래프의 클러스터링 단위로 용어가 온톨로지에 일정하게 매칭되었는지를 검사하기 위한 검사부(35), 및 외부의 도메인 전문가로부터의 수정 또는 보완 정보에 따라 상 기 검사부(35)로부터의 온톨로지를 수정하거나 누락된 일정 용어를 기 구축된 온톨로지에 강제적으로 매칭시키기 위한 강제 매칭부(36)를 더 포함한다.In addition, the ontology generating apparatus 30 according to the present invention receives an associated ontology 10 and a related word graph 20, and associates the related word graph by clustering the related word graph to the language-based matching unit 32. The graph clustering unit 31 and the semantic weighting unit 34 receive the matching relationship with the semantic weights, and check whether the term is consistently matched to the ontology in the clustering unit of the association graph. Force matching unit 36 for modifying the ontology from the inspection unit 35 or forcibly matching certain missing terms to the established ontology in accordance with the inspection unit 35 and correction or supplemental information from an external domain expert. More).

여기서, 도 2를 참조하여 상기 각 구성요소에 대하여 좀 더 상세히 살펴보면 다음과 같다.Here, referring to FIG. 2, the components are described in more detail as follows.

먼저, 연관어 그래프 클러스터링부(31)는 기 구축된 온톨로지(10)와 연관어 그래프(20)를 입력받아, 이후의 과정에서 연관어 대상 용어(Term)들이 일정한 분포로 온톨로지에 매칭되었는지를 파악할 수 있도록 하기 위한 사전 작업으로, 연관어 그래프를 클러스터링한다.First, the associated word graph clustering unit 31 receives the pre-established ontology 10 and the associated word graph 20 to determine whether the related term target terms are matched to the ontology in a predetermined distribution in a subsequent process. As a preliminary work to make this possible, cluster the association graph.

그리고 언어기반 매칭부(32)는 언어 자원의 특성을 반영하여 연관어 그래프의 대상 용어들(terms)과 온톨로지(온톨로지 유사체계)의 용어(개념, 속성, 관계, 인스턴스 등) 간을 매칭시킨다.The language-based matching unit 32 reflects the characteristics of language resources to match the terms (terms) of the ontology graph with terms (concepts, attributes, relationships, instances, etc.) of the ontology.

그리고 상황정보기반 매칭부(33)는 상기 언어기반 매칭부(32)에서 생성된 1:N(N은 자연수)의 관계를 상황정보를 기반으로 매칭시켜 1:1의 관계를 새롭게 생성하여 연관어 그래프를 분해한다.The contextual information-based matching unit 33 matches the relationship of 1: N (N is a natural number) generated by the language-based matching unit 32 based on contextual information to newly generate a 1: 1 relationship. Explode the graph.

그리고 의미적 가중치 부여부(34)는 상기 상황정보기반 매칭부(33)에서 분해된 관계를 토대로 연관어 그래프와 온톨로지 간의 연결(Link) 관계에 의미적 가중치를 부여한다.The semantic weighting unit 34 assigns a semantic weight to a link relationship between the association word graph and the ontology based on the relationship decomposed by the contextual information-based matching unit 33.

그리고 검사부(35)는 통계적으로 대규모로 생성된 연관어 그래프(예 : 한글의 경우 300만~400만 어휘)와 온톨로지 용어(한글의 경우 수십만 어휘) 간의 매칭으로 인해 연관어 그래프의 용어들이 일정하게 온톨로지와 매칭되었는지를 판단할 수 있도록, 상기 의미적 가중치 부여부(34)에서 의미적 가중치가 부여된 매칭 관계를 전달받아, 상기 연관어 그래프 클러스터링부(31)에서의 연관어 그래프 클러스터링 단위로 클러스터에 포함된 용어 중에 일부가 온톨로지에 일정하게 매칭되었는지를 검사한다.In addition, the inspection unit 35 maintains constant terms of the associated graph due to a match between a statistically generated correlation graph (eg, 3 million to 4 million words in Korean) and an ontology term (hundreds of thousands in the Korean language). The semantic weighting unit 34 receives the matching relationship with the semantic weighting so as to determine whether the ontology is matched, and clusters the association word graph clustering unit in the association word clustering unit 31. Check that some of the terms contained in are consistently matched to the ontology.

그리고 강제 매칭부(36)는 상기 연관어 그래프 클러스터링 단위 내 용어들이 단 한 개의 용어도 온톨로지에 매칭되지 않았을 경우, 외부의 도메인 전문가로부터의 수정 또는 보완 정보에 따라 온톨로지 수정 또는 누락된 연관어 그래프 클러스터 내의 일정 용어를 기 구축된 온톨로지에 강제적으로 매칭시킨다.The forced matching unit 36 may determine that the terms in the associative graph clustering unit are not matched with the ontology, even if only one term is not matched with the ontology. Certain terms in the table are forcibly matched to an existing ontology.

다음으로, 본 발명에 따른 온톨로지 생성 장치의 각 구성요소 및 그 동작 방법에 대하여 구체적으로 상세히 살펴보면 다음과 같다.Next, each component of the ontology generating device and its operation method according to the present invention will be described in detail as follows.

먼저, 연관 정보와 온톨로지 유사체계 간의 매칭에 있어서, 매칭 대상이 되는 온톨로지(10)와 연관어 그래프(20)를 외부로부터 수집하거나 확보한다.First, in matching between the association information and the ontology-like system, the ontology 10 to be matched with the association word graph 20 is collected or secured from the outside.

이때, 온톨로지 유사체계는 도메인 전문가에 의해 기 생성된 온톨로지(10)는 물론이고, 기존 어휘의 의미를 표현한 사전, 어휘 등의 분류 등에서 활용하고 있는 시소러스, 텍사노미, 또는 폭소노미 등을 온톨로지 유사체계의 대상으로 포함한다.At this time, the ontology-like system uses not only the ontology 10 pre-generated by domain experts, but also thesaurus, texanomi, or folksonomi used in the classification of dictionaries, vocabularies, etc. that express the meaning of the existing vocabulary. Include as a target of.

다음으로, 상기 연관어 그래프(20)를 생성하는 방식에 대하여 살펴보면, 우선 웹 문서를 대상으로 연관어의 대상이 될 수 있는 핵심 용어들을 추출하는데, 이 경우에 불용어/중복어 등은 제외한다. 이때, 연관어 대상 핵심 용어 추출을 위해 형태소 분석기, 또는 앤그램(nGram) 방식 등을 이용하여 연관어 대상 핵심 용어를 선택한다. 또한, 추출된 연관어들 간의 연관도 측정을 위해 MI(Mutual Information), TF(Term Frequency : 문서 내에서의 단어의 출현 회수)/IDF(Inversed Document Frequency : 전체문서 중 단어가 출현한 문서의 개수의 역수), C-value/NC-value 등의 알고리즘 중 상호 정보를 가장 잘 측정할 수 있는 방식을 선택한다. 측정된 연관도는 임계치 조건(예 : 상호 정보값이 40%~50%, 또는 하나의 노드에서 연결된 노드의 개수가 M개 이내 등)과 정규화 실행을 만족하여야 하며, 이렇게 생성된 연관어들과 연관도를 이용하여, 노드-아크(Node-Arc)를 갖는 그래프 이론에 적용하여 각 도메인별 연관어 그래프를 생성한다. 이렇게 생성된 도메인별 연관어 그래프는 노드 간의 코-어커런스(co-occurrence), 의미 있는 노드 관계 생성 등의 방식을 통해 하나의 큰 그래프로 병합되어, 구문기반 연관어 그래프를 생성한다.Next, referring to the method of generating the related word graph 20, first, key terms that may be the target of the related word are extracted from the web document, and in this case, stopwords / duplicate words are excluded. In this case, to extract the core term for the target correlator, the core term for the correlated term is selected using a morpheme analyzer or an ngram method. In addition, MI (Mutual Information), TF (Term Frequency) / IDF (Inversed Document Frequency: the number of documents in which a word appeared in the entire document to measure the degree of association between the extracted related words Reciprocal), C-value / NC-value, etc., the method that can measure the mutual information best. The measured correlation should satisfy the threshold condition (e.g. 40% to 50% of mutual information, or the number of connected nodes within one node, etc.) and normalization execution. Using the degree of association, an association word graph for each domain is generated by applying the graph theory with a node-arc. The domain-related association graphs generated as above are merged into one large graph through co-occurrence between nodes and generation of meaningful node relationships, thereby generating a syntax-based association graph.

한편, 상기 온톨로지 생성 장치(30)는 매칭의 대상이 되는 온톨로지(10)와 연관어 그래프(20)를 매칭시키는 과정을 포함한다.On the other hand, the ontology generation device 30 includes a process of matching the ontology 10 to be matched with the graph 20 in association.

먼저, 연관어 그래프 클러스터링부(31)가 매칭의 대상이 되는 연관어 그래프를 클러스터링한다. 연관어 그래프를 우선 클러스터링하는 이유는 매칭의 대상이 되는 온톨로지 용어(개념, 속성, 관계, 인스턴스 등)와 연관어 그래프 용어(예 : 상호정보량 기반의 용어 간 연결의 대상 용어들) 간의 숫자의 차이를 해결하기 위해서이다. 예를 들어, 온톨로지의 용어 개수와 연관어 그래프의 용어 개수의 비율이 1:100 또는 1:1000으로 구성되어 있을 경우 온톨로지의 용어가 연관어 그래프에 일정한 비율로 매칭되었는지를 판단하려고 클러스터링을 수행한다. 연관어 그래프 구성요소인 용어들은 전문가들에 의해서 생성된 온톨로지에 비해 다양한 용어를 포함하고 있고 신조어, 복합어, 및 최신 유행어 등을 포함하거나 일정한 기간 동안만 사용되는 용어들이어서 객관적 관점에서 만들어진 온톨로지 용어와 매칭되지 못하는 경우가 발생할 수 있다. 따라서 연관어 그래프의 연관도에 최신 용어들만으로 구성되어 있는 부분(영역)이 있고, 이 영역이 클러스터링되어 있다면 이러한 영역이 연관어 그래프의 클러스터링 단위로 온톨로지와 매칭되었는지를 판단하여 해당 영역을 온톨로지에 강제(인위)적으로 매칭시키기 위해 연관어 그래프를 클러스터링한다. 연관어 그래프에 대해서는 연관계수를 이용하여 HCS(Highly Connected Subgraph) 또는 카멜론(Chamelon) 등의 방법을 통해 클러스터링을 수행할 수 있다.First, the association word cluster clustering unit 31 clusters the association word graph to be matched. The reason for clustering an association graph first is that there is a difference in the number between ontology terms (concepts, attributes, relationships, instances, etc.) that are subject to matching and association terms (eg, target terms of linkage between terms based on mutual information). To solve it. For example, when the ratio of the number of terms in the ontology and the number of terms in the association graph is 1: 100 or 1: 1000, clustering is performed to determine whether the terms of the ontology are matched to the association graph at a constant ratio. . The terms, which are related graph elements, contain a variety of terms compared to ontologies created by experts and include terms such as new words, compound words, and trendy words, or are used only for a certain period of time. A mismatch can occur. Therefore, if there is a part (area) consisting only of the latest terms in the association diagram of the association word graph, and if this area is clustered, it is determined whether these areas match the ontology as the clustering unit of the association word and force the corresponding area to the ontology. Cluster the associative graph to match (artificially). For the association word graph, clustering may be performed by using a method such as HCS (Highly Connected Subgraph) or Camon.

다음으로, 언어기반 매칭부(32)에서는 연관어 그래프의 용어와 온톨로지의 용어들을 언어적 관점에서만 매칭시킨다. 이때, 언어기반 매칭부(32)에서는 크게 문자열기반(String-based) 매칭 과정과 언어자원(Linguistic resource)기반 매칭 과정을 수행한다. 우선, 문자열기반 매칭 과정에서는 유사한 용어는 유사한 이름 혹은 표현을 사용한다고 가정한다. 예를 들어, 온톨로지의 컨셉, 및 인스턴스 등에서의 표현과 연관어 그래프에서의 연관어 대상 용어 간의 문자열을 비교한다. 이 경우에는 정확하게 매칭되는 경우를 1:N 관계로 파악한다. 언어자원기반 매칭 과정에서는 언어적 자원(Linguistic resource)을 활용하여 용어의 의미와 용어 간 관계를 파악한다. 언어적 관계어(유의어, 동의어, 약어 등)를 참조하여, 온톨로지 컨셉, 인스턴스의 표현과 연관어의 대상이 되는 용어 간의 매칭 관계를 1:N으로 파악한다. 예를 들어, 연관어 대상 용어 중에 '김희선'은 가수 김희선, 탤런트 김희선, 농구선수 김희선, 정치인 김희선 등으로 1:N 관례로 매칭될 수 있다. 물론, 이 경우에 중의어 리스트를 확보하여 매칭 시 중의어 리스트로 제한하여 매칭을 수행할 수도 있다. 상기 예시에 대해서는 도 3의 언어기반의 연관어 그래프와 온톨로지 간의 매칭 예시를 통해 상세히 후술하기로 한다.Next, the language-based matching unit 32 matches the terms of the association graph with the terms of the ontology only from a linguistic point of view. At this time, the language-based matching unit 32 performs a string-based matching process and a linguistic resource-based matching process. First, in the string-based matching process, it is assumed that similar terms use similar names or expressions. For example, a character string between an ontology concept, an expression in an instance, and the like and a target word of a related word in a related word graph is compared. In this case, the case of correct matching is identified by a 1: N relationship. In the linguistic resource-based matching process, linguistic resources are used to understand the meaning of terms and the relationship between terms. With reference to linguistic relational terms (synonyms, synonyms, abbreviations, etc.), the matching relationship between ontology concepts, instance expressions and terms that are subjects of association terms is identified as 1: N. For example, the term “Kim Hee Sun” among the related term target terms may be matched with a 1: N convention such as singer Kim Hee Sun, talent Kim Hee Sun, basketball player Kim Hee Sun, and politician Kim Hee Sun. Of course, in this case, the Chinese word list may be secured to limit the Chinese word list to match. The example will be described later in detail through a matching example between the language-based association word graph and the ontology of FIG. 3.

다음으로, 상황정보기반 매칭부(33)에서는 연관어 그래프의 용어와 온톨로지의 용어들을 상황정보기반으로 매칭시킨다. 이때, 상황정보기반 매칭부(33)에서는 크게 시맨틱기반(Semantic-based) 매칭 과정과 제약조건기반(Constraint-based) 매칭 과정을 수행한다. 먼저, 시맨틱기반(Semantic-based) 매칭 과정에 대해서 살펴보면, 이 과정에서는 상이한 상황정보가 상이한 구조(Structure), 속성(Property), 및 관계(Relation)로 표현되므로, 온톨로지의 주변 컨셉 및 속성 등과 연관어 그래프의 1차적, 2차적 연관 용어를 상황정보로 활용하여 매칭을 수행한다. 우선, 온톨로지의 상황정보로는 대상 용어들의 상위 관계(Super Class) 컨셉, 형제 관계(Sibling) 컨셉, 대상 용어의 속성(Property)과 객체(ObjectProperty) 및 인스턴스에 해당하는 용어인 경우는 차상위 컨셉의 관련 정보를 대상으로 한다. 반면에, 연관어 그래프의 상황정보로는 연관어 그래프의 대상 용어의 1차적인 연관어 중에 한 개 및 선택된 1차적 연관어의 연관어들을 상황정보로 활용한다. 이러한 연관어 그래프와 온톨로지의 매칭 시 가장 높은 유사도를 갖는 매칭을 선택하고, 나머지 매칭된 연결(Link)은 해제하여 1:1의 관계로 매칭을 수행한다. 이 과정에서 가장 높은 유사도 매칭의 선택을 위해서는 하기의 [수학식 1]과 같은 다이스 계수 함수(Dice Coefficient Function) 등을 이용할 수 있다. 예를 들어, 온톨로지의 특정 용어의 상황정보(E)와 연관어 그래프의 특정 용어에 해당하는 상황정보(E')의 합집합을 분모로 하고, 온톨로지의 특정 용어의 상황정보(e)와 연관어 그래프의 특정 용어에 해당하는 상황정보(E')의 교집합을 분자로 하여 가장 유사도가 높은 매칭 관계만을 선택한다. 상기 예시에 대해서는 도 4의 상황정보기반의 연관어 그래프와 온톨로지 간의 매칭 예시(1)를 통해 상세히 후술하기로 한다.Next, the contextual information-based matching unit 33 matches the terminology of the associated graph with the terminology of the ontology based on the contextual information. At this time, the contextual information-based matching unit 33 performs a semantic-based matching process and a constraint-based matching process. First, the semantic-based matching process is described. In this process, different contextual information is represented by different structures, properties, and relationships, and thus associated with the concepts and properties of the ontology. Matching is performed using the primary and secondary related terms of the graph as contextual information. First of all, the ontology's context information includes super class concept, sibling concept, target property, object property, and instance. Target relevant information. On the other hand, as contextual information of the association graph, one of the primary association words of the target term of the association graph and the association words of the selected primary association term are used as situation information. When matching the association graph and the ontology, the matching having the highest similarity is selected, and the remaining matched links are released to perform matching in a 1: 1 relationship. In this process, a die coefficient function such as Equation 1 below may be used to select the highest similarity matching. For example, the union of the situation information (E) of a specific term of the ontology and the situation information (E ') corresponding to the specific term of the graph is used as the denominator, and the context information (e) of the specific term of the ontology is associated with the denominator. Only the most similar matching relationship is selected based on the intersection of the situation information E 'corresponding to a specific term of the graph as a molecule. The above example will be described in detail later through an example (1) of matching between the contextual information-based association word graph and the ontology of FIG. 4.

상기 시맨틱기반 매칭 과정의 다음 과정으로 제약조건기반(Constraint-based) 매칭 과정이 수행된다. 이 과정에서는 상이한 상황정보가 상이한 구조, 속성, 및 관계로 표현되기 때문에 매칭의 연결(Link)이 존재하는 용어(x)와 온톨로지 속성(y)이 일치하면서 용어(x)의 1차/2차 연관어(용어(x)의 상황정보)가 온톨로지 속성(y)의 상황정보(속성(y)의 도메인과 레인지(Range))까지 일치한다면 해당 용어(x)와 온톨로지 속성(y)은 거의 같은 대상을 간주하는 관계로 매칭시킨다. 상기 예시에 대해서는 도 5의 상황정보기반의 연관어 그래프와 온톨로지 간의 매칭 예시(2)를 통해 상세히 후술하기로 한다.As a next step of the semantic-based matching process, a constraint-based matching process is performed. In this process, since the different contextual information is represented by different structures, attributes, and relationships, the term x and the ontology attribute y coincide in terms of the first and second order of the term x. If the related term (the context information of term (x)) matches the situation information of the ontology attribute (y) (the domain and range of the attribute (y)), the term (x) and the ontology attribute (y) are almost the same. Matches in relation to the object considered. The example will be described later in detail through an example (2) of matching between the contextual information-based association word graph and the ontology of FIG. 5.

다음으로, 의미적 가중치 부여부(34)에서는 상기 상황정보기반 매칭 과정을 통하여 1:1 관계로 매칭된 용어들 간의 관계에 의미적 가중치를 부여한다.Next, the semantic weighting unit 34 assigns semantic weights to the relationships between terms matched in a 1: 1 relationship through the contextual information-based matching process.

이를 좀 더 상세히 살펴보면, 상기 상황정보기반 매칭 과정을 통해 1:1 관계로 분해된 각각의 연결(Link)은 새롭게 생성된 관계/호(relation/arc)로서, 온톨로지를 그래프로 간주하고 기존 호 가중치에 상응하는(correspondent) 연결 가중치를 부여할 필요가 있다. 여기서, 상기 부여된 연결 가중치는 다양한 그래프 탐색 알고리즘에 적용될 수 있다. 이 경우에 연결의 신뢰성을 고려한 가중치 부여 값이 요구된다. 기본적인 호 가중치 부여 기준은 [0, 1]사이의 값을 부여하되, 상위클래스(Superclass)보다는 높은 값을, 속성(Property)보다는 낮은 값을 부여한다. 왜냐하면, 온톨로지는 전문가의 도메인 지식을 통해 구축된 개념, 속성, 및 인스턴스 등을 포함하고 있고, 이러한 전문 지식의 표현 중에서 상위 개념(Superclass)과 속성(Property)은 하위 개념(subclass), 인스턴스(Instance), 및 연관어 용어들(Term)보다는 개념적 수준의 용어라고 판단되기 때문이다. 그러므로 상위 클래스 < 연결(Link) < 속성 < 하위클래스 순으로 의미적 가중치를 부여한다. 상세한 연결 가중치 부여 방식과 함수에 대해서는 도 6의 의미적 가중치 부여 과정에서 상세히 후술하기로 한다.In more detail, each link decomposed into a 1: 1 relationship through the contextual information-based matching process is a newly created relationship / arc, which considers the ontology as a graph and weights the existing call. There is a need to assign a connection weight to the corresponding (correspondent). Here, the given connection weight may be applied to various graph search algorithms. In this case, a weighting value considering the reliability of the connection is required. The basic call weighting criteria assigns values between [0, 1], but gives higher values than superclasses and lower values than properties. Because ontology includes concepts, properties, and instances constructed through expert domain knowledge, and superclasses and properties are subclasses and instances. And terminology rather than terminology (Term). Therefore, we assign semantic weights in the order of upper class <Link <attribute <subclass. The detailed connection weighting method and function will be described later in detail in the semantic weighting process of FIG. 6.

다음으로, 검사부(35)는 상기 연관어 그래프 클러스터링 과정에서 클러스터링된 단위, 즉 연관어 그래프 중 특정 용어로부터 N차(Nth)적으로 연결되어 있는 연관어들로 구성된 연관어 그래프 클러스터링 단위 내에서 하나의 용어라도 온톨로지에 매칭되었는지를 파악한다. 즉, 매칭의 적절성을 검사한다. 예를 들어, 신조어와 연관되어 있는 연관어들로 구성된 연관어 그래프 클러스터들의 용어는 도메인 전문가의 지식에 기반하여 구축된 온톨로지에는 포함되어 있지 않을 수 있으며, 검 사부(35)에서는 이러한 연관어 그래프 내 클러스터를 파악한다.Next, the inspector 35 is one of the units clustered in the association graph clustering process. Even if the term is matched to the ontology, it is determined. In other words, it checks for adequacy of matching. For example, terminology in the association graph clusters consisting of association words associated with new words may not be included in an ontology built on the knowledge of domain experts. Identify the cluster.

다음으로, 강제 매칭부(36)에서는 상기 검사부(35)에서 온톨로지에 매칭되지 않았던 연관어 그래프 내 클러스터되어 있는 용어들 중의 대표어를 온톨로지에 강제(인위)적으로 연결한다. 왜냐하면, 연관어 그래프는 구문적인 그래프이고 온톨로지는 의미적 그래프인데 상호 매칭을 통해 하나의 대규모 온톨로지를 구성하려면, 연관어 그래프의 용어들이 일정한 분포로 온톨로지의 용어에 매칭되어야 하기 때문이다.Next, the forced matching unit 36 forcibly connects the representative words of the clustered terms in the association graph that are not matched with the ontology in the inspection unit 35 to the ontology. This is because the associative graph is a syntactic graph and the ontology is a semantic graph. In order to construct a large-scale ontology through mutual matching, the terms of the associative graph must be matched to the terms of the ontology with a constant distribution.

다음으로, 온톨로지 생성 장치(30)를 통해 구축된 거대한 온톨로지는 온톨로지 저장소(40)에 저장되어 이용자 단말기 또는 시맨틱 애플리케이션(50)이 이용할 수 있다.Next, the huge ontology constructed through the ontology generating device 30 is stored in the ontology repository 40 and can be used by the user terminal or semantic application 50.

도 3은 본 발명의 바람직한 일실시예에 따른 도 2의 언어기반 매칭부에서의 언어기반의 연관어 그래프와 온톨로지 간의 매칭 과정을 설명하기 위한 도면이다.FIG. 3 is a diagram illustrating a matching process between a language-based association word graph and an ontology in the language-based matching unit of FIG. 2 according to an exemplary embodiment of the present invention.

도 3을 참조하여 살펴보면, 온톨로지 영역(301)과 연관어 그래프 영역(302)으로 구분되어 있는 부분들을 매칭시켜 하나의 대규모 온톨로지를 구축하고자 한다. 연관어 그래프 영역(302)의 '김희선'이라는 용어의 경우 온톨로지 영역(301)의 '탤런트 김희선', '농구선수 김희선', '정치인 김희선'과 1:N의 관계로 매칭되어 연결(Link)된다. 또한, 연관어 그래프 영역(302)의 '농구선수'는 온톨로지 영역(301)의 프로선수라는 클래스의 하위 클래스인 '농구선수'로 매칭되고, 연관어 그래프 영역(302)의 '중앙대'는 온톨로지 영역(301)의 학교라는 인스턴스인 '중앙 대학교'의 동의어인 '중대', '중앙대'에 매칭된다.Referring to FIG. 3, a large-scale ontology is constructed by matching portions separated by the ontology area 301 and the graph area 302. In the case of the term 'Kim Hee-sun' in the related word graph area 302, it is matched and linked with the relationship between 'Talent Kim Hee-sun', 'Basketball player Kim Hee-sun' and 'Political Kim Hee-sun' in the ontology area 301 in a 1: N relationship. . In addition, 'basketball player' in the association word graph area 302 is matched with 'basketball player' which is a subclass of a professional player class of the ontology area 301, and 'central zone' in the association word graph area 302 is ontology. It matches the 'major' and 'central university' synonyms of 'central university', which is an instance of school in the area 301.

특히, 1:N 관계로 매칭된 '김희선'의 경우는 온톨로지 영역(301)의 어떤 클래스의 인스턴스에 해당하는지 분리해야 한다. 그러나 언어기반 매칭부(32)의 언어기반 매칭 과정에서는 전술한 바와 같이 문자열기반(String-based) 매칭 과정과 언어자원(Linguistic resource)기반 매칭 과정을 거치면서 연관어 그래프의 용어와 온톨로지의 용어가 정확하게 매칭되거나 언어적 관계(유의어, 동의어, 약어 등)에 의미적 유사도로 매칭될 경우 1:N 관계로 연결(Link)되며, 매칭 결과값은 '0' 또는 '1'로 구분한다.In particular, in the case of 'Kim Hee-sun' matched in a 1: N relationship, it is necessary to separate which class instance of the ontology area 301 corresponds to. However, in the language-based matching process of the language-based matching unit 32, the terminology of the association graph and the term of ontology go through a string-based matching process and a linguistic resource-based matching process as described above. When matched correctly or syntactically (synonyms, synonyms, abbreviations, etc.) with semantic similarity are linked in a 1: N relationship, and the matching result is divided into '0' or '1'.

도 4는 본 발명의 바람직한 일실시예에 따른 도 2의 상황정보기반 매칭부에서의 상황정보기반의 연관어 그래프와 온톨로지 간의 매칭 과정을 설명하기 위한 제 1 도면이다.4 is a first view illustrating a matching process between a contextual information-based association word graph and an ontology in the contextual information-based matching unit of FIG. 2 according to an exemplary embodiment of the present invention.

도 4를 참조하여 살펴보면, 온톨로지 영역(401)과 연관어 그래프 영역(402)으로 분리되어 있으나, 도 3의 설명에서도 언급하였듯이 언어기반 매칭부(32)에서 언어기반으로 연관어 그래프의 용어와 온톨로지 용어 간이 매칭된 결과를 상황정보를 토대로 분리하고자 한다. 왜냐하면, 온톨로지에 매칭될 특정 용어(개념, 속성, 인스턴스 등) 값을 찾고자 할 때 연관어 그래프와 여러 개가 매칭되어 있지만 연관어 그래프의 용어는 온톨로지의 특정 용어와의 관계(relation)이지 모두와의 관계를 의미하지 않기 때문이다. 예를 들어, 도 4의 연관어 그래프의 용어인 '김희선'은 언어기반으로 연관어 그래프 용어와 온톨로지 용어 간을 매칭한 결과, 1:3의 관 계로 연결되어 있으나, 온톨로지의 상황정보와 연관어 그래프의 상황정보를 통해 의미적(semantic)으로 분리할 수 있다. 온톨로지 상황정보는 대상 용어의 상위(Superclass) 컨셉, 형제 컨셉(Sibling), 관계(Property, ObjectProperty), 및 인스턴스의 상위 컨셉 등을 활용하고, 연관어 그래프의 상황정보는 1차적 연관어 중 한 개와 선택된 1차적 연관어의 연관어들을 활용할 수 있다. 도 4에서 연관어 그래프 영역(402)의 용어 '김희선'은 1차적으로 선택된 연관어인 '이효리'와 선택된 1차적 연관어의 연관어인 '가수', '결혼' 등을 상황정보로 활용할 수 있다. 반면에, 온톨로지 영역(401)의 용어 중에 하나인 '탤런트 김희선'의 상황정보는 '김희선'이라는 인스턴스의 컨셉인 '탤런트', 탤런트의 속성인 '결혼', 탤런트와 결혼 관계의 컨셉인 '가수', 탤런트의 상위 컨셉인 '연예인' 등을 온톨로지의 상황정보로 활용할 수 있다. 결과적으로, 연관어 그래프 영역(402)의 용어인 '김희선'은 일부 의미가 유사한 온톨로지의 탤런트 '김희선'과 연결이 되고, 연관어 그래프 영역(402)의 용어인 또 다른 '김희선'의 경우도 유사도가 높은 순서로 매칭시킬 수 있다. 연관어 그래프 영역(402)의 다른 '김희선'의 경우에도 김희선-당선-공약의 경우와 온톨로지의 김희선-정치인-당선 등으로 반복하면서 유사도가 높은 순으로 매칭이 가능하다. 이러한 매칭 방식에 대해서는 도 2의 상황정보기반 매칭부(33)에서도 상술하였듯이 상기 [수학식 1]과 같은 다이스 계수 함수(Dice Coefficient Function) 등을 활용할 수 있다.Referring to FIG. 4, the ontology region 401 and the association word graph area 402 are separated, but as mentioned in the description of FIG. 3, the term and ontology of the association word graph in the language-based matching unit 32 are language-based. We want to separate the results matched between terms based on the situation information. Because when we try to find the value of a specific term (concept, property, instance, etc.) to be matched with the ontology, the association graph is matched with several, but the term in the association graph is a relation with the specific term of the ontology. Because it does not mean a relationship. For example, the term 'Kim Hee-sun' of the related word graph of FIG. 4 is a language-based match between the related word graph term and the ontology term. It can be semanticly separated through the status information of the graph. Ontology contextual information utilizes the superclass concept of the target term, sibling concept, relationship (Property, ObjectProperty), and superordinate concept of the instance, and the contextual information of the association graph is one of the primary related words. The association words of the selected primary association word can be used. In FIG. 4, the term 'Kim Hee-sun' of the related word graph region 402 may use 'Hyo-ri', which is a primarily selected related word, and 'singer', 'marriage', etc., which are related words of the selected primary related word. On the other hand, context information of 'Talent Kim Hee-sun', which is one of the terms of ontology area 401, is 'Talent' which is an instance concept of 'Kim Hee-sun', 'Marriage' which is a property of talent, and 'Singer' which is a concept of marriage and talent. ',' Celebrity ', a higher concept of talent, can be used as context information of ontology. As a result, the term 'Kim Hee-sun', which is a term of the associative graph area 402, is connected to the talent 'Kim Hee-sun' of an ontology, which has a similar meaning, and in the case of another 'Kim Hee-sun' which is a term of the associative graph area 402. Matching may be performed in order of high similarity. In the case of other 'Kim Hee-sun' in the related word graph region 402, it is possible to match in the order of high similarity while repeating the case of Kim Hee-sun-Dong-pyeon and Kim Hee-sun-politician-Dang-Soo of the ontology. As for the matching method, as described above in the contextual information-based matching unit 33 of FIG. 2, a dice coefficient function such as Equation 1 may be used.

도 5는 본 발명의 바람직한 일실시예에 따른 도 2의 상황정보기반 매칭부에 서의 상황정보기반의 연관어 그래프와 온톨로지 간의 매칭 과정을 설명하기 위한 제 2 도면이다.FIG. 5 is a second view illustrating a matching process between a contextual information-based association word graph and an ontology in the contextual information-based matching unit of FIG. 2 according to an exemplary embodiment of the present invention.

도 5를 참조하여 살펴보면, 온톨로지 영역(501)과 연관어 그래프 영역(502)으로 분리되어 있으나, 도 4의 설명에서도 언급하였듯이 상황정보기반으로 연관어 그래프 용어와 온톨로지 용어 간을 매칭한 결과를 토대로 각각의 용어를 의미적(semantic)으로 상세하게 분리한 후, 각각의 연결된 관계가 의미적으로 거의 일치하는지를 추가 확인한다. 즉, 도 4에서 전술한 과정을 통해 매칭된 경우 각각의 연결(Link)의 대상이 되는 연관어 그래프의 용어와 온톨로지의 용어 간에 의미적으로 거의 일치하는지를 파악하는 방식이다. 예를 들어, 연관어 그래프의 용어인 '가수'와 온톨로지의 용어인 '가수'는 연관어 그래프의 가수 상황정보에서 '결혼'이라는 1차적인 연관어가 있고, 온톨로지의 용어인 '가수'라는 컨셉의 속성(Property)으로 '결혼'이 있기 때문에 두 용어는 거의 일치한다고 판단할 수 있다. 이에 대한 상세 설명은 도 2의 상황정보기반 매칭부(33)에서 전술한 바와 같다.Referring to FIG. 5, the ontology area 501 and the associative graph area 502 are separated, but as mentioned in the description of FIG. 4, based on the result of matching the related term graph term and the ontology term based on contextual information. Separate each term in semantic detail, and then further check that each linked relationship is semantically close. That is, in the case of matching through the above-described process in FIG. 4, it is a method of determining whether the terms of the associated graphs that are the targets of each link and the terms of the ontology are substantially matched. For example, the term 'singer' in the association graph and 'singer' in terms of the ontology have a primary association of 'marriage' in the singer context information in the association graph, and the concept of 'singer', the term in ontology. Since there is 'marriage' as a property of, the two terms can be judged to be almost identical. Detailed description thereof has been described above with the contextual information-based matching unit 33 of FIG. 2.

도 6은 본 발명의 바람직한 일실시예에 따른 도 2의 의미적 가중치 부여부에서의 의미적 가중치 부여 과정을 설명하기 위한 도면이다.6 is a diagram illustrating a semantic weighting process in the semantic weighting unit of FIG. 2 according to an exemplary embodiment of the present invention.

도 6을 참조하여 살펴보면, 도 5에서 전술한 과정을 통해 온톨로지와 연관어 그래프가 1:1 관계로 매칭되어 거대한 온톨로지로 활용하기 위해서는 연결 부분에 일정한 의미적 가중치를 부여하여 하나의 동질성있는 온톨로지로 생성하여야 한다. 기본적으로 의미적 가중치를 부여하는 방식은 연결(Link) 가중치를 온톨로지의 속 성(Property) 가중치보다 낮게 부여한다(상위(Superclass) < 연결(Link) < 속성(Property) <하위(subclass)). 또한, 상황정보기반 매칭 기법 중의 하나인 제약조건기반(도 2의 상황정보기반 매칭 과정에 대한 설명 참조) 체크 결과를 반영하여 연결(Link) 가중치를 구분하며, 유효한 값으로 체크된 연결의 경우 온톨로지의 속성 가중치와 대응한 값을 부여한다. 이 경우라도 온톨로지의 속성 최대 가중치 값보다는 낮게 부여하고, 또한 문서를 가진 속성의 가중치 값보다도 낮게 부여한다. 만약, 온톨로지의 속성 값이 존재하지 않는 경우는 아래 2)의 경우처럼 가중치를 부여하되, 특히 연결사례 2의 경우는 보정계수 값을 추가해야 한다. 여기서, 보정계수 값은 0~1 사이의 값을 가질 수 있으며, 바람직하게는 0.7의 값을 가진다.Referring to FIG. 6, the ontology and associated graphs are matched in a 1: 1 relationship through the process described above with reference to FIG. 5, and in order to be used as a huge ontology, a certain semantic weight is given to the connection part as one homogeneous ontology. Must be created. Basically, the semantic weighting method gives the link weight lower than the property weight of the ontology (Superclass <Link <Property <Subclass). In addition, link weights are classified based on the constraint-based (see description of the contextual information matching process of FIG. 2) check, which is one of contextual information-based matching techniques, and the ontology is checked for a valid value. It gives a value corresponding to the attribute weight of. Even in this case, it is lower than the attribute maximum weight value of the ontology and lower than the weight value of the attribute with the document. If the attribute value of the ontology does not exist, the weight is assigned as in the case of 2) below. In particular, in connection case 2, the correction factor value should be added. Here, the correction coefficient value may have a value between 0 and 1, and preferably has a value of 0.7.

도 6의 의미적 가중치 부여 과정에 사용되는 연결(Link) 가중치 부여 함수에 대하여 좀 더 상세하게 살펴보면 다음과 같다. 이때, 아래의 연결(Link) 가중치 부여 함수에서 L_w는 연결(link)의 가중치를 나타내고,　P_w는 속성(Property)의 가중치를 나타내며,　P_d는 문서를 가진 속성(Property)의 가중치를 나타낸다.The link weighting function used in the semantic weighting process of FIG. 6 will be described in more detail as follows. In the link weighting function below, L _w represents a weight of a link, P _w represents a weight of a property, and P _d represents a weight of a property having a document. .

1) 온톨로지 속성이 한 개 이상 존재하는 경우1) When more than one ontology attribute exists

　　　- 연결(Link) 사례 1 : 상황정보의 제약조건기반 체크 결과가 유효한 경우 : 연결(link) 가중치는 아래와 같이 속성 최소 가중치보다는 높게 부여하고, 속성 최대 가중치와 문서를 가진 속성의 최소 가중치 중에서 작은 값보다는 작게 부여한다.-Link case 1: Constraint-based check result of the situation information is valid: The link weight is given higher than the attribute minimum weight as shown below, and the smaller of the attribute maximum weight and the minimum weight of the attribute with the document as follows. Give smaller than

　　　　　. [min] P_w < L_w < [min] ([max] P_w _,[min] P_d). [min] P _w <L _w <[ min] ( [max] P _w _, [ min] P _d )

- 연결(Link) 사례 2 : 상황정보의 제약조건기반 체크 결과가 유효하지 않은 경우 : 연결(link) 가중치는 아래와 같이 상위 클래스 가중치보다는 높게 부여하고, 속성 최소 가중치보다는 작게 부여한다. -Link Case 2: If the constraint-based check result of the situation information is invalid: The link weight is given higher than the upper class weight and smaller than the attribute minimum weight as shown below.

　　　　　. 상위 < L_w < [min] P_w . Parent <L _w <[ min] P _w

2) 온톨로지 속성이 존재하지 않는 경우2) Ontology attribute does not exist

　　　- 연결(Link) 사례 1 : 상황정보의 제약조건기반 체크 결과가 유효한 경우 : 연결(link) 가중치는 아래와 같이 상위 클래스 가중치보다는 높게 부여하고, 하위 클래스 가중치보다는 작게 부여한다.-Link Case 1: When the constraint-based check result of the situation information is valid: The link weight is given higher than the upper class weight and smaller than the lower class weight.

　　　　　. 상위 < L_w < 하위. Upper <L _w <lower

　　　- 연결(Link) 사례 2 : 상황정보의 제약조건기반 체크 결과가 유효하지 않은 경우 : 연결(link) 가중치와 보정계수가 곱해진 값이 아래와 같이 상위 클래스 가중치보다는 크고 하위 클래스 가중치보다는 작게 되도록 연결(link) 가중치를 부여한다.-Link Case 2: If the constraint-based check result of the situation information is invalid: The link multiplied by the correction factor is larger than the upper class weight and smaller than the lower class weight. link) assigns a weight.

　　　　　. 상위 < L_w * 보정계수< 하위. Top <L _w * Correction factor <lower

여기서, 보정계수로 사용되는 임계치는 0 내지 1 사이의 값을 가지며, 바람직하게는 0.7의 값을 가진다.Here, the threshold value used as the correction coefficient has a value between 0 and 1, and preferably has a value of 0.7.

도 7은 본 발명의 바람직한 일실시예에 따른 온톨로지 생성 방법에 대한 흐름도이다.7 is a flowchart illustrating an ontology generation method according to an embodiment of the present invention.

이때, 본 발명의 온톨로지 생성 방법에 대한 구체적인 실시예는 전술한 바와 같으므로, 여기서는 온톨로지 생성 방법의 동작 요지만을 간략하게 설명하기로 한다.At this time, since the specific embodiment of the ontology generation method of the present invention is as described above, only the operation of the ontology generation method will be briefly described.

먼저, 연관어 그래프 클러스터링부(31)가 기 구축된 온톨로지(10)와 연관어 그래프(20)를 입력받아 연관어 그래프를 클러스터링한다(701). 즉, 연관어 그래프 클러스터링부(31)가 기 구축된 온톨로지(10)와 연관어 그래프(20)를 외부로부터 입력받아, 이후의 검사 과정(705)에서 연관어 대상 용어(Term)들이 일정한 분포로 온톨로지에 매칭되었는지를 파악할 수 있도록 하기 위한 사전 작업으로, 연관어 그래프를 클러스터링한다.First, the associated word graph clustering unit 31 receives the associated ontology 10 and the associated word graph 20 to cluster the associated word graph (701). That is, the associated word graph clustering unit 31 receives the pre-established ontology 10 and the associated word graph 20 from the outside, and in the subsequent inspection process 705, the related term target terms are defined in a constant distribution. As a preliminary work to determine whether the ontology is matched, the association graph is clustered.

이후, 언어기반 매칭부(32)가 연관어 그래프(20)와 온톨로지(10) 간을 언어기반으로 매칭시킨다(702). 즉, 언어기반 매칭부(32)가 언어 자원의 특성을 반영하여 연관어 그래프의 대상 용어들(terms)과 온톨로지(온톨로지 유사체계)의 용어(개념, 속성, 관계, 인스턴스 등) 간을 매칭시킨다.Thereafter, the language-based matching unit 32 matches the association graph 20 and the ontology 10 based on language (702). That is, the language-based matching unit 32 matches the terms (terms, concepts, attributes, relationships, instances, etc.) of the ontology (ontology similar system) and the terms (terms) of the association graph by reflecting the characteristics of the language resources. .

이후, 상황정보기반 매칭부(33)가 상기 언어기반 매칭부(32)에서 매칭된 관계를 상황정보를 기반으로 매칭시킨다(703). 즉, 상황정보기반 매칭부(33)가 상기 언어기반 매칭부(32)에서 생성된 1:N(N은 자연수)의 관계를 상황정보를 기반으로 매칭시켜 1:1의 관계를 새롭게 생성하여 연관어 그래프를 분해한다.Thereafter, the contextual information-based matching unit 33 matches the match matched by the language-based matching unit 32 based on the contextual information (703). That is, the contextual information-based matching unit 33 matches the relationship of 1: N (N is a natural number) generated by the language-based matching unit 32 based on the contextual information to newly generate a 1: 1 relationship. Explode the graph.

이후, 의미적 가중치 부여부(34)가 상기 상황정보기반 매칭부(33)에서 매칭된 관계를 토대로 연관어 그래프와 온톨로지 간에 의미적 가중치를 부여한다(704). 즉, 의미적 가중치 부여부(34)가 상기 상황정보기반 매칭부(33)에서 분해된 관계를 토대로 연관어 그래프와 온톨로지 간의 연결(Link) 관계에 의미적 가중치를 부여한다.Subsequently, the semantic weighting unit 34 assigns semantic weights between the association word and the ontology based on the relationship matched by the contextual information-based matching unit 33 (704). That is, the semantic weighting unit 34 assigns semantic weights to the link relationship between the association word graph and the ontology based on the relationship decomposed by the contextual information-based matching unit 33.

이후, 검사부(35)가 상기 의미적 가중치 부여부(34)에서 의미적 가중치가 부여된 매칭 관계를 전달받아, 상기 연관어 그래프의 클러스터링 단위로 용어가 온톨로지에 일정하게 매칭되었는지를 검사한다(705). 즉, 검사부(35)가 통계적으로 대규모로 생성된 연관어 그래프(예 : 한글의 경우 300만~400만 어휘)와 온톨로지 용어(한글의 경우 수십만 어휘) 간의 매칭으로 인해 연관어 그래프의 용어들이 일정하게 온톨로지와 매칭되었는지를 판단할 수 있도록, 상기 의미적 가중치 부여부(34)에서 의미적 가중치가 부여된 매칭 관계를 전달받아, 상기 연관어 그래프 클러스터링부(31)에서의 연관어 그래프 클러스터링 단위로 클러스터에 포함된 용어 중에 일부가 온톨로지에 일정하게 매칭되었는지를 검사한다.Subsequently, the inspection unit 35 receives the matching relationship given the semantic weight from the semantic weighting unit 34 and checks whether the term is consistently matched to the ontology in the clustering unit of the association graph (705). ). That is, the terms of the associated word graph are constant due to the matching between the related word graph (eg, 3 million to 4 million words in Korean) and the ontology term (hundreds of thousands in Korean). The semantic weighting unit 34 receives the matching relationship with the semantic weights so as to determine whether the ontology is matched with the ontology, and in the associative graph graph clustering unit in the associative graph graph clustering unit 31. Check that some of the terms in the cluster consistently match the ontology.

이후, 강제 매칭부(36)가 외부의 도메인 전문가로부터의 수정 또는 보완 정보에 따라 상기 검사부(35)로부터의 온톨로지를 수정하거나 누락된 일정 용어를 기 구축된 온톨로지에 강제적으로 매칭시킨다(706). 즉, 강제 매칭부(36)가 상기 연관어 그래프 클러스터링 단위 내 용어들이 단 한 개의 용어도 온톨로지에 매칭되지 않았을 경우, 외부의 도메인 전문가로부터의 수정 또는 보완 정보에 따라 온톨로지 수정 또는 누락된 연관어 그래프 클러스터 내의 일정 용어를 기 구축된 온톨로지에 강제적으로 매칭시킨다.Subsequently, the forced matching unit 36 modifies the ontology from the inspection unit 35 according to the correction or supplementation information from the external domain expert or forcibly matches the missing certain term to the already built ontology (706). That is, when the forced matching unit 36 does not match any of the terms in the associative graph clustering unit with the ontology, the ontology is modified or missing associative graph according to the correction or supplementary information from an external domain expert. Certain terms in the cluster are forcibly matched to the established ontology.

전술한 바와 같이, 본 발명에서는 연관어 그래프와 온톨로지 간의 매칭으로 인해, 기존의 인간의 개입이 많고 구축 시간이 많이 소요되는 방식보다, 시맨틱 웹 응용이 가능한 실생활에서의 가치를 더할 것으로 기대된다.As described above, in the present invention, due to the matching between the association word graph and the ontology, it is expected that the semantic web application will add value in real life, rather than the existing method that requires a lot of human intervention and construction time.

한편, 전술한 바와 같은 본 발명의 방법은 컴퓨터 프로그램으로 작성이 가능하다. 그리고 상기 프로그램을 구성하는 코드 및 코드 세그먼트는 당해 분야의 컴퓨터 프로그래머에 의하여 용이하게 추론될 수 있다.　또한, 상기 작성된 프로그램은 컴퓨터가 읽을 수 있는 기록매체(정보저장매체)에 저장되고, 컴퓨터에 의하여 판독되고 실행됨으로써 본 발명의 방법을 구현한다. 그리고 상기 기록매체는 컴퓨터가 판독할 수 있는 모든 형태의 기록매체를 포함한다.On the other hand, the method of the present invention as described above can be written in a computer program. And the code and code segments constituting the program can be easily inferred by a computer programmer in the art. In addition, the written program is stored in a computer-readable recording medium (information storage medium), and read and executed by a computer to implement the method of the present invention. The recording medium may include any type of computer readable recording medium.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다.The present invention described above is capable of various substitutions, modifications, and changes without departing from the technical spirit of the present invention for those skilled in the art to which the present invention pertains. It is not limited by the drawings.

본 발명은 연관어 그래프와 온톨로지 간의 매칭을 통한 온톨로지 생성 방식에 관한 것으로, 시맨틱 웹에서 필수불가결한 온톨로지 구축을 용이하게 하여 시맨틱 웹의 다양한 응용, 예를 들어 시맨틱 검색, 시맨틱 데이터 통합 등에 현실적으로 적용될 수 있다.The present invention relates to an ontology generation method through matching between an association graph and an ontology, and facilitates the construction of an ontology that is indispensable in the semantic web, and thus can be practically applied to various applications of the semantic web, for example, semantic search and semantic data integration. have.

도 1은 종래의 온톨로지들 간의 매칭 방식을 설명하기 위한 일실시예 설명도,1 is a diagram illustrating an exemplary method of matching between conventional ontology;

도 2는 본 발명이 적용되는 온톨로지 생성 시스템과, 본 발명의 바람직한 일실시예에 따른 온톨로지 생성 장치의 구성도,2 is a configuration diagram of an ontology generating system to which the present invention is applied, and an ontology generating device according to a preferred embodiment of the present invention;

도 3은 본 발명의 바람직한 일실시예에 따른 도 2의 언어기반 매칭부에서의 언어기반의 연관어 그래프와 온톨로지 간의 매칭 과정을 설명하기 위한 도면,3 is a view illustrating a matching process between a language-based association word graph and an ontology in the language-based matching unit of FIG. 2 according to an embodiment of the present invention;

도 4는 본 발명의 바람직한 일실시예에 따른 도 2의 상황정보기반 매칭부에서의 상황정보기반의 연관어 그래프와 온톨로지 간의 매칭 과정을 설명하기 위한 제 1 도면,4 is a first view illustrating a matching process between a contextual information-based association word graph and an ontology in the contextual information-based matching unit of FIG. 2 according to an embodiment of the present invention;

도 5는 본 발명의 바람직한 일실시예에 따른 도 2의 상황정보기반 매칭부에서의 상황정보기반의 연관어 그래프와 온톨로지 간의 매칭 과정을 설명하기 위한 제 2 도면,FIG. 5 is a second view illustrating a matching process between a contextual information-based association word graph and an ontology in the contextual information-based matching unit of FIG. 2 according to an embodiment of the present invention; FIG.

도 6은 본 발명의 바람직한 일실시예에 따른 도 2의 의미적 가중치 부여부에서의 의미적 가중치 부여 과정을 설명하기 위한 도면,6 is a view for explaining a semantic weighting process in the semantic weighting unit of FIG. 2 according to an embodiment of the present invention;

* 도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

10 : 온톨로지 20 : 연관어 그래프10: ontology 20: association graph

30 : 온톨로지 생성 장치 31 : 연관어 그래프 클러스터링부30: ontology generation device 31: association word graph clustering unit

32 : 언어기반 매칭부 33 : 상황정보기반 매칭부32: language-based matching unit 33: context information based matching unit

34 : 의미적 가중치 부여부 35 : 검사부34: semantic weighting unit 35: inspection unit

36 : 강제 매칭부 40 : 온톨로지 저장소36: forced matching unit 40: ontology storage

50 : 이용자 단말기 또는 시맨틱 애플리케이션50: user terminal or semantic application

Claims

In the ontology generating device,

A language-based matching unit for matching language-based association information with an ontology-like system;

A contextual information-based matching unit for matching the relationship matched by the language-based matching unit based on contextual information; And

Weighting unit for assigning a semantic weight between the association information and the ontology-like system based on the relationship matched in the contextual information-based matching unit

Ontology generation device comprising a.

The method of claim 1,

A clustering unit configured to receive pre-established related information and an ontology-like system and cluster the related information and deliver the related information to the language-based matching unit;

A checker for receiving a matching relationship from the weighting unit, and checking whether a term consistently matches an ontology-like system in a clustering unit of the clustering unit; And

Force matching unit for modifying the ontology-like system from the inspection unit or forcibly matching missing terms to the ontology-like system according to correction or supplementary information from an external expert.

Ontology generation device further comprising.

The method according to claim 1 or 2,

The association information is an association word graph,

The ontology-like system is an ontology, characterized in that the ontology.

The method of claim 3, wherein

The language-based matching unit matches target terms of the association graph with terms of the ontology by reflecting characteristics of language resources,

The contextual information-based matching unit decomposes a graph of association words by generating a 1: 1 relationship by matching a relationship of 1: N (N is a natural number) generated by the language-based matching unit based on contextual information.

The weight assigning unit assigns a semantic weight to a link relationship between an associated word graph and an ontology based on the relation decomposed by the contextual information-based matching unit.

The method of claim 4, wherein

The language-based matching unit,

Match the terms of the ontology with the target terms in the association graph on a string-based basis,

An ontology generating device for matching a term of an ontology and a target term of an association graph in a 1: N based on linguistic resources.

The method of claim 4, wherein

The situation information based matching unit,

Match the relationship of 1: N (N is a natural number) generated by the language-based matching unit to the semantic-based by using the context information of the graph with the context information of the ontology, and the constraint-based Ontology generating device for matching and checking the connection relationship based on).

The method of claim 4, wherein

The weighting unit,

An ontology generating device for assigning a semantic weight to a link relationship in the order of upper class <Link <attribute <subclass.

The method according to claim 1 or 2,

The association information is an association word graph,

The ontology-like system is any one of a dictionary, a thesaurus, a taxanomi, or a folksonomy, ontology generation apparatus.

In the ontology generation method,

A language-based matching step of matching between terms of the association graph and terms of the ontology by reflecting characteristics of the language resource;

A contextual information-based matching step of matching the matched relationship in the language-based matching step based on contextual information; And

A weighting step of assigning a semantic weight to a link relationship between an associated word graph and an ontology based on the matched relationship in the contextual information-based matching step

Ontology generation method comprising a.

The method of claim 9,

A clustering step of receiving a pre-built association word and ontology and clustering the association word graph to the language-based matching step;

Receiving a matching relationship from the weighting step and checking whether a term is consistently matched to an ontology in a clustering unit of the clustering step; And

Forced matching step of modifying the ontology from the inspection step or forcibly matching the missing term to the ontology according to the correction or supplementary information from an external expert

Ontology generation method further comprising.

11. The method according to claim 9 or 10,

The language-based matching step,

Matching between terms of the ontology and target terms of the association graph in a string-based manner; And

Matching between terms of the ontology and target terms in the association graph in a 1: N based on linguistic resources

Ontology generation method comprising a.

11. The method according to claim 9 or 10,

The situation information based matching step,

Generating a relationship of 1: 1 by matching the relationship of 1: N (N is a natural number) generated in the language-based matching step based on context information, and decomposing a related word graph.

13. The method of claim 12,

The situation information based matching step,

Matching the 1: N relationship (N is a natural number) generated in the language-based matching step to semantic-based by using context information of a graph associated with context information of an ontology; And

Identifying and matching the semantic-based matched relations on a constraint-based basis;

Ontology generation method comprising a.

delete