KR20160066237A

KR20160066237A - Method and system for constructing ontology instance based on semi-structured data

Info

Publication number: KR20160066237A
Application number: KR1020140170334A
Authority: KR
Inventors: 이경일; 함영경; 이종민
Original assignee: 주식회사 솔트룩스
Priority date: 2014-12-02
Filing date: 2014-12-02
Publication date: 2016-06-10
Also published as: KR101675946B1

Abstract

The present invention relates to a method for constructing an ontology instance. The method includes the steps of: acquiring a first value for a first input field having a data refinement rule that is determined in advance to extract a value of an input field according to an attribute defined for information that is to be obtained; comparing a second value for a second input field according to an attribute added or changed for the information with the first value; and defining a data refinement rule for the second value matched with the first value through mapping of the first value. According to the present invention, rule information which has to be consistently managed conventionally to construct an instance of an online dictionary is dynamically managed so that costs are reduced and information of the instance can be effectively maintained.

Description

FIELD OF THE INVENTION [0001] The present invention relates to a method and system for constructing a semi-structured data-based dynamic ontology instance,

본 발명은 온톨로지 인스턴스를 구축하는 방법에 관한 것으로 보다 상세하게는 속성 변경에 따른 온톨로지 인스턴스의 갱신 방법에 관한 것이다. The present invention relates to a method for constructing an ontology instance, and more particularly, to a method for updating an ontology instance according to an attribute change.

종래에는 네트워크 상에서 지식을 추출함에 있어, 지식 추출이 지속적으로 최신 정보로 유지하기 위한 설계보다는 추출 당시의 추출 정확성에 초점이 맞춰져 있었다. 따라서 정보가 기존보다 더 보강되거나 변경될 경우 이에 대해 즉각적인 대응이 어려운 문제가 존재한다. 예를 들어, 위키피디아의 인포박스 역시 기능 문서의 하나로 인포박스의 명칭이 변경되거나 속성명이 변경될 수 있다. 종래에는 관리자가 이와 같은 정보들을 전부 확인하여 수정해주어야 했다. 또한 각 속성 값 정보도 사전에 정의해둔 규칙으로만 분석이 되기 때문에 이후 다른 표현방식으로 속성 값을 표현하기 시작했다면 제대로 정보를 추출하기 어렵게 된다. 이와 같은 한계를 해결하기 위해서는 지속적인 관리가 필요하다.Conventionally, in extracting knowledge on a network, focus has been focused on extraction accuracy at the time of extracting rather than designing to keep information continuously updated. Therefore, there is a problem that it is difficult to respond immediately if information is reinforced or changed more than the existing one. For example, Wikipedia's InfoBox is also a feature document that can change the name of InfoBox or change its attribute name. In the past, administrators had to check and correct all such information. In addition, since each attribute value information is analyzed only by a rule defined in advance, it is difficult to extract the information properly if the attribute value is started to be expressed in a later expression manner. To overcome these limitations, continuous management is required.

본 발명의 기술적 과제는 상기한 문제점을 해결하기 위하여, 지식 추출을 위한 규칙들의 동적인 업데이트를 수행할 수 있는 방법을 제안하는 것을 목적으로 한다. SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and it is an object of the present invention to provide a method for dynamically updating rules for knowledge extraction.

보다 상세하게는 데이터가 기존의 규칙으로 추출될 수 없거나 불완전한 형태로 추출될 경우 최적의 형태로 규칙의 동적인 업데이트는 수행하는 방법을 제안하는 것을 목적으로 한다. More particularly, it is an object of the present invention to propose a method of dynamically updating a rule in an optimal form when data can not be extracted by an existing rule or extracted in an incomplete form.

상기 기술적 과제를 해결하기 위한 본 실시예에 따른 속성 변경에 따른 온톨로지 인스턴스 구축 방법은 획득하고자 하는 정보에 대하여 정의된 속성에 따른 입력 필드의 값을 추출하기 위하여 미리 결정된 데이터 정제 규칙을 가지는 제1 입력 필드에 대한 제1 값을 획득하는 단계; 상기 정보에 대하여 추가 또는 변경되는 속성에 따른 제2 입력 필드의 제2 값과 제1 값을 비교하는 단계; 및 상기 제1 값과 매칭되는 상기 제2 값에 대한 데이터 정제 규칙을 상기 제1 값에 대한 매핑으로 정의하는 단계를 포함한다.According to another aspect of the present invention, there is provided a method for constructing an ontology instance according to an exemplary embodiment of the present invention, Obtaining a first value for the field; Comparing a second value of a second input field with a first value according to an attribute added or changed with respect to the information; And defining a data refinement rule for the second value matching the first value as a mapping for the first value.

상기 온톨로지 갱신 방법은, 다른 정보에 대하여 상기 추가 또는 변경되는 속성에 따른 상기 제2 입력 필드의 값을 상기 제2 값에 대하여 상기 정의된 데이터 정제 규칙을 통해 추출되는 값으로 갱신하는 단계를 더 포함한다.The ontology updating method further includes updating the value of the second input field according to the attribute to be added or changed with respect to other information to a value extracted through the defined data refining rule with respect to the second value do.

상기 제1 및 제2 값은 온톨로지 인스턴스로서, 네트워크상에서 입력 받은 정보에서 상기 속성에 따른 값으로 표현하는 반정형 데이터를 분리하는 단계; 상기 반정형 데이터를 상기 미리 결정된 데이터 정제 규칙에 따라 정제하는 단계; 상기 정제되어 반정형 데이터로부터 추출된 인스턴스에 대한 데이터 타입 또는 속성 정보를 설정하여 온톨로지 인스턴스를 생성하는 단계를 통해 생성되는 것이 바람직하다.Wherein the first and second values are ontology instances, separating semi-structured data represented by a value according to the attribute from information received on a network; Purifying the semi-formed data according to the predetermined data refining rule; And generating an ontology instance by setting data type or attribute information for the extracted instance from the refined semi-structured data.

상기 반정형 데이터를 분리하는 단계는 상기 입력 필드가 정의된 테이블 형태의 정보를 갖는 제1 형식 데이터와, 상기 네트워크상에서 입력 받은 정보의 분류 정보를 갖는 제2 형식 데이터로 분리하는 것이 바람직하다.Separating the semi-structured data into first format data having information of a table type in which the input field is defined and second format data having classification information of information input on the network.

상기 온톨로지 인스턴스를 생성하는 단계는 인스턴스의 매핑 규칙 정보에 따라 상기 인스턴스에 대한 데이터 타입의 설정하여 상기 인스턴스에 대한 메타데이터를 생성하는 것이 바람직하다.The step of generating the ontology instance may preferably generate metadata for the instance by setting a data type for the instance according to the mapping rule information of the instance.

상기 기술적 과제를 해결하기 위한 본 실시예에 따른 속성 변경에 따른 온톨로지 인스턴스 구축 시스템은 네트워크상에서 입력 받은 정보에서 정의된 속성에 따른 값으로 표현하는 반정형 데이터를 분리하는 파싱부; 상기 반정형 데이터를 상기 미리 결정된 데이터 정제 규칙에 따라 정제하고, 상기 정제된 반정형 데이터로부터 추출된 인스턴스에 대한 데이터 타입 또는 속성 정보를 설정하여 온톨로지 인스턴스를 생성하는 생성부; 및 상기 생성된 제1 온톨로지 인스턴스와, 상기 반정형 데이터에 대하여 추가 또는 변경되는 속성에 따른 입력 필드의 제2 온톨로지 인스턴스를 비교하여 상기 제2 온톨로지 인스턴스에 대한 데이터 정제 규칙을 상기 제1 온톨로지 인스턴스에 대한 매핑으로 정의하는 규칙 변환부를 한다.According to an embodiment of the present invention, there is provided a system for constructing an ontology according to an attribute change, comprising: a parser for separating semi-structured data represented by values according to attributes defined in information input on a network; A generator for generating an ontology instance by refining the semi-structured data according to the predetermined data refining rule and setting data type or property information for the extracted instance from the refined semi-structured data; And comparing the generated first ontology instance with a second ontology instance of an input field according to an attribute added or changed with respect to the semi-structured data, and comparing the data refinement rule for the second ontology instance to the first ontology instance And a rule conversion unit for defining the mapping as a mapping.

상기 온톨로지 인스턴스 구축 시스템은 다른 반정형 데이터에 대하여 상기 추가 또는 변경되는 속성에 따른 상기 제2 입력 필드의 값을 상기 제2 온톨로지 인스턴스에 대하여 상기 정의된 데이터 정제 규칙을 통해 추출되는 온톨로지 인스턴스로 갱신하는 갱신부를 더 포함한다.The ontology instance construction system updates the value of the second input field according to the added or changed attribute with respect to the other semi-structured data to the ontology instance extracted through the defined data refining rule for the second ontology instance And an update unit.

본 발명에 따르면, 온라인 사전의 인스턴스 구축을 위해 기존에 지속적으로 관리해야 했던 규칙 정보를 동적으로 관리할 수 있도록 처리함으로써 이에 대한 비용이 줄어들고 인스턴스의 정보를 효과적으로 유지할 수 있다. According to the present invention, it is possible to dynamically manage the rule information that has been previously managed for the instance construction of the online dictionary, thereby reducing the cost thereof and effectively maintaining the information of the instance.

도 1은 본 발명의 일 실시예에 따른 온톨로지 인스턴스 구축 방법을 나타내는 흐름도이다.
도 2는 본 발명의 일 실시예에 따른 온톨로지 인스턴스 구축 방법의 인스턴스 생성을 나타내는 흐름도이다.
도 3 및 도 4는 본 발명의 일 실시예에 따른 온톨로지 인스턴스 구축 시스템을 나타내는 블록도이다.
도 5 및 도 6은 본 발명의 일 실시예에 따른 인포박스의 속성 변경을 나타내는 예시도 이다. 1 is a flowchart illustrating a method for constructing an ontology instance according to an embodiment of the present invention.
FIG. 2 is a flowchart illustrating an instance creation method of an ontology instance building method according to an embodiment of the present invention.
3 and 4 are block diagrams illustrating an ontology instance building system according to an embodiment of the present invention.
5 and 6 are exemplary diagrams illustrating attribute changes of the InfoBox according to an embodiment of the present invention.

이하의 내용은 단지 발명의 원리를 예시한다. 그러므로 당업자는 비록 본 명세서에 명확히 설명되거나 도시되지 않았지만 발명의 원리를 구현하고 발명의 개념과 범위에 포함된 다양한 장치를 발명할 수 있는 것이다. 또한, 본 명세서에 열거된 모든 조건부 용어 및 실시예들은 원칙적으로, 발명의 개념이 이해되도록 하기 위한 목적으로만 명백히 의도되고, 이와같이 특별히 열거된 실시예들 및 상태들에 제한적이지 않는 것으로 이해되어야 한다. The following merely illustrates the principles of the invention. Therefore, those skilled in the art will be able to devise various apparatuses which, although not explicitly described or shown herein, embody the principles of the invention and are included in the concept and scope of the invention. It is also to be understood that all conditional terms and examples recited in this specification are, in principle, expressly intended for the purpose of enabling the inventive concept to be understood, and are not intended to be limiting as to such specifically recited embodiments and conditions .

상술한 목적, 특징 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이며, 그에 따라 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. BRIEF DESCRIPTION OF THE DRAWINGS The above and other objects, features and advantages of the present invention will become more apparent from the following detailed description taken in conjunction with the accompanying drawings, in which: .

또한, 발명을 설명함에 있어서 발명과 관련된 공지 기술에 대한 구체적인 설명이 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. 이하에는 첨부한 도면을 참조하여 본 발명의 바람직한 실시예에 대해 상세하게 설명한다.In the following description, a detailed description of known technologies related to the present invention will be omitted when it is determined that the gist of the present invention may be unnecessarily blurred. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 온톨로지 인스턴스 구축 방법을 나타내는 흐름도이다. 도 1을 참조하면 본 실시예에 따른 온톨로지 인스턴스 구축 방법은 온톨로지 인스턴스 획득 단계, 온톨로지 인스턴스 비교 단계, 데이터 정제 규칙 정의 단계, 온톨로지 인스턴스 갱신 단계를 포함한다. 1 is a flowchart illustrating a method for constructing an ontology instance according to an embodiment of the present invention. Referring to FIG. 1, an ontology instance building method according to the present embodiment includes an ontology instance acquisition step, an ontology instance comparison step, a data refining rule definition step, and an ontology instance update step.

먼저 온톨로지 인스턴스 획득 단계는 획득하고자 하는 정보에 대하여 정의된 속성에 따른 입력 필드의 값을 추출하기 위하여 미리 결정된 데이터 정제 규칙을 가지는 제1 입력 필드에 대한 제1 값을 획득한다. First, the ontology instance acquiring step acquires a first value for a first input field having a predetermined data refining rule in order to extract a value of an input field according to an attribute defined for the information to be acquired.

본 실시예에서 온톨로지 인스턴스는 네트워크 상에 존재하는 지식 정보를 분석하여 사물이나 개념의 구체물, 사건 등 실질적 형태를 지닌 지식 개체로 생성한 인스턴스로서, 인스턴스와 인스턴스 간에 관계가 규정된 지식 개체를 의미한다. In the present embodiment, an ontology instance is an instance created by analyzing knowledge information existing on a network and forming a knowledge object having a substantial shape such as an object, a conceptual object, or an event, do.

즉, 본 실시예에서 이용되는 온톨로지 인스턴스는 정제된 데이터를 인스턴스 분석기에 의해 분석하여 최종적인 지식 형태의 구조인 인스턴스로 형성되고, 인스턴스의 타입과 속성의 설정을 통해 온톨로지 인스턴스 형식으로 변환하여 형성된다. That is, the ontology instance used in the present embodiment is formed by analyzing the refined data by the instance analyzer and forming an instance that is a structure of the final knowledge type, and converting the ontology instance type into the ontology instance type through setting the type and attribute of the instance .

본 실시예에서 인스턴스의 생성은 네트워크 상에 존재하는 정보들을 이용하며, 바람직하게는 반정형 데이터(semi-structured data)를 이용한다. 반정형 데이터는 관계형 데이터베이스나 다른 형태의 데이터 테이블과 연결된 정형 구조의 데이터 모델을 준수하지 않는 정형 데이터의 한 형태이다. 그러나 태그나 기타 마커가 포함되어 있어서 시맨틱 요소를 구분하고 데이터 내의 레코드와 필드 계층을 강제할수 있다.In the present embodiment, the generation of the instance uses information existing on the network, and preferably uses semi-structured data. Semi-structured data is a form of structured data that does not conform to the data model of a structured structure connected to relational databases or other types of data tables. However, tags and other markers are included to distinguish semantic elements and enforce records and field hierarchies in the data.

반정형 데이터에서 동일 클래스에 속해있는 엔티티는 함께 그룹되어 있더라도 각기 다른 특성을 지닐 수 있으며, 특성의 순서가 중요하지 않다.Entities belonging to the same class in semi-structured data can have different characteristics even if they are grouped together, and the order of the properties is not important.

반정형 데이터는 인터넷이 등장한 뒤로 증가 추세이며, 여기서 전문 및 데이터베이스는 더 이상 유일한 데이터의 형태가 아니며 개별 애플리케이션들은 정보 교환을 위한 매개체가 필요하다. 객체 지향 데이터베이스에서 반정형 데이터를 종종 볼 수 있다. Semi-structured data has grown since the advent of the Internet, where professional and database are no longer the only form of data, and individual applications need a medium for information exchange. You can often see semi-structured data in an object-oriented database.

대표적인 반정형 데이터로 온라인 사전을 예로 들면, 온라인 사전은 위키피디아와 같이 내용과 표현 방식이 변경될 수 있고 온라인 접속을 통해 그 내용을 확인할 수 있다.For example, on-line dictionaries with typical semi-structured data, online dictionaries can be changed in terms of content and expression, such as Wikipedia, and their contents can be checked online.

따라서, 본 발명은 온라인 사전과 같이 반정형 상태로 변형되어 있는 데이터가 자동 추출의 주요 대상이다. 위키피디아의 문서들의 경우, 입력 필드가 정의된 테이블인 인포박스와 페이지의 분류 정보를 기록하는 카테고리 링크가 이러한 반정형 상태의 데이터로 구성된다.Accordingly, the present invention is a main object of automatic extraction of data that is transformed into a semi-fixed state like an on-line dictionary. In the case of Wikipedia documents, the infobox, which is the table in which the input field is defined, and the category link that records the classification information of the page, are made up of such semi-structured data.

따라서, 본 실시예에서 정보에 대하여 정의된 속성은 위키피디아의 인포박스의 각 항목에 해당될 수 있으며, 속성에 따른 입력 필드의 값을 추출하기 위하여 미리 결정된 데이터 정제 규칙은, 네트워크상에서 입력 받은 정보로부터 원하는 값을 추출하기 위한 규칙으로 이는 미리 사전에 1회의 직접적인 규칙 생성과정을 통해 생성될 수 있다. Therefore, in the present embodiment, the attributes defined for the information may correspond to each item of Wikipedia's InfoBox, and the predetermined data refinement rule for extracting the value of the input field according to the attribute may include information It is a rule for extracting a desired value, which can be generated through a direct rule generation process once in advance.

다시 말해, 인포박스에는 어떤 속성들이 포함되어 있고 각 속성의 값들은 어떤 정제 규칙에 따라 데이터가 파싱된다고 하는 규칙 정보가 작성될 수 있다.In other words, the InfoBox includes certain attributes, and the values of each attribute can be written to rule information that data is parsed according to certain refinement rules.

예를 들어 '이순신'페이지에 '인물 정보'라는 인포박스가 존재한다면, '인물 정보' 인포박스가 '생애'라는 속성을 갖고 있음을 사전에 정의된 규칙에 의해서 알고 있고, 그 값들에서 '출생연월일'과 '사망연월일' 값을 추출하기 위한 모듈이 필요함 역시 규칙으로 지정되어 있을 수 있다.For example, if there is an information box called 'person information' on the page 'Yi Soon Shin', knowing that 'person information' info box has the attribute 'life', it is known by a predefined rule, A module for extracting the 'date of death' and the 'death date' values may also be designated as a rule.

따라서, 인스턴스 획득 단계(S100)는 인포박스의 온톨로지 인스턴스로서 제1 입력 필드에 대한 제1 값을 획득한다. Thus, the instance acquiring step (SlOO) acquires the first value for the first input field as an ontology instance of the InfoBox.

다음, 온톨로지 인스턴스 비교 단계(S200)는 정보에 대하여 추가 또는 변경되는 속성에 따른 제2 입력 필드의 제2 값과 제1 값을 비교한다. Next, the ontology instance comparing step (S200) compares the second value of the second input field with the first value according to the property added or changed to the information.

상술한 예에 따라, 위키피디아의 인포박스 역시 기능 문서의 하나로 인포박스의 명칭이 변경되거나 속성명이 변경될 수 있다. 종래에는 관리자가 이와 같은 정보들을 전부 확인하여 수정해주어야 했다. 또한 각 속성 값 정보도 사전에 정의해둔 규칙으로만 분석이 되기 때문에 이후 다른 표현방식으로 속성 값을 표현하기 시작했다면 제대로 정보를 추출하기 어렵게 된다. 이와 같은 한계를 해결하기 위해서는 지속적인 관리가 필요하다. 따라서, 속성의 변경을 인지하고 이에 대한 동적인 업데이트를 위하여 본 실시예에서는 미리 결정된 데이터 정제 규칙을 가지고 있는 제1 값과, 변경된 속성에 따른 제2 값을 서로 비교하여, 매칭되는 속성을 찾는다. According to the above-described example, Wikipedia's InfoBox is also a function document, and the name of the InfoBox may be changed or the attribute name may be changed. In the past, administrators had to check and correct all such information. In addition, since each attribute value information is analyzed only by a rule defined in advance, it is difficult to extract the information properly if the attribute value is started to be expressed in a later expression manner. To overcome these limitations, continuous management is required. Therefore, in order to recognize the change of the attribute and to update it dynamically, in this embodiment, the first value having the predetermined data refining rule and the second value according to the changed attribute are compared with each other to find the matched attribute.

본 실시예에서는 정제 규칙의 동적인 업데이트를 위하여 기 구축한 인스턴스 정보를 사용한다. 문서의 데이터가 변경되더라도 한번에 변경되는 양은 구축된 전체 인스턴스가 갖고 있는 정보에 비교하면 많지 않다. 따라서 기 구축된 인스턴스가 지니고 있는 정보를 바탕으로 변경된 인포박스나 카테고리 정보를 점검하면 변경된 반정형 데이터의 의미를 추론할 수 있다.In the present embodiment, the previously constructed instance information is used for dynamic update of the refinement rule. Even if the data in the document changes, the amount of changes made at one time is not much compared to the information of the entire instance. Therefore, if the infobox or category information is checked based on the information of the constructed instance, the meaning of the changed semi-structured data can be deduced.

도 5를 참조하면, 예를 들어, 온라인 사전에 '이순신' 문서(52)에 존재하는 인포박스(54)에 '생애'라는 속성(55)으로 제1 값으로 출생연월일과 사망연월일이 기록되어 있다고 가정한다. 해당 속성(56)이 변경되어 '출생일'과 '사망일'로 나뉘어져 기록되었을 때, 기존에는 해당 정보에 대한 데이터 정제 규칙을 별도로 수정해주어야 했다. 그러나 본 발명에서는 기존에 기록되어 있던 제1 값인 출생연월일 정보가 변경된 속성의 제2 값인 '출생일' 속성에 기록되어 있고, 사망연월일이 '사망일' 속성에 기록되어 있는 것을 확인할 수 있다. Referring to FIG. 5, for example, in the online dictionary, the date of birth and the date of death are recorded in the Info box 54 existing in the 'Yi Soon Shin' document 52 as the first value as the attribute 55 'Life' . When the attribute 56 is changed and recorded as 'birth date' and 'death date', the data refining rule for the corresponding information has to be separately modified. However, according to the present invention, it can be confirmed that the previously recorded first value of birth date information is recorded in the attribute 'second birth date', and the date of death is recorded in the 'death date' attribute.

다음, 데이터 정제 규칙 정의 단계(S300)는 제1 값과 매칭되는 제2 값에 대한 데이터 정제 규칙을 제1 값에 대한 매핑으로 정의한다. Next, the data refining rule defining step S300 defines the data refining rule for the second value matching the first value as a mapping for the first value.

즉 기존 정제 규칙에 정의되어 있던 인포박스 및 그 인포박스 하위 속성들에 대한 규칙은 추후에 다시 구축될 때에도 그대로 적용 될 수 있으며, 인포박스 속성의 추가 및 변경도 가능하고, 인포박스 자체가 바뀌는 경우도 있는데, 이러한 경우를 온톨로지 인스턴스를 통해 대응 한다. That is, the rules for the InfoBox and the InfoBox sub-attributes defined in the existing refinement rule can be applied even when they are rebuilt in the future, and it is also possible to add and change the InfoBox attributes, and when the InfoBox itself changes There is also an instance of ontology in this case.

상술한 예에서는 기존에 규칙에 존재하지 않았던 '장군 정보'라는 인포박스가 생성된 상황으로, '장군 정보' 인포박스에는 '출생일'과 '사망일'이라는 속성을 지니고 있고, 이 값이 기존의 생애의 값들과 매칭되며, 이럴 경우 기존의 '인물 정보'의 '생애' 속성에 대한 데이터 정제 규칙과 '장군 정보'의 '출생일'과 '사망일' 속성에 대한 데이터 젱제 규칙간의 매핑 정보를 구성한다. In the above example, there is a situation in which an information box called 'general information' which was not existing in the existing rule is created, and 'general information' info box has attributes such as 'birth date' and 'date of death' In this case, the data refinement rule for the 'lifetime' attribute of the existing 'person information' and the mapping information between the data item rule for the 'birth date' and the 'death date' attributes of 'general information' are configured.

도 6과 같은 테이블을 통해 기존의 제1 입력 필드에 대한 제1 값이 변경된 속성의 제2 입력 필드의 제2 값이 동일함을 확인 하였는바, 제1 값에 대한 데이터 정제 규칙을 제2 값에 대한 데이터 정제 규칙으로 매핑시킨다.6, it is confirmed that the second value of the second input field of the changed attribute of the first value of the existing first input field is the same, and the data refining rule for the first value is the second value of the second value To the data-refining rules for < RTI ID = 0.0 >

다음, 온톨로지 인스턴스 갱신 단계(S400)는, 다른 정보에 대하여 추가 또는 변경되는 속성에 따른 제2 입력 필드의 값을 제2 값에 대하여 정의된 데이터 정제 규칙을 통해 추출되는 값으로 갱신한다.Next, the ontology instance updating step (S400) updates the value of the second input field according to the attribute to be added or changed with respect to other information to a value extracted through the data refining rule defined for the second value.

즉, 해당 인포박스를 사용하는 다른 문서에서는 '출생일'과 '사망일' 속성을 확인하고 데이터를 추출하도록 규칙을 생성하여 기존 과정을 동적으로 처리한다.That is, other documents using the InfoBox dynamically process existing processes by checking the 'birthdate' and 'date of death' attributes and creating rules to extract the data.

이하 도 2를 참조하여, 본 실시예에서 이용되는 온톨로지 인스턴스의 생성 방법에 대하여 보다 상세히 설명한다. Hereinafter, a method of generating an ontology instance used in the present embodiment will be described in more detail with reference to FIG.

도 2를 참조하면, 온톨로지 인스턴스 생성 방법은 온라인 사전 데이터 입력 단계(S10), 반정형 데이터 분리 단계(S20), 데이터 정제 단계(S30), 온톨로지 인스턴스 생성 단계(S40)를 포함한다.Referring to FIG. 2, the ontology instance creation method includes an online dictionary data input step S10, a semi-structured data separation step S20, a data refinement step S30, and an ontology instance creation step S40.

온라인 사전 데이터 입력 단계(S10)는 네트워크상에서 정보를 입력 받는 단계로, 온라인 사전의 공개된 덤프 파일의 버전을 확인하여 최신으로의 변경이 확인되면 온라인 사전에 대한 원본 데이터를 입력 받는다. The online dictionary data input step (S10) is a step of inputting information on the network. The online dictionary data input step (S10) checks the version of the dump file of the online dictionary and receives the original data for the online dictionary when the latest change is confirmed.

반정형 데이터 분리 단계(S20)는 네트워크상에서 입력 받은 정보에서 속성에 따른 값으로 표현하는 반정형 데이터를 분리한다. 본 실시예에서 반정형 데이터를 분리하는 단계는 입력 필드가 정의된 테이블 형태의 정보를 갖는 제1 형식 데이터와, 네트워크상에서 입력 받은 정보의 분류 정보를 갖는 제2 형식 데이터로 분리할 수 있다. 즉 입력된 데이터 중 반정형 형태의 자동분석이 가능한 데이터를 분리하는 단계로 제1 형식 데이터인 인포박스를 분리하는 단계와 제2 형식 데이터인 카테고리를 분리하는 단계로 구성될 수 있다. The semi-structured data separation step (S20) separates the semi-structured data represented by values according to attributes in the information input on the network. In this embodiment, the step of separating the semi-structured data can be divided into the first format data having the information of the table type for which the input field is defined and the second format data having the classification information of the information inputted on the network. That is, separating the data that can be automatically analyzed in the semi-regular form of the input data, separating the infobox as the first format data and separating the category as the second format data.

데이터 정제 단계(S30)는 반정형 데이터를 미리 결정된 데이터 정제 규칙에 따라 정제한다. 다음 온톨로지 인스턴스 생성 단계(S40)는 정제되어 반정형 데이터로부터 추출된 인스턴스에 대한 데이터 타입 또는 속성 정보를 설정하여 온톨로지 인스턴스를 생성한다.The data refining step S30 refines the semi-regular data according to a predetermined data refining rule. The next ontology instance creation step S40 is refined to create an ontology instance by setting data type or attribute information for the instance extracted from the semi-structured data.

이상의 단계를 통해 생성된 온톨로지 인스턴스를 이용하여, 본 실시예에 따른 온톨로지 인스턴스 구축 방법은 인포박스나 카테고리 정보를 점검하여 변경된 반정형 데이터의 의미를 추론한다. 본 발명에서 온라인 사전 데이터를 지속적으로 업데이트 받고 데이터가 인스턴스화 되는 일련의 과정을 자동화할 수 있다.Using the ontology instance generated through the above steps, the ontology instance building method according to the present embodiment inspects the info box or category information to infer the meaning of the changed semi-structured data. In the present invention, it is possible to continuously update the online dictionary data and automate a series of processes in which data is instantiated.

이하, 도 3 및 도 4를 참조하여 상술한 실시예에 따른 온톨로지 인스턴스 구축 방법을 수행하는 시스템에 대하여 설명한다. Hereinafter, a system for performing an ontology instance building method according to the above-described embodiment will be described with reference to FIGS. 3 and 4. FIG.

본 실시예에 따른 온톨로지 인스턴스 구축 시스템은 온라인 사전 추출부(100), 온라인 사전 파싱부(200), 온톨로지 인스턴스 생성부(400), 데이터 정제 규칙 분석부(300)를 포함한다. The ontology instance construction system according to the present embodiment includes an on-line pre-extraction unit 100, an on-line dictionary parsing unit 200, an ontology instance generation unit 400, and a data refining rule analysis unit 300.

본 실시예에서 온라인 사전 추출부(100)는 버전 검사기(110)를 통해 온라인 사전(10)의 공개된 덤프 파일의 버전을 확인하여 최신으로의 변경이 확인되면 데이터 추출기(120)를 통해 온라인 사전에 대한 원본 데이터를 추출하여 원본 데이터 데이터 베이스(130)에 기록한다. In this embodiment, the online dictionary extraction unit 100 checks the version of the published dump file of the online dictionary 10 through the version checker 110 and, if a latest change is confirmed, And records the extracted original data in the original data database 130.

온라인 사전 파싱부(200)는 기록된 데이터 중 반정형 형태의 자동분석이 가능한 데이터를 분리하며, 구체적으로 인포박스를 분리하는 파서(210)와 카테고리를 분리하는 파서(220)를 포함한다. 이렇게 파서에 의해 분리된 데이터는 각각의 인포박스 및 카테고리 데이터 베이스(411, 421)에 저장된다.The on-line pre-parsing unit 200 separates data that can be automatically analyzed in semi-regular form among the recorded data, and specifically includes a parser 210 for separating infobox and a parser 220 for separating categories. The data thus separated by the parser is stored in the respective infobox and category databases 411 and 421.

온톨로지 인스턴스 생성부(400)는 반정형 데이터를 미리 결정된 데이터 정제 규칙에 따라 정제하고, 반정형 데이터로부터 추출된 인스턴스에 대한 데이터 타입 또는 속성 정보를 설정하여 온톨로지 인스턴스를 생성한다.The ontology instance creation unit 400 creates an ontology instance by refining the semi-structured data according to a predetermined data refining rule and setting the data type or attribute information for the extracted instance from the semi-structured data.

도 4를 참조하면, 본 실시예에 따른 온톨로지 인스턴스 생성부(400)는 인포박스 정제기(410), 카테고리 정제기(420), 통합 정제 온라인 사전 데이터 베이스(430), 인스턴스 분석기(440), 인스턴스 생성기(450)를 포함할 수 있다. 4, the ontology instance generator 400 according to the present embodiment includes an infobox purifier 410, a category refiner 420, an integrated tablet online dictionary database 430, an instance analyzer 440, (450).

인포박스 정제기(410)는 인포박스 데이터 베이스(411)에 기록된 데이터를 인포박스 정제 규칙 데이터 베이스(412)가 적용된 정제 모듈(413, 414)로 정제된다. The information box purifier 410 is refined with the refinement modules 413 and 414 to which the information box refinement rule database 412 is applied, the data recorded in the information box database 411.

인포박스 통합 정제 모듈(413)은 모든 인포박스 정보에 대해 공통적으로 수행하며, 인포박스-속성 페어 정제 모듈(414)은 특정 인포박스-속성 페어에서만 필요한 정제 과정을 수행한다. The infobox-integrated refinement module 414 performs common processing for all infobox information, and the infobox-property pair refinement module 414 performs a refinement process necessary only for a specific infobox-property pair.

카테고리 정제기(420)는 카테고리 데이터 베이스(421)에 기록된 데이터를 정제규칙 데이터 베이스가(422)가 적용된 정제 모듈(423)로 정제된다. The category purifier 420 is refined to the refinement module 423 to which the refinement rule database 422 is applied.

인포박스 및 카테고리 정제기(410, 420)에 의해 정제된 데이터는 통합 정제 온라인 사전 데이터 베이스(430)에 기록되며 인스턴스 분석기(440)에 의해 최종적인 지식 형태의 구조를 형성한다. The data refined by the infobox and category refiners 410 and 420 is recorded in the integrated refinement online dictionary database 430 and is formed by the instance analyzer 440 to form the structure of the final knowledge form.

인스턴스 분석기(440)는 인스턴스의 타입과 그 외 속성 설정 과정으로 나뉘어 동작하며 타입 설정은 인포박스 및 카테고리 기반 타입 분석기 (443, 444)가 타입 매핑 규칙 데이터베이스(442)를 적용하여 인스턴스 메타데이터를 구축하며, 그 외의 속성 설정은 속성 매퍼(445)에 의해 설정되고, 인포박스와 카테고리의 인스턴스 통합기(446)에 의해 두 가지 구축과정의 결과가 통합되어 인스턴스 메타데이터 데이터 베이스(447)에 기록된다. The instance analyzer 440 is divided into an instance type and other attributes setting process. In the type setting, the infobox and category-based type analyzers 443 and 444 apply the type mapping rule database 442 to construct instance metadata The other attribute settings are set by the attribute mapper 445, and the results of the two construction processes are integrated by the instance integrator 446 of the information box and category and recorded in the instance metadata database 447 .

인스턴스 생성기(450)는 인스턴스 메타데이터 데이터 베이스(447)의 데이터를 온톨로지 인스턴스 형식으로 변환하기 위하여 온톨로지 변환기(451)에 의해 온톨로지 구조로 변경하고, URI(Uniform Resource Identifier) 정제기(452)가 URI를 인식할 수 있는 형태로 정제하며, 이를 인스턴스 검증기(453)가 이를 검증한다. The instance generator 450 changes the ontology structure by the ontology converter 451 to convert the data of the instance metadata database 447 into the ontology instance format and the URI And the instance verifier 453 verifies it.

이때 검증된 온톨로지 인스턴스는 인스턴스 데이터 베이스(500)에 저장되며, 구체적으로 인스턴스 데이터로 온라인 사전 자동추출 과정에 의해 구축된 인스턴스는 온라인 사전 자동 추출 인스턴스 데이터 베이스(510)에 기타 큐레이션이나 다른 온톨로지 변환 등에 의해 구축된 인스턴스는 기타 인스턴스 데이터 베이스(520)에 저장된다.In this instance, the validated ontology instance is stored in the instance database 500. Specifically, the instance constructed by the on-line dictionary automatic extraction process using the instance data is stored in the on-line dictionary automatic extraction instance database 510 as other curation or other ontology transformation And the like are stored in the other instance database 520.

다음 데이터 정제 규칙 분석부(300)는 상술한 도 1에 따른 데이터 정제 규칙을 정의하는 단계로서, 규칙 검증기(310)와 규칙을 분석하여 재정의하는 규칙 변환기(320)를 포함한다. 즉, 규칙 검증기(310)를 통해 기 구축된 온톨로지 인스턴스를 활용하여 데이터 정제 규칙의 유효성을 검증할 수 있으며, 규칙 변환기(320)는 인포박스의 속성 변경에 따라 데이터 정제 규칙의 재정의가 필요한 경우 이를 변환하고, 온톨로지 인스턴스를 구축한다. The following data refinement rule analyzing unit 300 defines a data refinement rule according to the above-described FIG. 1, and includes a rule verifier 310 and a rule converter 320 for analyzing and redefining rules. That is, the validity of the data refinement rule can be verified by utilizing the pre-built ontology instance through the rule verifier 310. If the redefinition of the data refinement rule is required according to the property change of the information box, Transforms, and builds an ontology instance.

상술한 실시예에 따르면, 생성된 제1 온톨로지 인스턴스와, 반정형 데이터에 대하여 추가 또는 변경되는 속성에 따른 입력 필드의 제2 온톨로지 인스턴스를 비교하여 제2 온톨로지 인스턴스에 대한 데이터 정제 규칙을 제1 온톨로지 인스턴스에 대한 매핑으로 정의한다.According to the above-described embodiment, the generated first ontology instance is compared with the second ontology instance of the input field according to the attributes added or changed to the semi-structured data, and the data refining rules for the second ontology instance are compared with the first ontology instance Define this as a mapping to an instance.

나아가, 도시하지는 않았으나, 본 발명에 따른 온톨로지 인스턴스 구축 시스템은 다른 반정형 데이터에 대하여 추가 또는 변경되는 속성에 따른 제2 입력 필드의 값을 제2 온톨로지 인스턴스에 대하여 정의된 데이터 정제 규칙을 통해 추출되는 온톨로지 인스턴스로 갱신하는 갱신부를 더 포함할 수 있다.In addition, although not shown, the ontology instance construction system according to the present invention extracts the value of the second input field according to the attributes added or changed for the other semi-structured data through the data refining rules defined for the second ontological instance And an update unit that updates the ontology instance with the ontology instance.

이상의 본 발명에 따르면, 온라인 사전의 인스턴스 구축을 위해 기존에 지속적으로 관리해야 했던 데이터 정제 규칙 정보를 동적으로 관리할 수 있도록 처리함으로써 이에 대한 비용이 줄어들고 인스턴스의 정보를 효과적으로 유지할 수 있다.According to the present invention, the data refinement rule information that has been previously managed to be managed in order to construct an instance of the on-line dictionary can be dynamically managed so that the cost of the data can be reduced and the information of the instance can be effectively maintained.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위 내에서 다양한 수정, 변경 및 치환이 가능할 것이다. It will be apparent to those skilled in the art that various modifications, substitutions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims. will be.

따라서, 본 발명에 개시된 실시예 및 첨부된 도면들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예 및 첨부된 도면에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구 범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리 범위에 포함되는 것으로 해석되어야 할 것이다.Therefore, the embodiments disclosed in the present invention and the accompanying drawings are intended to illustrate and not to limit the technical spirit of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments and the accompanying drawings . The scope of protection of the present invention should be construed according to the following claims, and all technical ideas within the scope of equivalents should be construed as falling within the scope of the present invention.

Claims

Obtaining a first value for a first input field having a predetermined data refinement rule to extract a value of an input field according to an attribute defined for information to be acquired;
Comparing a second value of a second input field with a first value according to an attribute added or changed with respect to the information; And
And defining a data refinement rule for the second value matching the first value as a mapping for the first value. A method for constructing a semi-structured data-based dynamic ontology instance according to an attribute change

The method according to claim 1,
The ontology instance building method includes:
Updating the value of the second input field according to the added or changed attribute with respect to the other information to a value extracted through the defined data refining rule with respect to the second value. A Method for Constructing Dynamic Ontology Instances Based on Semantically Modified Data

The method according to claim 1,
The first and second values are ontology instances,
Separating semi-structured data represented by a value according to the attribute from information received on a network;
Purifying the semi-formed data according to the predetermined data refining rule;
And generating an ontology instance by setting a data type or attribute information of the extracted instance from the semi-structured data. The method of claim 1, wherein the step of constructing the semi-structured data-based dynamic ontology instance comprises:

The method of claim 3,
Separating the semi-structured data includes first type data having information of a table type in which the input field is defined,
And separating the input data into second type data having classification information of information input on the network.

The method of claim 3,
Wherein the step of generating the ontology instance comprises generating metadata for the instance by setting a data type for the instance according to the mapping rule information of the instance, and constructing the semi-structured data-based dynamic ontology instance according to the attribute change

A parser for separating semi-structured data represented by a value according to an attribute defined in information input on a network;
A generator for generating an ontology instance by refining the semi-structured data according to a predetermined data refining rule and setting data type or attribute information for the extracted instance from the refined semi-structured data; And
Comparing the generated first ontology instance with a second ontology instance of an input field according to an attribute added or changed with respect to the semi-structured data, and comparing the data refining rule for the second ontology instance with the second ontology instance for the first ontology instance And a rule conversion unit that defines the dynamic ontology instance as a mapping.

The method according to claim 6,
The semi-structured data-based dynamic ontology instance construction system comprises:
And an update unit for updating the value of the second input field according to the added or changed attribute with respect to the other semi-structured data to the ontology instance extracted through the defined data refining rule for the second ontological instance Dynamic Ontology Instance Construction System based on semi-structured data based on property changes