KR102121504B1

KR102121504B1 - System and method for building integration knowledge data base based a plurality of data sources

Info

Publication number: KR102121504B1
Application number: KR1020180151227A
Authority: KR
Inventors: 이경일; 장정호
Original assignee: 주식회사 솔트룩스
Priority date: 2018-11-29
Filing date: 2018-11-29
Publication date: 2020-06-10

Abstract

A system for building a knowledge database according to the present invention includes: a service providing unit that provides service data requested to an application with reference to the knowledge database, a service data management unit that manages an ontology schema defined according to the service purpose; an interface unit that receives open data from different external data sources; and a data integration unit for generating a plurality of integrated data based on a data integration rule generated by analyzing the relationship between the plurality of open data and constructing the knowledge database based on the integrated data and the ontology schema. The data integration unit includes: an instance distributor for distributing a plurality of instances of the ontology schema for the knowledge database construction work based on curation module information representing the characteristics of a curation module and feedback information representing the result of the work performed by the curation module; a curation module group provided with a plurality of curation modules that individually perform the knowledge database construction work by receiving the plurality of instances distributed from the instance distributor, respectively; and a verifier for generating feedback information by verifying the result of the knowledge database construction work of the curation module group.

Description

System and method for building a knowledge database based on multiple data sources{SYSTEM AND METHOD FOR BUILDING INTEGRATION KNOWLEDGE DATA BASE BASED A PLURALITY OF DATA SOURCES}

본 발명의 기술적 사상은 지식 데이터 베이스 구축 시스템에 관한 것으로, 더욱 상세하게는, 복수의 데이터 소스들로부터 수집한 데이터들을 기반으로 효율적인 방식으로 지식 데이터 베이스를 구축하는 지식 데이터 베이스 구축 시스템 및 이의 방법에 관한 것이다.The technical idea of the present invention relates to a knowledge database building system, and more particularly, to a knowledge database building system and method for building a knowledge database in an efficient manner based on data collected from a plurality of data sources. It is about.

본 발명은 과학기술정보통신부의 정보통신, 방송 기술개발사업의 일환으로 (주)솔트룩스가 주관하고 연구하여 수행된 연구로부터 도출된 것이다. [연구기간: 2018.01.01~2018.12.31, 연구관리 전문기관: 정보통신기술연구진흥센터, 연구과제명: WiseKB: 빅데이터 이해 기반 자가학습형 지식베이스 및 추론 기술 개발, 과제 고유번호: 2013-0-00109]The present invention is derived from research conducted by Saltlux Co., Ltd. as a part of the information and communication technology development project of the Ministry of Science and ICT. [Research period: 2018.01.01~2018.12.31, Research institute: Information and communication technology research promotion center, Research project name: WiseKB: Development of self-learning knowledge base and reasoning technology based on big data understanding, task identification number: 2013- 0-00109]

링크드 데이터(linked data) 관련 연구/상용 프로젝트들에 의해 구축된 시스템들의 구현에 있어서 데이터 포맷, 프로토콜이 W3C에 의해 제정된 표준을 따라 구축되고 있다. W3C에서는 2012년 10월부터 'Linked Data Platform 1.0' 이라는 권고 표준을 제안하였고, 2015년 2월까지 꾸준하게 개정하고 있으며, 'Linked Data Platform 1.0'는 링크드 데이터의 정의 포맷과 통신 프로토콜 등을 정의하고 있다. 이러한 규격들을 기반으로 다양한 링크드 데이터의 변환 및 관리 도구, 서비스들이 개발되고 있다. 또한, 외부의 데이터 베이스들에 저장된 오픈 데이터를 링크드 데이터로 변환하고 관리하는 데이터 서비스 시스템에 대한 개발이 활발하게 진행되고 있다.In the implementation of systems built by linked data research/commercial projects, data formats and protocols are being built in accordance with the standards established by the W3C. W3C proposed the recommended standard of'Linked Data Platform 1.0' from October 2012, and has been revised steadily until February 2015, and'Linked Data Platform 1.0' defines the definition format and communication protocol of Linked Data. . Based on these standards, various linked data conversion and management tools and services are being developed. In addition, development of a data service system that converts and manages open data stored in external databases into linked data has been actively conducted.

본 발명의 기술적 사상이 해결하려는 과제는 복수의 데이터 소스들 기반 지식 데이터 베이스를 효율적으로 구축하기 위한 지식 데이터 베이스 구축 시스템 및 방법을 제공한다.The problem to be solved by the technical idea of the present invention is to provide a knowledge database construction system and method for efficiently constructing a knowledge database based on a plurality of data sources.

본 개시에 따른 지식 데이터 베이스 구축 시스템은, 어플리케이션에 요청된 서비스 데이터를 지식 데이터 베이스를 참조하여 제공하는 서비스 제공부, 상기 서비스 목적에 따라 정의된 온톨리지 스키마를 관리하는 서비스 데이터 관리부, 외부의 서로 다른 데이터 소스들로부터 오픈 데이터들을 입력받는 인터페이스부 및 상기 복수의 오픈 데이터들의 관계를 분석하여 생성한 데이터 통합 규칙을 기반으로 복수의 통합 데이터들을 생성하고, 상기 통합 데이터들 및 상기 온톨로지 스키마를 기반으로 상기 지식 데이터 베이스를 구축하는 데이터 통합부를 포함하며, 상기 데이터 통합부는, 큐레이션 모듈의 특성을 나타내는 큐레이션 모듈 정보 및 큐레이션 모듈의 작업 수행 결과를 나타내는 피드백 정보를 기반으로 상기 지식 데이터 베이스 구축 작업을 위하여 상기 온톨로지 스키마의 복수의 인스턴스(instance) 들을 분배하는 인스턴스 분배기, 상기 인스턴스 분배기로부터 분배된 상기 복수의 인스턴스들을 각각 수신하여 상기 지식 데이터 베이스 구축 작업을 개별적으로 수행하는 복수의 큐레이션 모듈들이 구비된 큐레이션 모듈 그룹 및 상기 큐레이션 모듈 그룹의 상기 지식 데이터 베이스 구축 작업의 결과를 검증하여 상기 피드백 정보를 생성하는 검증기를 포함하는 것을 특징으로 한다.The knowledge database construction system according to the present disclosure includes a service providing unit providing service data requested to an application with reference to a knowledge database, a service data management unit managing an ontology schema defined according to the service purpose, and external parties Based on the integrated data generated by analyzing the relationship between the interface unit receiving the open data from other data sources and the plurality of open data, and generating a plurality of integrated data, based on the integrated data and the ontology schema And a data integration unit for constructing the knowledge database, wherein the data integration unit is based on the curation module information indicating characteristics of the curation module and feedback information indicating a result of performing the operation of the curation module. For this, an instance distributor that distributes a plurality of instances of the ontology schema, and a plurality of curation modules that individually receive the plurality of instances distributed from the instance distributor and individually perform the knowledge database construction task are provided. And a validator for verifying the result of the knowledge database construction work of the curation module group and the curation module group, and generating the feedback information.

또한, 본 개시의 일 실시 예에 있어서, 상기 인스턴스 분배기는, 상기 큐레이션 모듈 정보를 기반으로 상기 큐레이션 모듈들 각각이 수행 가능한 인스턴스의 타입에 따라 상기 온톨로지 스키마의 복수의 인스턴스들을 나누어 상기 큐레이션 모듈들에 분배하는 것을 특징으로 한다.In addition, in one embodiment of the present disclosure, the instance divider divides the plurality of instances of the ontology schema according to the type of an instance that each of the curation modules can perform based on the curation module information, and the curation It is characterized by distributing to modules.

또한, 본 개시의 일 실시 예에 있어서, 상기 인스턴스 분배기는, 상기 온톨로지 스키마의 인스턴스에 대응하는 복수의 속성들 중 상기 인스턴스의 타입을 고려하여 일부만을 선택해 상기 구축 작업을 분배하는 것을 특징으로 한다.In addition, in one embodiment of the present disclosure, the instance distributor is characterized in that a part of the attributes corresponding to the instance of the ontology schema is selected in consideration of the type of the instance to distribute the construction task.

또한, 본 개시의 일 실시 예에 있어서, 상기 큐레이션 모듈은, 상기 인스턴스 분배기로부터 수신된 상기 지식 데이터 베이스 구축 작업의 복수의 인스턴스들에 대한 처리가 가능한지 여부를 판별하여, 판별 결과를 상기 검증기에 제공하고, 상기 판별 결과를 기반으로 상기 지식 데이터 베이스 구축 작업을 수행하는 것을 특징으로 한다.In addition, in one embodiment of the present disclosure, the curation module determines whether processing for a plurality of instances of the knowledge database construction job received from the instance distributor is possible, and the determination result is sent to the verifier. And providing the knowledge database based on the discrimination result.

또한, 본 개시의 일 실시 예에 있어서, 상기 큐레이션 모듈은, 상기 온톨로지 스키마를 기반으로 수신된 상기 지식 데이터 베이스 구축 작업의 상기 복수의 인스턴스들 외에도 처리 가능한 적어도 하나의 인스턴스를 추가하는 인스턴스 추가 선택부를 더 포함하는 것을 특징으로 한다.In addition, in one embodiment of the present disclosure, the curation module selects an instance to add at least one processable instance in addition to the plurality of instances of the knowledge database construction job received based on the ontology schema Characterized in that it further comprises a wealth.

또한, 본 개시의 일 실시 예에 있어서, 상기 큐레이션 모듈은, 상기 복수의 인스턴스들에 대한 상기 지식 데이터 베이스 구축 작업을 완료한 때에, 작업 완료를 알리는 신호를 상기 인스턴스 분배기에 제공하는 상태 출력부를 더 포함하고, 상기 인스턴스 분배기는, 상기 작업 완료를 알리는 신호에 응답하여 상기 큐레이션 모듈 그룹 내의 다른 큐레이션 모듈로부터 처리 불가 판별을 받은 적어도 하나의 인스턴스를 상기 큐레이션 모듈에 재분배하는 것을 특징으로 한다.In addition, in one embodiment of the present disclosure, the curation module, when completing the task of constructing the knowledge database for the plurality of instances, provides a status output unit that provides a signal to the instance distributor to inform the completion of the task Further comprising, the instance distributor is characterized in that in response to the signal indicating the completion of the task, redistributing at least one instance that is not processed by another curation module in the curation module group to the curation module .

또한, 본 개시의 일 실시 예에 있어서, 상기 큐레이션 모듈 그룹 내의 큐레이션 모듈 별로 검증 결과를 누적하고, 상기 검증 결과를 기반으로 상기 복수의 큐레이션 모듈들의 성능에 대한 정보를 지속적으로 업데이트하고, 상기 피드백 정보는 상기 성능에 대한 정보를 포함하는 것을 특징으로 한다.In addition, in one embodiment of the present disclosure, verification results are accumulated for each curation module in the curation module group, and information on performance of the plurality of curation modules is continuously updated based on the verification result, The feedback information is characterized by including information on the performance.

또한, 본 개시의 일 실시 예에 있어서, 상기 검증기는, 상기 큐레이션 모듈 그룹 내의 큐레이션 모듈 별로 수행된 상기 지식 데이터 베이스 구축 작업을 검증하고, 상기 검증 결과를 상기 큐레이션 모듈들에 제공하고, 상기 큐레이션 모듈들은, 상기 검증 결과를 기반으로 상기 지식 데이터 베이스 구축 작업을 재수행하거나, 상기 지식 데이터 베이스 구축 작업을 완료하는 것을 특징으로 한다.In addition, in one embodiment of the present disclosure, the verifier verifies the knowledge database construction work performed for each curation module in the curation module group, and provides the verification results to the curation modules, The curation modules are characterized by performing the knowledge database construction task again based on the verification result or completing the knowledge database construction task.

본 개시의 일 실시예에 따른 지식 데이터 베이스 구축 시스템은 큐레이션 모듈의 특성, 큐레이션 모듈의 작업 수행 결과에 따라 온톨로지 스키마의 인스턴스들을 분배하여 큐레이션 모듈들에 제공함으로써 효율적인 지식 데이터 베이스 구축 작업을 수행할 수 있는 효과가 있다.The knowledge database construction system according to an embodiment of the present disclosure provides efficient knowledge database construction by distributing instances of ontology schemas to the curation modules according to the characteristics of the curation module and the result of performing the operation of the curation module. There is an effect that can be performed.

도 1은 본 발명의 일 실시예에 따른 지식 데이터 베이스 구축 시스템을 개략적으로 나타내는 블록도이다.
도 2는 본 개시의 일 실시 예에 따른 인터페이스부를 나타내는 블록도이다.
도 3은 본 개시의 일 실시 예에 따른 서비스 데이터 관리부를 나타내는 블록도이다.
도 4는 본 개시의 일 실시 예에 따른 데이터 통합부를 나타내는 도면이다.
도 5는 본 개시의 일 실시 예에 따른 인스턴스 분배기를 나타내는 블록도이다.
도 6은 본 개시의 일 실시 예에 따른 큐레이션 모듈 그룹을 나타내는 블록도이다.
도 7은 본 개시의 일 실시 예에 따른 검증기를 나타내는 블록도이다.1 is a block diagram schematically showing a knowledge database building system according to an embodiment of the present invention.
2 is a block diagram illustrating an interface unit according to an embodiment of the present disclosure.
3 is a block diagram illustrating a service data management unit according to an embodiment of the present disclosure.
4 is a diagram illustrating a data integration unit according to an embodiment of the present disclosure.
5 is a block diagram illustrating an instance distributor according to an embodiment of the present disclosure.
6 is a block diagram illustrating a group of curation modules according to an embodiment of the present disclosure.
7 is a block diagram illustrating a verifier according to an embodiment of the present disclosure.

이하, 첨부한 도면을 참조하여 본 발명의 실시예에 대해 상세히 설명한다. 본 발명의 실시예는 당 업계에서 평균적인 지식을 가진 자에게 본 발명을 보다 완전하게 설명하기 위하여 제공되는 것이다. 본 발명은 다양한 변경을 가할 수 있고 여러 가지 형태를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 개시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용한다. 첨부된 도면에 있어서, 구조물들의 치수는 본 발명의 명확성을 기하기 위하여 실제보다 확대하거나 축소하여 도시한 것이다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. The embodiments of the present invention are provided to more fully describe the present invention to those skilled in the art. The present invention can be applied to various changes and may have various forms, and specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the present invention to a specific disclosure form, and it should be understood that it includes all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. In describing each drawing, similar reference numerals are used for similar components. In the accompanying drawings, the dimensions of the structures are enlarged or reduced than actual ones for clarity of the present invention.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성 요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in this application are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, terms such as “include” or “have” are intended to indicate the presence of features, numbers, steps, actions, elements, parts or combinations thereof described in the specification, but one or more other features. It should be understood that the existence or addition possibilities of fields or numbers, steps, actions, components, parts or combinations thereof are not excluded in advance.

또한, 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로 사용될 수 있다. 예를 들어, 본 발명의 권리 범위로부터 벗어나지 않으면서, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. Further, terms such as first and second may be used to describe various components, but the components should not be limited by the terms. The terms may be used for the purpose of distinguishing one component from other components. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 갖는다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person skilled in the art to which the present invention pertains. Terms such as those defined in a commonly used dictionary should be interpreted as having meanings consistent with meanings in the context of related technologies, and should not be interpreted as ideal or excessively formal meanings unless explicitly defined in the present application. Does not.

본 발명의 실시예들을 설명하기에 앞서 이하에서 사용할 용어들과 개념을 간략히 설명한다. Before describing embodiments of the present invention, terms and concepts to be used below will be briefly described.

시맨틱 기술은 사람이 화면을 읽고 의미를 이해하는 것처럼 컴퓨터가 이해할 수 있는 언어와 규칙들을 정해 놓아 컴퓨터 사이의 의사소통을 가능하게 하는 지능형 기술을 의미한다. 시맨틱 기술에서는 해당 환경에 속하는 객체들 간의 관계-의미 정보(semanteme)를 기계, 즉 컴퓨터가 처리할 수 있는 온톨로지(ontology) 형태로 표현하고, 이를 자동화된 기계가 처리하도록 하는 것을 목표로 하고 있다. 온톨로지란 사람들이 사물에 대해 생각하는 바를 추상화하고 공유한 모델로서, 정형화되고 개념의 유형이나 사용상의 제약 조건들이 명시적으로 정의된 기술을 말한다.　 컴퓨터 과학 분야에서 온톨로지는 특정한 도메인을 표현하는 데이터 모델로서 특정한 도메인(domain)에 속하는 개념과 개념 사이의 관계를 기술하는 정형 데이터로 정의된다. 온톨로지는 시맨틱 기술을 구현할 수 있는 도구로써, 데이터를 의미적으로 연결할 수 있는 도구로 사용되며, 컴퓨터에서 사람이 갖고 있는 사물에 대한 개념을 일종의 데이터베이스의 형태로 가공하여 처리할 수 있도록 해 준다. Semantic technology is an intelligent technology that enables communication between computers by setting languages and rules that computers can understand, just like a person reads a screen and understands meaning. The semantic technology aims to express the relationship-semantic information (semanteme) between objects belonging to the environment in the form of an ontology that can be processed by a machine, that is, a computer. Ontology is a model that abstracts and shares what people think about things, and is a technique that is formalized and the types of concepts or constraints on use are explicitly defined. In the field of computer science, an ontology is a data model representing a specific domain, and is defined as structured data that describes concepts and relationships between concepts belonging to a specific domain. Ontology is a tool that can implement semantic technology, and is used as a tool to connect data semantically, and allows the computer to process and process the concept of objects in the form of a database.

온톨로지의 구성 요소는 인스턴스(instance), 클래스(class), 관계(relation), 속성(property)으로 구분할 수 있다. 클래스는 일반적으로 우리가 사물이나 개념 등에 붙이는 이름을 말한다고 설명할 수 있다. "키보드", "모니터", "사랑"과 같은 것은 모두 클래스라고 할 수 있다. 반면, 인스턴스는 사물이나 개념이 구체물이나 사건 등의 실질적인 형태로 나타난 그 자체를 의미한다. 즉, "LG전자 ST-500 슬림키보드", "삼성 싱크마스터 Wide LCD 모니터", "로미오와 줄리엣의 사랑"은 일반적으로 인스턴스라 볼 수 있다. 이와 같은 클래스와 인스턴스의 구분은 응용과 사용목적에 따라서 매우 달라질 수 있다. 즉, 같은 표현의 개체가 어떠한 경우에는 클래스가 되었다가 다른 경우에는 인스턴스가 될 수 있다.Ontology components can be classified into instance, class, relationship, and property. We can explain that a class is a name we usually give to things or concepts. "Keyboard", "monitor", and "love" are all classes. On the other hand, an instance means itself in which objects or concepts appear in a practical form such as a concrete object or an event. In other words, "LG Electronics ST-500 Slim Keyboard", "Samsung Sync Master Wide LCD Monitor", "Romeo and Juliet's Love" can be considered as instances. Classes and instances can be very different depending on the application and purpose of use. That is, an object of the same expression can be a class in some cases and an instance in other cases.

관계는 클래스/인스턴스 간에 존재하는 관계들을 칭하며, 일반적으로 분류적인 관계(taxonomic relation)와 비분류적인 관계(non-taxonomic relation)로 구분할 수 있다. 분류적인 관계는 클래스/인스턴스들의 개념 분류를 위하여, 보다 폭넓은 개념과 구체적인 개념들로 구분하여 계층적으로 표현하는 관계이다. 예를 들어, "사람은 동물이다"와 같은 개념 간 포함관계를 나타내기 위한 "isA" 관계가 그것이다. 분류적인 관계가 아닌 관계를 비분류적인 관계라 한다. 예를 들어, "운동으로 인해 건강해진다"는 것은 "cause" 관계(인과관계)를 이용하여 표현한다. Relationships refer to relationships that exist between classes/instances, and can be generally classified into a taxonomic relation and a non-taxonomic relation. The taxonomy relationship is a hierarchical expression that is classified into broader and concrete concepts for class/instance concept classification. For example, it is the "isA" relationship to indicate the inclusion relationship between concepts such as "people are animals." Relationships that are not classified are called unclassified. For example, "being healthy through exercise" is expressed using a "cause" relationship (causality).

속성은 클래스나 인스턴스의 특정한 성질/성향 등을 나타내기 위하여, 클래스나 인스턴스를 특정한 값(value)와 연결시킨 것이다. 예를 들어, "삼성 싱크마스터 Wide LCD 모니터는 24인치이다."라는 것을 표현하기 위하여, hasSize와 같은 속성을 정의할 수 있다.Attributes are classes or instances that are associated with specific values to indicate a specific property/property of the class or instance. For example, to express "Samsung Syncmaster Wide LCD monitor is 24 inches.", an attribute such as hasSize can be defined.

일반적으로 관계와 속성은 굳이 구분하여 칭하지 않는 경우가 많다. "isA(사람, 동물)", "cause(운동, 건강)", "hasSize(삼성 싱크마스터 Wide LCD, 24 인치)"와 같은 실제 클래스/인스턴스들 사이의 관계로 선언한 관계/속성을 관계/속성 인스턴스(relation/property instance)라고 부르기도 하는데, 이는 "isA", "cause", "hasSize" 등과 같이 정의하여 명명한 관계/속성과의 구분을 위하여서이다.In general, relationships and attributes are often not distinguished. Relationships/properties declared as relationships between actual classes/instances such as "isA (person, animal)", "cause (exercise, health)", "hasSize (Samsung Syncmaster Wide LCD, 24 inch)" Also called a property/property instance, this is for distinguishing the relationship/property defined and defined as "isA", "cause", and "hasSize".

이러한 시맨틱 기술 분야에서는 관계를 표현하기 위한 수단으로서 트리플(triple)이라는 표현 형태를 사용한다. 트리플이란, 주어(subject), 술어(predicate), 객체(object)의 형태로 개념을 표현하는 것을 말한다. 각각의 주어, 술어, 객체는 XML의 URI(Uniform Resource Identifier)로 표현될 수 있으며, 현재 시맨틱 웹 온톨로지를 기술하는 표준 언어로 W3C에서 제안한 RDF, OWL 그리고 ISO에서 제안한 TopicMaps 등이 있다.In this field of semantic technology, the expression form triple is used as a means for expressing a relationship. Triple means expressing a concept in the form of a subject, a predicate, and an object. Each subject, predicate, and object can be expressed as a Uniform Resource Identifier (URI) of XML. RDF, OWL proposed by W3C, and TopicMaps proposed by ISO are standard languages that describe the semantic web ontology.

도 1은 본 발명의 일 실시예에 따른 지식 데이터 베이스 구축 시스템(100)을 개략적으로 나타내는 블록도이다.1 is a block diagram schematically showing a knowledge database building system 100 according to an embodiment of the present invention.

도 1을 참조하면, 지식 데이터 베이스 구축 시스템(100)은 인터페이스부(110), 데이터 통합부(120), 서비스 데이터 관리부(130) 및 서비스 제공부(140)를 포함한다. 인터페이스부(110)는 외부의 데이터 풀(data pool)로부터 오픈 데이터(OPD)를 입력 받을 수 있다. 인터페이스부(110)는 데이터 풀로부터 입력받은 복수 개의 오픈 데이터(OPD)를 오픈 데이터 그룹(OPDG)으로 데이터 통합부(120)에 제공할 수 있다. 인터페이스부(110)에 오픈 데이터를 제공하는 데이터 풀은 다양한 데이터 소스를 포함하는 것으로, 인터넷(internet), 데이터 베이스(database), 클라우드 소싱(cloud sourcing) 및 소셜 네트워크(social network) 등, 오픈 데이터가 생성되고 보유되며 유통될 수 있는 것을 나타낼 수 있다. 또한, 데이터 풀은 대중 또는 개인에 의해 직접 제공되는 오픈 데이터도 포함할 수 있다. 오픈 데이터란, 사용자들이 사용할 수 있도록 공개된 공공 데이터를 지칭할 수 있다. 오픈 데이터(OPD)의 형태는 CSV, 엑셀파일, RDF 등과 같은 정적 파일이거나, CKAN과 같은 데이터 카탈로그 시스템 형태, 오픈 API 형태 등에 해당할 수 있다.Referring to FIG. 1, the knowledge database construction system 100 includes an interface unit 110, a data integration unit 120, a service data management unit 130, and a service providing unit 140. The interface unit 110 may receive open data (OPD) from an external data pool. The interface unit 110 may provide a plurality of open data OPDs received from the data pool to the data integration unit 120 as an open data group OPDG. The data pool that provides open data to the interface 110 includes various data sources, such as the Internet, a database, cloud sourcing, and social networks. Can represent what can be created, retained, and distributed. In addition, the data pool may also include open data provided directly by the public or individuals. Open data may refer to public data disclosed for use by users. The type of open data (OPD) may be a static file such as CSV, Excel file, or RDF, or may correspond to a data catalog system type such as CKAN or an open API type.

데이터 풀로부터 비정형 데이터(informal data 또는 unstructured data) 또는 정형 데이터(formal data 또는 structured data)를 포함하는 오픈 데이터를 입력 받는다. 비정형 데이터는 고정된 형태로 구현되지 아니하는 데이터로, 대응되는 필드(field)에 대응되는 콘텐츠(contents)가 포함되는 정형 데이터(formal data 또는 structured data)와 대비된다. 고정된 필드에 저장되지는 않지만, 메타데이터(metadata)나 스키마(schema) 등을 포함하는 데이터로, XML이나 HTML은 반정형 데이터로 분류될 수는 있으나, 본 발명은 반정형 데이터를 비정형 데이터의 일 유형으로 전제될 수 있음을 알려둔다.Open data including informal data or unstructured data or formal data is received from a data pool. Unstructured data is data that is not implemented in a fixed form, and is contrasted with formal data or structured data including contents corresponding to a corresponding field. Although not stored in a fixed field, data including metadata or a schema, XML or HTML may be classified as semi-structured data, but the present invention uses semi-structured data as unstructured data. Note that it can be assumed as a work type.

데이터 통합부(120)는 복수의 오픈 데이터들을 포함하는 오픈 데이터 그룹(OPDG)를 수신하여, 각각의 오픈 데이터들의 포맷(format)을 하나의 통일된 포맷으로 변환할 수 있다. 일 실시예로, 오픈 데이터는 CSV(comma separated value), 엑셀 (Excel), HTML(HypertextMarkup Language), PDF(Portable Document Format), XML(Extensible markup language) 등의 데이터 형태일 수 있다. 데이터 통합부(120)는 변환 템플릿을 기반으로 다양한 포맷을 가지는 오픈 데이터들을 통일된 포맷을 가지도록 변환할 수 있다. 데이터 통합부(120)는 오픈 데이터들의 구조인 필드, 포맷 등을 분석하고, 분석된 오픈 데이터들의 구조를 기반으로 상기 변환 템플릿을 생성할 수 있다. 일 실시예로, 데이터 통합부(120)는 외부로부터 수신한 각각의 오픈 데이터들의 포맷을 RDF(Resource Description Framework) 데이터 포맷으로 통일하여 변환할 수 있다.The data integration unit 120 may receive an open data group (OPDG) including a plurality of open data, and convert the format of each open data into one unified format. In one embodiment, the open data may be in the form of data such as comma separated value (CSV), Excel, HypertextMarkup Language (HTML), Portable Document Format (PDF), Extensible markup language (XML). The data integration unit 120 may convert open data having various formats based on a conversion template to have a uniform format. The data integration unit 120 may analyze fields, formats, etc., which are structures of open data, and generate the conversion template based on the analyzed structures of open data. In one embodiment, the data integration unit 120 may uniformly convert the format of each open data received from the outside into an RDF (Resource Description Framework) data format.

데이터 통합부(120)는 각각의 오픈 데이터간의 관계를 분석하여, 통합 데이터 규칙을 생성할 수 있다. 상기 통합 데이터 규칙을 기반으로, 오픈 데이터들을 통합한 통합 데이터를 생성할 수 있다. 일 실시예로, 오픈 데이터들은 데이터 통합부(120)에 의하여 RDF 데이터 포맷으로 변환될 수 있으며, RDF 데이터 포맷으로 변환된 오픈 데이터들을 통합한 통합 데이터는 RDF 데이터 포맷을 가질 수 있다. The data integration unit 120 may analyze a relationship between each open data and generate integrated data rules. Based on the integrated data rule, integrated data incorporating open data may be generated. In one embodiment, open data may be converted into an RDF data format by the data integration unit 120, and integrated data incorporating open data converted into an RDF data format may have an RDF data format.

서비스 데이터 관리부(130)는 사용자에게 데이터 제공 서비스를 제공하기 위하여, 서비스 목적에 따라 공간, 시간, 사용자, 서비스 정책, 해당 서비스 등 이들 간의 관계를 정의한 온톨로지 스키마를 정의할 수 있다. 정의된 온톨로지 스키마는 온톨로지 스키마 저장부(미도시)에 저장될 수 있다. 예를 들면, 공간 온톨로지는 GPS 좌표로부터 행정 구역을 도출할 수 있는 스키마, 시간 온톨로지는 입력 시간 정보로부터 계절, 절기, 새벽, 오후 등을 도출할 수 있는 스키마, 헬스케어 서비스 온톨로지는 개인 병력, 혈압, 혈당 측정 결과로부터 건강 상태를 파악할 수 있는 스키마 등으로 정의될 수 있다. 또한, 서비스 데이터 관리부(130)는 외부로부터 입력된 오픈 데이터들을 온톨로지 스키마에 매핑하는 방법을 정의한 매핑 규칙을 생성할 수 있다. 서비스 데이터 관리부(130)는 온톨로지 스키마 및 매핑 규칙 중 적어도 하나를 포함하는 온톨로지 정보(OI)를 데이터 통합부(120)에 제공할 수 있다. In order to provide a data providing service to a user, the service data management unit 130 may define an ontology schema that defines a relationship between space, time, user, service policy, and corresponding service according to the service purpose. The defined ontology schema may be stored in an ontology schema storage (not shown). For example, spatial ontology is a schema that can derive an administrative area from GPS coordinates, time ontology is a schema that can derive seasons, seasons, dawn, afternoon, etc. from input time information, healthcare service ontology is personal history, blood pressure , It can be defined as a schema that can grasp the health status from blood sugar measurement results. In addition, the service data management unit 130 may generate a mapping rule defining a method of mapping open data input from outside to an ontology schema. The service data management unit 130 may provide ontology information OI including at least one of an ontology schema and mapping rules to the data integration unit 120.

본 개시의 일 실시 예에 따른 데이터 통합부(120)는 통합 데이터를 온톨로지 스키마에 적용시킴으로써 지식 데이터 베이스 구축 작업을 수행할 수 있다. 데이터 통합부(120)는 지식 데이터 베이스 구축 작업의 효율성을 위하여 지식 데이터 베이스 구축 작업을 개별적으로 수행할 수 있는 복수의 큐레이션 모듈들을 포함할 수 있다. 또한, 데이터 통합부(120)는 큐레이션 모듈의 특성을 나타내는 큐레이션 모듈 정보 및 큐레이션 모듈의 작업 수행 결과를 나타내는 피드백 정보를 기반으로 온톨로지 스키마의 복수의 인스턴스들을 분배하는 인스턴스 분배기를 포함할 수 있다. 큐레이션 모듈들은 각각 인스턴스 분배기로부터 분배된 인스턴스들을 이용하여 지식 데이터 베이스 구축 작업을 수행할 수 있다. 데이터 통합부(120)는 큐레이션 모듈들에서 수행되는 지식 데이터 베이스 구축 작업을 검증하여, 지식 데이터 베이스 구축 작업에 의해 생성된 지식 데이터를 지식 데이터 베이스에 적용할지 여부를 결정하기 위한 검증기를 포함할 수 있다. 데이터 통합부(120)는 소정의 메모리 장치에 생성한 지식 데이터 베이스를 저장하고, 저장된 지식 데이터 베이스에 대한 업데이트를 수행할 수 있다.The data integration unit 120 according to an embodiment of the present disclosure may perform a knowledge database construction operation by applying integrated data to an ontology schema. The data integration unit 120 may include a plurality of curation modules capable of individually performing a knowledge database construction task for efficiency of a knowledge database construction task. In addition, the data integration unit 120 may include an instance distributor that distributes a plurality of instances of the ontology schema based on the curation module information indicating characteristics of the curation module and feedback information indicating the result of performing the operation of the curation module. have. Curation modules may perform knowledge database construction using instances distributed from the instance distributor. The data integration unit 120 includes a verifier for determining whether to apply knowledge data generated by the knowledge database construction task to the knowledge database by verifying the knowledge database construction task performed in the curation modules. Can be. The data integrator 120 may store the knowledge database created in a predetermined memory device and update the stored knowledge database.

이와 같이, 데이터 통합부(120)는 큐레이션 모듈의 특성, 큐레이션 모듈의 작업 수행 결과에 따라 온톨로지 스키마의 인스턴스들을 분배하여 큐레이션 모듈들에 제공함으로써 효율적인 지식 데이터 베이스 구축 작업을 수행할 수 있는 효과가 있다.In this way, the data integration unit 120 can perform an efficient knowledge database construction work by distributing instances of the ontology schema to the curation modules according to the characteristics of the curation module and the result of performing the work of the curation module. It works.

데이터 통합부(120)는 서비스 데이터 관리부(130)에서 정의한 온톨로지 스키마와 통합 데이터를 기반으로 생성된 지식 데이터 베이스를 참조하여 어플리케이션(1)에 제공하는 서비스 데이터(SD)를 생성할 수 있다. 일 실시예로, 데이터 통합부(120)는 상기 온톨로지 스키마에 통합 데이터들을 매핑하고, 매핑 데이터를 생성하고, 상기 매핑 데이터를 포함하는 지식 데이터 베이스로부터 서비스 데이터(SD)를 생성할 수 있다. 일 실시예로, 매핑 데이터는 RDF 데이터 포맷을 가지는 트리플(triple)에 해당할 수 있으며, 이에 대하여, 자세한 서술은 후술하기로 한다. 서비스 데이터(SD)는 서비스 데이터 관리부(130)에 저장될 수 있다.The data integration unit 120 may generate service data SD provided to the application 1 by referring to the knowledge database created based on the ontology schema and the integrated data defined by the service data management unit 130. In one embodiment, the data integration unit 120 may map integrated data to the ontology schema, generate mapping data, and generate service data SD from a knowledge database including the mapping data. In one embodiment, the mapping data may correspond to a triple having an RDF data format, and detailed description thereof will be described later. The service data SD may be stored in the service data management unit 130.

서비스 제공부(140)는 어플리케이션(1)으로부터 서비스 데이터를 요청하기 위한 질의(Query)를 수신하여, 질의(Query)를 분석한 후에, 서비스 데이터 요청 신호(RS)를 서비스 데이터 관리부(130)에 제공할 수 있다. 서비스 데이터 관리부(130)는 이에 응답하여 요청된 서비스 데이터(RSD)를 서비스 제공부(140)에 제공하며, 서비스 제공부(140)는 어플리케이션(1)에 요청된 서비스 데이터(RSD)를 제공할 수 있다. 더 나아가, SPARQL Endpoint, RESTful API 등의 다양한 어플리케이션이 데이터 서비스 시스템(100)에 접근할 수 있도록, 서비스 제공부(140)는 다양한 어플리케이션 각각에 대응하는 인터페이스를 포함할 수 있다.The service providing unit 140 receives a query for requesting service data from the application 1, analyzes the query, and then sends a service data request signal RS to the service data management unit 130 Can provide. In response, the service data management unit 130 provides the requested service data (RSD) to the service providing unit 140, and the service providing unit 140 provides the requested service data (RSD) to the application 1. Can be. Furthermore, so that various applications such as SPARQL Endpoint and RESTful API can access the data service system 100, the service providing unit 140 may include an interface corresponding to each of various applications.

어플리케이션(1)은 서비스 제공부(140)를 통하여, 서비스 데이터 관리부(130)에 새로운 온톨로지 스키마를 정의하는 데 필요한 서비스 정보를 제공할 수 있으며, 서비스 데이터 관리부(130)는 상기 새로운 서비스 정보를 기반으로 새로운 온톨로지 스키마를 정의할 수 있다. The application 1 may provide service information necessary for defining a new ontology schema to the service data management unit 130 through the service providing unit 140, and the service data management unit 130 based on the new service information As a result, a new ontology schema can be defined.

도 2는 본 개시의 일 실시 예에 따른 인터페이스부(110)를 나타내는 블록도이다. 도 2를 참고하면, 인터페이스부(110)는 제1 내지 제n 오픈 데이터(OPD_1~OPD_n)를 입력 받기 위하여, 각각의 데이터 포맷에 대응하는 제1 내지 제n 오픈 데이터 어댑터(110_1~110_n)를 포함한다.2 is a block diagram illustrating an interface unit 110 according to an embodiment of the present disclosure. Referring to FIG. 2, in order to receive the first to nth open data (OPD_1 to OPD_n), the interface unit 110 uses the first to nth open data adapters 110_1 to 110_n corresponding to each data format. Includes.

인터페이스부(110)는 외부로부터 제1 내지 제n 오픈 데이터(OPD_1~OPD_n)를 입력받을 수 있다. 제1 내지 제n 오픈 데이터(OPD_1~OPD_n)의 오픈 데이터 형태는 서로 상이할 수 있다. 예를 들면, 제1 오픈 데이터(OPD_1)는 엑셀 파일과 같은 정적 파일 형식 해당될 수 있으며, 제2 오픈 데이터(OPD_2)는 CKAN과 같은 데이터 카탈로그 시스템 형식에 해당될 수 있고, 제n 오픈 데이터(OPD_n)는 오픈 API의 형태를 가질 수 있다. 따라서, 인터페이스부(110)는 제1 오픈 데이터(OPD_1)를 입력 받을 수 있도록, 상기 정적 파일 형식에 대응되는 제1 인터페이스를 포함하는 제1 오픈 데이터 어댑터(110_1)를 포함하고, 제2 오픈 데이터(OPD_2)를 입력 받을 수 있도록, 상기 데이터 카탈로그 시스템 형식에 대응되는 제2 인터페이스를 포함하는 제2 오픈 데이터 어댑터(110_2)를 포함하고, 제n 오픈 데이터(OPD_n)를 입력 받을 수 있도록, 상기 오픈 API 형식에 대응되는 제n 인터페이스를 포함하는 제n 오픈 데이터 어댑터(110_n)를 포함할 수 있다. 다만, 이에 국한되지 않으며, 더 다양한 오픈 데이터 형식을 가지는 오픈 데이터를 입력 받기 위해, 이에 대응하는 오픈 데이터 어댑터를 더 포함할 수 있다. 인터페이스부(110)는 복수의 오픈 데이터들(OPD_1~OPD_n)을 한번에 오픈 데이터 그룹(OPDG)으로 도 1 의 데이터 통합부(120)에 제공할 수 있다.The interface unit 110 may receive first to nth open data OPD_1 to OPD_n from the outside. The open data types of the first to nth open data OPD_1 to OPD_n may be different from each other. For example, the first open data OPD_1 may correspond to a static file format such as an Excel file, and the second open data OPD_2 may correspond to a data catalog system format such as CKAN, and n th open data ( OPD_n) may take the form of an open API. Accordingly, the interface unit 110 includes a first open data adapter 110_1 including a first interface corresponding to the static file format so that the first open data OPD_1 can be input, and the second open data The second open data adapter 110_2 including a second interface corresponding to the data catalog system format is included so that (OPD_2) can be input, and the n open data OPD_n is input so that the open is possible. An n-th open data adapter 110_n including an n-th interface corresponding to an API format may be included. However, the present invention is not limited thereto, and in order to receive open data having various open data formats, an open data adapter corresponding to the open data adapter may be further included. The interface unit 110 may provide a plurality of open data OPD_1 to OPD_n to the data integration unit 120 of FIG. 1 as an open data group OPDG at a time.

도 3은 본 개시의 일 실시 예에 따른 서비스 데이터 관리부(130)를 나타내는 블록도이다. 도 3을 참고하면, 서비스 데이터 관리부(130)는 온톨로지 정의부(131), 규칙 정보 생성부(132), 온톨로지 정보 저장부(134) 및 서비스 데이터 저장부(135)를 포함한다.3 is a block diagram illustrating a service data management unit 130 according to an embodiment of the present disclosure. Referring to FIG. 3, the service data management unit 130 includes an ontology definition unit 131, a rule information generation unit 132, an ontology information storage unit 134, and a service data storage unit 135.

온톨로지 정의부(131)는 어플리케이션에 제공하는 서비스의 목적에 따라 공간, 시간, 사용자, 서비스 정책, 해당 서비스 등 이들 간의 관계를 정의한 적어도 하나의 온톨로지 스키마를 정의할 수 있다. 온톨로지 정의부(131)는 어플리케이션으로부터 서비스 정보(SI)를 수신하여, 서비스 정보를 기반으로 온톨로지 스키마를 정의할 수 있다. The ontology definition unit 131 may define at least one ontology schema defining spaces, times, users, service policies, and corresponding services, etc. according to the purpose of a service provided to an application. The ontology definition unit 131 may receive service information SI from an application, and define an ontology schema based on the service information.

규칙 정보 생성부(132)는 도 1의 데이터 통합부(120)에서 오픈 데이터들을 통합하여 생성한 통합 데이터를 온톨로지 스키마에 매핑하는 방법을 정의하는 제1 매핑 규칙을 생성할 수 있다. 상기 제1 매핑 규칙은 온톨로지 정의부(131)에서 정의된 온톨로지 스키마를 기반으로 생성될 수 있다. 일 실시예로, 제1 매핑 규칙은 RDF 데이터 포맷을 가지는 통합 데이터의 트리플 리소스와 정의된 온톨로지 스키마의 개체가 동일 그룹 관계인지, 동의 관계인지, 상/하위 관계인지, 상/하위 그룹 관계인지, 동의어 관계 등의 특정한 관계를 나타내는 것 일 수 있다. 다만, 이는 일 실시예로서 이에 국한되지 않으며, 상기 특정한 관계는 동의 관계뿐만 아니라 반의 관계, 유사 관계 등을 나타낸 것일 수 있다. 온톨로지 스키마의 개체의 의미는 클래스, 클래스에 속한 객체인 인스턴스 및 인스턴스의 성질을 나타내는 속성을 의미한다. 더 나아가, 어플리케이션으로부터 데이터 서비스 시스템이 질의를 수신하였을 때, 질의를 기반으로 추론을 하기 위한 추론 규칙, 즉 추론을 위한 사용자 정의 규칙 등을 생성할 수 있다.The rule information generation unit 132 may generate a first mapping rule that defines a method of mapping the integrated data generated by integrating open data in the data integration unit 120 of FIG. 1 into an ontology schema. The first mapping rule may be generated based on the ontology schema defined in the ontology definition unit 131. In one embodiment, the first mapping rule is whether the triple resource of the integrated data having the RDF data format and the object of the defined ontology schema are the same group relationship, consent relationship, parent/child relationship, parent/child group relationship, It may indicate a specific relationship, such as a synonym relationship. However, this is not limited to this as an embodiment, and the specific relationship may indicate not only a consent relationship, but also a class relationship, a similar relationship, and the like. An object of an ontology schema means a class, an instance that is an object belonging to a class, and properties representing properties of the instance. Furthermore, when a data service system receives a query from an application, an inference rule for inference based on the query, that is, a user-defined rule for inference may be generated.

온톨로지 정보 저장부(134)는 온톨로지 정의부(131)로부터 정의된 온톨로지 스키마를 저장할 수 있다. 또한, 온톨로지 정보 저장부(134)는 규칙 정보 생성부(132)로부터 정의된 제1 매핑 규칙 및 추론 규칙을 저장할 수 있다. 온톨로지 정보 저장부(134)는 도 1의 데이터 통합부(120)에 온톨로지 스키마 및 제1 매핑 규칙 중 적어도 하나를 포함하는 온톨로지 정보(OI)를 데이터 통합부(120)에 제공할 수 있다. The ontology information storage unit 134 may store an ontology schema defined by the ontology definition unit 131. Also, the ontology information storage unit 134 may store the first mapping rule and the inference rule defined by the rule information generation unit 132. The ontology information storage unit 134 may provide the data integration unit 120 with the ontology information OI including at least one of the ontology schema and the first mapping rule to the data integration unit 120 of FIG. 1.

서비스 데이터 저장부(135)는 도 1의 데이터 통합부(120)로부터 생성된 서비스 데이터(SD)를 수신하여, 저장할 수 있다. 도 1의 서비스 제공부(140)로부터 서비스 데이터 요청 신호(RS)를 수신한 경우, 이에 응답하여 요청된 서비스 데이터(RSD)를 서비스 제공부(140)에 제공할 수 있다. 일 실시 예로, 서비스 데이터 저장부(135)는 서비스 데이터(SD)를 온톨로지 정의부(131) 및 규칙 정보 생성부(132)에 피드백 정보로서 제공할 수 있다. 온톨로지 정의부(131)는 피드백 정보를 참고하여, 온톨로지 스키마 기반으로 적절한 서비스 데이터(SD)가 생성되었는지 판단하고, 적절하지 않은 서비스 데이터(SD)가 생성된 경우, 정의된 온톨로지 스키마를 수정할 수 있다. 또한, 규칙 정보 생성부(132)는 피드백 정보를 참고하여, 제 1 매핑 규칙을 기반으로 적절한 서비스 데이터(SD)가 생성되었는지 판단하고, 적절하지 않은 서비스 데이터(SD)가 생성된 경우, 정의된 제 1 매핑 규칙을 수정할 수 있다.The service data storage unit 135 may receive and store the service data SD generated from the data integration unit 120 of FIG. 1. When the service data request signal RS is received from the service provider 140 of FIG. 1, in response, the requested service data RSD may be provided to the service provider 140. In one embodiment, the service data storage unit 135 may provide the service data SD as feedback information to the ontology definition unit 131 and the rule information generation unit 132. The ontology definition unit 131 may determine whether appropriate service data SD is generated based on the ontology schema by referring to the feedback information, and when the inappropriate service data SD is generated, the defined ontology schema may be modified. . In addition, the rule information generation unit 132 determines whether appropriate service data SD is generated based on the first mapping rule by referring to the feedback information, and when inappropriate service data SD is generated, it is defined. The first mapping rule can be modified.

도 4는 본 개시의 일 실시 예에 따른 데이터 통합부(120)를 나타내는 도면이다.4 is a diagram illustrating a data integration unit 120 according to an embodiment of the present disclosure.

도 4를 참조하면, 데이터 통합부(120)는 데이터 포맷 변환부(122), RDF 데이터 저장부(124), 지식 데이터 베이스 저장부(126) 및 데이터 통합 관리부(200)를 포함할 수 있다. Referring to FIG. 4, the data integration unit 120 may include a data format conversion unit 122, an RDF data storage unit 124, a knowledge database storage unit 126, and a data integration management unit 200.

데이터 통합부(120)는 도 1의 인터페이스부(110)로부터 복수의 오픈 데이터를 포함하는 오픈 데이터 그룹(OPDG)을 수신하고, 도 1의 서비스 데이터 관리부(130)로부터 온톨로지 정보(OI)를 수신할 수 있다. 데이터 통합부(120)는 온톨로지 정보(OI) 및 통합 데이터를 기반으로, 지식 데이터 베이스를 구축할 수 있으며, 지식 데이터 베이스로부터 서비스 데이터(SD)를 생성할 수 있다.The data integration unit 120 receives an open data group OPDG including a plurality of open data from the interface unit 110 of FIG. 1, and receives ontology information OI from the service data management unit 130 of FIG. 1. can do. The data integration unit 120 may build a knowledge database based on ontology information (OI) and integrated data, and generate service data (SD) from the knowledge database.

데이터 포맷 변환부(122)는 정형 데이터 변환부(122a) 및 RDF 데이터 변환부(122b)를 포함할 수 있다. 오픈 데이터는 CSV(comma separated value), 엑셀(Excel), HTML(HypertextMarkup Language), PDF(Portable Document Format), XML(Extensible markup language) 등의 데이터 형태일 수 있다. 정형 데이터 변환부(122a)는 변환 템플릿을 기반으로 다양한 포맷을 가지는 오픈 데이터들을 정형 데이터 형태으로 변환할 수 있다. 정형 데이터 변환부(122a)는 오픈 데이터(OPD)의 구조인 필드, 포맷 등을 분석하고, 분석된 오픈 데이터(OPD)의 구조를 기반으로 상기 변환 템플릿을 생성할 수 있다.The data format converter 122 may include a structured data converter 122a and an RDF data converter 122b. The open data may be in the form of data such as comma separated value (CSV), Excel, HypertextMarkup Language (HTML), Portable Document Format (PDF), Extensible markup language (XML). The structured data conversion unit 122a may convert open data having various formats into a structured data type based on a conversion template. The structured data conversion unit 122a may analyze fields, formats, and the like of the structure of the open data (OPD), and generate the conversion template based on the structure of the analyzed open data (OPD).

RDF 데이터 변환부(122b)는 상기 정형 데이터 형태로 변환된 오픈 데이터를 RDF 데이터 포맷을 가지도록 변환할 수 있다. 더 나아가, RDF 데이터 변환부(122b)는 오픈 데이터 그룹(OPDG)을 직접 수신하여, 각각 서로 다른 데이터 포맷을 가지는 오픈 데이터를 RDF 데이터 포맷을 가지도록 변환할 수 있다. RDF 데이터 변환부(122b)로부터 RDF 데이터 포맷을 가지도록 변환된 오픈 데이터는 RDF 데이터 저장부(124)에 저장될 수 있다.The RDF data conversion unit 122b may convert open data converted into the structured data format to have an RDF data format. Furthermore, the RDF data converter 122b may directly receive an open data group (OPDG) and convert open data having different data formats to have an RDF data format. The open data converted to have an RDF data format from the RDF data conversion unit 122b may be stored in the RDF data storage unit 124.

데이터 통합 관리부(200)는 RDF 데이터 포맷을 가지도록 변환된 오픈 데이터를 분석하여 생성한 데이터 통합 규칙을 기반으로 통합 데이터들을 생성할 수 있다. 일 예로, RDF 데이터 포맷으로 변환된 제1 오픈 데이터 및 제2 오픈 데이터를 통합하는 경우에는, 데이터 통합 규칙은 제1 오픈 데이터의 트리플 리소스와 제2 오픈 데이터의 트리플 리소스가 동일 그룹 관계인지, 동의 관계인지, 상/하위 관계인지, 상/하위 그룹 관계인지, 동의어 관계 등의 특정한 관계를 나타내는 것 일수 있다. 상기 특정한 관계는 동의 관계뿐만 아니라 반의 관계, 유사 관계 등을 나타낸 것일 수 있다.The data integration management unit 200 may generate integrated data based on data integration rules generated by analyzing open data converted to have an RDF data format. For example, when integrating the first open data and the second open data converted to the RDF data format, the data integration rule agrees whether the triple resource of the first open data and the triple resource of the second open data are in the same group relationship. It may be a relationship, a parent/child relationship, a parent/child group relationship, or a specific relationship such as a synonym relationship. The specific relationship may indicate not only a consent relationship, but also a class relationship, a similar relationship, and the like.

데이터 통합 관리부(200)는 생성된 통합 데이터들 및 온톨로지 정보(OI)(또는, 온톨로지 스키마)를 기반으로 지식 데이터 베이스를 구축할 수 있으며, 이를 위해, 인스턴스 분배기(210), 큐레이션 모듈 그룹(220) 및 검증기(230)를 포함할 수 있다. 큐레이션 모듈 그룹(220)은 개별적으로 지식 데이터 베이스 구축 작업을 수행할 수 있는 복수의 큐레이션 모듈들을 포함할 수 있다. 지식 데이터 베이스 구축 작업이란 온톨로지 스키마에 포함되는 복수의 인스턴스들에 대한 속성, 다른 인스턴스와의 관계, 클래스 등을 연결시켜 특정 식별자로 접근할 수 있는 트리플 데이터들을 생성하는 동작을 의미할 수 있다.The data integration management unit 200 may build a knowledge database based on the generated integrated data and ontology information (OI) (or ontology schema), and for this purpose, the instance distributor 210 and the curation module group ( 220) and a verifier 230. The curation module group 220 may include a plurality of curation modules that can individually perform a knowledge database construction task. Building a knowledge database may mean an operation of generating triple data that can be accessed with a specific identifier by connecting attributes, relationships with other instances, and classes for a plurality of instances included in the ontology schema.

인스턴스 분배기(210)는 온톨로지 스키마의 복수의 인스턴스들을 분배하여 큐레이션 모듈 그룹(220) 내의 각각의 큐레이션 모듈들에 제공할 수 있다. 구체적으로, 인스턴스 분배기(210)는 큐레이션 모듈의 특성을 나타내는 큐레이션 모듈 정보를 기반으로 온톨로지 스키마의 복수의 인스턴스들을 분배할 수 있다. 예를 들어, 제1 큐레이션 모듈은 '정치' 테마에 관한 인스턴스들의 지식 데이터 베이스 구축 작업 수행에 특화된 것이고, 제2 큐레이션 모듈은 '연예' 테마에 관한 인스턴스들의 지식 데이터 베이스 구축 작업 수행에 특화된 것인 때에, 인스턴스 분배기(210)는 이러한 큐레이션 모듈의 특성을 고려하여 인스턴스들을 분배할 수 있다. 큐레이션 모듈 그룹(220)은 각각의 큐레이션 모듈들의 특성을 나타내는 큐레이션 모듈 정보를 인스턴스 분배기(210)에 제공할 수 있으며, 큐레이션 모듈 정보는 주기적으로 업데이트될 수 있다.The instance distributor 210 may distribute a plurality of instances of the ontology schema and provide them to respective curation modules in the curation module group 220. Specifically, the instance distributor 210 may distribute a plurality of instances of the ontology schema based on the curation module information indicating characteristics of the curation module. For example, the first curation module is specialized in performing knowledge database construction of instances related to the theme of'politics', and the second curation module is specialized in performing knowledge database construction of instances related to the theme of'celebrity'. When it is, the instance distributor 210 may distribute the instances in consideration of the characteristics of this curation module. The curation module group 220 may provide curation module information indicating characteristics of each curation module to the instance distributor 210, and the curation module information may be periodically updated.

다른 실시 예로, 인스턴스 분배기(210)는 검증기(230)로부터 생성된 피드백 정보를 기반으로 온톨로지 스키마의 복수의 인스턴스들을 분배할 수 있다. 검증기(230)는 큐레이션 모듈 그룹(220) 내의 각각의 큐레이션 모듈들의 지식 데이터 베이스 구축 작업이 잘 수행되었는지 여부에 대하여 소정의 신뢰성이 보장된 검증 모듈을 이용해 검증을 수행할 수 있다. 검증기(230)는 큐레이션 모듈 별 검증 결과를 포함하는 피드백 정보를 인스턴스 분배기(210)에 제공할 수 있다. 검증 결과는 각 큐레이션 모듈들이 지식 데이터 베이스 구축 작업을 잘 수행하였는지에 대한 결과와 얼마나 신속하게 지식 데이터 베이스 구축 작업을 완료하였는지를 포함할 수 있다. 이후의 지식 데이터 베이스 구축 작업을 위해 인스턴스 분배기(210)는 피드백 정보를 기반으로 각 큐레이션 모듈들에 분배하는 인스턴스들의 개수를 조절할 수 있으며, 지식 데이터 베이스 구축 작업을 잘 수행하지 못하는 일부 큐레이션 모듈에는 인스턴스들을 분배하지 않을 수 있다.In another embodiment, the instance distributor 210 may distribute a plurality of instances of the ontology schema based on feedback information generated from the verifier 230. The verifier 230 may perform verification using a verification module in which a certain reliability is guaranteed as to whether the knowledge database construction work of each curation module in the curation module group 220 has been well performed. The verifier 230 may provide feedback information including verification results for each curation module to the instance distributor 210. The verification result may include a result of whether each curation module performed the knowledge database construction work well and how quickly the knowledge database construction work was completed. For the future knowledge database construction work, the instance distributor 210 may adjust the number of instances distributed to each curation module based on the feedback information, and some curation modules that do not perform the knowledge database construction work well There may not be instances distributed.

또한, 검증기(230)는 상기 검증 결과를 기반으로 큐레이션 모듈 그룹(220) 내의 각 큐레이션 모듈들에 대한 지식 데이터 베이스 구축 작업 재수행, 중단을 제어할 수 있다. 검증기(230)는 큐레이션 모듈 그룹(220)에서 수행된 지식 데이터 베이스 구축 작업이 검증 패스된 때에는, 결과물이 지식 데이터 베이스에 적용되도록 할 수 있다. 데이터 통합 관리부(200)는 생성한 지식 데이터 베이스를 지식 데이터 베이스 저장부(126)에 저장하거나, 또는, 업데이트를 수행할 수 있다.In addition, the verifier 230 may control re-execution and suspension of the knowledge database construction work for each curation module in the curation module group 220 based on the verification result. The verifier 230 may allow the result to be applied to the knowledge database when the knowledge database construction work performed in the curation module group 220 is verified. The data integration management unit 200 may store the generated knowledge database in the knowledge database storage unit 126 or perform an update.

도 5는 본 개시의 일 실시 예에 따른 인스턴스 분배기(210)를 나타내는 블록도이다. 5 is a block diagram illustrating an instance distributor 210 according to an embodiment of the present disclosure.

도 5를 참조하면, 인스턴스 분배기(210)는 편집 인스턴스 선택부(211), 편집 속성 선택부(212), 큐레이션 모듈 지정부(213), 큐레이션 모듈 정보 저장부(214) 및 피드백 정보 수집부(215)를 포함할 수 있다. 편집 인스턴스 선택부(211)는 통합 데이터들로부터 온톨로지 스키마에 적합한 편집 인스턴스들을 선택할 수 있다. 편집 속성 선택부(212)는 통합 데이터들로부터 선택된 편집 인스턴스들 각각에 적합한 편집 속성들을 선택할 수 있다. 예를 들어, 편집 속성 선택부(212)는 선택된 편집 인스턴스가 '국회의원'인 때에, 이와 가장 관련된 '지역구', '정책', '이력' 등에 관한 속성(또는, 관계)들을 편집 속성으로서 선택할 수 있다.Referring to FIG. 5, the instance divider 210 includes an edit instance selection unit 211, an edit attribute selection unit 212, a curation module designation unit 213, a curation module information storage unit 214, and feedback information collection It may include a portion 215. The edit instance selection unit 211 may select edit instances suitable for the ontology schema from the integrated data. The edit attribute selection unit 212 may select edit attributes suitable for each of the edit instances selected from the integrated data. For example, the editing attribute selector 212 may select, as an editing attribute, attributes (or relationships) related to the'regional district','policy','history', and the like most related to this when the selected editing instance is a member of the National Assembly. have.

큐레이션 모듈 지정부(213)는 큐레이션 모듈 정보 저장부(214)로부터 수신한 큐레이션 모듈 정보 및 피드백 정보 수집부(215)로부터 수신한 피드백 정보를 기반으로 선택된 편집 인스턴스들을 큐레이션 모듈들 각각에 분배할 수 있다. 일 실시 예로, 큐레이션 모듈 지정부(213)는 큐레이션 모듈 정부를 기반으로 큐레이션 모듈들 각각이 수행 가능한 인스턴스의 타입에 따라 온톨로지 스키마의 편집 인스턴스들을 나누어 큐레이션 모듈들에 분배할 수 있다. 또한, 큐레이션 모듈 지정부(213)는 각 큐레이션 모듈들의 작업 수행 결과들에 관한 피드백 정보를 기반으로 적응적으로 편집 인스턴스들을 큐레이션 모듈들 각각에 분배할 수 있다. 예를 들어, 큐레이션 모듈 지정부(213)는 선택된 편집 인스턴스들을 제1 그룹 인스턴스, 제2 그룹 인스턴스로 나누어, 각각 제1 큐레이션 모듈, 제2 큐레이션 모듈에 제공할 수 있고, 제1 큐레이션 모듈은 제1 그룹 인스턴스에 대한 지식 데이터 베이스 구축 작업을 수행하고, 제2 큐레이션 모듈은 제2 그룹 인스턴스에 대한 지식 데이터 베이스 구축 작업을 수행할 수 있다.The curation module designation unit 213 curates the selected editing instances based on the curation module information received from the curation module information storage unit 214 and the feedback information received from the feedback information collection unit 215, respectively. Can be distributed on. According to an embodiment, the curation module designation unit 213 may divide edit instances of the ontology schema according to a type of an instance each of the curation modules can perform and distribute the curation modules to the curation modules. In addition, the curation module designation unit 213 may adaptively distribute edit instances to each of the curation modules based on feedback information regarding the results of performing work of each curation module. For example, the curation module designation unit 213 may divide the selected edit instances into a first group instance and a second group instance, and provide the first curation module and the second curation module, respectively, and the first queue The migration module may perform a knowledge database construction task for the first group instance, and the second curation module may perform a knowledge database construction task for the second group instance.

큐레이션 모듈 정보 저장부(214)는 도 4의 큐레이션 모듈 그룹(220)으로부터 큐레이션 모듈 정보를 수신할 수 있으며, 더 나아가, 각 큐레이션 모듈들에 분배되었던 편집 인스턴스들에 대한 히스토리(history)들이 저장될 수 있다. 인스턴스 분배기(210)는 상기 히스토리들을 기반으로 편집 인스턴스들을 분배할 수 있다. 또한, 인스턴스 분배기(210)는 큐레이션 모듈들 각각으로부터 현재 큐레이션 모듈의 상태(예를 들면, 비지(busy), 유휴(idle)) 관련 정보를 수신할 수 있으며, 이러한 큐레이션 모듈의 상태 관련 정보를 기반으로 편집 인스턴스들을 분배할 수 있다.The curation module information storage unit 214 may receive curation module information from the curation module group 220 of FIG. 4, and further, a history of edit instances distributed to each curation module ) Can be stored. The instance divider 210 may distribute edit instances based on the histories. In addition, the instance distributor 210 may receive information related to the current curation module status (eg, busy, idle) from each of the curation modules, and the status of the curation module Editing instances can be distributed based on information.

도 6은 본 개시의 일 실시 예에 따른 큐레이션 모듈 그룹(220)을 나타내는 블록도이다.6 is a block diagram illustrating a curation module group 220 according to an embodiment of the present disclosure.

도 6을 참조하면, 큐레이션 모듈 그룹(220)은 개별적인 지식 데이터 베이스 구축 작업을 수행할 수 있는 제1 내지 제N 큐레이션 모듈들(220_1~220_N)을 포함할 수 있다. 이하에서는, 제1 큐레이션 모듈(220_1)의 구성을 중심으로 서술하나, 제1 큐레이션 모듈(220_1)의 구성은 다른 큐레이션 모듈(220_2~220_N)에 적용될 수 있음은 분명하다.Referring to FIG. 6, the curation module group 220 may include first to Nth curation modules 220_1 to 220_N capable of performing individual knowledge database construction. Hereinafter, the configuration of the first curation module 220_1 will be mainly described, but it is clear that the configuration of the first curation module 220_1 can be applied to other curation modules 220_2 to 220_N.

제1 큐레이션 모듈(220_1)은 인스턴스 확인부(211_1), 편집 인스턴스 추가 선택부(212_1), 지식 데이터 구축부(213_1) 및 상태 출력부(214_1)를 포함할 수 있다. 인스턴스 확인부(211_1)는 분배된 편집 인스턴스들에 대하여 우선적으로 본인이 처리할 수 있는지 여부를 확인할 수 있다. 전술한 바와 같이, 각 큐레이션 모듈(220_1~220_N) 별로 특화된 인스턴스 테마를 가질 수 있는 바, 인스턴스 확인부(211_1)는 상기 테마가 나타난 큐레이션 모듈 정보를 기반으로 처리 가능한 편집 인스턴스들인지 판별할 수 있다.The first curation module 220_1 may include an instance confirmation unit 211_1, an edit instance addition selection unit 212_1, a knowledge data construction unit 213_1, and a status output unit 214_1. The instance confirmation unit 211_1 may first check whether the edited instances are distributed or not. As described above, since each curation module 220_1 to 220_N may have a specialized instance theme, the instance confirmation unit 211_1 may determine whether the themes are editable editable instances based on the curation module information in which the theme appears. have.

편집 인스턴스 추가 선택부(212_1)는 도 5의 인스턴스 분배기(210)로부터 수신한 편집 인스턴스들에서 온톨로지 스키마에 적합한 편집 인스턴스가 더 존재하는지 온톨로지 스키마를 기반으로 확인하고, 적어도 하나의 편집 인스턴스를 더 추가할 수 있다. 지식 데이터 구축부(213_1)는 분배된 편집 인스턴스들 및 온톨로지 스키마를 기반으로 지식 데이터 베이스 구축 작업을 수행할 수 있다.The editing instance addition selector 212_1 checks whether there are more editing instances suitable for the ontology schema from the editing instances received from the instance distributor 210 of FIG. 5 based on the ontology schema, and adds at least one more editing instance. can do. The knowledge data construction unit 213_1 may perform a knowledge database construction operation based on distributed editing instances and ontology schema.

상태 출력부(214_1)는 지식 데이터 베이스 구축 작업의 결과물을 도 4의 검증기(230)로 출력하고, 인스턴스 분배기(210)는 현재 제1 큐레이션 모듈(220_1)의 동작 상태를 나타내는 상태 정보를 생성하여 출력할 수 있다. 상태 출력부(214_1)는 현재 제1 큐레이션 모듈(220_1)에 분배된 편집 인스턴스들에 대한 지식 데이터 베이스 구축 작업이 완료된 때에, 이를 도 5의 인스턴스 분배기(210)에 알릴 수 있다. 도 5의 인스턴스 분배기(210)는 이러한 알림에 응답하여 제1 큐레이션 모듈(220_1)에 추가적인 편집 인스턴스들(다른 큐레이션 모듈에 의해 처리 불가능함이 확인된 편집 인스턴스들 또는 지식 데이터 베이스 구축 작업이 실패된 편집 인스턴스들을 포함)을 더 분배할 수 있다. The status output unit 214_1 outputs the result of the knowledge database construction work to the verifier 230 of FIG. 4, and the instance distributor 210 generates status information indicating the current operation state of the first curation module 220_1 Can be output. The status output unit 214_1 may notify the instance distributor 210 of FIG. 5 when the construction of the knowledge database for the editing instances currently distributed to the first curation module 220_1 is completed. The instance distributor 210 of FIG. 5 responds to this notification, and there is additional editing instances (editing instances or knowledge database construction work confirmed to be impossible to be processed by other curation modules) in the first curation module 220_1. (Including failed edit instances).

도 7은 본 개시의 일 실시 예에 따른 검증기(230)를 나타내는 블록도이다.7 is a block diagram illustrating a verifier 230 according to an embodiment of the present disclosure.

도 7을 참조하면, 검증기(230)는 구축 작업 불가 확인부(231), 지식 데이터 검증부(232), 지식 데이터 적용부(233), 피드백 생성부(234)를 포함할 수 있다. 구축 작업 불가 확인부(231)는 도 6의 인스턴스 확인부(211_1)에 의해 제1 큐레이션 모듈(220_1)이 분배된 편집 인스턴스들을 처리하지 못하는 때에, 이를 확인하고, 도 5의 인스턴스 분배기(210)에 해당 편집 인스턴스들에 재분배를 요청할 수 있다. 요청에 응답하여, 인스턴스 분배기(210)는 각 큐레이션 모듈들의 상태 정보 및 큐레이션 모듈의 특성을 기반으로 해당 편집 인스턴스들에 대한 재분배를 수행할 수 있다.Referring to FIG. 7, the verifier 230 may include a non-construction confirmation unit 231, a knowledge data verification unit 232, a knowledge data application unit 233, and a feedback generation unit 234. The non-construction check unit 231 confirms this when the first curation module 220_1 cannot process the edited instances distributed by the instance confirmation unit 211_1 of FIG. 6, and checks the instance distributor 210 of FIG. 5 ) To redistribute the edited instances. In response to the request, the instance distributor 210 may redistribute the corresponding edit instances based on the status information of each curation module and the characteristics of the curation module.

지식 데이터 검증부(232)는 도 6의 큐레이션 모듈들(220_1~220_N)에 의해 수행되어 생성된 지식 데이터(또는, 지식 데이터 베이스)를 신뢰성이 보장된 검증 모듈을 이용하여 검증을 수행할 수 있다. 지식 데이터 검증부(232)는 검증 결과가 패스인 때에, 지식 데이터 적용부(233)가 생성된 지식 데이터들을 기존에 저장된 지식 데이터 베이스에 적용될 수 있도록 제어할 수 있다.The knowledge data verification unit 232 can perform verification by using the verification module with which reliability is guaranteed for the knowledge data (or knowledge database) generated by the curation modules 220_1 to 220_N of FIG. 6. have. The knowledge data verification unit 232 may control the knowledge data application unit 233 to apply the generated knowledge data to an existing stored knowledge database when the verification result is a pass.

피드백 생성부(234)는 지식 데이터 검증부(232)의 검증 결과를 포함하는 피드백 정보를 생성하여 도 5의 인스턴스 분배기(210)에 제공할 수 있다. 피드백 생성부(234)는 각 큐레이션 모듈들의 지식 데이터 베이스 구축 작업 실패 횟수를 카운팅하여 임계 값을 초과한 큐레이션 모듈에 대해서는 작업 불가 대상임을 알리는 정보를 생성하여 도 5의 인스턴스 분배기(210)에 제공할 수 있다. 인스턴스 분배기(210)는 이에 응답하여 작업 불가 대상에 해당하는 큐레이션 모듈에는 편집 인스턴스들을 분배하지 않을 수 있다. 피드백 생성부(234)는 도 6의 큐레이션 모듈들(220_1~220_N)에 대한 검증 결과를 누적하여 검증 결과를 기반으로 큐레이션 모듈들(220_1~220_N)의 성능에 대한 정보를 지속적으로 업데이트할 수 있다. 성능에 대한 정보는 하나의 인스턴스에 대한 처리 속도, 구축 작업 실패 횟수 등의 정보를 포함할 수 있다. 도 5의 인스턴스 분배기(210)는 피드백 정보에 포함된 성능에 대한 정보를 기반으로 편집 인스턴스들에 대한 분배를 수행할 수 잇다.The feedback generation unit 234 may generate feedback information including the verification result of the knowledge data verification unit 232 and provide it to the instance distributor 210 of FIG. 5. The feedback generation unit 234 counts the number of failures in the construction of the knowledge database of each curation module, and generates information indicating that the operation is impossible for the curation module exceeding the threshold value, to the instance distributor 210 of FIG. 5 Can provide. In response to this, the instance distributor 210 may not distribute the edited instances to the curation module corresponding to the non-workable object. The feedback generator 234 accumulates verification results for the curation modules 220_1 to 220_N of FIG. 6 to continuously update information about the performance of the curation modules 220_1 to 220_N based on the verification result. Can be. The performance information may include information such as processing speed for one instance and the number of failed construction tasks. The instance divider 210 of FIG. 5 may perform distribution for edit instances based on information about performance included in the feedback information.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 다른 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의하여 정해져야 할 것이다.Although the present invention has been described with reference to the embodiments shown in the drawings, these are merely exemplary, and those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the appended claims.

Claims

A service providing unit providing service data requested to the application with reference to the knowledge database;
A service data management unit that manages the service data based on an ontology schema defined according to a service purpose;
An interface unit that receives open data from different external data sources; And
And a data integration unit that generates a plurality of integrated data based on a data integration rule generated by analyzing the relationship of the plurality of open data, and builds the knowledge database based on the integrated data and the ontology schema,
The data integration unit,
An instance distributor for distributing a plurality of instances of the ontology schema for the knowledge database construction task based on the curation module information indicating characteristics of the curation module and feedback information indicating a result of performing the task of the curation module;
A curation module group provided with a plurality of curation modules that respectively receive the plurality of instances distributed from the instance distributor and individually perform the knowledge database construction task; And
And a verifier generating the feedback information by verifying the result of the knowledge database construction work of the curation module group.

According to claim 1,
The instance distributor,
A knowledge database building system, characterized in that a plurality of instances of the ontology schema are divided and distributed to the curation modules according to the type of an instance that each of the curation modules can perform based on the curation module information.

According to claim 1,
The instance distributor,
A knowledge database construction system characterized by distributing the construction task by selecting only a part of the plurality of attributes corresponding to the ontology schema instance in consideration of the instance type.

According to claim 1,
The curation module,
It is determined whether processing of a plurality of instances of the knowledge database construction job received from the instance distributor is possible, and a determination result is provided to the verifier, and the knowledge database construction job is performed based on the determination result Knowledge database construction system, characterized in that.

The method of claim 4,
The curation module,
The knowledge database construction system further comprises an instance adding selector for adding at least one processable instance in addition to the plurality of instances of the knowledge database construction job received based on the ontology schema.

The method of claim 5,
The curation module,
When the knowledge database construction work for the plurality of instances is completed, further includes a status output unit that provides a signal to the instance distributor indicating that the work is completed,
The instance distributor,
Knowledge database construction system, characterized in that to redistribute the at least one instance that is determined to be non-processable from other curation modules in the curation module group in response to the signal indicating the completion of the operation.

According to claim 1,
The verifier,
The verification results are accumulated for each curation module in the curation module group, and information about the performance of the plurality of curation modules is continuously updated based on the verification result,
The feedback information includes information about the performance, knowledge database construction system.

According to claim 1,
The verifier,
Verify the knowledge database construction work performed for each curation module in the curation module group, and provide the verification results to the curation modules,
The curation modules,
Based on the verification result, the knowledge database construction system is performed again or the knowledge database construction system is completed.