KR20210041365A

KR20210041365A - Distributed system and method for integrating knowledge

Info

Publication number: KR20210041365A
Application number: KR1020190123975A
Authority: KR
Inventors: 김광민; 양승원
Original assignee: 주식회사 솔트룩스
Priority date: 2019-10-07
Filing date: 2019-10-07
Publication date: 2021-04-15
Also published as: WO2021070998A1; KR102279109B1

Abstract

According to an exemplary embodiment of the present invention, a data integration system accessing a plurality of remote data conversion systems for a distributed system for knowledge integration, includes: a job selection part requesting a request from a manager, and decoding the request; a conversion management part ordering at least one data conversion system to perform a conversion job for source data in response to a conversion request from the manager, and receiving the converted source data from the at least one data conversion system; and a data integration part formed to access a metadata storage storing metadata used for knowledge integration, and integrating the converted source data with a knowledge base based on a mapping table between knowledge domains stored in the metadata storage, in response to an integration request from the manager.

Description

Distributed system and method for knowledge integration {DISTRIBUTED SYSTEM AND METHOD FOR INTEGRATING KNOWLEDGE}

본 발명의 기술적 사상은 지식 데이터에 관한 것으로서, 자세하게는 산재된 지식 데이터의 통합을 위한 분산 시스템 및 방법에 관한 것이다.The technical idea of the present invention relates to knowledge data, and in detail, to a distributed system and method for integrating scattered knowledge data.

본 발명은 미래창조과학부 SW컴퓨팅원천기술개발사업(SW)의 일환으로 (주)솔트룩스가 주관하고 연구하여 수행된 연구로부터 도출된 것이다. [연구기간: 2019.01.01~2019.12.31, 연구관리 전문기관: 정보통신기술진흥센터, 연구과제명: WiseKB: 빅데이터 이해 기반 자가학습형 지식베이스 및 추론 기술 개발, 과제 고유번호: 2013-0-00109]The present invention is derived from research conducted and conducted by Saltlux Co., Ltd. as part of the SW Computing Source Technology Development Project (SW) of the Ministry of Science, ICT and Future Planning. [Research Period: 2019.01.01~2019.12.31, Research management professional institution: Information and Communication Technology Promotion Center, Research project name: WiseKB: Development of self-learning knowledge base and reasoning technology based on understanding big data, project serial number: 2013-0 -00109]

지식 데이터는 다양한 지식 도메인들에 산재되어 있을 수 있다. 예를 들면, 지식 데이터는 분야에 따라 상이한 지식 도메인들에 각각 포함될 수도 있고, 지식 데이터의 서비스 방식에 따라 상이한 지식 도메인들에 각각 포함될 수도 있다. 또한, 지식 도메인들은 상이한 방식으로 지식 데이터를 저장하고 관리할 수 있다. 이와 같이 산재된 지식 데이터는, 지식 데이터에 대한 사용자의 접근을 용이하지 아니하게 할 수 있고, 중복된 지식 데이터가 복수의 지식 도메인들에 포함될 수도 있다. 이에 따라, 지식 데이터의 활용성 및 효율성을 향상시키기 위하여, 산재된 지식 데이터의 통합, 즉 지식 통합을 위한 방법이 요구될 수 있다.Knowledge data can be scattered across various knowledge domains. For example, the knowledge data may be included in different knowledge domains depending on the field, or may be included in different knowledge domains according to the service method of the knowledge data. Also, knowledge domains can store and manage knowledge data in different ways. The knowledge data scattered in this way may make it difficult for a user to access the knowledge data, and duplicate knowledge data may be included in a plurality of knowledge domains. Accordingly, in order to improve the usability and efficiency of knowledge data, a method for integrating scattered knowledge data, that is, integrating knowledge, may be required.

본 발명의 기술적 사상은, 복수의 데이터 변환 시스템들을 사용하는 지식 통합을 위한 분산 시스템 및 방법을 제공한다.The technical idea of the present invention provides a distributed system and method for integrating knowledge using a plurality of data conversion systems.

상기와 같은 목적을 달성하기 위하여, 본 발명의 기술적 사상의 일측면에 따라, 지식 통합을 위한 분산 시스템을 위하여 원격의 복수의 데이터 변환 시스템들에 액세스하는 데이터 통합 시스템은, 관리자로부터 요청을 수신하고, 요청을 디코딩하는 작업 선택부, 관리자의 변환 요청에 응답하여, 적어도 하나의 데이터 변환 시스템에 소스 데이터의 변환 작업을 지시하고, 적어도 하나의 데이터 변환 시스템으로부터 변환된 소스 데이터를 수신하는 변환 관리부, 및 지식 통합에 사용되는 메타데이터(metadata)를 저장하는 메타데이터 저장소에 액세스하도록 구성되고, 관리자의 통합 요청에 응답하여, 메타데이터 저장소에 저장된 지식 도메인간 맵핑 테이블에 기초하여 변환된 소스 데이터를 지식 베이스에 통합하는 데이터 통합부를 포함할 수 있다.In order to achieve the above object, in accordance with one aspect of the technical idea of the present invention, a data integration system for accessing a plurality of remote data conversion systems for a distributed system for knowledge integration receives a request from an administrator and , A job selection unit for decoding the request, a conversion management unit for instructing at least one data conversion system to convert the source data in response to a conversion request from the manager, and for receiving the converted source data from at least one data conversion system, And a metadata repository that stores metadata used for knowledge integration, and in response to an integration request from an administrator, knowledge of the converted source data based on the knowledge domain mapping table stored in the metadata repository. It may include a data integration unit to integrate into the base.

본 발명의 예시적 실시예에 따라, 변환 요청은, 소스 데이터가 포함된 지식 도메인의 식별 정보 및 지식 도메인에 포함된 소스 데이터의 적어도 하나의 식별자를 포함할 수 있다.According to an exemplary embodiment of the present invention, the conversion request may include identification information of a knowledge domain including source data and at least one identifier of source data included in the knowledge domain.

본 발명의 예시적 실시예에 따라, 데이터 통합부는, 맵핑 테이블 및 변환된 소스 데이터에 포함된 적어도 하나의 지식 개체의 적어도 하나의 제1 식별자에 기초하여 소스 인스턴스를 생성하는 지식 통합부, 및 소스 인스턴스를 정제하고, 정제된 소스 인스턴스를 지식 베이스에 통합하는 지식 적용부를 포함할 수 있다.According to an exemplary embodiment of the present invention, the data integration unit may include a knowledge integration unit that generates a source instance based on at least one first identifier of at least one knowledge entity included in the mapping table and the converted source data, and the source It may include a knowledge application unit that refines the instance and integrates the refined source instance into the knowledge base.

본 발명의 예시적 실시예에 따라, 지식 통합부는, 맵핑 테이블에서 적어도 하나의 제1 식별자를 검색하고, 검색 결과에 기초하여 지식 베이스로부터 적어도 하나의 지식 개체를 추출하는 지식 개체 선정부, 및 적어도 하나의 제1 식별자를 추출된 적어도 하나의 지식 개체의 적어도 하나의 제2 식별자로 변경함으로써 변환된 소스 데이터로부터 소스 인스턴스를 생성하는 소스 인스턴스 생성부를 포함할 수 있다.According to an exemplary embodiment of the present invention, the knowledge integration unit searches for at least one first identifier in the mapping table, and extracts at least one knowledge entity from the knowledge base based on the search result, and at least It may include a source instance generator that generates a source instance from the converted source data by changing one first identifier into at least one second identifier of the extracted at least one knowledge entity.

본 발명의 예시적 실시예에 따라, 지식 개체 선정부는, 맵핑 테이블에서 적어도 하나의 제1 식별자가 검색되지 아니한 경우, 변환된 소스 데이터에 기초하여 지식 베이스로부터 적어도 하나의 후보 지식 개체를 추출할 수 있다.According to an exemplary embodiment of the present invention, the knowledge entity selection unit may extract at least one candidate knowledge entity from the knowledge base based on the converted source data when at least one first identifier is not searched in the mapping table. have.

본 발명의 예시적 실시예에 따라, 지식 적용부는, 지식 베이스에 포함된 지식 인스턴스들의 형식에 기초하여 소스 인스턴스를 후처리하는 소스 인스턴스 후처리부, 및 후처리된 소스 인스턴스를 지식 베이스에 포함된 지식 인스턴스들과 비교함으로써 후처리된 소스 인스턴스를 지식 베이스에 선택적으로 통합하는 인스턴스 비교부를 포함할 수 있다.According to an exemplary embodiment of the present invention, the knowledge application unit includes a source instance post-processing unit that post-processes the source instance based on the type of knowledge instances included in the knowledge base, and the knowledge included in the post-processed source instance. It may include an instance comparison unit that selectively integrates the post-processed source instance into the knowledge base by comparing it with the instances.

본 발명의 예시적 실시예에 따라, 데이터 통합 시스템은, 관리자의 조회 요청에 응답하여, 변환 관리부 및 데이터 통합부에 의해서 생성된 데이터를 관리자에 제공하고, 관리자의 갱신 요청에 응답하여, 메타데이터를 갱신하는 데이터 관리부를 더 포함할 수 있다.According to an exemplary embodiment of the present invention, the data integration system provides the data generated by the conversion management unit and the data integration unit to the manager in response to an inquiry request from the manager, and in response to the manager's update request, the metadata It may further include a data management unit to update the.

본 발명의 기술적 사상의 일측면에 따른 지식 통합을 위한 분산 시스템은, 복수의 데이터 변환 시스템들, 및 지식 도메인간 맵핑 테이블에 기초하여, 원격의 복수의 데이터 변환 시스템들로부터 제공되는 변환된 소스 데이터를 지식 베이스에 통합하는 데이터 통합 시스템을 포함할 수 있고, 복수의 데이터 변환 시스템들 각각은, 데이터 통합 시스템으로부터 제공되는 지시에 응답하여 지식 베이스에 포함된 지식 인스턴스들의 구조에 기초하여 지식 도메인에 포함된 소스 데이터를 변환하는 복수의 변환 엔진들을 포함할 수 있다.Distributed system for knowledge integration according to an aspect of the present invention, based on a plurality of data conversion systems and a mapping table between knowledge domains, converted source data provided from a plurality of remote data conversion systems And a data integration system for integrating into the knowledge base, and each of the plurality of data conversion systems is included in the knowledge domain based on the structure of knowledge instances included in the knowledge base in response to an instruction provided from the data integration system. It may include a plurality of conversion engines for converting the source data.

본 발명의 예시적 실시예에 따라, 복수의 데이터 변환 시스템들 각각은, 지시에 기초하여 복수의 변환 모듈들 중 하나의 변환 모듈을 선택하고 복수의 변환 엔진들 중 적어도 하나에 선택된 변환 모듈을 제공하는 변환 엔진 관리부를 포함할 수 있다.According to an exemplary embodiment of the present invention, each of the plurality of data conversion systems selects one of the plurality of conversion modules based on the instruction and provides the selected conversion module to at least one of the plurality of conversion engines. It may include a conversion engine management unit.

본 발명의 예시적 실시예에 따라, 지시는, 지식 도메인의 식별 정보, 소스 데이터의 적어도 하나의 식별자 및 변환 모듈의 인덱스 중 적어도 하나를 포함할 수 있다.According to an exemplary embodiment of the present invention, the indication may include at least one of identification information of a knowledge domain, at least one identifier of source data, and an index of a conversion module.

본 발명의 예시적 실시예에 따라, 복수의 데이터 변환 시스템들 각각은, 변환된 소스 데이터를 저장하는 변환 데이터 저장소를 포함할 수 있고, 복수의 변환 엔진들 각각은, 지식 도메인으로부터 소스 데이터를 수집하고, 변환된 소스 데이터를 변환 데이터 저장소에 저장할 수 있다.According to an exemplary embodiment of the present invention, each of the plurality of data conversion systems may include a conversion data store that stores the converted source data, and each of the plurality of conversion engines collects source data from the knowledge domain. And, the converted source data can be stored in the conversion data storage.

본 발명의 기술적 사상에 따른 분산 시스템 및 방법에 의하면, 지식 통합을 위해 요구되는 데이터의 변환 및 데이터의 통합이 분리되고, 데이터의 변환이 복수의 데이터 변환 시스템들에 의해서 병렬적으로 처리됨으로써 방대한 지식 데이터가 분산 처리될 수 있고, 이에 따라 지식 통합이 효율적으로 용이하게 달성될 수 있다.According to the distributed system and method according to the technical idea of the present invention, data conversion and data integration required for knowledge integration are separated, and data conversion is processed in parallel by a plurality of data conversion systems, thereby providing a vast amount of knowledge. Data can be processed distributedly, and knowledge integration can thus be achieved efficiently and easily.

또한, 본 발명의 기술적 사상에 따른 분산 시스템 및 방법에 의하면, 통합된 지식 데이터를 포함하는 지식 베이스가 용이하게 구현될 수 있고, 이에 따라 지식 베이스에 기반한 다양한 서비스들의 유용성이 향상될 수 있다.In addition, according to the distributed system and method according to the technical idea of the present invention, a knowledge base including integrated knowledge data can be easily implemented, and thus the usefulness of various services based on the knowledge base can be improved.

본 발명의 실시예들에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 아니하며, 언급되지 아니한 다른 효과들은 이하의 본 발명의 실시예들에 대한 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 도출되고 이해될 수 있다. 즉, 본 발명을 실시함에 따른 의도하지 아니한 효과들 역시 본 발명의 실시예들로부터 당해 기술분야의 통상의 지식을 가진 자에 의해 도출될 수 있다.The effects obtainable in the embodiments of the present invention are not limited to the above-mentioned effects, and other effects that are not mentioned are common in the technical field to which the present invention pertains from the following description of the embodiments of the present invention. It can be clearly drawn and understood by those with knowledge. That is, unintended effects of carrying out the present invention may also be derived from the embodiments of the present invention by a person having ordinary skill in the art.

도 1은 본 발명의 예시적 실시예에 따른 지식 통합을 위한 분산 시스템을 나타내는 블록도이다.
도 2는 본 발명의 예시적 실시예에 따른 데이터 통합부의 예시를 나타내는 블록도이다.
도 3은 본 발명의 예시적 실시예에 따른 지식 통합부의 예시를 나타내는 블록도이다.
도 4a 및 도 4b는 본 발명의 예시적 실시예들에 따라 소스 인스턴스들이 생성되는 예시들을 나타내는 도면들이다.
도 5는 본 발명의 예시적 실시예에 따라 지식 개체 선정부의 동작의 예시를 나타내는 순서도이다.
도 6은 본 발명의 예시적 실시예에 따른 지식 적용부의 예시를 나타내는 블록도이다.
도 7은 본 발명의 예시적 실시예에 따른 데이터 변환 시스템의 예시를 나타내는 블록도이다.Fig. 1 is a block diagram showing a distributed system for integrating knowledge according to an exemplary embodiment of the present invention.
Fig. 2 is a block diagram showing an example of a data integration unit according to an exemplary embodiment of the present invention.
3 is a block diagram showing an example of a knowledge integration unit according to an exemplary embodiment of the present invention.
4A and 4B are diagrams illustrating examples in which source instances are generated according to exemplary embodiments of the present invention.
5 is a flowchart illustrating an example of an operation of a knowledge entity selection unit according to an exemplary embodiment of the present invention.
6 is a block diagram showing an example of a knowledge application unit according to an exemplary embodiment of the present invention.
Fig. 7 is a block diagram showing an example of a data conversion system according to an exemplary embodiment of the present invention.

이하, 첨부한 도면을 참조하여 본 발명의 실시예에 대해 상세히 설명한다. 본 발명의 실시예는 당 업계에서 평균적인 지식을 가진 자에게 본 발명을 보다 완전하게 설명하기 위하여 제공되는 것이다. 본 발명은 다양한 변경을 가할 수 있고 여러 가지 형태를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 개시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용한다. 첨부된 도면에 있어서, 구조물들의 치수는 본 발명의 명확성을 기하기 위하여 실제보다 확대하거나 축소하여 도시한 것이다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. The embodiments of the present invention are provided to more completely describe the present invention to those of ordinary skill in the art. In the present invention, various modifications may be made and various forms may be applied, and specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the present invention to a specific form disclosed, it should be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the present invention. In describing each drawing, similar reference numerals are used for similar elements. In the accompanying drawings, the dimensions of the structures are shown to be enlarged or reduced compared to the actual one for clarity of the present invention.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수개의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성 요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In the present application, terms such as "comprise" or "have" are intended to designate the presence of features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, but one or more other features. It is to be understood that the presence or addition of elements or numbers, steps, actions, components, parts, or combinations thereof, does not preclude in advance the possibility of the presence or addition.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 갖는다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein including technical or scientific terms have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms as defined in a commonly used dictionary should be construed as having a meaning consistent with the meaning in the context of the related technology, and should not be interpreted as an ideal or excessively formal meaning unless explicitly defined in this application. .

이하 도면 및 설명에서, 하나의 블록으로 표시 또는 설명되는 구성요소는 하드웨어 블록 또는 소프트웨어 블록일 수 있다. 예를 들면, 구성요소들 각각은 서로 신호를 주고 받는 독립적인 하드웨어 블록일 수도 있고, 또는 하나의 프로세서에서 실행되는 소프트웨어 블록일 수도 있다.In the following drawings and description, a component indicated or described as a single block may be a hardware block or a software block. For example, each of the components may be an independent hardware block that exchanges signals with each other, or may be a software block that is executed in one processor.

도 1은 본 발명의 예시적 실시예에 따른 지식 통합을 위한 분산 시스템을 나타내는 블록도이다. 분산 시스템은 데이터 통합 시스템(110) 및 복수의 데이터 변환 시스템들(120)을 포함할 수 있고, 관리자(10)와 통신할 수 있으며, 메타데이터 저장소(130) 및 지식 베이스(140)에 액세스할 수 있다. 일부 실시예들에서, 분산 시스템은 메타데이터 저장소(130) 및/또는 지식 베이스(140)를 포함할 수도 있다. 본 명세서에서, 메타데이터 저장소(130) 및 지식 베이스(140)는 분산 시스템의 외부에 있는 것으로 설명될 것이나, 본 발명의 예시적 실시예들이 이에 제한되지 아니하는 점이 유의된다.Fig. 1 is a block diagram showing a distributed system for integrating knowledge according to an exemplary embodiment of the present invention. The distributed system may include a data integration system 110 and a plurality of data conversion systems 120, may communicate with the manager 10, and access the metadata repository 130 and the knowledge base 140. I can. In some embodiments, the distributed system may include a metadata repository 130 and/or a knowledge base 140. In this specification, the metadata repository 130 and the knowledge base 140 will be described as being external to the distributed system, but it is noted that exemplary embodiments of the present invention are not limited thereto.

분산 시스템은 산재된 지식 데이터(본 명세서에서 소스 데이터로서 지칭될 수 있다)를 지식 베이스(140)에 통합하는 동작을 수행할 수 있다. 관리자(10)는 분산 시스템에 의한 지식 데이터의 통합(본 명세서에서 지식 통합으로 지칭될 수 있다)을 관리하는 임의의 주체일 수 있다. 예를 들면, 관리자(10)는 분산 시스템에 지식 통합을 위한 동작(예컨대, 후술되는 변환 동작, 통합 동작 등)을 지시할 수도 있고, 분산 시스템에 의해서 수행되는 지식 데이터의 통합 과정을 조회할 수도 있으며, 지식 데이터의 통합에 사용되는 데이터, 예컨대 메타데이터 저장소(130)에 저장된 메타데이터를 갱신함으로써 지식 데이터의 통합을 제어할 수도 있다. 관리자(10)는 다양한 방식으로 데이터 통합 시스템(110)과 통신할 수 있다. 예를 들면, 관리자(10)는 데이터 통합 시스템(110)과 통신 채널을 통해서 통신하는 임의의 단말기를 지칭할 수 있고, 단말기는 관리자(10)로부터 수신된 입력에 따라 통신 채널을 통해서 데이터 통합 시스템(110)에 요청을 제공할 수 있고, 요청에 따른 응답을 데이터 통합 시스템(110)으로부터 수신하여 관리자(10)에 제공할 수도 있다. 일부 실시예들에서, 관리자(10) 및 데이터 통합 시스템(110) 사이 통신 채널은 인터넷과 같은 네트워크를 경유하여 형성될 수도 있고, 일대일 직접 통신일 수도 있다.The distributed system may perform an operation of integrating scattered knowledge data (which may be referred to herein as source data) into the knowledge base 140. The manager 10 may be any entity that manages the integration of knowledge data (which may be referred to herein as knowledge integration) by a distributed system. For example, the manager 10 may instruct the distributed system to perform an operation for knowledge integration (e.g., a conversion operation, an integration operation, etc. to be described later), or query the integration process of knowledge data performed by the distributed system. In addition, the integration of knowledge data may be controlled by updating data used for integration of knowledge data, for example, metadata stored in the metadata storage 130. The manager 10 can communicate with the data integration system 110 in a variety of ways. For example, the manager 10 may refer to any terminal that communicates with the data integration system 110 through a communication channel, and the terminal is a data integration system through a communication channel according to an input received from the manager 10. A request may be provided to 110, and a response according to the request may be received from the data integration system 110 and provided to the manager 10. In some embodiments, the communication channel between the manager 10 and the data integration system 110 may be established via a network such as the Internet, or may be direct one-to-one communication.

메타데이터 저장소(130)는 지식 데이터의 통합에 사용되는 메타데이터(metadata)를 저장할 수 있다. 예를 들면, 메타데이터 저장소(130)는 지식 데이터의 통합 과정에서 생성된 중간 데이터(예컨대, 도 2의 132, 136 등)를 저장할 수도 있고, 지식 데이터의 통합 과정에서 참조되는 데이터(예컨대, 도 2의 134)를 저장할 수도 있다. 일부 실시예들에서, 관리자(10)는 데이터 통합 시스템(110)을 통해서 메타데이터 저장소(130)에 저장된 메타데이터의 일부를 갱신할 수도 있다. 메타데이터 저장소(130)는 독립적인 시스템(예컨대, 서버)으로서 네트워크를 통해서 데이터 통합 시스템(110)과 통신할 수도 있고, 데이터 통합 시스템(110)과 일대일 통신 채널을 통해서 연결된 로컬 저장소일 수도 있다. 메타데이터 저장소(130)의 예시가 도 2 등을 참조하여 후술될 것이다.The metadata storage 130 may store metadata used for integrating knowledge data. For example, the metadata storage 130 may store intermediate data (eg, 132, 136 in FIG. 2, etc.) generated in the process of integrating knowledge data, and data referenced in the process of integrating knowledge data (eg, FIG. 2 of 134) can also be stored. In some embodiments, the manager 10 may update some of the metadata stored in the metadata repository 130 through the data integration system 110. The metadata storage 130 may communicate with the data integration system 110 through a network as an independent system (eg, a server), or may be a local storage connected to the data integration system 110 through a one-to-one communication channel. An example of the metadata storage 130 will be described later with reference to FIG. 2 and the like.

지식 베이스(knowledge base)(140)는 통합된 지식 데이터를 포함하는 시스템을 지칭할 수 있다. 지식 베이스(140)는 미리 정의된 구조에 따라 지식 데이터를 저장할 수 있고, 요청(예컨대, 쿼리)에 따라 지식 데이터를 외부에 제공할 수 있다. 예를 들면, 지식 베이스(140)는 RDF(Resource Description Framework) 구조로 표현된 지식 인스턴스를 포함할 수 있고, 지식 인스턴스들의 집합은 지식 그래프로서 지칭될 수 있다. 본 명세서에서, 지식 인스턴스는, 예컨대 "이순신의 생일은 1545년 4월 28일이다"와 같은 지식을 나타내는 단위를 지칭할 수 있다. RDF 구조에서 지식 인스턴스는 트리플(triple)로 표현될 수 있다. 트리플은, 예컨대 "이순신 / 생일 / 1545-04-28"과 같이 주어(subject), 술어(predicate), 목적어(object)로 표현될 수 있다. "이순신"은 지식 개체 또는 단순하게 개체로 지칭될 수 있으며, 지식 베이스(140)에서 고유한 식별자(본 명세서에서 제2 식별자로서 지칭될 수 있다), 예컨대 URI(Uniform Resource Identifier)를 가질 수 있다. 또한, "생일"은 주어 및 목적어의 관계를 나타내는 속성으로 지칭될 수 있다. 트리플에서 목적어는 상기 "1545년 4월 28일"과 같이 값에 대응할 수도 있고, "이순신 / 직업 / 군인"과 같이 술어가 개체간 관계를 나타내는 경우 주어와 상이한 개체에 대응할 수도 있다. 지식 베이스(140)와 상이한 지식 도메인에서도 지식 그래프가 구현될 수 있고, 이에 따라 지식 통합은 지식 그래프 통합으로 지칭될 수 있다.The knowledge base 140 may refer to a system including integrated knowledge data. The knowledge base 140 may store knowledge data according to a predefined structure, and may provide knowledge data to the outside according to a request (eg, a query). For example, the knowledge base 140 may include a knowledge instance expressed in a Resource Description Framework (RDF) structure, and a set of knowledge instances may be referred to as a knowledge graph. In the present specification, the knowledge instance may refer to a unit representing knowledge, for example, "Yi Soon-shin's birthday is April 28, 1545." In the RDF structure, knowledge instances can be expressed as triples. The triple may be expressed as a subject, a predicate, and an object, such as "Lee Soon Shin / birthday / 1545-04-28". "Yi Soon-Shin" may be referred to as a knowledge entity or simply an entity, and may have a unique identifier (which may be referred to as a second identifier in this specification) in the knowledge base 140, for example, a Uniform Resource Identifier (URI). . In addition, “birthday” may be referred to as an attribute representing the relationship between a subject and an object. In the triple, the object may correspond to a value such as "April 28, 1545", or may correspond to an entity different from the subject when the predicate indicates a relationship between entities, such as "Lee Soon Shin / Occupation / Soldier". A knowledge graph may be implemented in a knowledge domain different from the knowledge base 140, and accordingly, knowledge integration may be referred to as knowledge graph integration.

지식 도메인에 산재된 지식 데이터는 분산 시스템에 의해서 지식 베이스(140)에 통합될 수 있다. 지식 데이터의 통합은 분산된 지식 데이터를 지식 베이스(140)의 구조에 따라 변환하는 데이터 변환 동작 및 변환된 지식 데이터를 지식 베이스(140)에 포함된 지식 데이터, 즉 지식 인스턴스들과 통합하는 데이터 통합 동작을 포함할 수 있다. 분산 시스템에서, 데이터 변환 동작은 복수의 데이터 변환 시스템들(120)에 의해서 병렬 수행될 수 있는 한편, 데이터 통합 동작은 데이터 통합 시스템(110)에 의해서 수행될 수 있다. 이와 같이, 분산 시스템은 지식 데이터의 통합을 데이터 변환 동작 및 데이터 통합 동작으로 분리하고, 데이터 변환 동작을 병렬 처리함으로써 방대한 지식 데이터가 분산 처리될 수 있으며, 지식 데이터의 통합이 효율적으로 그리고 용이하게 달성될 수 있다. 또한, 통합된 지식 데이터를 포함하는 지식 베이스(140)가 분산 시스템에 의해서 용이하게 구현됨으로써, 지식 베이스(140)에 기반한 다양한 서비스들이 가능할 수 있고, 그러한 서비스들의 유용성이 현저하게 향상될 수 있다.Knowledge data scattered in the knowledge domain may be integrated into the knowledge base 140 by a distributed system. The integration of knowledge data is a data conversion operation that converts distributed knowledge data according to the structure of the knowledge base 140, and data integration that integrates the converted knowledge data with knowledge data included in the knowledge base 140, that is, knowledge instances. May include actions. In a distributed system, the data conversion operation may be performed in parallel by a plurality of data conversion systems 120, while the data integration operation may be performed by the data integration system 110. In this way, a distributed system separates the integration of knowledge data into a data conversion operation and a data integration operation, and parallel processing of the data conversion operation enables a large amount of knowledge data to be distributedly processed, and the integration of knowledge data is achieved efficiently and easily. Can be. In addition, since the knowledge base 140 including the integrated knowledge data is easily implemented by a distributed system, various services based on the knowledge base 140 may be possible, and the usefulness of such services may be remarkably improved.

데이터 통합 시스템(110)은, 도 1에 도시된 바와 같이, 관리자(10)와 통신할 수 있고, 복수의 데이터 변환 시스템들(120)과 통신할 수 있다. 일부 실시예들에서, 데이터 통합 시스템(110)은 인터넷과 같은 네트워크를 통해서 복수의 데이터 변환 시스템들(120)과 통신할 수 있다. 예를 들면, 복수의 데이터 변환 시스템들(120)은 데이터 통합 시스템(110)으로부터 원격지들에 배치될 수 있고, 복수의 데이터 변환 시스템들(120) 상호간에도 멀리 떨어져 있을 수 있다. 또한, 데이터 통합 시스템(110)은 메타데이터 저장소(130) 및 지식 베이스(140)와 통신할 수 있다. 도 1에 도시된 바와 같이, 데이터 통합 시스템(110)은 작업 선택부(112), 변환 관리부(114), 데이터 통합부(116) 및 데이터 관리부(118)를 포함할 수 있다.The data integration system 110 may communicate with the manager 10 and may communicate with a plurality of data conversion systems 120, as shown in FIG. 1. In some embodiments, the data integration system 110 may communicate with a plurality of data conversion systems 120 over a network such as the Internet. For example, the plurality of data conversion systems 120 may be disposed at remote locations from the data integration system 110, and the plurality of data conversion systems 120 may be separated from each other. In addition, the data integration system 110 may communicate with the metadata storage 130 and the knowledge base 140. As shown in FIG. 1, the data integration system 110 may include a job selection unit 112, a conversion management unit 114, a data integration unit 116, and a data management unit 118.

작업 선택부(112)는 관리자(10)로부터 요청을 수신할 수 있고, 수신된 요청을 디코딩할 수 있다. 예를 들면, 작업 선택부(112)는 관리자(10)로부터 변환 요청을 수신할 수 있고, 변환 요청을 디코딩함으로써 변환 동작의 지시 및 변환 동작을 위한 정보를 변환 관리부(114)에 제공할 수 있다. 또한, 작업 선택부(112)는 관리자(10)로부터 통합 요청을 수신할 수 있고, 통합 요청을 디코딩함으로써 통합 요청의 지시 및 통합 동작을 위한 정보를 데이터 통합부(116)에 제공할 수 있다.The job selection unit 112 may receive a request from the manager 10 and may decode the received request. For example, the job selection unit 112 may receive a conversion request from the manager 10 and may provide instructions for a conversion operation and information for the conversion operation to the conversion management unit 114 by decoding the conversion request. . In addition, the job selection unit 112 may receive an integration request from the manager 10 and may provide information for an indication of the integration request and integration operation to the data integration unit 116 by decoding the integration request.

변환 관리부(114)는 복수의 데이터 변환 시스템들(120)과 통신할 수 있다. 변환 관리부(114)는 관리자(10)의 변환 요청에 응답하여, 복수의 데이터 변환 시스템들(120) 중 적어도 하나의 데이터 변환 시스템에 소스 데이터의 변환 작업을 지시할 수 있고, 적어도 하나의 데이터 변환 시스템으로부터 변환된 소스 데이터를 수신할 수 있다. 예를 들면, 변환 관리부(114)는 변환 요청에 응답하여, 제1 데이터 변환 시스템(120_1) 및/또는 제k 데이터 변환 시스템(120_k)에 변환 작업을 지시할 수 이고, 제1 데이터 변환 시스템(120_1) 및/또는 제k 변환 시스템(120_k)로부터 변환된 소스 데이터를 수신할 수 있다. 일부 실시예들에서, 관리자(10)가 제공하는 변환 요청은, 변환될 소스 데이터가 포함된 지식 도메인의 식별 정보, 지식 도메인에 포함된 소스 데이터 중 변환의 대상이 되는 소스 데이터를 식별하기 위한 적어도 하나의 식별자(본 명세서에서 제1 식별자로서 지칭될 수 있다)를 포함할 수 있다. 일부 실시예들에서, 변환 관리부(114)는 복수의 데이터 변환 시스템들(120) 각각의 부하들을 고려하여 적어도 하나의 데이터 변환 시스템을 선정할 수 있고, 선정된 적어도 하나의 데이터 변환 시스템에 변환 작업의 지시 및 변환 요청에 포함된 정보를 제공할 수 있다. 변환 관리부(114)는 적어도 하나의 데이터 변환 시스템으로부터 수신된 변환된 소스 데이터를 데이터 통합부(116)에 제공할 수 있다.The conversion management unit 114 may communicate with a plurality of data conversion systems 120. The conversion management unit 114 may instruct at least one data conversion system of the plurality of data conversion systems 120 to convert the source data in response to the conversion request of the manager 10, and convert at least one data The converted source data can be received from the system. For example, in response to the conversion request, the conversion management unit 114 may instruct the first data conversion system 120_1 and/or the k-th data conversion system 120_k to perform conversion, and the first data conversion system ( 120_1) and/or the converted source data from the k-th conversion system 120_k may be received. In some embodiments, the conversion request provided by the manager 10 is at least for identifying source data to be converted among the source data included in the knowledge domain and identification information of the knowledge domain including the source data to be converted. It may include one identifier (which may be referred to herein as a first identifier). In some embodiments, the conversion management unit 114 may select at least one data conversion system in consideration of the loads of each of the plurality of data conversion systems 120, and perform conversion on at least one selected data conversion system. Information included in the instruction and conversion request of can be provided. The conversion management unit 114 may provide the converted source data received from at least one data conversion system to the data integration unit 116.

데이터 통합부(116)는 변환 관리부(114)로부터 변환된 소스 데이터를 수신할 수 있고, 메타데이터 저장소(130) 및 지식 베이스(140)에 액세스할 수 있다. 데이터 통합부(116)는 관리자(10)의 통합 요청에 응답하여, 메타데이터 저장소(130)에 저장된 데이터를 참조함으로써 변환 관리부(114)로부터 제공된 변환된 소스 데이터를 지식 베이스(140)에 통합할 수 있다. 데이터 통합부(116)는 통합 과정에서 메타데이터 저장소(130)에 저장된 데이터를 처리할 수 있고, 처리된 데이터를 메타데이터 저장소(130)에 저장할 수도 있다. 데이터 통합부(116)의 예시는 도 2 등을 참조하여 후술될 것이다.The data integration unit 116 may receive the converted source data from the conversion management unit 114 and may access the metadata storage 130 and the knowledge base 140. The data integration unit 116 may integrate the converted source data provided from the conversion management unit 114 into the knowledge base 140 by referring to the data stored in the metadata storage 130 in response to the integration request of the manager 10. I can. The data integration unit 116 may process data stored in the metadata storage 130 during the integration process, and may store the processed data in the metadata storage 130. An example of the data integration unit 116 will be described later with reference to FIG. 2 and the like.

데이터 관리부(118)는, 관리자(10)의 조회 요청에 응답하여 메타데이터 저장소(130)에 저장된 메타데이터의 적어도 일부, 예컨대 변환 관리부(114) 및 데이터 통합부(116)에 의해서 생성된 데이터(예컨대, 도 2의 132, 136)를 작업 선택부(112)를 통해서 관리자(10)에 제공할 수도 있고, 관리자(10)의 갱신 요청에 응답하여 메타데이터 저장소(130)에 저장된 메타데이터의 적어도 일부(예컨대, 도 2의 134)를 갱신할 수도 있다. 즉, 데이터 관리부(118)는 분산 시스템에 대한 관리자(10)의 큐레이션을 지원할 수 있다.Data management unit 118, in response to the inquiry request of the manager 10, at least a part of the metadata stored in the metadata storage 130, for example, data generated by the conversion management unit 114 and the data integration unit 116 ( For example, 132 and 136 of FIG. 2 may be provided to the manager 10 through the job selection unit 112, and at least of the metadata stored in the metadata storage 130 in response to an update request of the manager 10 Some (eg, 134 in FIG. 2) may be updated. That is, the data management unit 118 may support curation of the manager 10 for a distributed system.

복수의 데이터 변환 시스템들(120) 각각은 데이터 통합 시스템(110)의 변환 관리부(114)의 지시에 기초하여 소스 데이터, 즉 지식 도메인에 포함된 지식 데이터를 변환할 수 있다. 예를 들면, 지식 도메인은 고유한 방식으로 지식 데이터, 즉 소스 데이터를 저장할 수 있다. 예를 들면, 지식 도메인은, 지식 베이스(140)와 유사하게 RDF 구조에 따라 지식 데이터를 저장할 수도 있고, 다른 임의의 구조에 따라 지식 데이터를 저장할 수도 있다. 또한, 지식 도메인은 CSV, TXT, LOD(Linked Open Data) 등과 같이 고유한 형식에 따라 지식 데이터를 저장할 수도 있다. 이에 따라, 지식 도메인에 포함된 지식 데이터를 데이터 통합 시스템(110)이 처리하기 위하여, 복수의 데이터 변환 시스템들(120)은 지식 도메인의 지식 데이터를 공통의 구조 및 형식으로 변환할 수 있다. 본 명세서에서, 지식 도메인의 소스 데이터가 복수의 데이터 변환 시스템들(120)에 의해서 변환된 데이터는 변환된 소스 데이터로서 지칭될 수 있다. 도 1에 도시된 바와 같이, 제1 데이터 변환 시스템(120_1)은 변환 데이터 저장소(122_1) 및 복수의 변환 엔진들(124_1)을 포함할 수 있고, 제k 데이터 변환 시스템(120_k)은 변환 데이터 저장소(122_k) 및 복수의 변환 엔진들(124_k)을 포함할 수 있다. Each of the plurality of data conversion systems 120 may convert source data, that is, knowledge data included in the knowledge domain, based on an instruction of the conversion management unit 114 of the data integration system 110. For example, a knowledge domain may store knowledge data, i.e. source data, in a unique way. For example, similar to the knowledge base 140, the knowledge domain may store knowledge data according to an RDF structure, or may store knowledge data according to another arbitrary structure. In addition, the knowledge domain may store knowledge data according to a unique format such as CSV, TXT, and Linked Open Data (LOD). Accordingly, in order for the data integration system 110 to process the knowledge data included in the knowledge domain, the plurality of data conversion systems 120 may convert the knowledge data of the knowledge domain into a common structure and format. In this specification, data in which the source data of the knowledge domain is converted by the plurality of data conversion systems 120 may be referred to as converted source data. 1, the first data conversion system 120_1 may include a conversion data storage 122_1 and a plurality of conversion engines 124_1, and the k-th data conversion system 120_k is a conversion data storage (122_k) and a plurality of conversion engines 124_k.

복수의 데이터 변환 시스템들(120) 각각은 독립적인 변환 데이터 저장소를 포함할 수 있고, 이에 따라 소스 데이터의 대용량 처리를 지원할 수 있다. 도 1에 도시된 바와 상이하게, 단일 시스템(또는 단일 서버)에서 데이터의 변환 및 데이터의 통합이 수행되는 경우, 방대한 소스 데이터의 처리가 비효율적일 수 있다. 복수의 데이터 변환 시스템들(120) 각각은 변환 데이터 저장소뿐만 아니라 변환 동작을 각각 독립적으로 수행할 수 있는 복수의 변환 엔진들을 포함할 수 있고, 이에 따라 데이터 변환 시스템 내부에서도 변환 동작이 병렬적으로 수행될 수 있다. 일부 실시예들에서, 하나의 데이터 변환 시스템은 하나의 지식 도메인에 포함된 소스 데이터를 변환할 수 있다. 또한, 일부 실시예들에서, 2이상의 데이터 변환 시스템들이 하나의 지식 도메인에 포함된 상이한 소스 데이터를 각각 변환할 수도 있다. 이와 같은 복수의 데이터 변환 시스템들(120)에 의한 변환 동작들은, 전술된 바와 같이 변환 관리부(114)에 의해서 스케줄링될 수 있다.Each of the plurality of data conversion systems 120 may include an independent converted data storage, thereby supporting large-capacity processing of source data. Different from that shown in FIG. 1, when data conversion and data integration are performed in a single system (or a single server), processing of massive source data may be inefficient. Each of the plurality of data conversion systems 120 may include a plurality of conversion engines capable of independently performing a conversion operation as well as a conversion data storage, and accordingly, a conversion operation is performed in parallel inside the data conversion system. Can be. In some embodiments, one data conversion system may convert source data included in one knowledge domain. Further, in some embodiments, two or more data conversion systems may each convert different source data included in one knowledge domain. Conversion operations by the plurality of data conversion systems 120 may be scheduled by the conversion management unit 114 as described above.

도 2는 본 발명의 예시적 실시예에 따른 데이터 통합부의 예시를 나타내는 블록도이다. 구체적으로, 도 2의 블록도는 데이터 통합부(200) 및 데이터 통합부(200)가 액세스하는 메타데이터 저장소(130) 및 지식 베이스(140)를 나타낸다. 도 1을 참조하여 전술된 바와 같이, 데이터 통합부(200)는 도 1의 데이터 통합 시스템(110)에 포함될 수 있고, 관리자(10)의 통합 요청에 응답하여, 변환 관리부(114)로부터 제공된 변환된 소스 데이터를 지식 베이스(140)에 통합할 수 있다. 도 2에 도시된 바와 같이, 데이터 통합부(200)는 지식 통합부(220) 및 지식 적용부(240)를 포함할 수 있고, 메타데이터 저장소(130)는 소스 인스턴스들(132), 맵핑 테이블(134) 및 후보 지식 개체들(136)을 포함할 수 있으며, 지식 베이스(140)는 지식 인스턴스들(142)을 포함할 수 있다. 이하에서 도 2는 도 1을 참조하여 설명될 것이다.Fig. 2 is a block diagram showing an example of a data integration unit according to an exemplary embodiment of the present invention. Specifically, the block diagram of FIG. 2 shows the data integration unit 200 and the metadata storage 130 and the knowledge base 140 accessed by the data integration unit 200. As described above with reference to FIG. 1, the data integration unit 200 may be included in the data integration system 110 of FIG. 1, and in response to an integration request from the manager 10, the conversion provided from the conversion management unit 114 The obtained source data may be integrated into the knowledge base 140. As shown in FIG. 2, the data integration unit 200 may include a knowledge integration unit 220 and a knowledge application unit 240, and the metadata storage 130 includes source instances 132 and a mapping table. 134 and candidate knowledge entities 136 may be included, and the knowledge base 140 may include knowledge instances 142. Hereinafter, FIG. 2 will be described with reference to FIG. 1.

지식 통합부(220)는 변환된 소스 데이터에 포함된 적어도 하나의 지식 개체의 적어도 하나의 제1 식별자에 기초하여 소스 인스턴스를 생성할 수 있다. 변환된 소스 데이터는 소스 데이터가 포함된 지식 도메인에서 고유한 식별자를 가지는 적어도 하나의 개체를 포함할 수 있다. 예를 들면, 전술된 예시 "이순신 / 직업 / 군인"에서, 개체들 "이순신", "직업" 및 "군인" 각각은 지식 도메인에서 고유한 식별자를 가질 수 있다. 맵핑 테이블(134)은 소스 데이터에 포함된 지식 개체의 식별자 및 지식 베이스(140)의 지식 인스턴스들(142)에 포함된 지식 개체의 식별자 사이 맵핑 관계를 정의할 수 있다. 이에 따라, 지식 통합부(220)는 변환된 소스 데이터로부터 맵핑 테이블(134)을 참조함으로써 소스 인스턴스들(132)을 생성할 수 있고, 메타데이터 저장소(130)에 저장할 수 있다. 또한, 지식 통합부(220)는 소스 데이터에 포함된 개체의 식별자가 맵핑 테이블(134)에 존재하지 아니하는 경우, 지식 베이스(140)로부터 후보 지식 개체들(136)을 검색할 수 있고, 메타데이터 저장소(130)에 저장할 수 있다. 지식 통합부(220)의 예시는 도 3을 참조하여 후술될 것이다.The knowledge integration unit 220 may generate a source instance based on at least one first identifier of at least one knowledge entity included in the converted source data. The converted source data may include at least one entity having a unique identifier in the knowledge domain including the source data. For example, in the above example "Yi / Occupation / Soldier", each of the entities "Yi Soon Shin", "Occupation" and "Soldier" may have a unique identifier in the knowledge domain. The mapping table 134 may define a mapping relationship between the identifier of the knowledge entity included in the source data and the identifier of the knowledge entity included in the knowledge instances 142 of the knowledge base 140. Accordingly, the knowledge integration unit 220 may create source instances 132 by referring to the mapping table 134 from the converted source data, and store them in the metadata storage 130. In addition, the knowledge integration unit 220 may search for candidate knowledge entities 136 from the knowledge base 140 when the identifier of the entity included in the source data does not exist in the mapping table 134, and It can be stored in the data storage 130. An example of the knowledge integration unit 220 will be described later with reference to FIG. 3.

지식 적용부(240)는 지식 통합부(220)에 의해서 생성된 소스 인스턴스들(132)을 정제할 수 있고, 정제된 소스 인스턴스들을 지식 베이스(140)에 통합할 수 있다. 예를 들면, 지식 적용부(240)는 소스 인스턴스들(132)을 검증할 수 있고, 소스 인스턴스들(132)을 지식 베이스(140)의 적어도 하나의 지식 인스턴스와 비교함으로써 지식 베이스(140)에 소스 인스턴스들(132)의 추가 여부를 판단할 수 있다. 지식 적용부(240)의 예시는 도 6을 참조하여 후술될 것이다.The knowledge application unit 240 may refine the source instances 132 generated by the knowledge integrator 220 and integrate the refined source instances into the knowledge base 140. For example, the knowledge application unit 240 may verify the source instances 132 and compare the source instances 132 with at least one knowledge instance of the knowledge base 140 to the knowledge base 140. Whether or not the source instances 132 are added may be determined. An example of the knowledge application unit 240 will be described later with reference to FIG. 6.

도 3은 본 발명의 예시적 실시예에 따른 지식 통합부의 예시를 나타내는 블록도이고, 도 4a 및 도 4b는 본 발명의 예시적 실시예들에 따라 소스 인스턴스들이 생성되는 예시들을 나타내는 도면들이다. 도 2를 참조하여 전술된 바와 같이, 지식 통합부(300)는 도 2의 데이터 통합부(200)(또는 도 1의 116)에 포함될 수 있고, 맵핑 테이블(134)을 참조하여, 변환된 소스 데이터로부터 소스 인스턴스들을 생성할 수 있다. 이하에서, 도 3, 도 4a 및 도 4b는 도 2에 대한 설명 중 도 2에 대한 설명과 중복되는 내용은 생략될 것이다.3 is a block diagram illustrating an example of a knowledge integration unit according to an exemplary embodiment of the present invention, and FIGS. 4A and 4B are diagrams illustrating examples in which source instances are generated according to exemplary embodiments of the present invention. As described above with reference to FIG. 2, the knowledge integrating unit 300 may be included in the data integrating unit 200 of FIG. 2 (or 116 of FIG. 1 ), and with reference to the mapping table 134, the converted source You can create source instances from the data. Hereinafter, descriptions of FIGS. 3, 4A, and 4B that overlap with those of FIG. 2 will be omitted.

도 3을 참조하면, 지식 통합부(300)는 지식 개체 선정부(320) 및 소스 인스턴스 생성부(340)를 포함할 수 있다. 지식 개체 선정부(320)는 변환된 소스 데이터에 포함된 지식 개체의 식별자를 맵핑 테이블(134)에서 검색할 수 있고, 검색 결과에 기초하여 지식 베이스(140)로부터 적어도 하나의 지식 개체를 추출할 수 있다. 예를 들면, 도 4a의 우측에 도시된 바와 같이, 지식 개체 선정부(320)는 변환된 소스 데이터의 예시(42) "이순신 / 직업 / 군인"에서 지식 개체 "이순신" 및 "군인"의 식별자로서 "wdata: 43" 및 "wdata: 30"을 각각 획득할 수 있다. 지식 개체 선정부(320)는 식별자 "wdata: 43" 및 "wdata: 30"를 도 4a의 좌측에 도시된 맵핑 테이블(41)에서 검색할 수 있고, 도 4a에 도시된 바와 같이, 식별자 "wdata: 43" 및 "wdata: 30"에 대응하는 식별자 "addr: 0102" 및 "addr; 0001"를 획득할 수 있다. 비록 도 4a에 도시되지 아니하였으나, 술어(P)도 지식 도메인에서 하나의 지식 개체로서 식별자를 가질 수 있고, 맵핑 테이블(41)은 술어(P)의 식별자들간 맵핑 관계도 포함할 수 있다. 유사하게, 도 4b의 우측에 도시된 바와 같이, 지식 개체 선정부(320)는 변환된 소스 데이터의 예시(44) "이순신 / 생일 / 1545-04-28"에서 지식 개체 "이순신"의 식별자로서 "wdata: 43"을 획득할 수 있고, 맵핑 테이블(41)을 참조하여 식별자로서 "wdata: 43"에 대응하는 식별자 "addr: 0102"를 획득할 수 있다.Referring to FIG. 3, the knowledge integration unit 300 may include a knowledge entity selection unit 320 and a source instance generation unit 340. The knowledge entity selection unit 320 may search the mapping table 134 for the identifier of the knowledge entity included in the converted source data, and extract at least one knowledge entity from the knowledge base 140 based on the search result. I can. For example, as shown on the right side of FIG. 4A, the knowledge entity selection unit 320 is an example 42 of the converted source data. As "wdata: 43" and "wdata: 30" can be obtained, respectively. The knowledge entity selection unit 320 may search for the identifiers "wdata: 43" and "wdata: 30" in the mapping table 41 shown on the left side of FIG. 4A, and as shown in FIG. 4A, the identifier "wdata: Identifiers "addr: 0102" and "addr; 0001" corresponding to: 43" and "wdata: 30" may be obtained. Although not shown in FIG. 4A, the predicate P may also have an identifier as one knowledge entity in the knowledge domain, and the mapping table 41 may also include a mapping relationship between the identifiers of the predicate P. Similarly, as shown on the right side of FIG. 4B, the knowledge entity selection unit 320 is used as the identifier of the knowledge entity "Lee Soon-shin" in the example 44 of the converted source data "Lee Soon-Shin / Birthday / 1545-04-28" "wdata: 43" may be obtained, and an identifier "addr: 0102" corresponding to "wdata: 43" as an identifier may be obtained by referring to the mapping table 41.

도 4a 및 도 4b에 도시된 예시들과 상이하게, 도 3의 맵핑 테이블(134)에서 식별자가 검색되지 아니한 경우, 지식 개체 선정부(320)는 지식 베이스(140)에 액세스할 수 있고, 지식 베이스(140)의 지식 인스턴스들에 포함된 지식 개체들을 추출함으로써 후보 지식 개체들(136)을 생성할 수 있다. 도 3의 맵핑 테이블(134)에서 식별자가 검색되지 아니한 경우, 지식 개체 선정부(320)의 동작의 예시는 도 5를 참조하여 후술될 것이다.Unlike the examples shown in FIGS. 4A and 4B, when the identifier is not searched in the mapping table 134 of FIG. 3, the knowledge entity selection unit 320 can access the knowledge base 140 and Candidate knowledge entities 136 may be generated by extracting knowledge entities included in the knowledge instances of the base 140. When the identifier is not found in the mapping table 134 of FIG. 3, an example of the operation of the knowledge entity selection unit 320 will be described later with reference to FIG. 5.

소스 인스턴스 생성부(340)는 변환된 소스 데이터에 포함된 지식 개체의 식별자를 지식 개체 선정부(320)에 의해서 제공되는 식별자로 변경함으로써 소스 인스턴스들(132)을 생성할 수 있다. 예를 들면, 도 4a의 우측에 도시된 바와 같이, 소스 인스턴스 생성부(340)는 지식 도메인의 식별자들 "wdata: 43" 및 "wdata: 30"을 포함하는 변환된 소스 데이터의 예시(42)로부터 지식 베이스(140)에 포함된 지식 개체의 식별자들 "addr: 0102" 및 "addr: 0001"을 포함하는 소스 인스턴스(43)를 생성할 수 있다. 유사하게, 도 4b의 우측에 도시된 바와 같이, 소스 인스턴스 생성부(340)는 지식 도메인의 식별자들 "wdata: 43"을 포함하는 변환된 소스 데이터의 예시(44)로부터 지식 베이스(140)에 포함된 지식 개체의 식별자 "addr: 0102"를 포함하는 소스 인스턴스(45)를 생성할 수 있다. 일부 실시예들에서, 도 4b에 도시된 바와 같이, 소스 인스턴스 생성부(340)는 소스 데이터의 예시(44)의 값 "1545-04-28"을 그대로 포함하는 소스 인스턴스(45)를 생성할 수 있다.The source instance generation unit 340 may generate the source instances 132 by changing the identifier of the knowledge entity included in the converted source data into an identifier provided by the knowledge entity selection unit 320. For example, as shown on the right side of FIG. 4A, the source instance generator 340 is an example of converted source data including the identifiers “wdata: 43” and “wdata: 30” of the knowledge domain (42). From, the source instance 43 including the identifiers "addr: 0102" and "addr: 0001" of the knowledge entity included in the knowledge base 140 may be created. Similarly, as shown on the right side of FIG. 4B, the source instance generating unit 340 is from the example 44 of the converted source data including the identifiers “wdata: 43” of the knowledge domain to the knowledge base 140. The source instance 45 including the identifier "addr: 0102" of the included knowledge entity may be created. In some embodiments, as shown in FIG. 4B, the source instance generation unit 340 may generate the source instance 45 including the value “1545-04-28” of the example 44 of the source data as it is. I can.

도 5는 본 발명의 예시적 실시예에 따라 지식 개체 선정부의 동작의 예시를 나타내는 순서도이다. 도 3을 참조하여 전술된 바와 같이, 도 3의 지식 개체 선정부(320)는 변환된 소스 데이터에 포함된 지식 개체의 식별자를 맵핑 테이블(134)에서 검색할 수 있고, 검색 결과에 기초하여 지식 베이스(140)로부터 적어도 하나의 지식 개체를 추출할 수 있다. 도 5에 도시된 바와 같이, 지식 개체 선정부(320)의 동작은 복수의 단계들(S51 내지 S59)을 포함할 수 있다. 이하에서, 도 5는 도 3, 도 4a 및 도 4b를 참조하여 설명될 것이다.5 is a flowchart illustrating an example of an operation of a knowledge entity selection unit according to an exemplary embodiment of the present invention. As described above with reference to FIG. 3, the knowledge entity selection unit 320 of FIG. 3 may search the mapping table 134 for the identifier of the knowledge entity included in the converted source data, and based on the search result, At least one knowledge entity may be extracted from the base 140. As shown in FIG. 5, the operation of the knowledge entity selection unit 320 may include a plurality of steps S51 to S59. In the following, FIG. 5 will be described with reference to FIGS. 3, 4A and 4B.

단계 S51에서, 맵핑 테이블(134)에서 소스 데이터의 제1 식별자를 검색하는 동작이 수행될 수 있다. 예를 들면, 지식 개체 선정부(320)는 변환된 소스 데이터에 포함된 지식 개체의 제1 식별자(예컨대, 도 4a 및 도 4b의 "wdata: 43", "wdata: 30")를 맵핑 테이블(134)에서 검색할 수 있다.In step S51, an operation of searching for the first identifier of the source data in the mapping table 134 may be performed. For example, the knowledge entity selection unit 320 may convert the first identifier of the knowledge entity included in the converted source data (eg, "wdata: 43" and "wdata: 30" in FIGS. 4A and 4B) into a mapping table ( 134).

단계 S52에서, 제1 식별자 검색의 성공 여부가 판정될 수 있다. 예를 들면, 지식 개체 선정부(320)는 단계 S51의 제1 식별자가 맵핑 테이블(134)에 존재하는 경우 검색 성공을 판정할 수 있는 한편, 그렇지 아니한 경우, 검색 실패를 판정할 수 있다. 도 5에 도시된 바와 같이, 검색 성공이 판정된 경우 단계 S59가 후속하여 수행될 수 있는 한편, 검색 실패가 판정된 경우 단계 S53이 후속하여 수행될 수 있다.In step S52, it may be determined whether or not the first identifier search is successful. For example, the knowledge entity selection unit 320 may determine the success of the search when the first identifier of step S51 is present in the mapping table 134, while otherwise, may determine the search failure. As shown in Fig. 5, step S59 may be subsequently performed when search success is determined, while step S53 may be performed subsequently when search failure is determined.

제1 식별자의 검색 실패가 판정된 경우, 단계 S53에서 지식 베이스(140)에서 후보 지식 개체를 검색하는 동작이 수행될 수 있다. 예를 들면, 지식 개체 선정부(320)는 제1 식별자를 가지는 지식 개체에 대응하는, 지식 베이스(140)에 포함된 지식 개체를 판정하기 위하여 지식 베이스(140)에서 적어도 하나의 후보 지식 개체를 검색할 수 있다. 일부 실시예들에서, 본원과 동일한 출원인에 의해서 출원되고 본 명세서에 전체로서 참조되어 포함되는 한국특허출원 제10-2018-0151222호에서 설명된 "유사도 계산부"와 같이, 지식 개체 선정부(320)는 소스 인스턴스와 유사한 지식 인스턴스들을 지식 베이스(140)에서 검색할 수 있고, 검색된 지식 인스턴스들의 유사도들에 기초하여 적어도 하나의 후보 지식 개체를 추출할 수 있다.When it is determined that the search for the first identifier has failed, an operation of searching for a candidate knowledge entity in the knowledge base 140 may be performed in step S53. For example, the knowledge entity selection unit 320 selects at least one candidate knowledge entity from the knowledge base 140 to determine a knowledge entity included in the knowledge base 140 corresponding to the knowledge entity having the first identifier. You can search. In some embodiments, the knowledge entity selection unit 320, such as the “similarity calculation unit” described in Korean Patent Application No. 10-2018-0151222, filed by the same applicant as the present application and incorporated herein by reference in its entirety. ) May search for knowledge instances similar to the source instance in the knowledge base 140, and extract at least one candidate knowledge entity based on similarities of the searched knowledge instances.

단계 S54에서, 후보 지식 개체 검색의 성공 여부가 판정될 수 있다. 예를 들면, 지식 개체 선정부(320)는 단계 S53에서 미리 정의된 문턱값 이상의 유사도를 가지는 지식 개체가 지식 베이스(140)에서 검색되지 아니한 경우 검색 실패를 판정할 수 있는 한편, 단계 S53에서 미리 정의된 문턱값 이상의 유사도를 가지는 적어도 하나의 지식 개체가 지식 베이스(140)에서 검색된 겨우 검색 성공을 판정할 수 있다. 도 5에 도시된 바와 같이, 검색 성공이 판정된 경우 단계 S57이 후속하여 수행될 수 있는 한편, 검색 실패가 판정된 경우 단계 S55가 후속하여 수행될 수 있다.In step S54, it may be determined whether the search for the candidate knowledge entity is successful. For example, the knowledge entity selection unit 320 may determine a search failure when a knowledge entity having a similarity greater than or equal to a predefined threshold value in step S53 is not searched in the knowledge base 140. When at least one knowledge entity having a degree of similarity equal to or greater than the defined threshold is searched in the knowledge base 140, it may be determined that the search is successful. As shown in Fig. 5, step S57 may be subsequently performed when search success is determined, while step S55 may be performed subsequently when search failure is determined.

후보 지식 개체의 검색 실패가 판정된 경우, 단계 S55에서 신규 식별자를 생성하는 동작이 수행될 수 있다. 예를 들면, 지식 개체 선정부(320)는 제1 식별자에 대응하는 지식 개체와 유사한 지식 개체가 지식 베이스(140)에서 검색되지 아니하는 경우, 제1 식별자에 대응하는 지식 개체는 지식 베이스(140)에 없는 신규 지식 개체로 판정할 수 있다. 이에 따라, 지식 개체 선정부(320)는 지식 베이스(140)의 식별자 체계를 참조하여 제1 식별자에 대응하는 신규 식별자를 생성할 수 있다.When it is determined that the search failure of the candidate knowledge entity is determined, an operation of generating a new identifier may be performed in step S55. For example, if the knowledge entity similar to the knowledge entity corresponding to the first identifier is not searched in the knowledge base 140, the knowledge entity selection unit 320 may determine the knowledge entity corresponding to the first identifier. It can be judged as a new knowledge entity that is not in ). Accordingly, the knowledge entity selection unit 320 may generate a new identifier corresponding to the first identifier by referring to the identifier system of the knowledge base 140.

다른 한편으로, 후보 지식 개체의 검색 성공이 판정된 경우, 단계 S57에서 단독 후보 지식 개체가 검색되었는지 여부가 판정될 수 있다. 예를 들면, 지식 개체 선정부(320)는 미리 정의된 문턱값 이상의 유사도를 가지는 지식 개체가 하나만 검색되었는지 여부를 판정할 수 있다. 일부 실시예들에서, 지식 개체 선정부(320)는 미리 정의된 문턱값 이상의 유사도를 가지는 2이상의 후보 지식 개체들이 검색된 경우, 최고 유사도를 가지는 후보 지식 개체 및 2번째 큰 유사도를 가지는 후보 지식 개체 사이 유사도 차를 미리 정의된 기준치이상인 경우, 최고 유사도를 가지는 후보 지식 개체가 단독으로 검색된 것으로 판정할 수도 있다. 도 5에 도시된 바와 같이, 단독 후보 지식 개체가 검색된 경우 단계 S56이 후속하여 수행될 수 있는 한편, 2이상의 후보 지식 개체들이 검색된 경우, 단계 S58이 후속하여 수행될 수 있다.On the other hand, when it is determined that the search for the candidate knowledge entity is successful, it may be determined whether or not a single candidate knowledge entity has been searched in step S57. For example, the knowledge entity selection unit 320 may determine whether only one knowledge entity having a similarity greater than or equal to a predefined threshold value has been searched. In some embodiments, when two or more candidate knowledge entities having a similarity greater than or equal to a predefined threshold value are searched, the knowledge entity selection unit 320 may be configured between a candidate knowledge entity having the highest similarity and a candidate knowledge entity having the second greatest similarity. When the similarity difference is greater than or equal to a predefined reference value, it may be determined that the candidate knowledge entity having the highest similarity has been independently searched. As shown in FIG. 5, when a single candidate knowledge entity is searched, step S56 may be subsequently performed, while when two or more candidate knowledge entities are searched, step S58 may be performed subsequently.

2이상의 후보 지식 개체들이 검색된 경우, 단계 S58에서 후보 지식 개체들을 저장하는 동작이 수행될 수 있다. 예를 들면, 지식 개체 선정부(320)는 메타데이터 저장소(130)에 후보 지식 개체들을 저장할 수 있고, 이에 따라 도 2를 참조하여 전술된 바와 같이, 메타데이터 저장소(130)는 후보 지식 개체들(136)을 포함할 수 있다. 제1 식별자의 지식 개체에 대응하는 지식 베이스(140)의 지식 개체를 유사도를 통해서 자동으로 판정하기 어려운 경우, 후보 지식 개체들(136)이 메타데이터 저장소(130)에 저장될 수 있고, 메타데이터 저장소(130)에 저장된 후보 지식 개체들(136)은 도 1의 데이터 관리부(118) 및 작업 선택부(112)를 통해서 관리자(10)에 제공될 수 있다. 관리자(10)는 큐레이션을 통해서 후보 지식 개체들(136) 중 제1 식별자의 지식 개체에 대응하는 지식 개체를 선정할 수 있고, 선정된 지식 개체에 따라 데이터 관리부(118)는 맵핑 테이블(134)을 갱신할 수 있다. 도 5에 도시된 바와 같이, 단계 S58에 후속하여 단계 S51이 다시 수행될 수 있고, 이에 따라 변환된 소스 데이터에 포함된 다른 지식 개체의 식별자가 제1 식별자로서 맵핑 테이블(134)에서 검색될 수 있다.When two or more candidate knowledge entities are searched, an operation of storing the candidate knowledge entities may be performed in step S58. For example, the knowledge entity selection unit 320 may store candidate knowledge entities in the metadata storage 130, and accordingly, as described above with reference to FIG. 2, the metadata storage 130 includes candidate knowledge entities. (136) may be included. When it is difficult to automatically determine the knowledge entity of the knowledge base 140 corresponding to the knowledge entity of the first identifier through the similarity, the candidate knowledge entities 136 may be stored in the metadata storage 130, and the metadata Candidate knowledge entities 136 stored in the storage 130 may be provided to the manager 10 through the data management unit 118 and the job selection unit 112 of FIG. 1. The manager 10 may select a knowledge entity corresponding to the knowledge entity of the first identifier among the candidate knowledge entities 136 through curation, and the data management unit 118 may select a mapping table 134 according to the selected knowledge entity. ) Can be updated. As shown in FIG. 5, step S51 may be performed again following step S58, and accordingly, the identifier of another knowledge entity included in the converted source data may be retrieved from the mapping table 134 as the first identifier. have.

단계 S56에서, 맵핑 테이블(134)을 갱신하는 동작이 수행될 수 있다. 예를 들면, 지식 개체 선정부(320)는, 단계 S55에서 생성된 신규 식별자 또는 단계 S53에서 검색된 단독 후보 지식 개체의 식별자를, 제2 식별자로서 제1 식별자와 맵핑하여 맵핑 테이블(134)에 추가할 수 있다. 이에 따라, 갱신된 맵핑 테이블(134)은 제1 식별자 및 제1 식별자에 대응하는 제2 식별자를 포함할 수 있다.In step S56, an operation of updating the mapping table 134 may be performed. For example, the knowledge entity selection unit 320 maps the new identifier generated in step S55 or the identifier of the single candidate knowledge entity searched in step S53 with the first identifier as the second identifier to add to the mapping table 134 can do. Accordingly, the updated mapping table 134 may include a first identifier and a second identifier corresponding to the first identifier.

단계 S59에서, 제1 식별자에 대응하는 제2 식별자를 획득하는 동작이 수행될 수 있다. 예를 들면, 지식 개체 선정부(320)는, 단계 S52에서 제1 식별자의 검색이 성공하거나(예컨대, 도 4a 및 도 4b의 예시들), 단계 S56에서 맵핑 테이블(134)이 갱신된 후, 맵핑 테이블(134)로부터 제1 식별자에 대응하는 제2 식별자를 획득할 수 있다. 지식 개체 선정부(320)는 획득된 제2 식별자를 소스 인스턴스 생성부(340)에 제공할 수 있다.In step S59, an operation of obtaining a second identifier corresponding to the first identifier may be performed. For example, the knowledge entity selection unit 320 succeeds in searching for the first identifier in step S52 (for example, in the examples of FIGS. 4A and 4B), or after the mapping table 134 is updated in step S56, A second identifier corresponding to the first identifier may be obtained from the mapping table 134. The knowledge entity selection unit 320 may provide the acquired second identifier to the source instance generation unit 340.

도 6은 본 발명의 예시적 실시예에 따른 지식 적용부의 예시를 나타내는 블록도이다. 도 2를 참조하여 전술된 바와 같이, 지식 적용부(600)는 지식 통합부(예컨대, 도 2의 220)에 의해서 생성된 소스 인스턴스들(132)을 정제할 수 있고, 정제된 소스 인스턴스들을 지식 베이스(140)에 통합할 수 있다. 도 6에 도시된 바와 같이, 지식 적용부(600)는 소스 인스턴스 후처리부(620) 및 인스턴스 비교부(640)를 포함할 수 있고, 이하에서 도 6은 도 2를 참조하여 설명될 것이다.6 is a block diagram showing an example of a knowledge application unit according to an exemplary embodiment of the present invention. As described above with reference to FIG. 2, the knowledge application unit 600 may refine the source instances 132 generated by the knowledge integration unit (eg, 220 of FIG. 2 ), and the refined source instances are knowledgeable. It can be incorporated into the base 140. As shown in FIG. 6, the knowledge application unit 600 may include a source instance post-processing unit 620 and an instance comparison unit 640, and FIG. 6 will be described below with reference to FIG. 2.

소스 인스턴스 후처리부(620)는 지식 베이스(140)에 포함된 지식 인스턴스들의 형식에 기초하여 소스 인스턴스들(132)을 후처리할 수 있다. 예를 들면, 도 4b의 예시에서, "이순신 / 생일 / 1545-04-28"에 대응하는 소스 인스턴스(45)에서, 트리플의 목적어로서 시간 정보를 나타내는 "1545-04-28"은 소스 도메인에서 날짜를 나타내기 위한 형식 "YYYY-MM-DD"에 부합할 수 있다. 소스 인스턴스 후처리부(620)는 "1545-04-28"을 후처리함으로써 시간 정보를 구체화할 수 있다. 예를 들면, 소스 인스턴스 후처리부(620)는 "1545-04-28"로부터 "tm:1545-04-28", "tm:1545-04-28 year 1545", "tm:1545-04-28 month 04", "tm:1545-04-28 day 28"을 생성할 수 있다. 비제한적인 예시로서, 지식 베이스(140)가 지식 인스턴스에서 날짜를 나타내기 위한 형식으로서 "MON DD, YYYY"를 가지는 경우, 소스 인스턴스 후처리부(620)는, 구체화된 시간 정보에 기초하여 "1545-04-28"를 "APR 28, 1545"로서 변경함으로써 소스 인스턴스(45)를 후처리할 수 있다. 일부 실시예들에서, 소스 인스턴스 후처리부(620)는, 예컨대 전술된 예시와 같이 날짜 형식을 변환하기 위한 후처리 모듈을 포함하는 복수의 후처리 모듈들을 참조함으로써 소스 인스턴스들(132)을 후처리할 수 있다. 본 명세서에서, 후처리된 소스 인스턴스들은 정제된 소스 인스턴스들로서 지칭될 수도 있다.The source instance post-processing unit 620 may post-process the source instances 132 based on the types of knowledge instances included in the knowledge base 140. For example, in the example of FIG. 4B, in the source instance 45 corresponding to "Yi Soon Shin / birthday / 1545-04-28", "1545-04-28" representing time information as the object of the triple is in the source domain. It may conform to the format "YYYY-MM-DD" for representing the date. The source instance post-processing unit 620 may specify time information by post-processing “1545-04-28”. For example, the source instance post-processing unit 620 from "1545-04-28" to "tm:1545-04-28", "tm:1545-04-28 year 1545", "tm:1545-04-28" month 04", "tm:1545-04-28 day 28" can be created. As a non-limiting example, when the knowledge base 140 has "MON DD, YYYY" as a format for representing the date in the knowledge instance, the source instance post-processing unit 620 may perform "1545" based on the specified time information. The source instance 45 can be post-processed by changing "-04-28" to "APR 28, 1545". In some embodiments, the source instance post-processing unit 620 post-processes the source instances 132 by referring to a plurality of post-processing modules including a post-processing module for converting the date format, for example, as in the above-described example. can do. In this specification, post-processed source instances may be referred to as refined source instances.

인스턴스 비교부(640)는 소스 인스턴스 후처리부(620)로부터 후처리된(혹은 정제된) 소스 인스턴스를 수신할 수 있고, 후처리된 소스 인스턴스를 지식 베이스(140)에 포함된 지식 인스턴스들과 비교할 수 있다. 인스턴스 비교부(640)는 비교 결과에 따라 후처리된 소스 인스턴스에 대응하는 지식을 지식 베이스(140)에 추가할지 판정할 수 있고, 후처리된 소스 인스턴스를 선택적으로 지식 베이스(140)에 통합할 수 있다. 일부 실시예들에서, 한국특허출원 제10-2018-0151222호에서 설명된 "유사도 계산부"와 같이, 인스턴스 비교부(640)는 후처리된 소스 인스턴스와 유사한 지식 인스턴스들을 지식 베이스(140)에서 검색할 수 있고, 검색된 지식 인스턴스들의 유사도들에 기초하여 후처리된 소스 인스턴스의 추가 여부를 판정할 수 있다. 예를 들면, 인스턴스 비교부(640)는, 미리 정의된 문턱값 이상의 유사도를 가지는 지식 인스턴스가 지식 베이스(140)에서 검색된 경우 후처리된 소스 인스턴스를 무시할 수 있는 한편, 문턱값 미만의 유사도를 가지는 지식 인스턴스들만이 지식 베이스(140)에서 검색된 경우 후처리된 소스 인스턴스를 지식 베이스(140)에 추가할 수 있다. The instance comparison unit 640 may receive a post-processed (or refined) source instance from the source instance post-processing unit 620, and compare the post-processed source instance with knowledge instances included in the knowledge base 140. I can. The instance comparison unit 640 may determine whether to add the knowledge corresponding to the post-processed source instance to the knowledge base 140 according to the comparison result, and selectively integrate the post-processed source instance into the knowledge base 140. I can. In some embodiments, like the "similarity calculation unit" described in Korean Patent Application No. 10-2018-0151222, the instance comparison unit 640 stores knowledge instances similar to the post-processed source instances in the knowledge base 140. It can be searched, and it can be determined whether to add a post-processed source instance based on similarities of the searched knowledge instances. For example, the instance comparison unit 640 may ignore the post-processed source instance when a knowledge instance having a similarity greater than or equal to a predefined threshold value is searched in the knowledge base 140, while having a similarity less than the threshold value. When only knowledge instances are retrieved from the knowledge base 140, the post-processed source instance may be added to the knowledge base 140.

도 7은 본 발명의 예시적 실시예에 따른 데이터 변환 시스템의 예시를 나타내는 블록도이다. 도 1을 참조하여 전술된 바와 같이, 분산 시스템은 복수의 데이터 변환 시스템들을 포함할 수 있고, 복수의 데이터 변환 시스템들 중 하나로서 도 7의 데이터 변환 시스템(700)은 도 1의 데이터 통합 시스템(110')으로부터 제공되는 지시에 기초하여 소스 데이터, 즉 지식 도메인(150)에 포함된 지식 데이터를 변환할 수 있다. 도 7에 도시된 바와 같이, 데이터 변환 시스템(700)은 변환 엔진 관리부(720), 복수의 변환 모듈들(740), 변환 데이터 저장소(760) 및 복수의 변환 엔진들(780)을 포함할 수 있다.Fig. 7 is a block diagram showing an example of a data conversion system according to an exemplary embodiment of the present invention. As described above with reference to FIG. 1, the distributed system may include a plurality of data conversion systems, and the data conversion system 700 of FIG. 7 as one of the plurality of data conversion systems is the data integration system of FIG. 110'), source data, that is, knowledge data included in the knowledge domain 150 may be converted. As shown in FIG. 7, the data conversion system 700 may include a conversion engine management unit 720, a plurality of conversion modules 740, a conversion data storage 760, and a plurality of conversion engines 780. have.

복수의 변환 모듈들(740) 각각은 지식 도메인(150)에 따라 혹은 지식 도메인(150)에 포함된 지식 데이터의 그룹에 따라 요구되는 변환 규칙을 정의할 수 있다. 일부 실시예들에서, 데이터 변환 시스템(700)은 특정 지식 도메인에 제한되지 아니하고서 변환 동작을 수행할 수 있고, 이를 위하여 복수의 지식 도메인들에 대응하는 복수의 변환 모듈들(740)을 포함할 수 있다. 또한, 일부 실시예들에서, 데이터 변환 시스템(700)은 단일 지식 도메인에 포함된 지식 데이터를 변환하는 것으로 제한될 수 있고, 단일 지식 도메인에 포함되는 지식 데이터의 그룹들에 각각 대응하는 복수의 변환 모듈들(740)을 포함할 수도 있다. 일부 실시예들에서, 복수의 변환 모듈들(740)은 데이터 통합 시스템(110')으로부터 제공될 수 있고, 변환 엔진 관리부(720)에 의해서 데이터 변환 시스템(700) 내부에 저장될 수도 있다.Each of the plurality of conversion modules 740 may define a required conversion rule according to the knowledge domain 150 or according to a group of knowledge data included in the knowledge domain 150. In some embodiments, the data conversion system 700 may perform a conversion operation without being limited to a specific knowledge domain, and for this purpose, the data conversion system 700 may include a plurality of conversion modules 740 corresponding to a plurality of knowledge domains. I can. In addition, in some embodiments, the data conversion system 700 may be limited to converting knowledge data included in a single knowledge domain, and a plurality of conversions respectively corresponding to groups of knowledge data included in a single knowledge domain. Modules 740 may also be included. In some embodiments, the plurality of conversion modules 740 may be provided from the data integration system 110 ′, and may be stored in the data conversion system 700 by the conversion engine management unit 720.

변환 엔진 관리부(720)는 데이터 통합 시스템(110')으로부터 지시를 수신할 수 있고, 지시에 포함된 정보에 기초하여 데이터 변환 동작을 제어할 수 있다. 일부 실시예들에서, 데이터 통합 시스템(110')으로부터의 지시는 지식 도메인(150)에 대한 식별 정보(예컨대, URL)를 포함할 수 있다. 변환 엔진 관리부(720)는 지시에 포함된 지식 도메인(150)의 식별 정보에 기초하여 복수의 변환 모듈들(740) 중 적어도 하나의 변환 모듈을 선택할 수 있고, 선택된 변환 모듈을 복수의 변환 엔진들(780) 중 적어도 하나에 제공하고, 변환 동작을 트리거할 수 있다. 또한, 일부 실시예들에서, 데이터 통합 시스템(110')으로부터의 지시는 지식 도메인(150)에 포함된 소스 데이터 중 변환의 대상이 되는 소스 데이터의 범위를 나타내는 소스 데이터의 식별자들을 포함할 수도 있다. 변환 엔진 관리부(720)는 소스 데이터의 식별자들에 기초하여 소스 데이터가 포함되는 소스 데이터의 그룹을 인식할 수 있고, 인식된 그룹에 따라 복수의 변환 모듈들(740) 중 적어도 하나의 변환 모듈을 선택할 수 있고, 선택된 변환 모듈을 복수의 변환 엔진들(780) 중 적어도 하나에 제공하고, 변환 동작을 트리거할 수 있다. 변환 엔진 관리부(720)는 변환 데이터 저장소(760)에 저장된 변환된 소스 데이터를 데이터 통합 시스템(110')에 제공할 수 있다. 또한, 변환 엔진 관리부(720)는 복수의 변환 엔진들(780)의 작업들을 스케줄링할 수도 있다.The conversion engine management unit 720 may receive an instruction from the data integration system 110 ′ and control a data conversion operation based on information included in the instruction. In some embodiments, instructions from data integration system 110 ′ may include identification information (eg, URL) for knowledge domain 150. The conversion engine management unit 720 may select at least one conversion module from among the plurality of conversion modules 740 based on the identification information of the knowledge domain 150 included in the instruction, and select the selected conversion module as a plurality of conversion engines. It may be provided to at least one of 780 and trigger a conversion operation. In addition, in some embodiments, the instruction from the data integration system 110 ′ may include identifiers of source data indicating a range of source data to be converted among source data included in the knowledge domain 150. . The conversion engine management unit 720 may recognize a group of source data including the source data based on the identifiers of the source data, and may perform at least one of the plurality of conversion modules 740 according to the recognized group. It may be selected, and the selected conversion module may be provided to at least one of the plurality of conversion engines 780, and a conversion operation may be triggered. The conversion engine management unit 720 may provide the converted source data stored in the conversion data storage 760 to the data integration system 110 ′. In addition, the conversion engine management unit 720 may schedule jobs of the plurality of conversion engines 780.

복수의 변환 엔진들(780) 각각은 지식 도메인(150)에 액세스할 수 있고, 변환 엔진 관리부(720)로부터 제공된 변환 모듈에 기초하여 지식 도메인(150)에 포함된 소스 데이터를 변환할 수 있다. 복수의 변환 엔진들(780)은 병렬적으로 변환을 수행할 수 있고, 이를 위하여 변환 엔진 관리부(720)는 상이한 소스 데이터가 상이한 변환 엔진들에 의해서 각각 변환되도록 복수의 변환 엔진들(780)을 제어할 수 있다. 일부 실시예들에서, 지식 도메인(150)은 원격의 시스템(예컨대 서버)일 수 있고, 복수의 변환 엔진들(780) 각각은 변환 엔진 관리부(720)로부터 제공되는 식별자 리스트에 기초하여 네트워크를 통해서 지식 도메인(150)에 액세스함으로써 소스 데이터를 수집하여 변환할 수 있다. 또한, 일부 실시예들에서, 지식 도메인(150)은 원격의 시스템으로부터 수집된 소스 데이터를 저장하는 데이터 변환 시스템(700)의 로컬 저장소일 수 있고, 변환 엔진 관리부(720)는 로컬 저장소의 영역들을 복수의 변환 엔진들(780)에 각각 지정해줄 수 있다. 복수의 변환 엔진들(780)은 변환된 소스 데이터를 변환 데이터 저장소(760)에 저장할 수 있다.Each of the plurality of conversion engines 780 may access the knowledge domain 150 and may convert source data included in the knowledge domain 150 based on the conversion module provided from the conversion engine management unit 720. The plurality of conversion engines 780 may perform conversion in parallel, and for this purpose, the conversion engine management unit 720 uses a plurality of conversion engines 780 so that different source data are converted respectively by different conversion engines. Can be controlled. In some embodiments, the knowledge domain 150 may be a remote system (eg, a server), and each of the plurality of conversion engines 780 is based on an identifier list provided from the conversion engine management unit 720 through a network. By accessing the knowledge domain 150, source data can be collected and transformed. In addition, in some embodiments, the knowledge domain 150 may be a local storage of the data conversion system 700 that stores source data collected from a remote system, and the conversion engine management unit 720 includes areas of the local storage. Each of the plurality of conversion engines 780 may be designated. The plurality of conversion engines 780 may store the converted source data in the conversion data storage 760.

이상에서와 같이 도면과 명세서에서 예시적인 실시예들이 개시되었다. 본 명세서에서 특정한 용어를 사용하여 실시예들을 설명되었으나, 이는 단지 본 발명의 기술적 사상을 설명하기 위한 목적에서 사용된 것이지 의미 한정이나 특허청구범위에 기재된 본 발명의 범위를 제한하기 위하여 사용된 것은 아니다. 그러므로 본 기술분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다.As described above, exemplary embodiments have been disclosed in the drawings and specification. In the present specification, embodiments have been described using specific terms, but these are only used for the purpose of describing the technical idea of the present invention, and not for limiting the meaning or limiting the scope of the present invention described in the claims. . Therefore, those of ordinary skill in the art will understand that various modifications and equivalent other embodiments are possible therefrom. Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the appended claims.

Claims

A data integration system configured to access a plurality of remote data conversion systems for a distributed system for knowledge integration,
A job selection unit configured to receive a request from an administrator and decode the request;
A conversion management unit configured to instruct at least one data conversion system to convert source data in response to a conversion request from the manager, and to receive the converted source data from the at least one data conversion system; And
The converted source data is configured to access a metadata repository storing metadata used for knowledge integration, and in response to an integration request from the administrator, based on a mapping table between knowledge domains stored in the metadata repository Data integration system comprising a data integration unit configured to integrate the knowledge base.

The method according to claim 1,
The conversion request includes identification information of a knowledge domain including the source data and at least one identifier of source data included in the knowledge domain.

The method according to claim 1,
The data integration unit,
A knowledge integration unit configured to generate a source instance based on the mapping table and at least one first identifier of at least one knowledge entity included in the converted source data; And
And a knowledge application unit configured to refine the source instance and integrate the refined source instance into the knowledge base.

The method of claim 3,
The knowledge integration unit,
A knowledge entity selection unit configured to search for the at least one first identifier in the mapping table and extract at least one knowledge entity from the knowledge base based on a search result; And
And a source instance generator configured to generate a source instance from the converted source data by changing the at least one first identifier into at least one second identifier of the extracted at least one knowledge entity. system.

The method of claim 4,
The knowledge entity selection unit is configured to extract at least one candidate knowledge entity from the knowledge base based on the converted source data when the at least one first identifier is not searched in the mapping table. Integrated system.

The method of claim 3,
The knowledge application unit,
A source instance post-processing unit configured to post-process the source instance based on the type of knowledge instances included in the knowledge base; And
And an instance comparison unit configured to selectively integrate the post-processed source instance into the knowledge base by comparing the post-processed source instance with knowledge instances included in the knowledge base.

The method according to claim 1,
The data integration system,
Further comprising a data management unit configured to provide data generated by the conversion management unit and the data integration unit to the manager in response to an inquiry request from the manager, and to update the metadata in response to an update request from the manager Data integration system, characterized in that.

As a distributed system for knowledge integration,
A plurality of data conversion systems; And
Based on the mapping table between knowledge domains, comprising a data integration system configured to integrate the converted source data provided from the remote plurality of data conversion systems into the knowledge base,
Each of the plurality of data conversion systems,
And a plurality of conversion engines configured to convert source data included in the knowledge domain based on the structure of knowledge instances included in the knowledge base in response to an instruction provided from the data integration system. Distributed system.

The method of claim 8,
Each of the plurality of data conversion systems,
And a conversion engine management unit configured to select one of a plurality of conversion modules based on the instruction and provide the selected conversion module to at least one of the plurality of conversion engines. Distributed system.

The method of claim 9,
The above instructions are:
Distributed system for knowledge integration comprising at least one of identification information of the knowledge domain, at least one identifier of source data, and an index of a conversion module.

The method of claim 8,
Each of the plurality of data conversion systems includes a conversion data store configured to store the converted source data,
Each of the plurality of transformation engines is configured to collect source data from the knowledge domain and store the transformed source data in the transformation data storage.