KR102279109B1

KR102279109B1 - Distributed system and method for integrating knowledge

Info

Publication number: KR102279109B1
Application number: KR1020190123975A
Authority: KR
Inventors: 김광민; 양승원
Original assignee: 주식회사 솔트룩스
Priority date: 2019-10-07
Filing date: 2019-10-07
Publication date: 2021-07-20
Also published as: WO2021070998A1; KR20210041365A

Abstract

지식 통합을 위한 분산 시스템을 위하여 원격의 복수의 데이터 변환 시스템들에 액세스하는 데이터 통합 시스템은, 본 발명의 예시적 실시예에 따라, 관리자로부터 요청을 수신하고, 요청을 디코딩하는 작업 선택부, 관리자의 변환 요청에 응답하여, 적어도 하나의 데이터 변환 시스템에 소스 데이터의 변환 작업을 지시하고, 적어도 하나의 데이터 변환 시스템으로부터 변환된 소스 데이터를 수신하는 변환 관리부, 및 지식 통합에 사용되는 메타데이터(metadata)를 저장하는 메타데이터 저장소에 액세스하도록 구성되고, 관리자의 통합 요청에 응답하여, 메타데이터 저장소에 저장된 지식 도메인간 맵핑 테이블에 기초하여 변환된 소스 데이터를 지식 베이스에 통합하는 데이터 통합부를 포함할 수 있다.A data integration system for accessing a plurality of remote data transformation systems for a distributed system for knowledge integration includes, according to an exemplary embodiment of the present invention, a job selector configured to receive a request from an administrator and decode the request, the administrator In response to the conversion request, the conversion management unit instructs at least one data conversion system to convert the source data, and receives the converted source data from the at least one data conversion system, and metadata used for knowledge integration. ), and in response to an administrator's request for integration, a data integration unit for integrating the transformed source data into the knowledge base based on a mapping table between knowledge domains stored in the metadata repository. have.

Description

DISTRIBUTED SYSTEM AND METHOD FOR INTEGRATING KNOWLEDGE

본 발명의 기술적 사상은 지식 데이터에 관한 것으로서, 자세하게는 산재된 지식 데이터의 통합을 위한 분산 시스템 및 방법에 관한 것이다.The technical idea of the present invention relates to knowledge data, and more particularly, to a distributed system and method for integrating scattered knowledge data.

본 발명은 미래창조과학부 SW컴퓨팅원천기술개발사업(SW)의 일환으로 (주)솔트룩스가 주관하고 연구하여 수행된 연구로부터 도출된 것이다. [연구기간: 2019.01.01~2019.12.31, 연구관리 전문기관: 정보통신기술진흥센터, 연구과제명: WiseKB: 빅데이터 이해 기반 자가학습형 지식베이스 및 추론 기술 개발, 과제 고유번호: 2013-0-00109]The present invention is derived from research conducted and conducted by Saltlux Co., Ltd. as part of the SW computing source technology development project (SW) of the Ministry of Science, ICT and Future Planning. [Research period: 2019.01.01~2019.12.31, Research management institution: Information and Communication Technology Promotion Center, Research project name: WiseKB: Development of self-learning knowledge base and reasoning technology based on big data understanding, project identification number: 2013-0 -00109]

지식 데이터는 다양한 지식 도메인들에 산재되어 있을 수 있다. 예를 들면, 지식 데이터는 분야에 따라 상이한 지식 도메인들에 각각 포함될 수도 있고, 지식 데이터의 서비스 방식에 따라 상이한 지식 도메인들에 각각 포함될 수도 있다. 또한, 지식 도메인들은 상이한 방식으로 지식 데이터를 저장하고 관리할 수 있다. 이와 같이 산재된 지식 데이터는, 지식 데이터에 대한 사용자의 접근을 용이하지 아니하게 할 수 있고, 중복된 지식 데이터가 복수의 지식 도메인들에 포함될 수도 있다. 이에 따라, 지식 데이터의 활용성 및 효율성을 향상시키기 위하여, 산재된 지식 데이터의 통합, 즉 지식 통합을 위한 방법이 요구될 수 있다.The knowledge data may be interspersed in various knowledge domains. For example, knowledge data may be respectively included in different knowledge domains according to fields, or may be included in different knowledge domains, respectively, according to a service method of the knowledge data. Also, knowledge domains may store and manage knowledge data in different ways. Such scattered knowledge data may make it difficult for a user to access the knowledge data, and duplicate knowledge data may be included in a plurality of knowledge domains. Accordingly, in order to improve the utilization and efficiency of knowledge data, a method for integrating scattered knowledge data, ie, knowledge integration, may be required.

본 발명의 기술적 사상은, 복수의 데이터 변환 시스템들을 사용하는 지식 통합을 위한 분산 시스템 및 방법을 제공한다.The technical idea of the present invention provides a distributed system and method for knowledge integration using a plurality of data transformation systems.

상기와 같은 목적을 달성하기 위하여, 본 발명의 기술적 사상의 일측면에 따라, 지식 통합을 위한 분산 시스템을 위하여 원격의 복수의 데이터 변환 시스템들에 액세스하는 데이터 통합 시스템은, 관리자로부터 요청을 수신하고, 요청을 디코딩하는 작업 선택부, 관리자의 변환 요청에 응답하여, 적어도 하나의 데이터 변환 시스템에 소스 데이터의 변환 작업을 지시하고, 적어도 하나의 데이터 변환 시스템으로부터 변환된 소스 데이터를 수신하는 변환 관리부, 및 지식 통합에 사용되는 메타데이터(metadata)를 저장하는 메타데이터 저장소에 액세스하도록 구성되고, 관리자의 통합 요청에 응답하여, 메타데이터 저장소에 저장된 지식 도메인간 맵핑 테이블에 기초하여 변환된 소스 데이터를 지식 베이스에 통합하는 데이터 통합부를 포함할 수 있다.In order to achieve the above object, according to an aspect of the technical idea of the present invention, a data integration system for accessing a plurality of remote data transformation systems for a distributed system for knowledge integration receives a request from an administrator and , a job selection unit for decoding the request, in response to the conversion request of the manager, instructing the conversion operation of the source data to at least one data conversion system, the conversion management unit for receiving the converted source data from the at least one data conversion system, and accessing a metadata repository that stores metadata used for knowledge integration, and in response to an integration request of an administrator, converts source data transformed based on a mapping table between knowledge domains stored in the metadata repository into knowledge. It may include a data integration unit for integrating into the base.

본 발명의 예시적 실시예에 따라, 변환 요청은, 소스 데이터가 포함된 지식 도메인의 식별 정보 및 지식 도메인에 포함된 소스 데이터의 적어도 하나의 식별자를 포함할 수 있다.According to an exemplary embodiment of the present invention, the conversion request may include identification information of the knowledge domain including the source data and at least one identifier of the source data included in the knowledge domain.

본 발명의 예시적 실시예에 따라, 데이터 통합부는, 맵핑 테이블 및 변환된 소스 데이터에 포함된 적어도 하나의 지식 개체의 적어도 하나의 제1 식별자에 기초하여 소스 인스턴스를 생성하는 지식 통합부, 및 소스 인스턴스를 정제하고, 정제된 소스 인스턴스를 지식 베이스에 통합하는 지식 적용부를 포함할 수 있다.According to an exemplary embodiment of the present invention, the data integrator includes a knowledge integrator that generates a source instance based on at least one first identifier of at least one knowledge entity included in the mapping table and the transformed source data, and the source It may include a knowledge application unit that refines the instance and integrates the refined source instance into the knowledge base.

본 발명의 예시적 실시예에 따라, 지식 통합부는, 맵핑 테이블에서 적어도 하나의 제1 식별자를 검색하고, 검색 결과에 기초하여 지식 베이스로부터 적어도 하나의 지식 개체를 추출하는 지식 개체 선정부, 및 적어도 하나의 제1 식별자를 추출된 적어도 하나의 지식 개체의 적어도 하나의 제2 식별자로 변경함으로써 변환된 소스 데이터로부터 소스 인스턴스를 생성하는 소스 인스턴스 생성부를 포함할 수 있다.According to an exemplary embodiment of the present invention, the knowledge integration unit searches for at least one first identifier in a mapping table, and a knowledge entity selector that extracts at least one knowledge entity from the knowledge base based on the search result, and at least and a source instance generator configured to generate a source instance from the converted source data by changing one first identifier to at least one second identifier of the extracted at least one knowledge entity.

본 발명의 예시적 실시예에 따라, 지식 개체 선정부는, 맵핑 테이블에서 적어도 하나의 제1 식별자가 검색되지 아니한 경우, 변환된 소스 데이터에 기초하여 지식 베이스로부터 적어도 하나의 후보 지식 개체를 추출할 수 있다.According to an exemplary embodiment of the present invention, when the at least one first identifier is not found in the mapping table, the knowledge entity selector may extract at least one candidate knowledge entity from the knowledge base based on the converted source data. have.

본 발명의 예시적 실시예에 따라, 지식 적용부는, 지식 베이스에 포함된 지식 인스턴스들의 형식에 기초하여 소스 인스턴스를 후처리하는 소스 인스턴스 후처리부, 및 후처리된 소스 인스턴스를 지식 베이스에 포함된 지식 인스턴스들과 비교함으로써 후처리된 소스 인스턴스를 지식 베이스에 선택적으로 통합하는 인스턴스 비교부를 포함할 수 있다.According to an exemplary embodiment of the present invention, the knowledge application unit includes a source instance post-processing unit that post-processes a source instance based on the types of knowledge instances included in the knowledge base, and a post-processed source instance to the knowledge included in the knowledge base. It may include an instance comparison unit that selectively integrates the post-processed source instance into the knowledge base by comparing with the instances.

본 발명의 예시적 실시예에 따라, 데이터 통합 시스템은, 관리자의 조회 요청에 응답하여, 변환 관리부 및 데이터 통합부에 의해서 생성된 데이터를 관리자에 제공하고, 관리자의 갱신 요청에 응답하여, 메타데이터를 갱신하는 데이터 관리부를 더 포함할 수 있다.According to an exemplary embodiment of the present invention, the data integration system provides the data generated by the conversion management unit and the data integration unit to the manager in response to the manager's inquiry request, and in response to the manager's update request, metadata It may further include a data management unit for updating the.

본 발명의 기술적 사상의 일측면에 따른 지식 통합을 위한 분산 시스템은, 복수의 데이터 변환 시스템들, 및 지식 도메인간 맵핑 테이블에 기초하여, 원격의 복수의 데이터 변환 시스템들로부터 제공되는 변환된 소스 데이터를 지식 베이스에 통합하는 데이터 통합 시스템을 포함할 수 있고, 복수의 데이터 변환 시스템들 각각은, 데이터 통합 시스템으로부터 제공되는 지시에 응답하여 지식 베이스에 포함된 지식 인스턴스들의 구조에 기초하여 지식 도메인에 포함된 소스 데이터를 변환하는 복수의 변환 엔진들을 포함할 수 있다.A distributed system for knowledge integration according to an aspect of the inventive concept includes a plurality of data transformation systems, and transformed source data provided from a plurality of remote data transformation systems based on a mapping table between knowledge domains and a plurality of data transformation systems. may include a data integration system for integrating into the knowledge base, wherein each of the plurality of data transformation systems is included in the knowledge domain based on the structure of the knowledge instances included in the knowledge base in response to an instruction provided from the data integration system. It may include a plurality of transformation engines that transform the source data.

본 발명의 예시적 실시예에 따라, 복수의 데이터 변환 시스템들 각각은, 지시에 기초하여 복수의 변환 모듈들 중 하나의 변환 모듈을 선택하고 복수의 변환 엔진들 중 적어도 하나에 선택된 변환 모듈을 제공하는 변환 엔진 관리부를 포함할 수 있다.According to an exemplary embodiment of the present invention, each of the plurality of data transformation systems selects one transformation module of the plurality of transformation modules based on the instruction and provides the selected transformation module to at least one of the plurality of transformation engines It may include a conversion engine management unit that

본 발명의 예시적 실시예에 따라, 지시는, 지식 도메인의 식별 정보, 소스 데이터의 적어도 하나의 식별자 및 변환 모듈의 인덱스 중 적어도 하나를 포함할 수 있다.According to an exemplary embodiment of the present invention, the indication may include at least one of identification information of a knowledge domain, at least one identifier of source data, and an index of a transformation module.

본 발명의 예시적 실시예에 따라, 복수의 데이터 변환 시스템들 각각은, 변환된 소스 데이터를 저장하는 변환 데이터 저장소를 포함할 수 있고, 복수의 변환 엔진들 각각은, 지식 도메인으로부터 소스 데이터를 수집하고, 변환된 소스 데이터를 변환 데이터 저장소에 저장할 수 있다.According to an exemplary embodiment of the present invention, each of the plurality of data transformation systems may include a transformation data store that stores transformed source data, and each of the plurality of transformation engines collects source data from the knowledge domain. and store the converted source data in the conversion data store.

본 발명의 기술적 사상에 따른 분산 시스템 및 방법에 의하면, 지식 통합을 위해 요구되는 데이터의 변환 및 데이터의 통합이 분리되고, 데이터의 변환이 복수의 데이터 변환 시스템들에 의해서 병렬적으로 처리됨으로써 방대한 지식 데이터가 분산 처리될 수 있고, 이에 따라 지식 통합이 효율적으로 용이하게 달성될 수 있다.According to the distributed system and method according to the technical idea of the present invention, data transformation and data integration required for knowledge integration are separated, and the data transformation is processed in parallel by a plurality of data transformation systems, so that vast knowledge Data can be distributed and processed, so that knowledge integration can be efficiently and easily achieved.

또한, 본 발명의 기술적 사상에 따른 분산 시스템 및 방법에 의하면, 통합된 지식 데이터를 포함하는 지식 베이스가 용이하게 구현될 수 있고, 이에 따라 지식 베이스에 기반한 다양한 서비스들의 유용성이 향상될 수 있다.In addition, according to the distributed system and method according to the technical idea of the present invention, a knowledge base including integrated knowledge data can be easily implemented, and thus the usefulness of various services based on the knowledge base can be improved.

본 발명의 실시예들에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 아니하며, 언급되지 아니한 다른 효과들은 이하의 본 발명의 실시예들에 대한 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 도출되고 이해될 수 있다. 즉, 본 발명을 실시함에 따른 의도하지 아니한 효과들 역시 본 발명의 실시예들로부터 당해 기술분야의 통상의 지식을 가진 자에 의해 도출될 수 있다.Effects that can be obtained in the embodiments of the present invention are not limited to the effects mentioned above, and other effects not mentioned are common in the art to which the present invention belongs from the description of the embodiments of the present invention below. It can be clearly derived and understood by those with knowledge. That is, unintended effects of carrying out the present invention may also be derived by those of ordinary skill in the art from the embodiments of the present invention.

도 1은 본 발명의 예시적 실시예에 따른 지식 통합을 위한 분산 시스템을 나타내는 블록도이다.
도 2는 본 발명의 예시적 실시예에 따른 데이터 통합부의 예시를 나타내는 블록도이다.
도 3은 본 발명의 예시적 실시예에 따른 지식 통합부의 예시를 나타내는 블록도이다.
도 4a 및 도 4b는 본 발명의 예시적 실시예들에 따라 소스 인스턴스들이 생성되는 예시들을 나타내는 도면들이다.
도 5는 본 발명의 예시적 실시예에 따라 지식 개체 선정부의 동작의 예시를 나타내는 순서도이다.
도 6은 본 발명의 예시적 실시예에 따른 지식 적용부의 예시를 나타내는 블록도이다.
도 7은 본 발명의 예시적 실시예에 따른 데이터 변환 시스템의 예시를 나타내는 블록도이다.1 is a block diagram illustrating a distributed system for knowledge integration according to an exemplary embodiment of the present invention.
Fig. 2 is a block diagram showing an example of a data integrator according to an exemplary embodiment of the present invention.
3 is a block diagram illustrating an example of a knowledge integration unit according to an exemplary embodiment of the present invention.
4A and 4B are diagrams illustrating examples in which source instances are created according to exemplary embodiments of the present invention.
5 is a flowchart illustrating an example of an operation of a knowledge object selection unit according to an exemplary embodiment of the present invention.
6 is a block diagram illustrating an example of a knowledge application unit according to an exemplary embodiment of the present invention.
Fig. 7 is a block diagram showing an example of a data conversion system according to an exemplary embodiment of the present invention.

이하, 첨부한 도면을 참조하여 본 발명의 실시예에 대해 상세히 설명한다. 본 발명의 실시예는 당 업계에서 평균적인 지식을 가진 자에게 본 발명을 보다 완전하게 설명하기 위하여 제공되는 것이다. 본 발명은 다양한 변경을 가할 수 있고 여러 가지 형태를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 개시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용한다. 첨부된 도면에 있어서, 구조물들의 치수는 본 발명의 명확성을 기하기 위하여 실제보다 확대하거나 축소하여 도시한 것이다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. The embodiments of the present invention are provided to more completely explain the present invention to those of ordinary skill in the art. Since the present invention can have various changes and can have various forms, specific embodiments are illustrated in the drawings and described in detail. However, this is not intended to limit the present invention to the specific disclosed form, it should be understood to include all modifications, equivalents and substitutes included in the spirit and scope of the present invention. In describing each figure, like reference numerals are used for like elements. In the accompanying drawings, the dimensions of the structures are enlarged or reduced than the actual size for clarity of the present invention.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수개의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성 요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present application, terms such as “comprise” or “have” are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, but one or more other features It is to be understood that it does not preclude the possibility of the presence or addition of numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 갖는다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having meanings consistent with the meanings in the context of the related art, and unless explicitly defined in the present application, they are not interpreted in an ideal or excessively formal meaning. .

이하 도면 및 설명에서, 하나의 블록으로 표시 또는 설명되는 구성요소는 하드웨어 블록 또는 소프트웨어 블록일 수 있다. 예를 들면, 구성요소들 각각은 서로 신호를 주고 받는 독립적인 하드웨어 블록일 수도 있고, 또는 하나의 프로세서에서 실행되는 소프트웨어 블록일 수도 있다.In the drawings and description below, a component indicated or described as one block may be a hardware block or a software block. For example, each of the components may be an independent hardware block that sends and receives signals with each other, or may be a software block executed by one processor.

도 1은 본 발명의 예시적 실시예에 따른 지식 통합을 위한 분산 시스템을 나타내는 블록도이다. 분산 시스템은 데이터 통합 시스템(110) 및 복수의 데이터 변환 시스템들(120)을 포함할 수 있고, 관리자(10)와 통신할 수 있으며, 메타데이터 저장소(130) 및 지식 베이스(140)에 액세스할 수 있다. 일부 실시예들에서, 분산 시스템은 메타데이터 저장소(130) 및/또는 지식 베이스(140)를 포함할 수도 있다. 본 명세서에서, 메타데이터 저장소(130) 및 지식 베이스(140)는 분산 시스템의 외부에 있는 것으로 설명될 것이나, 본 발명의 예시적 실시예들이 이에 제한되지 아니하는 점이 유의된다.1 is a block diagram illustrating a distributed system for knowledge integration according to an exemplary embodiment of the present invention. The distributed system may include a data integration system 110 and a plurality of data transformation systems 120 , may communicate with an administrator 10 , and may have access to a metadata repository 130 and knowledge base 140 . can In some embodiments, the distributed system may include a metadata repository 130 and/or a knowledge base 140 . In this specification, the metadata repository 130 and the knowledge base 140 will be described as being external to the distributed system, but it is noted that the exemplary embodiments of the present invention are not limited thereto.

분산 시스템은 산재된 지식 데이터(본 명세서에서 소스 데이터로서 지칭될 수 있다)를 지식 베이스(140)에 통합하는 동작을 수행할 수 있다. 관리자(10)는 분산 시스템에 의한 지식 데이터의 통합(본 명세서에서 지식 통합으로 지칭될 수 있다)을 관리하는 임의의 주체일 수 있다. 예를 들면, 관리자(10)는 분산 시스템에 지식 통합을 위한 동작(예컨대, 후술되는 변환 동작, 통합 동작 등)을 지시할 수도 있고, 분산 시스템에 의해서 수행되는 지식 데이터의 통합 과정을 조회할 수도 있으며, 지식 데이터의 통합에 사용되는 데이터, 예컨대 메타데이터 저장소(130)에 저장된 메타데이터를 갱신함으로써 지식 데이터의 통합을 제어할 수도 있다. 관리자(10)는 다양한 방식으로 데이터 통합 시스템(110)과 통신할 수 있다. 예를 들면, 관리자(10)는 데이터 통합 시스템(110)과 통신 채널을 통해서 통신하는 임의의 단말기를 지칭할 수 있고, 단말기는 관리자(10)로부터 수신된 입력에 따라 통신 채널을 통해서 데이터 통합 시스템(110)에 요청을 제공할 수 있고, 요청에 따른 응답을 데이터 통합 시스템(110)으로부터 수신하여 관리자(10)에 제공할 수도 있다. 일부 실시예들에서, 관리자(10) 및 데이터 통합 시스템(110) 사이 통신 채널은 인터넷과 같은 네트워크를 경유하여 형성될 수도 있고, 일대일 직접 통신일 수도 있다.The distributed system may perform the operation of integrating the scattered knowledge data (which may be referred to herein as source data) into the knowledge base 140 . The manager 10 may be any entity that manages the integration of knowledge data by a distributed system (which may be referred to as knowledge integration herein). For example, the manager 10 may instruct the distributed system an operation for knowledge integration (eg, a transformation operation, an integration operation, etc. to be described later), or inquire about the knowledge data integration process performed by the distributed system. Also, it is possible to control the integration of the knowledge data by updating the data used for the integration of the knowledge data, for example, the metadata stored in the metadata storage 130 . The manager 10 may communicate with the data integration system 110 in a variety of ways. For example, the manager 10 may refer to any terminal that communicates with the data aggregation system 110 through a communication channel, and the terminal is the data aggregation system through the communication channel according to an input received from the manager 10 . The request may be provided to the 110 , and a response according to the request may be received from the data integration system 110 and provided to the manager 10 . In some embodiments, the communication channel between the manager 10 and the data aggregation system 110 may be established via a network, such as the Internet, or may be one-to-one direct communication.

메타데이터 저장소(130)는 지식 데이터의 통합에 사용되는 메타데이터(metadata)를 저장할 수 있다. 예를 들면, 메타데이터 저장소(130)는 지식 데이터의 통합 과정에서 생성된 중간 데이터(예컨대, 도 2의 132, 136 등)를 저장할 수도 있고, 지식 데이터의 통합 과정에서 참조되는 데이터(예컨대, 도 2의 134)를 저장할 수도 있다. 일부 실시예들에서, 관리자(10)는 데이터 통합 시스템(110)을 통해서 메타데이터 저장소(130)에 저장된 메타데이터의 일부를 갱신할 수도 있다. 메타데이터 저장소(130)는 독립적인 시스템(예컨대, 서버)으로서 네트워크를 통해서 데이터 통합 시스템(110)과 통신할 수도 있고, 데이터 통합 시스템(110)과 일대일 통신 채널을 통해서 연결된 로컬 저장소일 수도 있다. 메타데이터 저장소(130)의 예시가 도 2 등을 참조하여 후술될 것이다.The metadata storage 130 may store metadata used for integration of knowledge data. For example, the metadata repository 130 may store intermediate data (eg, 132, 136, etc. in FIG. 2 ) generated in the process of integrating knowledge data, and data referenced in the process of integrating knowledge data (eg, in FIG. 2 ). 134 of 2) can also be stored. In some embodiments, the administrator 10 may update a portion of the metadata stored in the metadata repository 130 via the data integration system 110 . The metadata repository 130 may communicate with the data aggregation system 110 through a network as an independent system (eg, a server), or may be a local storage connected to the data aggregation system 110 through a one-to-one communication channel. An example of the metadata repository 130 will be described below with reference to FIG. 2 and the like.

지식 베이스(knowledge base)(140)는 통합된 지식 데이터를 포함하는 시스템을 지칭할 수 있다. 지식 베이스(140)는 미리 정의된 구조에 따라 지식 데이터를 저장할 수 있고, 요청(예컨대, 쿼리)에 따라 지식 데이터를 외부에 제공할 수 있다. 예를 들면, 지식 베이스(140)는 RDF(Resource Description Framework) 구조로 표현된 지식 인스턴스를 포함할 수 있고, 지식 인스턴스들의 집합은 지식 그래프로서 지칭될 수 있다. 본 명세서에서, 지식 인스턴스는, 예컨대 "이순신의 생일은 1545년 4월 28일이다"와 같은 지식을 나타내는 단위를 지칭할 수 있다. RDF 구조에서 지식 인스턴스는 트리플(triple)로 표현될 수 있다. 트리플은, 예컨대 "이순신 / 생일 / 1545-04-28"과 같이 주어(subject), 술어(predicate), 목적어(object)로 표현될 수 있다. "이순신"은 지식 개체 또는 단순하게 개체로 지칭될 수 있으며, 지식 베이스(140)에서 고유한 식별자(본 명세서에서 제2 식별자로서 지칭될 수 있다), 예컨대 URI(Uniform Resource Identifier)를 가질 수 있다. 또한, "생일"은 주어 및 목적어의 관계를 나타내는 속성으로 지칭될 수 있다. 트리플에서 목적어는 상기 "1545년 4월 28일"과 같이 값에 대응할 수도 있고, "이순신 / 직업 / 군인"과 같이 술어가 개체간 관계를 나타내는 경우 주어와 상이한 개체에 대응할 수도 있다. 지식 베이스(140)와 상이한 지식 도메인에서도 지식 그래프가 구현될 수 있고, 이에 따라 지식 통합은 지식 그래프 통합으로 지칭될 수 있다.The knowledge base 140 may refer to a system including integrated knowledge data. The knowledge base 140 may store knowledge data according to a predefined structure, and may provide the knowledge data to the outside according to a request (eg, a query). For example, the knowledge base 140 may include knowledge instances expressed in a Resource Description Framework (RDF) structure, and a set of knowledge instances may be referred to as a knowledge graph. In the present specification, a knowledge instance may refer to a unit representing knowledge, for example, "Sun-sin Yi's birthday is April 28, 1545". In the RDF structure, a knowledge instance may be expressed as a triple. A triple may be expressed as a subject, a predicate, and an object, for example, "Sun-Shin Yi / Birthday / 1545-04-28". "Sun-Shin Yi" may be referred to as a knowledge entity or simply an entity, and may have a unique identifier in the knowledge base 140 (which may be referred to as a second identifier herein), such as a Uniform Resource Identifier (URI). . Also, “birthday” may be referred to as an attribute indicating a relationship between a subject and an object. In the triple, the object may correspond to a value such as "April 28, 1545" above, or may correspond to an entity different from the subject when a predicate indicates a relationship between entities, such as "Sun-Shin Yi / Job / Soldier". A knowledge graph may be implemented in a knowledge domain different from the knowledge base 140 , and thus, knowledge integration may be referred to as knowledge graph integration.

지식 도메인에 산재된 지식 데이터는 분산 시스템에 의해서 지식 베이스(140)에 통합될 수 있다. 지식 데이터의 통합은 분산된 지식 데이터를 지식 베이스(140)의 구조에 따라 변환하는 데이터 변환 동작 및 변환된 지식 데이터를 지식 베이스(140)에 포함된 지식 데이터, 즉 지식 인스턴스들과 통합하는 데이터 통합 동작을 포함할 수 있다. 분산 시스템에서, 데이터 변환 동작은 복수의 데이터 변환 시스템들(120)에 의해서 병렬 수행될 수 있는 한편, 데이터 통합 동작은 데이터 통합 시스템(110)에 의해서 수행될 수 있다. 이와 같이, 분산 시스템은 지식 데이터의 통합을 데이터 변환 동작 및 데이터 통합 동작으로 분리하고, 데이터 변환 동작을 병렬 처리함으로써 방대한 지식 데이터가 분산 처리될 수 있으며, 지식 데이터의 통합이 효율적으로 그리고 용이하게 달성될 수 있다. 또한, 통합된 지식 데이터를 포함하는 지식 베이스(140)가 분산 시스템에 의해서 용이하게 구현됨으로써, 지식 베이스(140)에 기반한 다양한 서비스들이 가능할 수 있고, 그러한 서비스들의 유용성이 현저하게 향상될 수 있다.The knowledge data scattered in the knowledge domain may be integrated into the knowledge base 140 by a distributed system. The integration of knowledge data includes a data transformation operation for transforming distributed knowledge data according to the structure of the knowledge base 140 and data integration for integrating the transformed knowledge data with knowledge data included in the knowledge base 140 , that is, knowledge instances. It can include actions. In a distributed system, a data transformation operation may be performed in parallel by a plurality of data transformation systems 120 , while a data integration operation may be performed by the data integration system 110 . In this way, the distributed system separates the integration of knowledge data into a data transformation operation and a data integration operation, and by parallel processing the data transformation operation, a large amount of knowledge data can be distributed and processed, and the integration of knowledge data is efficiently and easily achieved. can be In addition, since the knowledge base 140 including the integrated knowledge data is easily implemented by a distributed system, various services based on the knowledge base 140 may be available, and the usefulness of such services may be remarkably improved.

데이터 통합 시스템(110)은, 도 1에 도시된 바와 같이, 관리자(10)와 통신할 수 있고, 복수의 데이터 변환 시스템들(120)과 통신할 수 있다. 일부 실시예들에서, 데이터 통합 시스템(110)은 인터넷과 같은 네트워크를 통해서 복수의 데이터 변환 시스템들(120)과 통신할 수 있다. 예를 들면, 복수의 데이터 변환 시스템들(120)은 데이터 통합 시스템(110)으로부터 원격지들에 배치될 수 있고, 복수의 데이터 변환 시스템들(120) 상호간에도 멀리 떨어져 있을 수 있다. 또한, 데이터 통합 시스템(110)은 메타데이터 저장소(130) 및 지식 베이스(140)와 통신할 수 있다. 도 1에 도시된 바와 같이, 데이터 통합 시스템(110)은 작업 선택부(112), 변환 관리부(114), 데이터 통합부(116) 및 데이터 관리부(118)를 포함할 수 있다.The data integration system 110 may communicate with the manager 10 and may communicate with a plurality of data conversion systems 120 , as shown in FIG. 1 . In some embodiments, the data aggregation system 110 may communicate with a plurality of data conversion systems 120 via a network, such as the Internet. For example, the plurality of data conversion systems 120 may be located at remote locations from the data integration system 110 , and may also be remote between the plurality of data conversion systems 120 . Additionally, the data integration system 110 may communicate with the metadata repository 130 and the knowledge base 140 . As shown in FIG. 1 , the data integration system 110 may include a job selection unit 112 , a conversion management unit 114 , a data integration unit 116 , and a data management unit 118 .

작업 선택부(112)는 관리자(10)로부터 요청을 수신할 수 있고, 수신된 요청을 디코딩할 수 있다. 예를 들면, 작업 선택부(112)는 관리자(10)로부터 변환 요청을 수신할 수 있고, 변환 요청을 디코딩함으로써 변환 동작의 지시 및 변환 동작을 위한 정보를 변환 관리부(114)에 제공할 수 있다. 또한, 작업 선택부(112)는 관리자(10)로부터 통합 요청을 수신할 수 있고, 통합 요청을 디코딩함으로써 통합 요청의 지시 및 통합 동작을 위한 정보를 데이터 통합부(116)에 제공할 수 있다.The task selection unit 112 may receive a request from the manager 10 and may decode the received request. For example, the job selection unit 112 may receive a conversion request from the manager 10, and by decoding the conversion request, an indication of the conversion operation and information for the conversion operation may be provided to the conversion management unit 114. . In addition, the task selection unit 112 may receive the integration request from the manager 10 , and provide the data integration unit 116 with information for an indication of the integration request and an integration operation by decoding the integration request.

변환 관리부(114)는 복수의 데이터 변환 시스템들(120)과 통신할 수 있다. 변환 관리부(114)는 관리자(10)의 변환 요청에 응답하여, 복수의 데이터 변환 시스템들(120) 중 적어도 하나의 데이터 변환 시스템에 소스 데이터의 변환 작업을 지시할 수 있고, 적어도 하나의 데이터 변환 시스템으로부터 변환된 소스 데이터를 수신할 수 있다. 예를 들면, 변환 관리부(114)는 변환 요청에 응답하여, 제1 데이터 변환 시스템(120_1) 및/또는 제k 데이터 변환 시스템(120_k)에 변환 작업을 지시할 수 이고, 제1 데이터 변환 시스템(120_1) 및/또는 제k 변환 시스템(120_k)로부터 변환된 소스 데이터를 수신할 수 있다. 일부 실시예들에서, 관리자(10)가 제공하는 변환 요청은, 변환될 소스 데이터가 포함된 지식 도메인의 식별 정보, 지식 도메인에 포함된 소스 데이터 중 변환의 대상이 되는 소스 데이터를 식별하기 위한 적어도 하나의 식별자(본 명세서에서 제1 식별자로서 지칭될 수 있다)를 포함할 수 있다. 일부 실시예들에서, 변환 관리부(114)는 복수의 데이터 변환 시스템들(120) 각각의 부하들을 고려하여 적어도 하나의 데이터 변환 시스템을 선정할 수 있고, 선정된 적어도 하나의 데이터 변환 시스템에 변환 작업의 지시 및 변환 요청에 포함된 정보를 제공할 수 있다. 변환 관리부(114)는 적어도 하나의 데이터 변환 시스템으로부터 수신된 변환된 소스 데이터를 데이터 통합부(116)에 제공할 수 있다.The conversion manager 114 may communicate with the plurality of data conversion systems 120 . The conversion management unit 114 may instruct a conversion operation of the source data to at least one data conversion system among the plurality of data conversion systems 120 in response to the conversion request of the manager 10 , and at least one data conversion system It is possible to receive the converted source data from the system. For example, the conversion management unit 114 may instruct the conversion operation to the first data conversion system 120_1 and/or the k-th data conversion system 120_k in response to the conversion request, and the first data conversion system ( 120_1) and/or may receive the converted source data from the k-th conversion system 120_k. In some embodiments, the conversion request provided by the manager 10 includes at least identification information of a knowledge domain including source data to be converted and source data to be converted among source data included in the knowledge domain. It may include one identifier (which may be referred to as a first identifier herein). In some embodiments, the conversion management unit 114 may select at least one data conversion system in consideration of the loads of each of the plurality of data conversion systems 120 , and perform a conversion operation on the selected at least one data conversion system You can provide the information included in the instructions and conversion requests. The conversion management unit 114 may provide the converted source data received from at least one data conversion system to the data integration unit 116 .

데이터 통합부(116)는 변환 관리부(114)로부터 변환된 소스 데이터를 수신할 수 있고, 메타데이터 저장소(130) 및 지식 베이스(140)에 액세스할 수 있다. 데이터 통합부(116)는 관리자(10)의 통합 요청에 응답하여, 메타데이터 저장소(130)에 저장된 데이터를 참조함으로써 변환 관리부(114)로부터 제공된 변환된 소스 데이터를 지식 베이스(140)에 통합할 수 있다. 데이터 통합부(116)는 통합 과정에서 메타데이터 저장소(130)에 저장된 데이터를 처리할 수 있고, 처리된 데이터를 메타데이터 저장소(130)에 저장할 수도 있다. 데이터 통합부(116)의 예시는 도 2 등을 참조하여 후술될 것이다.The data integration unit 116 may receive the converted source data from the conversion management unit 114 , and may access the metadata repository 130 and the knowledge base 140 . In response to the integration request of the manager 10 , the data integration unit 116 integrates the converted source data provided from the conversion management unit 114 into the knowledge base 140 by referring to data stored in the metadata storage 130 . can The data integration unit 116 may process data stored in the metadata storage 130 during the integration process, and may store the processed data in the metadata storage 130 . An example of the data integrator 116 will be described later with reference to FIG. 2 and the like.

데이터 관리부(118)는, 관리자(10)의 조회 요청에 응답하여 메타데이터 저장소(130)에 저장된 메타데이터의 적어도 일부, 예컨대 변환 관리부(114) 및 데이터 통합부(116)에 의해서 생성된 데이터(예컨대, 도 2의 132, 136)를 작업 선택부(112)를 통해서 관리자(10)에 제공할 수도 있고, 관리자(10)의 갱신 요청에 응답하여 메타데이터 저장소(130)에 저장된 메타데이터의 적어도 일부(예컨대, 도 2의 134)를 갱신할 수도 있다. 즉, 데이터 관리부(118)는 분산 시스템에 대한 관리자(10)의 큐레이션을 지원할 수 있다.The data management unit 118 may include at least a portion of the metadata stored in the metadata storage 130 in response to the inquiry request of the manager 10 , for example, data generated by the conversion management unit 114 and the data integration unit 116 ( For example, 132 and 136 of FIG. 2 may be provided to the manager 10 through the task selection unit 112 , and at least the metadata stored in the metadata storage 130 in response to the update request of the manager 10 . Some (eg, 134 in FIG. 2 ) may be updated. That is, the data management unit 118 may support the curation of the manager 10 for the distributed system.

복수의 데이터 변환 시스템들(120) 각각은 데이터 통합 시스템(110)의 변환 관리부(114)의 지시에 기초하여 소스 데이터, 즉 지식 도메인에 포함된 지식 데이터를 변환할 수 있다. 예를 들면, 지식 도메인은 고유한 방식으로 지식 데이터, 즉 소스 데이터를 저장할 수 있다. 예를 들면, 지식 도메인은, 지식 베이스(140)와 유사하게 RDF 구조에 따라 지식 데이터를 저장할 수도 있고, 다른 임의의 구조에 따라 지식 데이터를 저장할 수도 있다. 또한, 지식 도메인은 CSV, TXT, LOD(Linked Open Data) 등과 같이 고유한 형식에 따라 지식 데이터를 저장할 수도 있다. 이에 따라, 지식 도메인에 포함된 지식 데이터를 데이터 통합 시스템(110)이 처리하기 위하여, 복수의 데이터 변환 시스템들(120)은 지식 도메인의 지식 데이터를 공통의 구조 및 형식으로 변환할 수 있다. 본 명세서에서, 지식 도메인의 소스 데이터가 복수의 데이터 변환 시스템들(120)에 의해서 변환된 데이터는 변환된 소스 데이터로서 지칭될 수 있다. 도 1에 도시된 바와 같이, 제1 데이터 변환 시스템(120_1)은 변환 데이터 저장소(122_1) 및 복수의 변환 엔진들(124_1)을 포함할 수 있고, 제k 데이터 변환 시스템(120_k)은 변환 데이터 저장소(122_k) 및 복수의 변환 엔진들(124_k)을 포함할 수 있다. Each of the plurality of data conversion systems 120 may convert source data, ie, knowledge data included in the knowledge domain, based on an instruction of the conversion management unit 114 of the data integration system 110 . For example, a knowledge domain may store knowledge data, ie, source data, in a unique way. For example, the knowledge domain may store knowledge data according to an RDF structure, similar to the knowledge base 140 , or may store knowledge data according to another arbitrary structure. In addition, the knowledge domain may store knowledge data according to a unique format such as CSV, TXT, Linked Open Data (LOD), and the like. Accordingly, in order for the data integration system 110 to process the knowledge data included in the knowledge domain, the plurality of data conversion systems 120 may convert the knowledge data of the knowledge domain into a common structure and format. In this specification, data in which the source data of the knowledge domain is converted by the plurality of data conversion systems 120 may be referred to as converted source data. As shown in FIG. 1 , the first data transformation system 120_1 may include a transformation data store 122_1 and a plurality of transformation engines 124_1 , and the k-th data transformation system 120_k is a transformation data store 122_k and a plurality of transformation engines 124_k.

복수의 데이터 변환 시스템들(120) 각각은 독립적인 변환 데이터 저장소를 포함할 수 있고, 이에 따라 소스 데이터의 대용량 처리를 지원할 수 있다. 도 1에 도시된 바와 상이하게, 단일 시스템(또는 단일 서버)에서 데이터의 변환 및 데이터의 통합이 수행되는 경우, 방대한 소스 데이터의 처리가 비효율적일 수 있다. 복수의 데이터 변환 시스템들(120) 각각은 변환 데이터 저장소뿐만 아니라 변환 동작을 각각 독립적으로 수행할 수 있는 복수의 변환 엔진들을 포함할 수 있고, 이에 따라 데이터 변환 시스템 내부에서도 변환 동작이 병렬적으로 수행될 수 있다. 일부 실시예들에서, 하나의 데이터 변환 시스템은 하나의 지식 도메인에 포함된 소스 데이터를 변환할 수 있다. 또한, 일부 실시예들에서, 2이상의 데이터 변환 시스템들이 하나의 지식 도메인에 포함된 상이한 소스 데이터를 각각 변환할 수도 있다. 이와 같은 복수의 데이터 변환 시스템들(120)에 의한 변환 동작들은, 전술된 바와 같이 변환 관리부(114)에 의해서 스케줄링될 수 있다.Each of the plurality of data transformation systems 120 may include an independent transformation data store, and thus may support large-capacity processing of source data. Unlike shown in FIG. 1 , when data conversion and data integration are performed in a single system (or a single server), processing of massive source data may be inefficient. Each of the plurality of data conversion systems 120 may include a plurality of conversion engines capable of independently performing a conversion operation as well as a conversion data store, and thus the conversion operation is performed in parallel even within the data conversion system can be In some embodiments, one data transformation system may transform source data included in one knowledge domain. Also, in some embodiments, two or more data transformation systems may each transform different source data included in one knowledge domain. Such conversion operations by the plurality of data conversion systems 120 may be scheduled by the conversion management unit 114 as described above.

도 2는 본 발명의 예시적 실시예에 따른 데이터 통합부의 예시를 나타내는 블록도이다. 구체적으로, 도 2의 블록도는 데이터 통합부(200) 및 데이터 통합부(200)가 액세스하는 메타데이터 저장소(130) 및 지식 베이스(140)를 나타낸다. 도 1을 참조하여 전술된 바와 같이, 데이터 통합부(200)는 도 1의 데이터 통합 시스템(110)에 포함될 수 있고, 관리자(10)의 통합 요청에 응답하여, 변환 관리부(114)로부터 제공된 변환된 소스 데이터를 지식 베이스(140)에 통합할 수 있다. 도 2에 도시된 바와 같이, 데이터 통합부(200)는 지식 통합부(220) 및 지식 적용부(240)를 포함할 수 있고, 메타데이터 저장소(130)는 소스 인스턴스들(132), 맵핑 테이블(134) 및 후보 지식 개체들(136)을 포함할 수 있으며, 지식 베이스(140)는 지식 인스턴스들(142)을 포함할 수 있다. 이하에서 도 2는 도 1을 참조하여 설명될 것이다.Fig. 2 is a block diagram showing an example of a data integrator according to an exemplary embodiment of the present invention. Specifically, the block diagram of FIG. 2 shows the data integrator 200 and the metadata repository 130 and the knowledge base 140 accessed by the data integrator 200 . As described above with reference to FIG. 1 , the data integration unit 200 may be included in the data integration system 110 of FIG. 1 , and in response to the integration request of the manager 10 , the conversion provided from the conversion management unit 114 . source data may be integrated into the knowledge base 140 . As shown in FIG. 2 , the data integrator 200 may include a knowledge integrator 220 and a knowledge application unit 240 , and the metadata repository 130 includes source instances 132 and a mapping table. 134 and candidate knowledge entities 136 , and the knowledge base 140 may include knowledge instances 142 . Hereinafter, FIG. 2 will be described with reference to FIG. 1 .

지식 통합부(220)는 변환된 소스 데이터에 포함된 적어도 하나의 지식 개체의 적어도 하나의 제1 식별자에 기초하여 소스 인스턴스를 생성할 수 있다. 변환된 소스 데이터는 소스 데이터가 포함된 지식 도메인에서 고유한 식별자를 가지는 적어도 하나의 개체를 포함할 수 있다. 예를 들면, 전술된 예시 "이순신 / 직업 / 군인"에서, 개체들 "이순신", "직업" 및 "군인" 각각은 지식 도메인에서 고유한 식별자를 가질 수 있다. 맵핑 테이블(134)은 소스 데이터에 포함된 지식 개체의 식별자 및 지식 베이스(140)의 지식 인스턴스들(142)에 포함된 지식 개체의 식별자 사이 맵핑 관계를 정의할 수 있다. 이에 따라, 지식 통합부(220)는 변환된 소스 데이터로부터 맵핑 테이블(134)을 참조함으로써 소스 인스턴스들(132)을 생성할 수 있고, 메타데이터 저장소(130)에 저장할 수 있다. 또한, 지식 통합부(220)는 소스 데이터에 포함된 개체의 식별자가 맵핑 테이블(134)에 존재하지 아니하는 경우, 지식 베이스(140)로부터 후보 지식 개체들(136)을 검색할 수 있고, 메타데이터 저장소(130)에 저장할 수 있다. 지식 통합부(220)의 예시는 도 3을 참조하여 후술될 것이다.The knowledge integrator 220 may generate a source instance based on at least one first identifier of at least one knowledge entity included in the transformed source data. The transformed source data may include at least one entity having a unique identifier in a knowledge domain including the source data. For example, in the above-described example “Sun-Sin Yi / Profession / Soldier”, the entities “Sun-Sin Yi”, “Occupation” and “Soldier” may each have a unique identifier in the knowledge domain. The mapping table 134 may define a mapping relationship between the identifier of the knowledge entity included in the source data and the identifier of the knowledge entity included in the knowledge instances 142 of the knowledge base 140 . Accordingly, the knowledge integrator 220 may generate the source instances 132 by referring to the mapping table 134 from the converted source data, and may store the source instances 132 in the metadata storage 130 . Also, when the identifier of the entity included in the source data does not exist in the mapping table 134 , the knowledge integrator 220 may search for the candidate knowledge entities 136 from the knowledge base 140 , It may be stored in the data storage 130 . An example of the knowledge integrator 220 will be described later with reference to FIG. 3 .

지식 적용부(240)는 지식 통합부(220)에 의해서 생성된 소스 인스턴스들(132)을 정제할 수 있고, 정제된 소스 인스턴스들을 지식 베이스(140)에 통합할 수 있다. 예를 들면, 지식 적용부(240)는 소스 인스턴스들(132)을 검증할 수 있고, 소스 인스턴스들(132)을 지식 베이스(140)의 적어도 하나의 지식 인스턴스와 비교함으로써 지식 베이스(140)에 소스 인스턴스들(132)의 추가 여부를 판단할 수 있다. 지식 적용부(240)의 예시는 도 6을 참조하여 후술될 것이다.The knowledge application unit 240 may refine the source instances 132 generated by the knowledge integration unit 220 , and may integrate the refined source instances into the knowledge base 140 . For example, the knowledge applicator 240 may verify the source instances 132 , and compare the source instances 132 with at least one knowledge instance of the knowledge base 140 . It may be determined whether the source instances 132 are added. An example of the knowledge application unit 240 will be described later with reference to FIG. 6 .

도 3은 본 발명의 예시적 실시예에 따른 지식 통합부의 예시를 나타내는 블록도이고, 도 4a 및 도 4b는 본 발명의 예시적 실시예들에 따라 소스 인스턴스들이 생성되는 예시들을 나타내는 도면들이다. 도 2를 참조하여 전술된 바와 같이, 지식 통합부(300)는 도 2의 데이터 통합부(200)(또는 도 1의 116)에 포함될 수 있고, 맵핑 테이블(134)을 참조하여, 변환된 소스 데이터로부터 소스 인스턴스들을 생성할 수 있다. 이하에서, 도 3, 도 4a 및 도 4b는 도 2에 대한 설명 중 도 2에 대한 설명과 중복되는 내용은 생략될 것이다.3 is a block diagram illustrating an example of a knowledge integration unit according to an exemplary embodiment of the present invention, and FIGS. 4A and 4B are diagrams illustrating examples of generating source instances according to exemplary embodiments of the present invention. As described above with reference to FIG. 2 , the knowledge integrator 300 may be included in the data integrator 200 of FIG. 2 (or 116 of FIG. 1 ), and the converted source is referred to by referring to the mapping table 134 . You can create source instances from data. Hereinafter, content overlapping with the description of FIG. 2 among the description of FIG. 2 in FIGS. 3, 4A and 4B will be omitted.

도 3을 참조하면, 지식 통합부(300)는 지식 개체 선정부(320) 및 소스 인스턴스 생성부(340)를 포함할 수 있다. 지식 개체 선정부(320)는 변환된 소스 데이터에 포함된 지식 개체의 식별자를 맵핑 테이블(134)에서 검색할 수 있고, 검색 결과에 기초하여 지식 베이스(140)로부터 적어도 하나의 지식 개체를 추출할 수 있다. 예를 들면, 도 4a의 우측에 도시된 바와 같이, 지식 개체 선정부(320)는 변환된 소스 데이터의 예시(42) "이순신 / 직업 / 군인"에서 지식 개체 "이순신" 및 "군인"의 식별자로서 "wdata: 43" 및 "wdata: 30"을 각각 획득할 수 있다. 지식 개체 선정부(320)는 식별자 "wdata: 43" 및 "wdata: 30"를 도 4a의 좌측에 도시된 맵핑 테이블(41)에서 검색할 수 있고, 도 4a에 도시된 바와 같이, 식별자 "wdata: 43" 및 "wdata: 30"에 대응하는 식별자 "addr: 0102" 및 "addr; 0001"를 획득할 수 있다. 비록 도 4a에 도시되지 아니하였으나, 술어(P)도 지식 도메인에서 하나의 지식 개체로서 식별자를 가질 수 있고, 맵핑 테이블(41)은 술어(P)의 식별자들간 맵핑 관계도 포함할 수 있다. 유사하게, 도 4b의 우측에 도시된 바와 같이, 지식 개체 선정부(320)는 변환된 소스 데이터의 예시(44) "이순신 / 생일 / 1545-04-28"에서 지식 개체 "이순신"의 식별자로서 "wdata: 43"을 획득할 수 있고, 맵핑 테이블(41)을 참조하여 식별자로서 "wdata: 43"에 대응하는 식별자 "addr: 0102"를 획득할 수 있다.Referring to FIG. 3 , the knowledge integrator 300 may include a knowledge object selector 320 and a source instance generator 340 . The knowledge object selector 320 may search the mapping table 134 for an identifier of a knowledge object included in the converted source data, and extract at least one knowledge object from the knowledge base 140 based on the search result. can For example, as shown on the right side of FIG. 4A , the knowledge object selection unit 320 identifies the knowledge objects "Sun-Shin Yi" and "Soldier" in the example 42 "Sun-Shin Yi / Occupation / Soldier" of the converted source data. As , "wdata: 43" and "wdata: 30" can be obtained, respectively. The knowledge object selection unit 320 may search for identifiers “wdata: 43” and “wdata: 30” in the mapping table 41 shown on the left side of FIG. 4A , and as shown in FIG. 4A , identifiers “wdata” : 43" and identifiers "addr: 0102" and "addr; 0001" corresponding to "wdata: 30" can be obtained. Although not shown in FIG. 4A , the predicate P may also have an identifier as one knowledge entity in the knowledge domain, and the mapping table 41 may also include a mapping relationship between identifiers of the predicate P. Similarly, as shown on the right side of FIG. 4B , the knowledge entity selection unit 320 as an identifier of the knowledge entity “Sun-Shin Yi” in the example 44 “Sun-Shin Yi / Birthday / 1545-04-28” of the converted source data. “wdata: 43” may be obtained, and an identifier “addr: 0102” corresponding to “wdata: 43” may be obtained as an identifier with reference to the mapping table 41 .

도 4a 및 도 4b에 도시된 예시들과 상이하게, 도 3의 맵핑 테이블(134)에서 식별자가 검색되지 아니한 경우, 지식 개체 선정부(320)는 지식 베이스(140)에 액세스할 수 있고, 지식 베이스(140)의 지식 인스턴스들에 포함된 지식 개체들을 추출함으로써 후보 지식 개체들(136)을 생성할 수 있다. 도 3의 맵핑 테이블(134)에서 식별자가 검색되지 아니한 경우, 지식 개체 선정부(320)의 동작의 예시는 도 5를 참조하여 후술될 것이다.Unlike the examples shown in FIGS. 4A and 4B , when the identifier is not searched for in the mapping table 134 of FIG. 3 , the knowledge entity selector 320 may access the knowledge base 140 , and Candidate knowledge entities 136 may be generated by extracting knowledge entities included in the knowledge instances of the base 140 . When the identifier is not found in the mapping table 134 of FIG. 3 , an example of the operation of the knowledge entity selector 320 will be described with reference to FIG. 5 .

소스 인스턴스 생성부(340)는 변환된 소스 데이터에 포함된 지식 개체의 식별자를 지식 개체 선정부(320)에 의해서 제공되는 식별자로 변경함으로써 소스 인스턴스들(132)을 생성할 수 있다. 예를 들면, 도 4a의 우측에 도시된 바와 같이, 소스 인스턴스 생성부(340)는 지식 도메인의 식별자들 "wdata: 43" 및 "wdata: 30"을 포함하는 변환된 소스 데이터의 예시(42)로부터 지식 베이스(140)에 포함된 지식 개체의 식별자들 "addr: 0102" 및 "addr: 0001"을 포함하는 소스 인스턴스(43)를 생성할 수 있다. 유사하게, 도 4b의 우측에 도시된 바와 같이, 소스 인스턴스 생성부(340)는 지식 도메인의 식별자들 "wdata: 43"을 포함하는 변환된 소스 데이터의 예시(44)로부터 지식 베이스(140)에 포함된 지식 개체의 식별자 "addr: 0102"를 포함하는 소스 인스턴스(45)를 생성할 수 있다. 일부 실시예들에서, 도 4b에 도시된 바와 같이, 소스 인스턴스 생성부(340)는 소스 데이터의 예시(44)의 값 "1545-04-28"을 그대로 포함하는 소스 인스턴스(45)를 생성할 수 있다.The source instance generator 340 may generate the source instances 132 by changing the identifier of the knowledge entity included in the converted source data to the identifier provided by the knowledge entity selector 320 . For example, as shown on the right side of FIG. 4A , the source instance generator 340 is an example 42 of converted source data including identifiers “wdata: 43” and “wdata: 30” of the knowledge domain. A source instance 43 including identifiers “addr: 0102” and “addr: 0001” of a knowledge entity included in the knowledge base 140 may be created from the . Similarly, as shown on the right side of FIG. 4B , the source instance creation unit 340 stores the knowledge base 140 from the example 44 of the converted source data including the identifiers “wdata: 43” of the knowledge domain. It is possible to create a source instance 45 containing the identifier "addr: 0102" of the included knowledge entity. In some embodiments, as shown in FIG. 4B , the source instance generator 340 may generate the source instance 45 including the value “1545-04-28” of the example 44 of the source data as it is. can

도 5는 본 발명의 예시적 실시예에 따라 지식 개체 선정부의 동작의 예시를 나타내는 순서도이다. 도 3을 참조하여 전술된 바와 같이, 도 3의 지식 개체 선정부(320)는 변환된 소스 데이터에 포함된 지식 개체의 식별자를 맵핑 테이블(134)에서 검색할 수 있고, 검색 결과에 기초하여 지식 베이스(140)로부터 적어도 하나의 지식 개체를 추출할 수 있다. 도 5에 도시된 바와 같이, 지식 개체 선정부(320)의 동작은 복수의 단계들(S51 내지 S59)을 포함할 수 있다. 이하에서, 도 5는 도 3, 도 4a 및 도 4b를 참조하여 설명될 것이다.5 is a flowchart illustrating an example of an operation of a knowledge object selection unit according to an exemplary embodiment of the present invention. As described above with reference to FIG. 3 , the knowledge object selector 320 of FIG. 3 may search the mapping table 134 for the identifier of the knowledge object included in the converted source data, and based on the search result, the knowledge object selector 320 may search for the knowledge object identifier included in the converted source data. At least one knowledge entity may be extracted from the base 140 . As shown in FIG. 5 , the operation of the knowledge object selection unit 320 may include a plurality of steps S51 to S59 . Hereinafter, FIG. 5 will be described with reference to FIGS. 3, 4A and 4B.

단계 S51에서, 맵핑 테이블(134)에서 소스 데이터의 제1 식별자를 검색하는 동작이 수행될 수 있다. 예를 들면, 지식 개체 선정부(320)는 변환된 소스 데이터에 포함된 지식 개체의 제1 식별자(예컨대, 도 4a 및 도 4b의 "wdata: 43", "wdata: 30")를 맵핑 테이블(134)에서 검색할 수 있다.In operation S51 , an operation of searching for the first identifier of the source data in the mapping table 134 may be performed. For example, the knowledge object selector 320 may map the first identifier (eg, "wdata: 43", "wdata: 30" in FIGS. 4A and 4B) of the knowledge object included in the converted source data to the mapping table ( 134) can be found.

단계 S52에서, 제1 식별자 검색의 성공 여부가 판정될 수 있다. 예를 들면, 지식 개체 선정부(320)는 단계 S51의 제1 식별자가 맵핑 테이블(134)에 존재하는 경우 검색 성공을 판정할 수 있는 한편, 그렇지 아니한 경우, 검색 실패를 판정할 수 있다. 도 5에 도시된 바와 같이, 검색 성공이 판정된 경우 단계 S59가 후속하여 수행될 수 있는 한편, 검색 실패가 판정된 경우 단계 S53이 후속하여 수행될 수 있다.In step S52, it may be determined whether the search for the first identifier is successful. For example, when the first identifier of step S51 exists in the mapping table 134 , the knowledge object selector 320 may determine the search success, otherwise, it may determine the search failure. As shown in Fig. 5 , when it is determined that the search succeeds, step S59 may be performed subsequently, while when it is determined that the search fails, step S53 may be performed subsequently.

제1 식별자의 검색 실패가 판정된 경우, 단계 S53에서 지식 베이스(140)에서 후보 지식 개체를 검색하는 동작이 수행될 수 있다. 예를 들면, 지식 개체 선정부(320)는 제1 식별자를 가지는 지식 개체에 대응하는, 지식 베이스(140)에 포함된 지식 개체를 판정하기 위하여 지식 베이스(140)에서 적어도 하나의 후보 지식 개체를 검색할 수 있다. 일부 실시예들에서, 본원과 동일한 출원인에 의해서 출원되고 본 명세서에 전체로서 참조되어 포함되는 한국특허출원 제10-2018-0151222호에서 설명된 "유사도 계산부"와 같이, 지식 개체 선정부(320)는 소스 인스턴스와 유사한 지식 인스턴스들을 지식 베이스(140)에서 검색할 수 있고, 검색된 지식 인스턴스들의 유사도들에 기초하여 적어도 하나의 후보 지식 개체를 추출할 수 있다.When it is determined that the search for the first identifier fails, an operation for searching for a candidate knowledge entity in the knowledge base 140 may be performed in step S53 . For example, the knowledge object selection unit 320 selects at least one candidate knowledge object from the knowledge base 140 to determine the knowledge object included in the knowledge base 140 that corresponds to the knowledge object having the first identifier. You can search. In some embodiments, such as the "similarity calculation unit" described in Korean Patent Application No. 10-2018-0151222 filed by the same applicant as the present application and incorporated herein by reference in its entirety, the knowledge object selection unit 320 ) may search the knowledge base 140 for knowledge instances similar to the source instance, and extract at least one candidate knowledge entity based on similarities of the found knowledge instances.

단계 S54에서, 후보 지식 개체 검색의 성공 여부가 판정될 수 있다. 예를 들면, 지식 개체 선정부(320)는 단계 S53에서 미리 정의된 문턱값 이상의 유사도를 가지는 지식 개체가 지식 베이스(140)에서 검색되지 아니한 경우 검색 실패를 판정할 수 있는 한편, 단계 S53에서 미리 정의된 문턱값 이상의 유사도를 가지는 적어도 하나의 지식 개체가 지식 베이스(140)에서 검색된 겨우 검색 성공을 판정할 수 있다. 도 5에 도시된 바와 같이, 검색 성공이 판정된 경우 단계 S57이 후속하여 수행될 수 있는 한편, 검색 실패가 판정된 경우 단계 S55가 후속하여 수행될 수 있다.In step S54, it may be determined whether the candidate knowledge entity search succeeds. For example, the knowledge object selection unit 320 may determine a search failure when a knowledge object having a similarity greater than or equal to a predefined threshold value is not searched in the knowledge base 140 in step S53, while in step S53, the knowledge object selection unit 320 may determine the search failure in advance. When at least one knowledge entity having a similarity greater than or equal to a defined threshold is found in the knowledge base 140 , the search success may be determined. As shown in Fig. 5 , when it is determined that the search succeeds, step S57 may be performed subsequently, while when it is determined that the search fails, step S55 may be performed subsequently.

후보 지식 개체의 검색 실패가 판정된 경우, 단계 S55에서 신규 식별자를 생성하는 동작이 수행될 수 있다. 예를 들면, 지식 개체 선정부(320)는 제1 식별자에 대응하는 지식 개체와 유사한 지식 개체가 지식 베이스(140)에서 검색되지 아니하는 경우, 제1 식별자에 대응하는 지식 개체는 지식 베이스(140)에 없는 신규 지식 개체로 판정할 수 있다. 이에 따라, 지식 개체 선정부(320)는 지식 베이스(140)의 식별자 체계를 참조하여 제1 식별자에 대응하는 신규 식별자를 생성할 수 있다.When it is determined that the search for the candidate knowledge entity fails, an operation for generating a new identifier may be performed in step S55. For example, when a knowledge object similar to the knowledge object corresponding to the first identifier is not found in the knowledge base 140 , the knowledge object selector 320 may determine that the knowledge object corresponding to the first identifier is the knowledge base 140 . ) can be determined as a new knowledge entity that is not in Accordingly, the knowledge entity selector 320 may generate a new identifier corresponding to the first identifier with reference to the identifier system of the knowledge base 140 .

다른 한편으로, 후보 지식 개체의 검색 성공이 판정된 경우, 단계 S57에서 단독 후보 지식 개체가 검색되었는지 여부가 판정될 수 있다. 예를 들면, 지식 개체 선정부(320)는 미리 정의된 문턱값 이상의 유사도를 가지는 지식 개체가 하나만 검색되었는지 여부를 판정할 수 있다. 일부 실시예들에서, 지식 개체 선정부(320)는 미리 정의된 문턱값 이상의 유사도를 가지는 2이상의 후보 지식 개체들이 검색된 경우, 최고 유사도를 가지는 후보 지식 개체 및 2번째 큰 유사도를 가지는 후보 지식 개체 사이 유사도 차를 미리 정의된 기준치이상인 경우, 최고 유사도를 가지는 후보 지식 개체가 단독으로 검색된 것으로 판정할 수도 있다. 도 5에 도시된 바와 같이, 단독 후보 지식 개체가 검색된 경우 단계 S56이 후속하여 수행될 수 있는 한편, 2이상의 후보 지식 개체들이 검색된 경우, 단계 S58이 후속하여 수행될 수 있다.On the other hand, if the search success of the candidate knowledge entity is determined, it may be determined in step S57 whether a single candidate knowledge entity has been searched. For example, the knowledge entity selector 320 may determine whether only one knowledge entity having a similarity greater than or equal to a predefined threshold is found. In some embodiments, when two or more candidate knowledge entities having similarity greater than or equal to a predefined threshold are searched for, the knowledge entity selector 320 may be configured to determine between the candidate knowledge entity having the highest similarity and the candidate knowledge entity having the second highest similarity. When the similarity difference is greater than or equal to a predefined reference value, it may be determined that the candidate knowledge entity having the highest similarity is independently searched. As shown in FIG. 5 , when a single candidate knowledge entity is searched, step S56 may be subsequently performed, while if two or more candidate knowledge entities are searched, step S58 may be subsequently performed.

2이상의 후보 지식 개체들이 검색된 경우, 단계 S58에서 후보 지식 개체들을 저장하는 동작이 수행될 수 있다. 예를 들면, 지식 개체 선정부(320)는 메타데이터 저장소(130)에 후보 지식 개체들을 저장할 수 있고, 이에 따라 도 2를 참조하여 전술된 바와 같이, 메타데이터 저장소(130)는 후보 지식 개체들(136)을 포함할 수 있다. 제1 식별자의 지식 개체에 대응하는 지식 베이스(140)의 지식 개체를 유사도를 통해서 자동으로 판정하기 어려운 경우, 후보 지식 개체들(136)이 메타데이터 저장소(130)에 저장될 수 있고, 메타데이터 저장소(130)에 저장된 후보 지식 개체들(136)은 도 1의 데이터 관리부(118) 및 작업 선택부(112)를 통해서 관리자(10)에 제공될 수 있다. 관리자(10)는 큐레이션을 통해서 후보 지식 개체들(136) 중 제1 식별자의 지식 개체에 대응하는 지식 개체를 선정할 수 있고, 선정된 지식 개체에 따라 데이터 관리부(118)는 맵핑 테이블(134)을 갱신할 수 있다. 도 5에 도시된 바와 같이, 단계 S58에 후속하여 단계 S51이 다시 수행될 수 있고, 이에 따라 변환된 소스 데이터에 포함된 다른 지식 개체의 식별자가 제1 식별자로서 맵핑 테이블(134)에서 검색될 수 있다.When two or more candidate knowledge entities are found, an operation of storing the candidate knowledge entities may be performed in step S58. For example, the knowledge entity selector 320 may store candidate knowledge entities in the metadata repository 130 , and accordingly, as described above with reference to FIG. 2 , the metadata repository 130 may store candidate knowledge entities. (136). When it is difficult to automatically determine the knowledge entity of the knowledge base 140 corresponding to the knowledge entity of the first identifier through the similarity, the candidate knowledge entities 136 may be stored in the metadata storage 130 , and the metadata The candidate knowledge objects 136 stored in the storage 130 may be provided to the manager 10 through the data management unit 118 and the task selection unit 112 of FIG. 1 . The manager 10 may select a knowledge entity corresponding to the knowledge entity of the first identifier from among the candidate knowledge entities 136 through curation, and the data management unit 118 according to the selected knowledge entity generates the mapping table 134 . ) can be updated. 5 , step S51 may be performed again following step S58, and accordingly, the identifier of another knowledge entity included in the converted source data may be retrieved from the mapping table 134 as the first identifier. have.

단계 S56에서, 맵핑 테이블(134)을 갱신하는 동작이 수행될 수 있다. 예를 들면, 지식 개체 선정부(320)는, 단계 S55에서 생성된 신규 식별자 또는 단계 S53에서 검색된 단독 후보 지식 개체의 식별자를, 제2 식별자로서 제1 식별자와 맵핑하여 맵핑 테이블(134)에 추가할 수 있다. 이에 따라, 갱신된 맵핑 테이블(134)은 제1 식별자 및 제1 식별자에 대응하는 제2 식별자를 포함할 수 있다.In step S56, an operation of updating the mapping table 134 may be performed. For example, the knowledge entity selector 320 maps the new identifier generated in step S55 or the identifier of the single candidate knowledge entity searched for in step S53 with the first identifier as the second identifier, and adds it to the mapping table 134 . can do. Accordingly, the updated mapping table 134 may include a first identifier and a second identifier corresponding to the first identifier.

단계 S59에서, 제1 식별자에 대응하는 제2 식별자를 획득하는 동작이 수행될 수 있다. 예를 들면, 지식 개체 선정부(320)는, 단계 S52에서 제1 식별자의 검색이 성공하거나(예컨대, 도 4a 및 도 4b의 예시들), 단계 S56에서 맵핑 테이블(134)이 갱신된 후, 맵핑 테이블(134)로부터 제1 식별자에 대응하는 제2 식별자를 획득할 수 있다. 지식 개체 선정부(320)는 획득된 제2 식별자를 소스 인스턴스 생성부(340)에 제공할 수 있다.In step S59, an operation of obtaining a second identifier corresponding to the first identifier may be performed. For example, the knowledge object selection unit 320 may perform a successful search for the first identifier in step S52 (eg, examples of FIGS. 4A and 4B ) or after the mapping table 134 is updated in step S56, A second identifier corresponding to the first identifier may be obtained from the mapping table 134 . The knowledge entity selector 320 may provide the obtained second identifier to the source instance generator 340 .

도 6은 본 발명의 예시적 실시예에 따른 지식 적용부의 예시를 나타내는 블록도이다. 도 2를 참조하여 전술된 바와 같이, 지식 적용부(600)는 지식 통합부(예컨대, 도 2의 220)에 의해서 생성된 소스 인스턴스들(132)을 정제할 수 있고, 정제된 소스 인스턴스들을 지식 베이스(140)에 통합할 수 있다. 도 6에 도시된 바와 같이, 지식 적용부(600)는 소스 인스턴스 후처리부(620) 및 인스턴스 비교부(640)를 포함할 수 있고, 이하에서 도 6은 도 2를 참조하여 설명될 것이다.6 is a block diagram illustrating an example of a knowledge application unit according to an exemplary embodiment of the present invention. As described above with reference to FIG. 2 , the knowledge application unit 600 may refine the source instances 132 generated by the knowledge integrator (eg, 220 of FIG. 2 ), and convert the refined source instances into knowledge. It can be integrated into the base 140 . As shown in FIG. 6 , the knowledge application unit 600 may include a source instance post-processing unit 620 and an instance comparison unit 640 , and FIG. 6 will be described below with reference to FIG. 2 .

소스 인스턴스 후처리부(620)는 지식 베이스(140)에 포함된 지식 인스턴스들의 형식에 기초하여 소스 인스턴스들(132)을 후처리할 수 있다. 예를 들면, 도 4b의 예시에서, "이순신 / 생일 / 1545-04-28"에 대응하는 소스 인스턴스(45)에서, 트리플의 목적어로서 시간 정보를 나타내는 "1545-04-28"은 소스 도메인에서 날짜를 나타내기 위한 형식 "YYYY-MM-DD"에 부합할 수 있다. 소스 인스턴스 후처리부(620)는 "1545-04-28"을 후처리함으로써 시간 정보를 구체화할 수 있다. 예를 들면, 소스 인스턴스 후처리부(620)는 "1545-04-28"로부터 "tm:1545-04-28", "tm:1545-04-28 year 1545", "tm:1545-04-28 month 04", "tm:1545-04-28 day 28"을 생성할 수 있다. 비제한적인 예시로서, 지식 베이스(140)가 지식 인스턴스에서 날짜를 나타내기 위한 형식으로서 "MON DD, YYYY"를 가지는 경우, 소스 인스턴스 후처리부(620)는, 구체화된 시간 정보에 기초하여 "1545-04-28"를 "APR 28, 1545"로서 변경함으로써 소스 인스턴스(45)를 후처리할 수 있다. 일부 실시예들에서, 소스 인스턴스 후처리부(620)는, 예컨대 전술된 예시와 같이 날짜 형식을 변환하기 위한 후처리 모듈을 포함하는 복수의 후처리 모듈들을 참조함으로써 소스 인스턴스들(132)을 후처리할 수 있다. 본 명세서에서, 후처리된 소스 인스턴스들은 정제된 소스 인스턴스들로서 지칭될 수도 있다.The source instance post-processing unit 620 may post-process the source instances 132 based on the types of knowledge instances included in the knowledge base 140 . For example, in the example of FIG. 4B , in the source instance 45 corresponding to "Sun-Shin Yi / Birthday / 1545-04-28", "1545-04-28" indicating time information as the object of the triple is in the source domain. May conform to the format "YYYY-MM-DD" for representing a date. The source instance post-processing unit 620 may materialize time information by post-processing “1545-04-28”. For example, the source instance post-processing unit 620 may run from “1545-04-28” to “tm:1545-04-28”, “tm:1545-04-28 year 1545”, “tm:1545-04-28” month 04", "tm:1545-04-28 day 28". As a non-limiting example, when the knowledge base 140 has "MON DD, YYYY" as a format for indicating the date in the knowledge instance, the source instance post-processing unit 620 may display "1545" based on the specified time information. The source instance 45 can be post-processed by changing "-04-28" to "APR 28, 1545". In some embodiments, the source instance post-processing unit 620 post-processes the source instances 132 by referencing a plurality of post-processing modules including, for example, a post-processing module for converting a date format as in the example described above. can do. Herein, post-processed source instances may be referred to as refined source instances.

인스턴스 비교부(640)는 소스 인스턴스 후처리부(620)로부터 후처리된(혹은 정제된) 소스 인스턴스를 수신할 수 있고, 후처리된 소스 인스턴스를 지식 베이스(140)에 포함된 지식 인스턴스들과 비교할 수 있다. 인스턴스 비교부(640)는 비교 결과에 따라 후처리된 소스 인스턴스에 대응하는 지식을 지식 베이스(140)에 추가할지 판정할 수 있고, 후처리된 소스 인스턴스를 선택적으로 지식 베이스(140)에 통합할 수 있다. 일부 실시예들에서, 한국특허출원 제10-2018-0151222호에서 설명된 "유사도 계산부"와 같이, 인스턴스 비교부(640)는 후처리된 소스 인스턴스와 유사한 지식 인스턴스들을 지식 베이스(140)에서 검색할 수 있고, 검색된 지식 인스턴스들의 유사도들에 기초하여 후처리된 소스 인스턴스의 추가 여부를 판정할 수 있다. 예를 들면, 인스턴스 비교부(640)는, 미리 정의된 문턱값 이상의 유사도를 가지는 지식 인스턴스가 지식 베이스(140)에서 검색된 경우 후처리된 소스 인스턴스를 무시할 수 있는 한편, 문턱값 미만의 유사도를 가지는 지식 인스턴스들만이 지식 베이스(140)에서 검색된 경우 후처리된 소스 인스턴스를 지식 베이스(140)에 추가할 수 있다. The instance comparison unit 640 may receive the post-processed (or refined) source instance from the source instance post-processor 620 , and compare the post-processed source instance with knowledge instances included in the knowledge base 140 . can The instance comparison unit 640 may determine whether to add the knowledge corresponding to the post-processed source instance to the knowledge base 140 according to the comparison result, and selectively integrate the post-processed source instance into the knowledge base 140 . can In some embodiments, like the "similarity calculator" described in Korean Patent Application No. 10-2018-0151222, the instance comparison unit 640 compares knowledge instances similar to the post-processed source instances in the knowledge base 140 . may be searched, and it may be determined whether to add a post-processed source instance based on similarities of the retrieved knowledge instances. For example, the instance comparison unit 640 may ignore the post-processed source instance when a knowledge instance having a similarity greater than or equal to a predefined threshold is searched for in the knowledge base 140 , while having a similarity less than the threshold. If only the knowledge instances are retrieved from the knowledge base 140 , the post-processed source instance may be added to the knowledge base 140 .

도 7은 본 발명의 예시적 실시예에 따른 데이터 변환 시스템의 예시를 나타내는 블록도이다. 도 1을 참조하여 전술된 바와 같이, 분산 시스템은 복수의 데이터 변환 시스템들을 포함할 수 있고, 복수의 데이터 변환 시스템들 중 하나로서 도 7의 데이터 변환 시스템(700)은 도 1의 데이터 통합 시스템(110')으로부터 제공되는 지시에 기초하여 소스 데이터, 즉 지식 도메인(150)에 포함된 지식 데이터를 변환할 수 있다. 도 7에 도시된 바와 같이, 데이터 변환 시스템(700)은 변환 엔진 관리부(720), 복수의 변환 모듈들(740), 변환 데이터 저장소(760) 및 복수의 변환 엔진들(780)을 포함할 수 있다.Fig. 7 is a block diagram showing an example of a data conversion system according to an exemplary embodiment of the present invention. As described above with reference to FIG. 1 , the distributed system may include a plurality of data conversion systems, and the data conversion system 700 of FIG. 7 as one of the plurality of data conversion systems is the data integration system ( 110 ′), source data, that is, knowledge data included in the knowledge domain 150 may be converted. As shown in FIG. 7 , the data conversion system 700 may include a conversion engine manager 720 , a plurality of conversion modules 740 , a conversion data store 760 , and a plurality of conversion engines 780 . have.

복수의 변환 모듈들(740) 각각은 지식 도메인(150)에 따라 혹은 지식 도메인(150)에 포함된 지식 데이터의 그룹에 따라 요구되는 변환 규칙을 정의할 수 있다. 일부 실시예들에서, 데이터 변환 시스템(700)은 특정 지식 도메인에 제한되지 아니하고서 변환 동작을 수행할 수 있고, 이를 위하여 복수의 지식 도메인들에 대응하는 복수의 변환 모듈들(740)을 포함할 수 있다. 또한, 일부 실시예들에서, 데이터 변환 시스템(700)은 단일 지식 도메인에 포함된 지식 데이터를 변환하는 것으로 제한될 수 있고, 단일 지식 도메인에 포함되는 지식 데이터의 그룹들에 각각 대응하는 복수의 변환 모듈들(740)을 포함할 수도 있다. 일부 실시예들에서, 복수의 변환 모듈들(740)은 데이터 통합 시스템(110')으로부터 제공될 수 있고, 변환 엔진 관리부(720)에 의해서 데이터 변환 시스템(700) 내부에 저장될 수도 있다.Each of the plurality of transformation modules 740 may define a required transformation rule according to the knowledge domain 150 or a group of knowledge data included in the knowledge domain 150 . In some embodiments, the data transformation system 700 may perform a transformation operation without being limited to a specific knowledge domain, and for this purpose, it may include a plurality of transformation modules 740 corresponding to a plurality of knowledge domains. can Further, in some embodiments, the data transformation system 700 may be limited to transforming knowledge data included in a single knowledge domain, and a plurality of transformations each corresponding to groups of knowledge data included in a single knowledge domain. Modules 740 may be included. In some embodiments, the plurality of transformation modules 740 may be provided from the data integration system 110 ′, and may be stored in the data transformation system 700 by the transformation engine manager 720 .

변환 엔진 관리부(720)는 데이터 통합 시스템(110')으로부터 지시를 수신할 수 있고, 지시에 포함된 정보에 기초하여 데이터 변환 동작을 제어할 수 있다. 일부 실시예들에서, 데이터 통합 시스템(110')으로부터의 지시는 지식 도메인(150)에 대한 식별 정보(예컨대, URL)를 포함할 수 있다. 변환 엔진 관리부(720)는 지시에 포함된 지식 도메인(150)의 식별 정보에 기초하여 복수의 변환 모듈들(740) 중 적어도 하나의 변환 모듈을 선택할 수 있고, 선택된 변환 모듈을 복수의 변환 엔진들(780) 중 적어도 하나에 제공하고, 변환 동작을 트리거할 수 있다. 또한, 일부 실시예들에서, 데이터 통합 시스템(110')으로부터의 지시는 지식 도메인(150)에 포함된 소스 데이터 중 변환의 대상이 되는 소스 데이터의 범위를 나타내는 소스 데이터의 식별자들을 포함할 수도 있다. 변환 엔진 관리부(720)는 소스 데이터의 식별자들에 기초하여 소스 데이터가 포함되는 소스 데이터의 그룹을 인식할 수 있고, 인식된 그룹에 따라 복수의 변환 모듈들(740) 중 적어도 하나의 변환 모듈을 선택할 수 있고, 선택된 변환 모듈을 복수의 변환 엔진들(780) 중 적어도 하나에 제공하고, 변환 동작을 트리거할 수 있다. 변환 엔진 관리부(720)는 변환 데이터 저장소(760)에 저장된 변환된 소스 데이터를 데이터 통합 시스템(110')에 제공할 수 있다. 또한, 변환 엔진 관리부(720)는 복수의 변환 엔진들(780)의 작업들을 스케줄링할 수도 있다.The transformation engine management unit 720 may receive an instruction from the data integration system 110 ′, and may control a data transformation operation based on information included in the instruction. In some embodiments, the indication from data integration system 110 ′ may include identifying information (eg, URL) for knowledge domain 150 . The transformation engine manager 720 may select at least one transformation module from among the plurality of transformation modules 740 based on the identification information of the knowledge domain 150 included in the instruction, and convert the selected transformation module to a plurality of transformation engines. It may be provided to at least one of 780 and trigger a conversion operation. Also, in some embodiments, the instruction from the data integration system 110 ′ may include identifiers of source data indicating a range of source data to be converted among source data included in the knowledge domain 150 . . The transformation engine management unit 720 may recognize a group of source data including source data based on the identifiers of the source data, and select at least one transformation module among the plurality of transformation modules 740 according to the recognized group. may be selected, and may provide the selected transformation module to at least one of the plurality of transformation engines 780 and trigger a transformation operation. The conversion engine management unit 720 may provide the converted source data stored in the conversion data storage 760 to the data integration system 110 ′. In addition, the transformation engine manager 720 may schedule tasks of the plurality of transformation engines 780 .

복수의 변환 엔진들(780) 각각은 지식 도메인(150)에 액세스할 수 있고, 변환 엔진 관리부(720)로부터 제공된 변환 모듈에 기초하여 지식 도메인(150)에 포함된 소스 데이터를 변환할 수 있다. 복수의 변환 엔진들(780)은 병렬적으로 변환을 수행할 수 있고, 이를 위하여 변환 엔진 관리부(720)는 상이한 소스 데이터가 상이한 변환 엔진들에 의해서 각각 변환되도록 복수의 변환 엔진들(780)을 제어할 수 있다. 일부 실시예들에서, 지식 도메인(150)은 원격의 시스템(예컨대 서버)일 수 있고, 복수의 변환 엔진들(780) 각각은 변환 엔진 관리부(720)로부터 제공되는 식별자 리스트에 기초하여 네트워크를 통해서 지식 도메인(150)에 액세스함으로써 소스 데이터를 수집하여 변환할 수 있다. 또한, 일부 실시예들에서, 지식 도메인(150)은 원격의 시스템으로부터 수집된 소스 데이터를 저장하는 데이터 변환 시스템(700)의 로컬 저장소일 수 있고, 변환 엔진 관리부(720)는 로컬 저장소의 영역들을 복수의 변환 엔진들(780)에 각각 지정해줄 수 있다. 복수의 변환 엔진들(780)은 변환된 소스 데이터를 변환 데이터 저장소(760)에 저장할 수 있다.Each of the plurality of transformation engines 780 may access the knowledge domain 150 and may transform source data included in the knowledge domain 150 based on a transformation module provided from the transformation engine manager 720 . A plurality of transformation engines 780 may perform transformation in parallel, and for this purpose, the transformation engine manager 720 operates a plurality of transformation engines 780 such that different source data is transformed by different transformation engines, respectively. can be controlled In some embodiments, the knowledge domain 150 may be a remote system (eg, a server), and each of the plurality of transformation engines 780 is configured through a network based on an identifier list provided from the transformation engine manager 720 . By accessing the knowledge domain 150, source data can be collected and transformed. Also, in some embodiments, the knowledge domain 150 may be a local storage of the data transformation system 700 that stores source data collected from a remote system, and the transformation engine manager 720 stores areas of the local storage. Each of the plurality of conversion engines 780 may be designated. The plurality of transformation engines 780 may store the transformed source data in the transformation data storage 760 .

이상에서와 같이 도면과 명세서에서 예시적인 실시예들이 개시되었다. 본 명세서에서 특정한 용어를 사용하여 실시예들을 설명되었으나, 이는 단지 본 발명의 기술적 사상을 설명하기 위한 목적에서 사용된 것이지 의미 한정이나 특허청구범위에 기재된 본 발명의 범위를 제한하기 위하여 사용된 것은 아니다. 그러므로 본 기술분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다.Exemplary embodiments have been disclosed in the drawings and specification as described above. Although the embodiments have been described using specific terms in the present specification, these are only used for the purpose of explaining the technical idea of the present invention and not used to limit the meaning or the scope of the present invention described in the claims . Therefore, it will be understood by those skilled in the art that various modifications and equivalent other embodiments are possible therefrom. Accordingly, the true technical protection scope of the present invention should be defined by the technical spirit of the appended claims.

Claims

A data integration system configured to access a plurality of remote data transformation systems for a distributed system for knowledge integration comprising:
a job selection unit configured to receive a request from a manager and to decode the request;
a conversion management unit configured to instruct the at least one data conversion system to convert the source data in response to the conversion request of the manager, and to receive the converted source data from the at least one data conversion system; and
configured to access a metadata repository that stores metadata used for knowledge integration, and in response to an integration request of the administrator, the transformed source data based on a mapping table between knowledge domains stored in the metadata repository a data integration unit configured to integrate the
The data integrator generates a source instance based on the knowledge entity included in the mapping table and the transformed source data, extracts at least one knowledge entity from the knowledge base based on the knowledge entity, and and integrating the source instance into the knowledge base based on a similarity between the extracted knowledge entity and the at least one knowledge entity included in the source data.

The method according to claim 1,
The conversion request, the data integration system, characterized in that it includes identification information of the knowledge domain including the source data and at least one identifier of the source data included in the knowledge domain.

The method according to claim 1,
The data integration unit,
a knowledge integrator configured to generate the source instance based on at least one first identifier of at least one knowledge entity included in the mapping table and the transformed source data; and
and a knowledge application unit configured to refine the source instance and integrate the refined source instance into the knowledge base.

4. The method according to claim 3,
The knowledge integration unit,
a knowledge entity selector configured to search for the at least one first identifier in the mapping table and extract the at least one knowledge entity based on a search result; and
and a source instance generator configured to generate a source instance from the transformed source data by changing the at least one first identifier to the at least one second identifier of the at least one knowledge entity extracted. system.

5. The method according to claim 4,
The knowledge entity selection unit is configured to extract at least one candidate knowledge entity from the knowledge base based on the converted source data when the at least one first identifier is not found in the mapping table. integrated system.

4. The method according to claim 3,
The knowledge application unit,
a source instance post-processing unit configured to post-process the source instance based on the types of knowledge instances included in the knowledge base; and
and an instance comparison unit configured to selectively integrate the post-processed source instance into the knowledge base by comparing the post-processed source instance with knowledge instances included in the knowledge base.

The method according to claim 1,
The data integration system is
A data management unit configured to provide the data generated by the conversion management unit and the data integration unit to the manager in response to the manager's inquiry request, and to update the metadata in response to the manager's update request Data integration system, characterized in that.

As a distributed system for knowledge integration,
a plurality of data conversion systems; and
a data integration system configured to integrate transformed source data provided from the plurality of remote data transformation systems into a knowledge base based on the inter-knowledge domain mapping table;
Each of the plurality of data conversion systems,
a plurality of transformation engines configured to transform source data included in a knowledge domain based on a structure of knowledge instances included in the knowledge base in response to an instruction provided from the data integration system; and
an engine management unit configured to select one transformation module from among a plurality of transformation modules based on the instruction and to provide the selected transformation module to at least one of the plurality of transformation engines;
Each of the plurality of transformation modules defines a transformation rule required according to a knowledge domain or a group of knowledge data included in the knowledge domain.

delete

9. The method of claim 8,
The instructions are
The distributed system for knowledge integration, comprising at least one of identification information of the knowledge domain, at least one identifier of source data, and an index of a transformation module.

9. The method of claim 8,
each of the plurality of data transformation systems includes a transformation data store configured to store the transformed source data;
and each of the plurality of transformation engines is configured to collect source data from the knowledge domain and store the transformed source data in the transformation data store.