KR101553397B1

KR101553397B1 - Apparatus and method for matching large-scale biomedical ontology

Info

Publication number: KR101553397B1
Application number: KR1020130145615A
Authority: KR
Inventors: 이승룡; 비랄 아민 무하마드
Original assignee: 경희대학교 산학협력단
Priority date: 2013-11-27
Filing date: 2013-11-27
Publication date: 2015-09-15
Also published as: KR20150061457A

Abstract

본 발명에 따른 라지 스케일 온톨로지 매칭 장치는 수신된 후보 온톨로지를 하나 이상의 온톨로지 서브셋으로 분류하여 온톨로지 서브셋을 생성하는 전처리부, 생성된 온톨로지 서브셋을 분산 알고리즘을 적용하여 분할하고, 분할된 온톨로지 서브셋에 매칭 알고리즘을 적용하여 매칭 스레드를 생성하며, 생성된 매칭 스레드를 참여 노드의 개별 코어에 전달하는 분산 처리부 및 개별 코어에서 매칭 스레드에 기초하여 매칭 연산을 수행하여 생성된 매칭 결과를 수집 및 합산하여 온톨로지 매핑을 생성하는 집산부를 포함한다. A large scale ontology matching apparatus according to the present invention includes a preprocessing unit for generating a subset of an ontology by classifying received candidate ontologies into at least one ontology subset, a subset of the generated ontology subset using a distribution algorithm, And transmits the generated matching threads to the individual cores of the participating nodes, and a matching processing unit for collecting and summing the matching results generated by performing matching operations based on the matching threads in the individual cores, And a collecting section for generating the collecting section.

Description

[0001] APPARATUS AND METHOD FOR MATCHING LARGE-SCALE BIOMEDICAL ONTOLOGY [0002]

본 발명은 라지 스케일 생물의학 분야의 온톨로지 매칭에 관한 기술로서, 보다 상세하게는 온톨로지 매칭의 병렬 처리 및 분산 처리에 관한 기술이다.The present invention relates to ontology matching in a large scale biomedical field, and more particularly to parallel processing and distributed processing of ontology matching.

최근 들어, 생물의학(Biomedical) 분야의 정보 처리 및 정보 관리에 있어서 시맨틱 웹(Semantic Web) 기술이 적용되면서, 생물의학 시스템에서 많은 이익을 가져오고 있다. 특히, 생물의학 시스템에서 온톨로지(Ontology) 기술은 생물의학 정보의 표준화(Standardization), 지식 공유(Knowledge Sharing) 및 재사용능력(Reusability) 활용을 위해 많이 이용된다. 결과적으로, 유전자 온톨로지(Gene Ontology, GO), 국립 암 연구소 유의어사전(National Cancer Institute Thesaurus, NCI), 해부학 토대 모델(Foundation Model of Anatomy, FMA) 및 SNOMED CT(Systemized Nomenclature of Medicine Clinical Terms)와 같은 형태로 생물의학 분야에 온톨로지는 적용되고 있다. 이와 같이, 생물의학 분야에 보다 효과적으로 온톨로지를 적용하기 위해 여러 연구가 진행되고 있으며, 이미 개발된 위와 같은 온톨로지에 지속성을 제공하기 위한 연구 또한 진행되고 있다.Recently, Semantic Web technology has been applied to information processing and information management in the biomedical field, which has brought a lot of benefits to the biomedical system. In particular, ontology technology in biomedical systems is widely used for standardization of biomedical information, knowledge sharing, and reusability. As a result, it is possible to use the Gene Ontology (GO), the National Cancer Institute Thesaurus (NCI), the Foundation Model of Anatomy (FMA), and the Systemic Nomenclature of Medicine Clinical Terms Ontology is applied to biomedical field in the form. In this way, various studies are being conducted to apply ontology more effectively to the biomedical field, and researches are being conducted to provide continuity to the already developed ontology.

생물의학 분야의 온톨로지는 거대한 규모(Large Scale)을 가지기 때문에 매우 복잡하며, 이는 통합(Integration) 및 정보처리 상호 운용(Interoperability)에 장애 요인으로 작용한다. OBO(Open Biomedical Ontologies) 컨소시엄은 온톨로지 진화를 위한 도입 전략(Introducing Strategy)을 통해 이러한 장애 요인을 해결하는데 노력하고 있다. 이러한 생물의학 온톨로지는 오버래핑(Overlapping) 정보를 포함한다. 오버래핑 정보는 생물의학 시스템의 통합 및 정보 처리 상호 운용을 위해 필요한 정보이다. 온톨로지에 있어서, 서로 다른 후보 온톨로지(Candidate Ontology) 사이의 관련성은 매핑(Mapping) 또는 얼라인먼트(Alignment)라고 불리운다. 후보 온톨로지는 서로 관의 관계를 정립하는 매핑 과정을 위한 온톨로지이며, 서로 다른 온톨로자 사이의 관계를 정립하는 매핑 발견의 과정은 온톨로지 매칭(Ontology Matching)이라는 용어로 명명된다.Ontology in the biomedical field is very complex because it has a large scale, which is an obstacle to integration and interoperability. OBO (Open Biomedical Ontologies) consortium is trying to solve these obstacles through the Introducing Strategy for ontology evolution. These biomedical ontologies include overlapping information. The overlapping information is the information necessary for the integration of biomedical systems and the interoperability of information processing. In ontology, the relationship between different candidate ontologies is called mapping or alignment. The candidate ontology is an ontology for the mapping process that establishes the relationship between the pipes. The process of mapping discovery, which establishes the relationship between the different ontologists, is called ontology matching.

거대한 규모의 생물의학 온톨로지 상에서 온톨로지 매핑을 발견하기 위한 온톨로지 매칭 과정은 2차의 연산 복잡도와 함께 연산 과도한 연산 작업을 필요로 한다. 2010년 Data Integration in the Life Sciences에 개재된 "On Matching Large Life Science Ontologies in Parallel"을 살펴보면, 온톨로지 매칭은 두 후보 온톨로지의 카테시안 곱(Cartesian Product)으로 계산된다. 이 작업은 리소스 기반의 매칭 알고리즘을 요구한다. 이러한 매핑 과정에서 2차의 연산 복잡도에 의한 과도한 연산 작업은 지연(Delay)를 발생시킬 수 있으며, 이러한 매핑 결과의 지연은 생물의학 시스템을 위한 온톨로지 매핑을 처리 요구 시간 이내에 처리하는 것을 효과적이지 못하게 한다.The ontology matching process for finding the ontology mapping on a large scale biomedical ontology requires computational complexity and computational overhead. Ontology matching is computed as a Cartesian product of two candidate ontologies, as shown in "On Matching Large Life Science Ontologies in Parallel" in 2010 Data Integration in the Life Sciences. This task requires a resource-based matching algorithm. In this mapping process, an excessive computation operation due to the second-order computational complexity may cause a delay, and the delay of the mapping result ineffectively processing the ontology mapping for the biomedical system within the processing request time .

Anika Groβ, Michael Hartung, Toralf Kirsten, Erhard Rahm. "On Matching Large Life Science Ontologies in Parallel". Data Integration in the Life Sciences, Lecture Notes in Computer Science Volume 6254, 2010, pp 35-49.Anika Groß, Michael Hartung, Toralf Kirsten, Erhard Rahm. "On Matching Large Life Science Ontologies in Parallel". Data Integration in the Life Sciences, Lecture Notes in Computer Science Volume 6254, 2010, pp 35-49.

본 발명이 해결하고자 하는 과제는 라지 스케일 생물의학 분야의 온톨로지에 있어서, 온톨로지 매핑을 위한 온톨로지 매칭 과정에서 다중 코어 상에서 성능 향상을 위한 병렬 매칭을 수행하는 온톨로지 매칭 장치 및 방법을 제공하는 것이다.An object of the present invention is to provide an ontology matching apparatus and method for performing parallel matching for improving performance on multiple cores in an ontology matching process for ontology mapping in an ontology in a large scale biomedical field.

본 발명에 따른 라지 스케일 온톨로지 매칭 장치는 수신된 후보 온톨로지를 하나 이상의 온톨로지 서브셋으로 분류하여 온톨로지 서브셋을 생성하는 전처리부, 생성된 온톨로지 서브셋을 분산 알고리즘을 적용하여 분할하고, 분할된 온톨로지 서브셋에 매칭 알고리즘을 적용하여 매칭 스레드를 생성하며, 생성된 매칭 스레드를 참여 노드의 개별 코어에 전달하는 분산 처리부 및 개별 코어에서 매칭 스레드에 기초하여 매칭 연산을 수행하여 생성된 매칭 결과를 수집 및 합산하여 온톨로지 매핑을 생성하는 집산부를 포함한다. 그리고, 직렬화된 온톨로지 서브셋을 저장하며, 수신된 후보 온톨로지와 동일한 후보 온톨로지가 수신되면, 기 저장된 직렬화된 온톨로지 서브셋을 전처리부로 제공하는 온톨로지 저장부를 더 포함할 수 있다.A large scale ontology matching apparatus according to the present invention includes a preprocessing unit for generating a subset of an ontology by classifying received candidate ontologies into at least one ontology subset, a subset of the generated ontology subset using a distribution algorithm, And transmits the generated matching threads to the individual cores of the participating nodes, and a matching processing unit for collecting and summing the matching results generated by performing matching operations based on the matching threads in the individual cores, And a collecting section for generating the collecting section. The ontology storage unit may further include an ontology storage unit that stores the serialized ontology subset and provides the pre-stored serialized ontology subset to the preprocessor when the same candidate ontology as the received candidate ontology is received.

전처리부는 생성된 온톨로지 서브셋을 바이너리(Binary) 형태로 직렬화(Serialization)하여 직렬화된 온톨로지 서브셋을 생성할 수 있으며, 온톨로지 저장부로부터 수신된 직렬화된 온톨로지 서브셋을 역직렬화(De-serealization)를 통해 재구성하여 온톨로지 서브셋을 생성할 수 있다. 분산 처리부에서 생성되는 매칭 스레드는 하나 이상의 매칭 요청(Matching Request), 하나 이상의 매칭 작업(Matching Job) 및 하나 이상의 매칭 태스크(Matching Task)를 포함할 수 있다. 매칭 요청은 참여 노드 각각에 일대일로 대응하며, 매칭 작업은 매칭 요청에 일대일로 대응하는 참여 노드에 구비된 개별 코어에 일대일로 대응한다. 하나 이상의 매칭 요청의 최대 개수는 참여 노드의 수를 넘지 않으며, 하나 이상의 매칭 작업의 최대 개수는 참여 노드 전체에 구비된 개별 코어의 수를 넘지 않는다. 그리고, 매칭 작업은 다른 매칭 작업 및 다른 참여 노드에서 동작하는 매칭 요구에 독립적으로 동작한다.The preprocessor can generate a serialized ontology subset by serializing the created ontology subset to a binary form and reconstruct the serialized ontology subset received from the ontology storage through de-serialization You can create an ontology subset. The matching thread generated in the distributed processing unit may include one or more matching requests, one or more matching jobs, and one or more matching tasks. The matching request corresponds to each of the participating nodes on a one-to-one basis, and the matching operation corresponds one-to-one to the individual cores provided in the corresponding participating node on a one-to-one basis in the matching request. The maximum number of one or more matching requests does not exceed the number of participating nodes and the maximum number of one or more matching operations does not exceed the number of individual cores provided throughout the participating nodes. The matching operation then operates independently of other matching operations and matching requests that operate on other participating nodes.

분산 처리부는 후보 온톨로지가 수신되면, 참여 노드의 개수 및 참여 노드에 구비된 개별 코어의 개수를 확인하고, 참여 노드 및 개별 코어의 개수를 분산 알고리즘에 적용하여 분산 개수를 설정하며, 설정된 분산 개수를 고려하여 온톨로지 서브셋을 분할하고 매칭 알고리즘을 적용하여 매칭 요청, 매칭 작업 및 매칭 태스크를 포함하는 매칭 스레드를 생성한다. When the candidate ontology is received, the distributed processing unit confirms the number of participating nodes and the number of individual cores included in the participating node, sets the number of distributed nodes by applying the number of participating nodes and individual cores to the distributed algorithm, , And generates a matching thread including a matching request, a matching operation, and a matching task by dividing an ontology subset and applying a matching algorithm.

본 발명에 따른 라지 스케일 온톨로지 매칭 방법은 먼저, 수신된 후보 온톨로지를 하나 이상의 온톨로지 서브셋으로 분류한다. 그리고, 생성된 온톨로지 서브셋을 분산 알고리즘을 적용하여 분할하고, 분할된 온톨로지 서브셋에 매칭 알고리즘을 적용하여 매칭 스레드를 생성한다. 매칭 스레드가 생성되며느 생성된 매칭 스레드를 참여 노드의 개별 코어에 전달한다. 다음으로, 개별 코어에서 매칭 스레드에 기초하여 매칭 연산을 수행하여 생성된 매칭 결과를 수집 및 합산하여 온톨로지 매핑을 생성한다. In the large-scale ontology matching method according to the present invention, the received candidate ontology is classified into one or more ontology subsets. Then, the generated ontology subset is divided by applying a distributed algorithm, and a matching algorithm is applied to the divided ontology subset to generate a matching thread. The matching thread is created and the generated matching thread is delivered to the individual cores of the participating nodes. Next, an ontology mapping is generated by collecting and summing the matching results generated by performing matching operations based on the matching threads in the individual cores.

본 발명에 따른 라지 스케일 온톨로지 매칭 방법은 참여 노드의 개수 및 참여 노드에 구비된 개별 코어의 개수를 확인하고, 참여 노드 및 개별 코어의 개수를 분산 알고리즘에 적용하여 분산 개수를 설정한다. 그리고, 설정된 분산 개수를 고려하여 온톨로지 서브셋을 분할하고 매칭 알고리즘을 적용하여 매칭 요청, 매칭 작업 및 매칭 태스크를 포함하는 매칭 스레드를 생성한다.The large scale ontology matching method according to the present invention confirms the number of participating nodes and the number of individual cores included in participating nodes and sets the number of dispersions by applying the number of participating nodes and individual cores to the decentralization algorithm. Then, the ontology subset is divided in consideration of the set number of distributions, and a matching algorithm is applied to generate a matching thread including a matching request, a matching operation, and a matching task.

본 발명에 따른 라지 스케일 온톨로지 매칭 장치 및 매칭 방법을 통해 온톨로지 매칭 연산을 참여 노드의 개별 코어에 분산시켜 병렬 처리하여 매칭 연산에 필요한 연산 자원 및 연산 시간을 효과적으로 줄일 수 있다.The large scale ontology matching apparatus and the matching method according to the present invention can distribute the ontology matching operation to the individual cores of the participant nodes and perform parallel processing to effectively reduce the operation resources and computation time required for the matching operation.

도 1은 본 발명에 따른 생물의학 온톨로지의 일 실시예를 나타내는 구성도이다.
도 2는 본 발명에 따른 라지 스케일 온톨로지 매칭 장치의 일 실시예를 나타내는 구성도이다.
도 3은 본 발명의 일 실시예에 따른 라지 스케일 온톨로지 매칭 장치의 전처리부를 나타내는 상세도이다.
도 4는 본 발명의 일 실시예에 따른 라지 스케일 온톨로지 매칭 장치의 분산 처리부를 나타내는 상세도이다.
도 5는 본 발명의 일 실시예에 따른 라지 스케일 온톨로지 매칭 장치에서 개별 코어로 전달되는 매칭 스레드의 일 실시예를 나타내는 도면이다.
도 6은 본 발명의 일 실시예에 따른 라지 스케일 온톨로지 매칭 장치의 데이터 흐름을 나타내는 도면이다.
도 7은 본 발명의 일 실시예에 따른 라지 스케일 온톨로지 매칭 방법을 나타내는 흐름도이다.
도 8은 본 발명의 일 실시예에 따른 라지 스케일 온톨로지 매칭 방법의 전처리 생략 방법을 나타내는 흐름도이다.1 is a block diagram showing an embodiment of a biomedical ontology according to the present invention.
FIG. 2 is a block diagram showing an embodiment of a large scale ontology matching apparatus according to the present invention.
3 is a detailed view showing a preprocessing unit of a large scale ontology matching apparatus according to an embodiment of the present invention.
4 is a detailed view showing a distributed processing unit of a large scale ontology matching apparatus according to an embodiment of the present invention.
5 is a diagram illustrating an embodiment of a matching thread that is transmitted to an individual core in a large scale ontology matching apparatus according to an embodiment of the present invention.
6 is a diagram illustrating a data flow of a large scale ontology matching apparatus according to an embodiment of the present invention.
7 is a flowchart illustrating a method of matching a large scale ontology according to an embodiment of the present invention.
8 is a flowchart illustrating a method of omitting a pre-processing of a large-scale ontology matching method according to an embodiment of the present invention.

이하, 본 발명의 실시예를 첨부된 도면들을 참조하여 상세하게 설명한다. 본 명세서에서 사용되는 용어 및 단어들은 실시예에서의 기능을 고려하여 선택된 용어들로서, 그 용어의 의미는 발명의 의도 또는 관례 등에 따라 달라질 수 있다. 따라서 후술하는 실시예에서 사용된 용어는, 본 명세서에 구체적으로 정의된 경우에는 그 정의에 따르며, 구체적인 정의가 없는 경우는 당업자들이 일반적으로 인식하는 의미로 해석되어야 할 것이다.
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. The terms and words used in the present specification are selected in consideration of the functions in the embodiments, and the meaning of the terms may vary depending on the intention or custom of the invention. Therefore, the terms used in the following embodiments are defined according to their definitions when they are specifically defined in this specification, and unless otherwise specified, they should be construed in a sense generally recognized by those skilled in the art.

도 1은 본 발명에 따른 생물의학 온톨로지의 일 실시예를 나타내는 구성도이다.1 is a block diagram showing an embodiment of a biomedical ontology according to the present invention.

도 1을 참조하면, 본 발명에서 생물의학 온톨로지는 상호간의 관계가 정립된 하나 이상의 구성요소(10)로 형성된다. 하나 이상의 구성요소(10)는 온톨로지 매칭 과정을 통해 개별 구성요소(10)와 대응하는 구성요소(10)의 관계를 정립하게 되고, 이와 같은 개별 구성요소(10) 사이의 매칭 결과가 모여 온톨로지 매핑을 형성한다. 라지 스케일 온톨로지 매칭 장치는 대규모(large scale) 온톨로지(Ontology) 매칭(Matching) 과정에서 연산을 효율적으로 수행하기 위해 참여한 노드(Node)의 하드웨어를 활용한다. 분산된 환경에 참여한 노드는 가능한 분산 환경에 의존하는 하나 이상의 노드를 포함할 수 있다. 그리고 참여한 노드는 단일 코어(Single-core) 또는 멀티코어(Multi-core) 하드웨어를 모두 포함할 수 있다. 라지 스케일 온톨로지 매칭 장치는 하나 이상의 노드를 포함하는 분산 환경에서 각각의 노드가 보유한 싱글코어 프로세서 또는 멀티코어 프로세서에 온톨로지 서브셋(Ontology Subset)을 할당하고 데이터 병렬화 기술을 통해 처리한다.Referring to FIG. 1, in the present invention, a biomedical ontology is formed of one or more constituent elements 10 in which mutual relations are established. One or more components 10 establish the relationship between the individual component 10 and the corresponding component 10 through an ontology matching process and the result of the matching between the individual components 10 is collected to form an ontology mapping . The large scale ontology matching device utilizes the hardware of participating nodes to efficiently perform operations in a large scale ontology matching process. A node participating in a distributed environment may include one or more nodes that depend on a possible distributed environment. And the participating nodes may include both single-core or multi-core hardware. The large scale ontology matching apparatus allocates an ontology subset to a single-core processor or a multicore processor held by each node in a distributed environment including one or more nodes and processes the data by using a data parallelization technique.

생물의학 온톨로지는 일반적인 온톨로지와 동일하게 시매틱-웹 온톨로지(Semantic-web Ontology) 그룹에 속한다. 하지만, 생물의학 온톨로지와 일반적인 온톨로지는 규모(size)와 진화(Evolution)에 있어서 차이가 존재한다. 일반 온톨로지는 생물의학 온톨로지에 비해 규모가 더 작으며, 진화 또한 더 느리다. 반면에, 생물의학 온톨로지는 생물의학 도메인 및 생물의학 데이터에서의 빠른 진화 때문에 더욱 빠르게 진화하고 이에 따라 규모 또한 더욱 크다.The biomedical ontology belongs to the semantic-web ontology group as the general ontology. However, biomedical ontologies and generic ontologies differ in size and evolution. Generic ontologies are smaller in size than biomedical ontologies, and evolution is also slower. On the other hand, biomedical ontologies evolve more rapidly because of the rapid evolution in biomedical domains and biomedical data, and are therefore much larger.

분산 환경에서 온톨로지 매칭은 소스 온톨로지(Source Ontology)와 타겟 온톨로지(Target Ontology) 사이의 개념에 기초하여 대응(Matching) 관계를 설정하거나 연결하는 과정으로, 이러한 과정을 통해 온톨로지의 분류 체계를 확장해 줄 수 있다. 소스 온톨로지는 새로운 관계를 정립하기 위한 온톨로지이고, 타겟 온톨로지는 온톨로지 매칭 과정을 통해 소스 온톨로지와 매칭되는 대상이 되는 온톨로지이다. In the distributed environment, ontology matching is a process of establishing or linking a matching relationship based on the concept between a source ontology and a target ontology. This process extends the ontology classification system . The source ontology is an ontology for establishing a new relationship, and the target ontology is an ontology that is an object to be matched with the source ontology through an ontology matching process.

소스 온톨로지 및 타겟 온톨로지 사이의 매칭 작업의 총 개수는 수학식 1과 같이 가용한 연산 리소스(코어) 사이에 균등하게 분산된다. The total number of matching operations between the source ontology and the target ontology is evenly distributed among available computational resources (cores) as shown in Equation (1).

수학식 1에서 MTtotal은 전채 매칭 태스크를 나타내고, Os는 소스 온톨로지를 나타내며, Ot는 타겟 온톨로지를 나타내고, MTcore는 개별 코어의 매칭 태스크를 나타내며, m은 소스 온톨로지의 컨셉(Concept)이고, n은 타겟 온톨로지의 컨셉이다. 대규모 생물의학 온톨로지 매칭을 위한 요구는 생물의학 분야의 전문가, 연구원, 생물의학/생물정보학 시스템 및 심지어 클라우드 플랫폼 상에서 실행되는 서드파티(Third Party) 헬스케어 정보 서비스(Healthcare Information Service)를 포함하는 몇몇의 소스로부터 생성될 수 있다.MT represents a matching task of an individual, Os represents a source ontology, Ot represents a target ontology, MTcore represents a matching task of an individual core, m is a concept of a source ontology, n is a target It is the concept of ontology. The need for large-scale biomedical ontology matching is being addressed by several experts in biomedicine, including researchers, biomedical / bioinformatics systems, and even third-party healthcare information services running on cloud platforms. Lt; / RTI > source.

도 2는 본 발명에 따른 라지 스케일 온톨로지 매칭 장치의 일 실시예를 나타내는 구성도이다.FIG. 2 is a block diagram showing an embodiment of a large scale ontology matching apparatus according to the present invention.

도 2를 참조하면, 본 발명에 따른 라지 스케일 온톨로지 매칭 장치(100)는 전처리부(110), 온톨로지 저장부(120), 분산 처리부(130) 및 집산부(140)를 포함한다.2, an apparatus 100 for matching a large scale ontology according to the present invention includes a preprocessing unit 110, an ontology storage unit 120, a distributed processing unit 130, and an aggregation unit 140.

전처리부(110)는 소스 온톨로지 및 타겟 온톨로지를 포함하는 후보 온톨로지를 입력받는다. 소스 온톨로지 및 타겟 온톨로지를 포함하는 후보 온톨로지는 서로 1:1로 매칭할 수 있으며, 하나의 소스 온톨로지가 둘 이상의 타겟 온톨로지에 매칭하거나, 둘 이상의 소스 온톨로지가 하나의 타겟 온톨로지에 매칭할 수 있다. The preprocessing unit 110 receives the candidate ontology including the source ontology and the target ontology. The candidate ontologies including the source ontology and the target ontology may match one to one with each other, one source ontology may match two or more target ontologies, or two or more source ontologies may match one target ontology.

전처리부(110)는 수신된 소스 온톨로지 및 타겟 온톨로지를 포함하는 후보 온톨로지를 하나 이상의 서브셋(Subset)으로 분류한다. 온톨로지는 일반적으로 5개의 서브셋으로 구성될 수 있다. 온톨로지를 구성하는 서브셋은 크게 이름(Name), 레이블(Lable), 관계(Relationship), 공리(Axiom) 및 특성(Property)으로 구분될 수 있다. 하지만, 온톨로지를 상술한 다섯 가지의 서브셋으로만 구분할 수 있는 것은 아니며, 전처리부(110)는 수신된 후보 온토로지의 종류나 특성에 따라 숫자에 관계없이 하나 이상의 서브셋으로 분류할 수 있다. 그리고, 전처리부(110)는 분류된 온톨로지 서브셋을 온톨로지 저장부(120) 및 분산 처리부(130)로 전달한다. 전처리부(110)는 분류된 온톨로지 서브셋을 분산 처리부(130)로 직접 전달할 수 있으며, 또한, 전처리부(110)는 온톨로지 저장부(120)에 저장되어 있던 온톨로지 서브셋을 수신하여 분산 처리부(130)로 전달할 수 있다. 특히, 전처리부(110)는 이전에 분류하여 온톨로지 저장부(120)에 저장한 온톨로지 서브셋과 동일한 후보 온톨로지가 수신되는 경우, 서브셋으로 분류하는 전처리 과정을 반복하지 않고, 온톨로지 저장부(120)에 직렬화되어 저장된 온톨로지 서브셋을 전처리 과정 없이 분산 처리부(130)로 전달함으로써, 전처리 과정에서 소모되는 연산 리소스를 줄일 수 있다. 또한, 전처리부(110)는 온톨로지 서브셋을 직렬화(Serialization)하여 온톨로지 저장부(120)로 전달할 수 있으며, 온톨로지 저장부(120)로부터 수신된 직렬화된 온톨로지 서브셋을 다시 역직렬화(De-derialization)하여 분산 처리부(130)로 전달할 수 있다.The preprocessing unit 110 classifies the candidate ontology including the received source ontology and the target ontology into one or more subsets. An ontology can generally consist of five subsets. The subset that constitutes the ontology can be broadly classified into Name, Lable, Relationship, Axiom, and Property. However, the ontology can not be divided into only the five subset described above, and the preprocessing unit 110 can classify the ontology into one or more subsets according to the type and characteristics of the received candidate ontology. The pre-processing unit 110 transfers the classified ontology subset to the ontology storage unit 120 and the distributed processing unit 130. The preprocessing unit 110 may directly transmit the sorted ontology subset to the distributed processing unit 130. The preprocessing unit 110 also receives the ontology subset stored in the ontology storage unit 120 and transmits the received ontology subset to the distributed processing unit 130. [ . In particular, when a candidate ontology identical to an ontology subset previously classified and stored in the ontology storage unit 120 is received, the preprocessing unit 110 notifies the ontology storage unit 120 of the pre- The serialized and stored ontology subset is transferred to the distributed processing unit 130 without the preprocessing process, thereby reducing the computational resources consumed in the preprocessing process. The preprocessing unit 110 serializes the ontology subset to the ontology storage unit 120 and de-derializes the serialized ontology subset received from the ontology storage unit 120 To the distributed processing unit 130.

온톨로지 저장부(120)는 전처리부(110)로부터 수신된 온톨로지 서브셋을 직렬화(Serialization)하여 저장한다. 온톨로지 저장부(120)는 수신된 온톨로지 서브셋을 바이너리 형태로 직렬화하여 저장 공간을 효율적으로 사용할 수 있으며, 데이터 처리 효율을 높일 수 있다, 그리고, 전처리부(110)에서 이전에 온톨로지 저장부(120)는 바이너리 형식으로 직렬화하여 저장한 온톨로지 서브셋을 전처리부(110)의 요구에 따라 전달함으로써, 전처리부(110)에서 불필요한 전처리 과정의 반복(Re-processing)을 줄이거나 피할 수 있다.The ontology storage unit 120 serializes the ontology subset received from the preprocessing unit 110 and stores the serialized data. The ontology storage unit 120 can efficiently use the storage space by serializing the received ontology subset in a binary form and improve data processing efficiency. The pre-processing unit 110 previously stores the ontology storage unit 120, It is possible to reduce or avoid unnecessary preprocessing re-processing in the preprocessing unit 110 by transmitting the ontology subset stored in serial form in binary format according to the request of the preprocessing unit 110. [

분산 처리부(130)는 전처리부(110)로부터 수신된 온톨로지 서브셋을 데이터 병렬화(Data Parallel)하기 위해 둘 이상의 매칭 스레드(Matching Thread)로 분할한다. 데이터 병렬화는 둘 이상의 분산 온톨로지로 분할된 후보 온톨로지의 조각 각각에 대해 매칭 작업을 수행하기 위해 각각의 프로세싱 코어(Processing Core)를 요구한다. 병렬 처리 프로세싱을 가능하게 하기 위해, 분산 처리부(130)는 온톨로지 서브셋을 분산 알고리즘을 고려하여 분할한다. 먼저, 분산 처리부(130)는 먼저 개별 참여 노드의 멀티 코어 프로세서(Multi-core Processor)의 코어의 개수를 파악한다. 그리고, 분산 처리부(130) 파악된 참여 노드 및 개별 참여 노드의 코어 수를 분산 알고리즘에 적용하여 수신된 온톨로지 서브셋을 소정의 분산 온톨로지로 분할한다. The distributed processing unit 130 divides the ontology subset received from the preprocessing unit 110 into two or more matching threads in order to perform data parallel processing. Data parallelization requires each processing core to perform a matching operation on each piece of the candidate ontology segmented into two or more distributed ontologies. In order to enable parallel processing, the distributed processing unit 130 divides the ontology subset in consideration of the distribution algorithm. First, the distributed processing unit 130 determines the number of cores of a multi-core processor of an individual participating node. The distributed processing unit 130 divides the received ontology subset into predetermined distributed ontologies by applying the number of cores of the participating nodes and the individual participating nodes to the distributed algorithm.

그리고, 분산 처리부(130)는 분산 알고리즘에 기초하여 분할된 하나 이상의 온톨로지 서브셋에 매칭 라이브러리(Matching Library)를 적용하여 매칭 요구(Matching Request, 이하 MR이라 칭함), 매칭 작업(Matching Job, 이하 MJ라 칭함) 및 매칭 태스크(Matching Task, 이하 MT라 칭함)로 분할한다. 수학식 2는 분산 처리부(130)에 적용된 분산 알고리즘의 매칭 요구(Matching Request, 이하 MR이라 칭함), 매칭 작업(Matching Job, 이하 MJ라 칭함) 및 매칭 태스크(Matching Task, 이하 MT라 칭함)의 관계를 나타낸다. The distributed processing unit 130 applies a matching library to one or more divided subset of ontologies based on the distribution algorithm to generate a matching request (MR), a matching job (MJ) And a matching task (hereinafter referred to as MT). Equation (2) represents a matching request (MR), a matching job (MJ), and a matching task (MT) of a distributed algorithm applied to the distributed processing unit Relationship.

수학식 2에서 MR_i는 각 노드에 수신된 매칭 요구를 나타내고, i는 분산 환경에서 병렬 매칭을 위한 참여 노드의 수를 나타내고, MR은 매칭을 위해 라지 스케일 온톨로지 매칭 장치(100)에 의해 수신된 매칭 요구를 나타내며, MJ_j는 각 참여노드를 구성하는 개별 코어(20)에 할당된 매칭작업을 나타내고, j는 참여 노드에 구비된 개별 코어(20)의 수를 나타내며, MT_k는 각 참여 노드의 하나의 개별 코어(20)에 할당되어 매칭 연산을 수행하는 매칭태스크를 나타내고, k는 하나의 개별 코어(20)에 할당된 매칭 작업에 포함된 매칭 태스크의 개수를 나타낸다. 그리고, m은 소스 온톨로지의 컨셉(Concept)이고, n은 타겟 온톨로지의 컨셉이다. 단일 매칭 태스크 MT_k는 소스 온톨로지 및 타겟 온톨로지 각각의 컨셉의 카티전 곱(Cartesian Product)이다. In Equation (2), MR _i denotes a matching request received at each node, i denotes the number of participating nodes for parallel matching in the distributed environment, MR denotes the number of participating nodes received by the large scale ontology matching apparatus 100 _J represents a matching operation assigned to an individual core 20 constituting each participating node, j represents the number of individual cores 20 provided in the participating node, MT _k represents a matching operation for each participating node And k represents the number of matching tasks included in the matching task assigned to one individual core 20. In this case, And, m is the concept of the source ontology, and n is the concept of the target ontology. The single matching task MT _k is a Cartesian product of the concepts of the source ontology and the target ontology, respectively.

매칭 요구는 분산 환경의 참여 노드(또는 참여 노드의 CPU) 각각에 대응하여 전달되는 매칭 스레드(Mathing Thread)이다. 분산 처리부(130)는 먼저 참여 노드(또는 참여 노드의 CPU)의 수를 고려하여 온톨로지 서브셋을 각각의 참여 노드(또는 참여 노드의 CPU)에 대응하는 하나 이상의 매칭 요구로 분할한다. 매칭 요구의 최대 개수는 참여 노드(또는 참여 노드의 CPU)의 개수가 될 수 있다. 즉, 매칭 요구는 온톨로지 병렬 처리를 위해 개별 노드에 할당되는 병렬 처리 작업이 될 수 있다. 그리고, 분산 처리부(130)는 분할된 매칭 요구를 다시 해당 매칭 요구가 할당되는 참여 노드의 코어의 수에 따라 매칭 작업으로 분할하여 해당 매칭 요구에 포함시킨다. 즉, 매칭 작업은 개별 코어(20)에 할당되는 병렬 처리 작업으로서, 해당 매칭 요구가 할당된 참여 노드의 코어의 수가 4개의 코어를 가진다면, 해당 매칭 요구는 4개의 매칭 작업을 포함할 수 있다. 다음으로, 분산 처리부(130)는 개별 코어에 할당되는 매칭 작업을 실제 매칭 프로세싱을 수행하는 단위인 매칭 태스크로 분할하여 매칭 작업에 포함시킨다.The matching request is a mathing thread that is transmitted in correspondence with each of the participating nodes (or the CPUs of the participating nodes) of the distributed environment. The distributed processing unit 130 firstly divides the ontology subset into one or more matching requests corresponding to the respective participating nodes (or the CPUs of the participating nodes) considering the number of the participating nodes (or the CPUs of the participating nodes). The maximum number of matching requests may be the number of participating nodes (or participating node's CPUs). That is, the matching request can be a parallel processing task that is assigned to an individual node for ontology parallel processing. The distributed processing unit 130 divides the divided matching request into matching jobs according to the number of cores of the participating nodes to which the corresponding matching requests are assigned, and includes the divided matching requests in the matching request. That is, if the matching operation is a parallel processing operation assigned to the individual core 20, and the number of cores of the participating node to which the matching request is assigned has four cores, the matching request may include four matching operations . Next, the distributed processing unit 130 divides the matching task allocated to the individual core into a matching task, which is a unit for performing actual matching processing, and incorporates the matching task into the matching task.

매칭 요구, 매칭 작업 및 매칭 태스크는 전체 매칭 프로세스를 위한 세 개의 추상화 계층(Layer of Abstraction)이다. 병렬 처리 프로세싱을 위한 분산 알고리즘에서 서로 다른 레벨에서 분류를 제공해야 하기 때문에, 분산 처리부(130)의 분산 알고리즘은 실행중인 모든 작업의 트랙(Track)을 보존해야 한다.Matching requests, matching tasks, and matching tasks are three layers of abstraction for the entire matching process. The distribution algorithm of the distributed processing unit 130 must preserve the tracks of all the jobs being executed since the classification algorithm for the parallel processing processing must provide classification at different levels.

매칭 태스크는 매칭 프로세스의 유닛으로 가장 작은 전체 매칭 프로세스로 분류될 수 있다. 예를 들어, 해당 참여 노드의 코어에 할당된 매칭 작업에서 소스 온톨로지가 {A,B,C,D}이고 타겟 온톨로지가 {a,b,c,d}라면, 소스 온톨로지와 타겟 온톨로지를 비교하여 매칭 프로세스를 수행할 때, A=a는 첫 번째 MT, B=b는 두 번째 MT, C=c는 세 번째 MT, D=d는 네 번째 MT일 수 있으며, A=d를 비교하여 매칭할 수 있다면, 이 또한 다른 하나의 MT가 될 수 있다. 즉, 매칭 태스크는 참여 노드의 개별 코어 내부에서 개별 매칭 프로세싱을 처리하는 가장 작은 단위로서, 코어 내부에서 온톨로지를 구성하는 개별 용어(Term)의 매칭 프로세싱을 처리한다.The matching task can be classified as the smallest overall matching process as a unit of the matching process. For example, if the source ontology is {A, B, C, D} and the target ontology is {a, b, c, d} in the matching task assigned to the core of the participating node, then compare the source ontology with the target ontology When performing the matching process, A = a is the first MT, B = b is the second MT, C = c is the third MT, D = d is the fourth MT, If possible, this could also be the other MT. In other words, the matching task is the smallest unit that processes the individual matching processing within the individual core of the participating node, and processes the matching processing of individual terms (Term) constituting the ontology within the core.

매칭 작업은 참여 노드의 개별 코어(20)에 할당되어 매칭 프로세스를 수행하는 단위로서, 하나 이상의 매칭 태스크의 모음이 될 수 있다. 그리고, 매칭 요구는 하나의 참여 노드(또는 참여 노드의 CPU)에서 실행되는 모든 매칭 작업의 모음이다. The matching operation may be a unit of one or more matching tasks assigned to the individual cores 20 of the participating nodes to perform the matching process. And, the matching request is a collection of all the matching tasks that are executed in one participating node (or CPU of the participating node).

예를 들어, 4개의 참여 노드가 매칭 프로세스에 존재하고, 4개의 참여 노드가 모두 4개의 코어를 가지며, 수신된 온톨로지 서브셋이 총 2000개의 매칭을 필요로 한다면, 분산 처리부(130)는 총 2000개의 매칭을 참여 노드의 수에 따라 4등분하여 총 4개의 매칭 요구로 분할한다. 분할된 4개의 매칭 요구 각각은 500개의 매칭을 포함한다. 그리고, 분산 처리부(130)는 4개의 매칭 요구 각각을 코어의 수를 고려하여 4개의 매칭 작업으로 분류한다. 그러면, 각각의 매칭 작업은 500개의 매칭을 4로 나눈 125개의 매칭을 포함한다. 결국, 하나의 매칭 작업은 125개의 매칭을 포함하며, 개별 매칭 작업에 포함된 매칭 태스크는 125개가 된다. 즉, 4개의 매칭 요청 각각은 4개의 매칭 작업을 포함하며, 4개의 매칭 작업 각각은 125개의 매칭 태스크를 포함하여 결국, 총 2000개의 매칭을 포함하는 하나 이상의 온톨로지 서브셋은 4개의 매칭 요청으로 분할되고, 각각의 매칭 요청은 4개의 매칭 작업을 포함하며, 각각의 매칭 작업은 125개의 매칭 태스크를 포함한다.For example, if four participating nodes are present in the matching process, all four participating nodes have four cores, and the received ontology subset requires a total of 2000 matches, then the distributed processing unit 130 will have a total of 2000 The matching is divided into 4 equal parts according to the number of participating nodes, and is divided into a total of 4 matching requests. Each of the four matching requests divided comprises 500 matches. The distributed processing unit 130 classifies each of the four matching requests into four matching operations in consideration of the number of cores. Then, each matching operation includes 125 matching, which is 500 divided by 4. As a result, one matching task includes 125 matches, and the number of matching tasks included in the individual matching task is 125. That is, each of the four matching requests includes four matching tasks, each of the four matching tasks including 125 matching tasks, so that one or more of the ontology subsets comprising a total of 2000 matches are divided into four matching requests , Each matching request includes four matching tasks, each matching task including 125 matching tasks.

매칭 요청, 매칭 작업 및 매칭 태스크의 개수는 참여 노드 및 개별 참여 노드의 수뿐만 아니라, 가장 최적화된 병렬 처리를 수행할 수 있는 시나리오를 고려하여 설정될 수 있다. 그리고, 각각의 매칭 작업은 스스로 다른 매칭 작업 및 원격에서 동작하는 다른 매칭 요구에 독립적이다. 또한, 매칭 요구를 매칭 작업으로 세분화하는 과정에서 하나 이상의 매칭 작업에 포함된 매칭 태스크의 수를 하나 이상의 매칭 작업 간에 동일하게 유지한다면, 대부분의 개별 코어가 유사한 시간에 매칭 연산을 종료하여 매칭 결과를 생성할 수 있어 가용한 매칭 코어가 유휴상태(Idel)을 유지하는 것을 막을 수 있다. 매칭 요청, 매칭 작업 및 매칭 태스크는 후술하는 도 5에서 다시 설명하도록 한다. The number of matching requests, matching tasks, and matching tasks can be set considering not only the number of participating nodes and individual participating nodes, but also scenarios that can perform the most optimized parallel processing. And each matching operation is independent of other matching operations and other matching requirements that operate remotely. Also, if the number of matching tasks included in one or more matching jobs is kept the same among one or more matching jobs in the process of subdividing the matching requests into matching jobs, most of the individual cores finish the matching operation at a similar time, Lt; RTI ID = 0.0 > (Idel). &Lt; / RTI > The matching request, the matching operation and the matching task will be described again in FIG. 5 to be described later.

분산 처리부(130)는 상술한 과정을 통해 하나 이상의 온톨로지 서브셋을 소정의 개수를 가지는 매칭 요청을 참여 노드로 전달한다. 하나의 매칭 요청은 대응하는 하나의 참여 노드로 전달된다. 하나의 참여 노드로 전달된 하나의 매칭 요청은 해당 참여 노드에 포함된 하나 이상의 코어 각각에 대응하는 매칭 작업을 포함한다. 하나의 매칭 요청에 포함된 하나 이상의 매칭 작업은 대응하는 코어에 일 대 일로 할당되다. 그리고, 하나의 코어에 할당된 하나의 매칭 작업은 포함된 매칭 태스크를 통해 해당 코어에서 매칭 프로세싱을 수행한다. 이를 통해, 본 발명에 따른 라지 스케일 온톨로지 매칭 장치(100)는 참여 노드의 코어 각각에서 매칭 프로세싱을 병렬처리할 수 있다. The distributed processing unit 130 transmits a matching request having a predetermined number of at least one ontology subset to the participating node through the above-described process. One matching request is forwarded to the corresponding one participating node. One matching request delivered to one participating node includes a matching operation corresponding to each of the one or more cores included in the participating node. One or more matching operations included in one matching request are assigned one-to-one to the corresponding cores. One matching task assigned to one core performs matching processing in the corresponding core through the included matching task. Accordingly, the large scale ontology matching apparatus 100 according to the present invention can parallel process the matching processing in each of the cores of the participating nodes.

집산부(140)는 분산 처리부(130)로부터 매칭 스레드를 전달받은 참여 노드의 개별 코어(20)로부터 매칭 스레드를 통해 매칭 연산을 수행한 매칭 결과를 수신한다. 분산 처리부(130)는 매칭 요청, 매칭 작업 및 매칭 태스크를 포함하는 매칭 스레드를 참여 노드의 각 코어에 전달한다. 분산 처리부(130)로부터 매칭 요청, 매칭 작업 및 매칭 태스크를 포함하는 매칭 스레드를 전달받은 참여 노드의 개별 코어(20)는 수신된 매칭 스레드에 기초하여 매칭 연산을 수행한다. 매칭 스레드에 포함된 매칭 요청을 수신한 참여 노드는 매칭 요청에 포함된 매칭 작업에 따라 구비된 개별 코어(20)에 매칭 작업을 할당하고, 개별 코어(20)는 할당된 매칭 작업에 포함된 매칭 태스크를 수행함으로써, 수신된 후보 온톨로지의 매칭 프로세싱을 병렬로 처리하게 된다. 집산부(140)는 개별 코어에서 수행된 온톨로지 매칭 연산에 의해 생성된 매칭 결과(연산 결과)를 수집한다.The collecting unit 140 receives the matching result of performing the matching operation through the matching thread from the individual core 20 of the participating node that has received the matching thread from the distributed processing unit 130. [ The distributed processing unit 130 delivers a matching thread including a matching request, a matching operation, and a matching task to each core of the participating node. The individual core 20 of the participant node that receives the matching thread including the matching request, the matching operation, and the matching task from the distributed processing unit 130 performs the matching operation based on the received matching thread. The participating node receiving the matching request included in the matching thread assigns a matching operation to the individual core 20 provided according to the matching operation included in the matching request, and the individual core 20 performs matching By performing the task, the matching processing of the received candidate ontology is processed in parallel. The collection unit 140 collects matching results (operation results) generated by the ontology matching operation performed on the individual cores.

집산부(140)는 개별 코어(20)에서 생성된 매칭 결과를 개별 코어(20)로부터 직접 수신할 수 있으며, 또는, 개별 코어(20)에서 생성된 매칭 결과를 개별 코어(20)가 포함된 참여 노드로부터 한번에 수신할 수 있다.The concentrator 140 may receive the matching results generated by the individual cores 20 directly from the individual cores 20 or may output the matching results generated by the individual cores 20 to the individual cores 20, It can be received from the participating node at once.

집산부(140)는 개별 코어(20)로부터 수신된 매칭 결과를 합산하여 온톨로지 매핑을 생성한다. 온톨로지 매핑은 둘 이상의 후보 온톨로지의 구성요소 사이의 개별 매칭 결과의 집합이다. 그리고, 집산부(140)는 개별 코어(20)에서 매칭 작업의 연산에 의해 연산된 매칭 결과를 축적하고, 축적된 매칭 결과에 브릿지 패턴(Bridge Pattern)을 적용하여 온톨로지 매핑의 형식적 표현(Formal Representation)을 생성한다. 브릿지 패턴은 매핑 파일의 유형으로서, 개별 사용자에 따라 매핑 파일을 위해 정의된 개별 형식(custom defined format)이다. 예를 들어, 특정한 스키마(Scheme)와 함께 XML 포맷의 매핑 파일을 원하는 경우, 이는 다른 사용자가 정의한 매핑 포맷과 다를 수 있다. 이와 같은 모든 개별 형식은 브릿지 패턴으로 불린다. 즉, 집산부(140)는 브릿지 패턴을 통해 개별 사용자가 요구하는 개별 형식에 맞추어 온톨로지 매핑을 제공할 수 있다. 그리고, 집산부(140)는 사용한 브릿지 패턴을 다른 사용자를 위해 저장한다.The summation unit 140 sums up the matching results received from the individual cores 20 to generate an ontology mapping. The ontology mapping is a set of individual matching results between the components of two or more candidate ontologies. The accumulating unit 140 accumulates the matching result calculated by the calculation of the matching operation in the individual core 20 and applies a bridge pattern to the accumulated matching result to obtain a formal representation of the ontology mapping ). The bridge pattern is a type of mapping file, which is a custom defined format for mapping files according to individual users. For example, if you want a mapping file in XML format with a specific schema, it may be different from the mapping format defined by another user. All these individual formats are called bridge patterns. That is, the collection unit 140 can provide the ontology mapping according to the individual format requested by the individual user through the bridge pattern. Then, the collection unit 140 stores the used bridge pattern for another user.

집산부(140)는 참여 노드의 개수에 따라 합산 과정이 달라질 수 있다.The summation unit 140 may vary the summation process according to the number of participating nodes.

수학식 3은 둘 이상의 참여 노드가 연결된 분산 환경에서의 브릿지 온톨로지 생성을 위한 세 단계를 나타낸다. 첫 번째 단계는 모든 매칭 작업에서의 모든 단일 매칭 태스크의 결과를 모두 수집한다. 첫 번째 단계에서,

는 매칭 태스크의 매칭 결과(매칭 태스크의 브릿지 온톨로지)이고,

은 매칭 작업의 중앙 브릿지 온톨로지(Intermediate Bridge Ontology)이다. 두 번째 단계는 하나의 참여 노드 상에서 실행되는 모든 매칭 작업의 결과를 합산한다. 두 번째 단계에서,

는 단일 참여 노드에서 실행되는 모든 매칭 작업에서 모든 중앙 브릿지 온톨로지

의 합이다. 현재의 모든 노드에서 생성된 중앙 브릿지 온톨로지

는 노드로부터의 중앙 브릿지 온톨로지가 된다. 세 번째 단계는 분산 환경 상의 모든 참여 노드에서 실행되는 모든 매칭 요구의 모든 매칭 결과를 합산한다. 세 번째에서,

는 최종 생성되는 브릿지 온톨로지로서

는 출력 파일인 온톨로지 매핑 결과로 전환된다.Equation (3) represents three steps for generating a bridge ontology in a distributed environment in which two or more participating nodes are connected. The first step collects all the results of all single matching tasks in all matching jobs. In the first step,

(The bridge ontology of the matching task) of the matching task,

Is the intermediate bridge ontology of the matching operation. The second step sums the results of all matching operations running on one participating node. In the second step,

In all matching operations running on a single participating node, all central bridge ontologies

. The central bridge ontology generated from all current nodes

Is the central bridge ontology from the node. The third step is to sum all the matching results of all matching requests running on all participating nodes in the distributed environment. In the third,

Is the final generated bridge ontology

Is converted to an ontology mapping result which is an output file.

수학식 4는 하나의 참여 노드가 연결된 분산 환경에서의 브릿지 온톨로지 생성을 위한 두 단계를 나타낸다. 참여 노드가 하나일 경우, 단일 참여 노드(CPU)의 개별 코어(20)에서 실행되는 매칭 작업의 결과를 합산해야 한다. 첫 번째 단계는 단일 매칭 작업에서 실행되는 모든 매칭 태스크의 결과

를 통합하며, 매칭 태스크의 결과

를 통합 결과는 매칭 작업의 중앙 브릿지 온톨로지

이다. 두 번째 단계는 하나의 참여 노드에서 실행되는 모든 매칭 작업의 결과를 합산한다. 단일 참여 노드에서 실행되는 매칭 작업에서 모든 중앙 브릿지 온톨로지

의 합은 노드로부터의 중앙 브릿지 온톨로지

가 된다.Equation (4) represents two steps for generating a bridge ontology in a distributed environment in which one participant node is connected. If there is one participating node, the result of a matching operation performed on the individual core 20 of a single participating node (CPU) must be summed. The first step is the result of all the matching tasks running in a single matching task

, And the result of the matching task

The result of the integration is the central bridge ontology of the matching operation

to be. The second step sums the results of all matching operations performed on one participating node. In a matching operation running on a single participating node, all the central bridge ontologies

Lt; RTI ID = 0.0 > ontology < / RTI &

.

분산 환경에 참여한 모든 참여 노드에 의해 연산된 매칭 결과는 브릿지 온톨로지라고 불리는 하나의 온톨로지 오브젝트와 같이 합산된다. 그리고, 집산부(140)는 수집된 브릿지 온톨로지를 물리적 매핑 파일(Physical Mapping File)인 온톨로지 매핑 결과로 전환하여 출력한다.The matching results computed by all participating nodes participating in the distributed environment are summed together with one ontology object called the bridge ontology. Then, the collection unit 140 converts the collected bridge ontology into an ontology mapping result, which is a physical mapping file, and outputs the result.

도 3은 본 발명의 일 실시예에 따른 라지 스케일 온톨로지 매칭 장치의 전처리부를 나타내는 상세도이다. 3 is a detailed view showing a preprocessing unit of a large scale ontology matching apparatus according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일 실시예에 따른 라지 스케일 온톨로지 매칭 장치(100)의 전처리부(110)는 온톨로지 모델부(111), 직렬화 처리부(112) 및 역직렬화 처리부(113)를 포함한다.3, the preprocessing unit 110 of the large scale ontology matching apparatus 100 according to an embodiment of the present invention includes an ontology model unit 111, a serialization processing unit 112, and a deserialization processing unit 113 do.

온톨로지 모델부(111)는 소스 온톨로지 및 타겟 온톨로지를 포함하는 후보 온톨로지를 입력받는다. 소스 온톨로지 및 타겟 온톨로지를 포함하는 후보 온톨로지는 서로 1:1로 매칭할 수 있으며, 하나의 소스 온톨로지가 둘 이상의 타겟 온톨로지에 매칭하거나, 둘 이상의 소스 온톨로지가 하나의 타겟 온톨로지에 매칭할 수 있다. The ontology model unit 111 receives a candidate ontology including a source ontology and a target ontology. The candidate ontologies including the source ontology and the target ontology may match one to one with each other, one source ontology may match two or more target ontologies, or two or more source ontologies may match one target ontology.

온톨로지 모델부(111)는 수신된 소스 온톨로지 및 타겟 온톨로지를 포함하는 후보 온톨로지를 하나 이상의 서브셋(Subset)으로 분류한다. 온톨로지는 일반적으로 5개의 서브셋으로 구성될 수 있다. 온톨로지를 구성하는 서브셋은 크게 이름(Name), 레이블(Lable), 관계(Relationship), 공리(Axiom) 및 특성(Property)으로 구분될 수 있다. 하지만, 온톨로지를 상술한 다섯 가지의 서브셋으로만 구분할 수 있는 것은 아니며, 온톨로지 모델부(111)는 수신된 후보 온토로지의 종류나 특성에 따라 숫자에 관계없이 하나 이상의 서브셋으로 분류할 수 있다. 또한, 온톨로지 모델부(111)는 매칭 알고리즘을 포함하는 매칭 라이브러리를 고려하여 후보 온톨로지를 하나 이상의 서브셋으로 분류할 수 있다. 온톨로지 모델부(111)는 분류된 온톨로지 서브셋을 분산 처리부(130)로 전달한다.The ontology modeling unit 111 classifies the candidate ontology including the received source ontology and the target ontology into one or more subsets. An ontology can generally consist of five subsets. The subset that constitutes the ontology can be broadly classified into Name, Lable, Relationship, Axiom, and Property. However, the ontology can not be divided into only the five subset described above, and the ontology model unit 111 can classify it into one or more subsets regardless of the number according to the type and characteristics of the received candidate ontology. Also, the ontology modeling unit 111 may classify the candidate ontology into one or more subsets in consideration of the matching library including the matching algorithm. The ontology model unit 111 delivers the classified ontology subset to the distributed processing unit 130. [

직렬화 처리부(112)는 온톨로지 모델부(111)에서 생성된 온톨로지 서브셋을 직렬화한다. 직렬화 처리부(112)는 둘 이상의 서브셋으로 분류된 온톨로지 서브셋을 바이너리 형태로 직렬화하여 저장 공간을 효율적으로 사용할 수 있으며, 데이터 전송 및 처리의 효율성을 높일 수 있다. 직렬화 처리부(112)는 직렬화된 온톨로지 서브셋을 온톨로지 저장부(120)로 전달하여 저장할 수 있다. 온톨로지 저장부(120)는 직렬화된 온톨로지 서브셋을 저장함으로써 저장 공간을 효율적으로 사용할 수 있으며, 데이터 처리 효율을 높일 수 있다. The serialization processing unit 112 serializes the ontology subset generated by the ontology model unit 111. The serialization processing unit 112 can efficiently use the storage space by serializing the ontology subset classified into two or more subsets into a binary form, thereby improving the efficiency of data transmission and processing. The serialization processing unit 112 may transmit the serialized ontology subset to the ontology storage unit 120 and store the subset. The ontology storage unit 120 can efficiently use the storage space by storing the serialized ontology subset, thereby improving the data processing efficiency.

그리고, 역직렬화 처리부(113)는 직렬화 처리부(112)에 의해 직렬화된 형태로 온톨로지 저장부(120)에 저장된 직렬화된 온톨로지 서브셋을 다시 원래의 온톨로지 서브셋 상태로 재구성한다. 이러한 과정을 통해, 역직렬화 처리부(113)는 온톨로지 저장부(120)에 바이너리 형식으로 직렬화하여 저장한 온톨로지 서브셋을 온톨로지 모델부(111)의 요구에 따라 전달함으로써, 온톨로지 모델부(111)에서 불필요한 전처리 과정의 반복(Re-processing)을 줄이거나 피할 수 있다. 즉, 이전에 분류하여 온톨로지 저장부(120)에 저장한 온톨로지 서브셋과 동일한 후보 온톨로지가 수신되는 경우, 서브셋으로 분류하는 전처리 과정을 반복하지 않고, 온톨로지 저장부(120)에 직렬화되어 저장된 온톨로지 서브셋을 바로 역직렬화 처리부(113)를 통해 다시 복원(재구성)하여 분산 처리부(130)로 전달할 수 있다. 이와 같은 과정을 통해 전처리 과정을 생략(Skip)함으로써, 전처리 과정에서 소모되는 연산 리소스를 줄일 수 있다.The deserialization processor 113 reconstructs the serialized ontology subset stored in the ontology storage unit 120 in the serialized form by the serialization processing unit 112 into the original ontology subset state again. Through this process, the deserialization processing unit 113 transfers the ontology subset, which is serialized and stored in the binary format, to the ontology storage unit 120 according to the request of the ontology model unit 111, Re-processing of the preprocessing process can be reduced or avoided. That is, if a candidate ontology that is the same as the ontology subset previously stored in the ontology storage unit 120 is received, the ontology subsets stored in the ontology storage unit 120 are stored in the ontology storage unit 120 without repeating the pre- (Reconstructed) through the deserialization processing unit 113 and can transfer the reconstructed data to the distributed processing unit 130. By skipping the preprocessing process through such a process, the computational resources consumed in the preprocessing process can be reduced.

수신된 후보 온톨로지를 서브셋별로 세분화하여 서브셋 온톨로지를 생성하는 전처리부(110)의 전처리 과정은 실행 시간 동안 더 적은 메모리 풋프린트(Memory Footprint)를 가능하게 한다.The preprocessing process of the preprocessing unit 110, which subdivides the received candidate ontologies by subset to generate a subset ontology, enables less memory footprint during execution time.

도 4는 본 발명의 일 실시예에 따른 라지 스케일 온톨로지 매칭 장치의 분산 처리부를 나타내는 상세도이다.4 is a detailed view showing a distributed processing unit of a large scale ontology matching apparatus according to an embodiment of the present invention.

도 4를 참조하면, 본 발명의 일 실시예에 따른 라지 스케일 온톨로지 매칭 장치의 분산 처리부(130)는 병렬 전략부(131), 분산부(132) 및 병렬 인터페이스(133)를 포함한다.4, the distributed processing unit 130 of the apparatus for matching a large scale ontology according to an embodiment of the present invention includes a parallel strategy unit 131, a distribution unit 132, and a parallel interface 133.

병렬 전략부(131)는 참여 노드의 개수와 각 참여 노드가 구비한 개별 코어(멀티 코어)의 수를 확인한다. 분산 환경은 많은 수의 참여 노드가 참여하게 된다. 참여 노드는 일반적으로 사용되는 데스크탑 및 노트북과 같은 개인용 컴퓨터(Personal Computer)뿐만 아니라, 워크스테이션(Workstation) 및 서버 컴퓨터와 같이 연산 처리가 가능한 단말이나 장치를 모드 포함할 수 있다. 본 발명에 따른 라지 스케일 온톨로지 매칭 장치는 후보 온톨로지의 매핑을 위한 온톨로지 매칭을 참여 노드의 개별 코어에 분산시켜 병렬 처리한다. 따라서, 병렬 전략부(131)는 먼저 참여 노드의 수와 각 첨여 노드가 구비한 개별 코어의 수를 확인하여, 가용 연산 자원을 파악한다. 그리고, 병렬 전략부(131)는 확인된 참여 노드 및 개별 코어의 수와 분산 알고리즘을 고려하여, 최적의 분산 갯수를 설정한다.The parallel strategy unit 131 confirms the number of participating nodes and the number of individual cores (multicore) included in each participating node. In the distributed environment, a large number of participating nodes participate. The participating node may include a terminal or an apparatus capable of performing arithmetic processing such as a workstation and a server computer, as well as a personal computer such as a commonly used desktop and a notebook. The large scale ontology matching apparatus according to the present invention distributes the ontology matching for the mapping of the candidate ontology to the individual cores of the participating nodes and processes them in parallel. Accordingly, the parallel scheduling unit 131 first ascertains the number of participating nodes and the number of individual cores included in each attached node, and grasps available computing resources. Then, the parallel strategy unit 131 sets the optimal number of dispersions considering the number of identified participating nodes and individual cores, and the distribution algorithm.

분산부(132)는 병렬 전략부(131)에서 참여 노드 및 개별 코어의 수와 분산 알고리즘을 고려하여 설정된 최적의 분산 갯수를 고려하여, 온톨로지 서브셋을 분할하고, 분할된 온톨로지 서브셋에 매칭 라이브러리의 매칭 알고리즘을 적용하여 매칭 스레드를 생성한다. 생성된 매칭 스레드는 매칭 요구, 매칭 작업 및 매칭 태스크를 포함한다. 매칭 라이브러리에 저장된 매칭 알고리즘은 후보 온톨로지 사이의 온톨로지 매핑을 생성하기 위해 후보 온톨로지를 구성하는 개별 구성요소 사이의 매칭을 연산하기 위한 알고리즘으로 동의어 기반 매칭(Synonym-based matching), 라벨 기반 매칭(Label-based matching), 광의어 기반 매칭(Broader Term-based matching) 및 자식 기반 매칭(Child-based matching) 등과 같이 온톨로지 매칭에 적용할 수 있는 매칭 알고리즘을 포함할 수 있다.The distributing unit 132 divides the ontology subset in consideration of the number of participating nodes and the individual cores and the optimal number of dispersions considering the dispersion algorithms in the parallel strategy unit 131, To generate a matching thread. The generated matching thread includes a matching request, a matching operation, and a matching task. The matching algorithm stored in the matching library is an algorithm for computing the matching between the individual components constituting the candidate ontology to generate the ontology mapping between the candidate ontologies. The matching algorithm includes a synonym-based matching, a label- based matching, broader term-based matching, and child-based matching, which can be applied to ontology matching.

분산부(132)에서 생성된 매칭 요구는 참여 노드(또는 참여 노드의 CPU) 각각에 대응하여 전달되는 매칭 스레드(Mathing Thread)이다. 병렬 전략부(131)는 먼저 참여 노드(또는 참여 노드의 CPU)의 수를 고려하여 온톨로지 서브셋을 각각의 참여 노드(또는 참여 노드의 CPU)에 대응하는 하나 이상의 매칭 요구로 분할한다. 매칭 요구의 최대 개수는 참여 노드(또는 참여 노드의 CPU)의 개수가 될 수 있다. 즉, 매칭 요구는 온톨로지 병렬 처리를 위해 개별 노드에 할당되는 병렬 처리 작업이 될 수 있다. 그리고, 분산부(132)는 분할된 매칭 요구를 다시 해당 매칭 요구가 할당되는 참여 노드의 코어의 수에 따라 매칭 작업으로 분할하여 해당 매칭 요구에 포함시킨다. 즉, 매칭 작업은 개별 코어에 할당되는 병렬 처리 작업으로서, 해당 매칭 요구가 할당된 참여 노드의 코어의 수가 4개의 코어를 가진다면, 해당 매칭 요구는 4개의 매칭 작업을 포함할 수 있다. 다음으로, 분산부(132)는 개별 코어에 할당되는 매칭 작업을 실제 매칭 프로세싱을 수행하는 단위인 매칭 태스크로 분할하여 매칭 작업에 포함시킨다.The matching request generated by the distributing unit 132 is a mathing thread transmitted corresponding to each of the participating nodes (or the CPUs of the participating nodes). The parallel strategy unit 131 first divides the ontology subset into one or more matching requests corresponding to each participating node (or the CPU of the participating node) considering the number of participating nodes (or CPUs of the participating nodes). The maximum number of matching requests may be the number of participating nodes (or participating node's CPUs). That is, the matching request can be a parallel processing task that is assigned to an individual node for ontology parallel processing. The distributing unit 132 divides the divided matching request into a matching operation according to the number of cores of the participating node to which the matching request is assigned, and includes the divided matching request in the matching request. That is, if the matching operation is a parallel processing task assigned to an individual core, and the number of cores of the participating node to which the matching request is assigned has four cores, the matching request may include four matching operations. Next, the distributing unit 132 divides the matching task allocated to the individual core into a matching task, which is a unit for performing actual matching processing, and includes the matching task in the matching task.

매칭 요구, 매칭 작업 및 매칭 태스크는 전체 매칭 프로세스를 위한 세 개의 추상화 계층(Layer of Abstraction)이다. 병렬 처리 프로세싱을 위한 분산 알고리즘에서 서로 다른 레벨에서 분류를 제공해야 하기 때문에, 분산부(132)의 분산 알고리즘은 실행중인 모든 작업의 트랙(Track)을 보존해야 한다.Matching requests, matching tasks, and matching tasks are three layers of abstraction for the entire matching process. Since the classification algorithms for parallel processing must provide classification at different levels, the distribution algorithm of the distributor 132 must preserve the tracks of all running tasks.

매칭 작업은 참여 노드의 개별 코어에 할당되어 매칭 프로세스를 수행하는 단위로서, 하나 이상의 매칭 태스크의 모음이 될 수 있다. 그리고, 매칭 요구는 하나의 참여 노드(또는 참여 노드의 CPU)에서 실행되는 모든 매칭 작업의 모음이다. A matching task is a unit that is assigned to an individual core of a participating node and performs a matching process, and may be a collection of one or more matching tasks. And, the matching request is a collection of all the matching tasks that are executed in one participating node (or CPU of the participating node).

예를 들어, 4개의 참여 노드가 분산 환경에 존재하고, 4개의 참여 노드가 모두 4개의 코어를 가지며, 수신된 온톨로지 서브셋이 총 2000개의 매칭을 필요로 한다면, 분산부(132)는 총 2000개의 매칭을 참여 노드의 수에 따라 4등분하여 총 4개의 매칭 요구로 분할한다. 분할된 4개의 매칭 요구 각각은 500개의 매칭을 포함한다. 그리고, 분산부(132)는 4개의 매칭 요구 각각을 코어의 수를 고려하여 4개의 매칭 작업으로 분류한다. 그러면, 각각의 매칭 작업은 500개의 매칭을 4로 나눈 125개의 매칭을 포함한다. 결국, 하나의 매칭 작업은 125개의 매칭을 포함하며, 개별 매칭 작업에 포함된 매칭 태스크는 125개가 된다. 즉, 4개의 매칭 요청 각각은 4개의 매칭 작업을 포함하며, 4개의 매칭 작업 각각은 125개의 매칭 태스크를 포함하여 결국, 총 2000개의 매칭을 포함하는 하나 이상의 온톨로지 서브셋은 4개의 매칭 요청으로 분할되고, 각각의 매칭 요청은 4개의 매칭 작업을 포함하며, 각각의 매칭 작업은 125개의 매칭 태스크를 포함한다. 매칭 요청, 매칭 작업 및 매칭 태스크의 개수는 참여 노드 및 개별 참여 노드의 수뿐만 아니라, 가장 최적화된 병렬 처리를 수행할 수 있는 시나리오를 고려하여 설정될 수 있다. 그리고, 각각의 매칭 작업은 스스로 다른 매칭 작업 및 원격에서 동작하는 다른 매칭 요구에 독립적이다. For example, if four participating nodes are in a distributed environment, all four participating nodes have four cores, and the received ontology subset requires a total of 2000 matches, The matching is divided into 4 equal parts according to the number of participating nodes, and is divided into a total of 4 matching requests. Each of the four matching requests divided comprises 500 matches. Then, the distributing unit 132 classifies each of the four matching requests into four matching jobs in consideration of the number of cores. Then, each matching operation includes 125 matching, which is 500 divided by 4. As a result, one matching task includes 125 matches, and the number of matching tasks included in the individual matching task is 125. That is, each of the four matching requests includes four matching tasks, each of the four matching tasks including 125 matching tasks, so that one or more of the ontology subsets comprising a total of 2000 matches are divided into four matching requests , Each matching request includes four matching tasks, each matching task including 125 matching tasks. The number of matching requests, matching tasks, and matching tasks can be set considering not only the number of participating nodes and individual participating nodes, but also scenarios that can perform the most optimized parallel processing. And each matching operation is independent of other matching operations and other matching requirements that operate remotely.

병렬 인터페이스(133)는 참여 노드 및 각 참여 노드의 개별 코어 각각에 대응하는 매칭 스레드를 전달한다. 하나의 매칭 요청은 대응하는 하나의 참여 노드로 전달된다. 하나의 참여 노드로 전달된 하나의 매칭 요청은 해당 참여 노드에 포함된 하나 이상의 코어 각각에 대응하는 매칭 작업을 포함한다. 하나의 매칭 요청에 포함된 하나 이상의 매칭 작업은 대응하는 코어에 일 대 일로 할당되다. 그리고, 하나의 코어에 할당된 하나의 매칭 작업은 포함된 매칭 태스크를 통해 해당 코어에서 매칭 프로세싱을 수행한다. 이를 통해, 본 발명에 따른 라지 스케일 온톨로지 매칭 장치는 분산 환경을 구성하는 참여 노드의 코어 각각에서 매칭 프로세싱을 병렬처리할 수 있다. The parallel interface 133 conveys a matching thread corresponding to each participating node and each individual core of each participating node. One matching request is forwarded to the corresponding one participating node. One matching request delivered to one participating node includes a matching operation corresponding to each of the one or more cores included in the participating node. One or more matching operations included in one matching request are assigned one-to-one to the corresponding cores. One matching task assigned to one core performs matching processing in the corresponding core through the included matching task. Accordingly, the large scale ontology matching apparatus according to the present invention can parallel process the matching processing in each of the cores of the participating nodes constituting the distributed environment.

도 5는 본 발명의 일 실시예에 따른 라지 스케일 온톨로지 매칭 장치에서 개별 코어로 전달되는 매칭 스레드의 일 실시예를 나타내는 도면이다.5 is a diagram illustrating an embodiment of a matching thread that is transmitted to an individual core in a large scale ontology matching apparatus according to an embodiment of the present invention.

도 5를 참조하면, 본 발명에 따른 라지 스케일 온톨로지 매칭 장치의 분산 처리부에서 생성된 매칭 스레드는 각각의 노드 및 각각의 개별 노드에 대응하는 매칭 요청, 매칭 작업 및 매칭 태스크를 포함할 수 있다. 예를 들어, 분산 환경에 제1 노드(510), 제2 노드(520) 및 제3 노드(530)를 포함하는 세 개의 참여 노드가 연결되어 있다고 가정한다. 제1 노드(510)는 두 개의 코어(511,512)를 포함하고, 제2 노드(520)는 네 개의 코어(521,522,523,524)를 포함하며, 제3 노드(530)는 두 개의 코어(531,532)를 포함한다. 라지 스케일 온톨로지 매칭 장치는 소스 온톨로지 및 타겟 온톨로지를 포함하는 후보 온톨로지로부터 전처리 과정 및 분산 처리 과정을 통해 둘 이상의 매칭 스레드를 생성하고, 생성된 둘 이상의 매칭 스레드를 각 참여 노드의 개별 코어에 전달한다. Referring to FIG. 5, the matching thread generated in the distributed processing unit of the large scale ontology matching apparatus according to the present invention may include a matching request, a matching operation, and a matching task corresponding to each node and each individual node. For example, assume that three participating nodes including a first node 510, a second node 520, and a third node 530 are connected to a distributed environment. The first node 510 includes two cores 511 and 512 and the second node 520 includes four cores 521,522,523 and 524 and the third node 530 includes two cores 531 and 532 . The large scale ontology matching apparatus generates two or more matching threads from a candidate ontology including a source ontology and a target ontology through a preprocessing process and a distribution process, and delivers the generated two or more matching threads to the individual cores of the participating nodes.

매칭 요구는 참여 노드(또는 참여 노드의 CPU) 각각에 대응하여 전달되는 매칭 스레드(Mathing Thread)이다. 분산 처리부(130)는 먼저 참여 노드(또는 참여 노드의 CPU)의 수를 고려하여 온톨로지 서브셋을 각각의 참여 노드(또는 참여 노드의 CPU)에 대응하는 하나 이상의 매칭 요구로 분할한다. 매칭 요구의 최대 개수는 참여 노드(또는 참여 노드의 CPU)의 개수가 될 수 있다. 즉, 매칭 요구는 온톨로지 병렬 처리를 위해 개별 노드에 할당되는 병렬 처리 작업이 될 수 있다. 그리고, 라지 스케일 온톨로지 매칭 장치는 분할된 매칭 요구를 다시 해당 매칭 요구가 할당되는 참여 노드의 코어의 수에 따라 매칭 작업으로 분할하여 해당 매칭 요구에 포함시킨다. 즉, 매칭 작업은 개별 코어에 할당되는 병렬 처리 작업으로서, 해당 매칭 요구가 할당된 참여 노드의 코어의 수가 4개의 코어를 가진다면, 해당 매칭 요구는 4개의 매칭 작업을 포함할 수 있다. 다음으로, 분산 처리부(130)는 개별 코어에 할당되는 매칭 작업을 실제 매칭 프로세싱을 수행하는 단위인 매칭 태스크로 분할하여 매칭 작업에 포함시킨다. The matching request is a matching thread that is delivered corresponding to each of the participating nodes (or the CPUs of the participating nodes). The distributed processing unit 130 firstly divides the ontology subset into one or more matching requests corresponding to the respective participating nodes (or the CPUs of the participating nodes) considering the number of the participating nodes (or the CPUs of the participating nodes). The maximum number of matching requests may be the number of participating nodes (or participating node's CPUs). That is, the matching request can be a parallel processing task that is assigned to an individual node for ontology parallel processing. The large scale ontology matching apparatus divides the divided matching request into matching operations according to the number of cores of the participating nodes to which the corresponding matching request is assigned, and includes the divided matching requests in the matching request. That is, if the matching operation is a parallel processing task assigned to an individual core, and the number of cores of the participating node to which the matching request is assigned has four cores, the matching request may include four matching operations. Next, the distributed processing unit 130 divides the matching task allocated to the individual core into a matching task, which is a unit for performing actual matching processing, and incorporates the matching task into the matching task.

도 5에서는 총 세 개의 노드(510,520,530)를 포함하고 있으며, 라지 스케일 온톨로지 매칭 장치는 후보 온톨로지를 서브셋으로 분류하고, 분산 알고리즘 및 매칭 알고리즘을 적용하여 세 개의 매칭 요구(MR1, MR2, MR3)를 포함하는 매칭 스레드를 생성한다. 제1 매칭 요구(MR1)는 제1 노드(510)에 대응하고, 제2 매칭 요구(MR2)는 제2 노드(520)에 대응하며, 제3 매칭 요구(MR3)는 제3 노드(530)에 각각 대응한다. 제1 매칭 요구(MR1)는 제1 노드(510)가 구비한 두 개의 코어(511,512)에 대응하여 두 개의 매칭 작업(511-1,512-1)을 포함하며, 제2 매칭 요구(MR2)는 제2 노드(520)가 구비한 네 개의 코어(521,522,523,524)에 대응하여 네 개의 매칭 작업(521-1,522-1,523-1,524-1)을 포함하고, 제3 매칭 요구(MR3)는 제3 노드(530)가 구비한 두 개의 코어(531,532)에 대응하여 두 개의 매칭 작업(531-1,532-1)을 포함한다.5, the large scale ontology matching apparatus classifies the candidate ontology into a subset and applies three matching requests MR1, MR2, and MR3 by applying a distribution algorithm and a matching algorithm. To create a matching thread. The first matching request MR1 corresponds to the first node 510 and the second matching request MR2 corresponds to the second node 520 and the third matching request MR3 corresponds to the third node 530, Respectively. The first matching request MR1 includes two matching operations 511-1 and 512-1 corresponding to the two cores 511 and 512 of the first node 510 and the second matching request MR2 includes 522-1, 523-1, and 524-1 corresponding to the four cores 521, 522, 523 and 524 of the second node 520 and the third matching request MR3 includes the third node 530, And two matching operations 531-1 and 532-1 corresponding to the two cores 531 and 532 provided in the first and second cores 531 and 532, respectively.

제1 내지 제3 매칭 요구(MR1,MR2,MR) 및 각 매칭 요구에 포함된 매칭 작업(511-1,512-1,521-1,522-1,523-1,524-1,531-1,532-1)은 분산 환경에 연결된 참여 노드(510,520,530) 및 각 참여 노드의 개별 코어(511,512,521,522,523,524,531,532)에 대응하여 생성된다. 생성된 제1 내지 제3 매칭 요구(MR1,MR2,MR3)는 각각 대응하는 제1 노드(510), 제2 노드(520) 및 제3 노드(530) 각각에 전달된다. 라지 스케일 온톨로지 매칭 장치에서 각 노드로 매칭 요구가 전달되는 과정은 라지 스케일 온톨로지 매칭 장치로부터 각 노드로 직접 전달될 수 있으며, 분산 환경의 구조에 따라 라지 스케일 온톨로지 매칭 장치로부터 하나의 노드로 전달된 후, 연결된 노드를 따라 순차적으로 전달될 수 있다.The matching operations 511-1, 512-1, 521-1, 522-1, 523-1, 524-1, 531-1, and 532-1 included in each of the matching requests MR1, MR2, MR, 510, 520, 530) and individual cores 511, 512, 521, 522, 523, 524, 531, 532 of each participant node. The generated first to third matching requests MR1, MR2 and MR3 are respectively transmitted to the corresponding first node 510, the second node 520 and the third node 530, respectively. The process of transmitting the matching request to each node in the large scale ontology matching apparatus can be directly transferred to each node from the large scale ontology matching apparatus and is transferred from the large scale ontology matching apparatus to one node according to the structure of the distributed environment , And may be sequentially transmitted along the connected nodes.

도 6은 본 발명의 일 실시예에 따른 라지 스케일 온톨로지 매칭 장치의 데이터 흐름을 나타내는 도면이다.6 is a diagram illustrating a data flow of a large scale ontology matching apparatus according to an embodiment of the present invention.

도 6을 참조하면, 본 발명의 일 실시예에 따른 라지 스케일 온톨로지 매칭 장치의 데이터 흐름은 먼저 전처리부(110)는 소스 온톨로지 및 타겟 온톨로지를 포함하는 후보 온톨로지를 수신한다(S601). 소스 온톨로지 및 타겟 온톨로지를 포함하는 후보 온톨로지는 서로 1:1로 매칭할 수 있으며, 하나의 소스 온톨로지가 둘 이상의 타겟 온톨로지에 매칭하거나, 둘 이상의 소스 온톨로지가 하나의 타겟 온톨로지에 매칭할 수 있다. Referring to FIG. 6, in the data flow of the large scale ontology matching apparatus according to an embodiment of the present invention, the preprocessing unit 110 receives a candidate ontology including a source ontology and a target ontology (S601). The candidate ontologies including the source ontology and the target ontology may match one to one with each other, one source ontology may match two or more target ontologies, or two or more source ontologies may match one target ontology.

그리고, 전처리부(110)는 수신된 소스 온톨로지 및 타겟 온톨로지를 포함하는 후보 온톨로지를 하나 이상의 서브셋으로 분류한다(S602). 온톨로지는 일반적으로 5개의 서브셋으로 구성될 수 있다. 온톨로지를 구성하는 서브셋은 크게 이름, 레이블, 관계, 공리 및 특성으로 구분될 수 있다. 하지만, 온톨로지를 상술한 다섯 가지의 서브셋으로만 구분할 수 있는 것은 아니며, 전처리부(110)는 수신된 후보 온토로지의 종류나 특성에 따라 숫자에 관계없이 하나 이상의 서브셋으로 분류할 수 있다. 그리고, 전처리부(110)는 분류된 온톨로지 서브셋을 분산 처리부(130)로 전달한다(S603). Then, the preprocessing unit 110 classifies the candidate ontology including the received source ontology and the target ontology into one or more subsets (S602). An ontology can generally consist of five subsets. The subset that makes up the ontology can be broadly divided into name, label, relation, axiom and property. However, the ontology can not be divided into only the five subset described above, and the preprocessing unit 110 can classify the ontology into one or more subsets according to the type and characteristics of the received candidate ontology. Then, the preprocessing unit 110 transfers the classified ontology subset to the distributed processing unit 130 (S603).

다음으로 전처리부(110)는 분류된 온톨로지 서브셋을 직렬화한다(S604). 전처리부(110)는 온톨로지 서브셋을 바이너리 형태로 직렬화하여 데이터 처리 효율 및 저장 공간의 효율을 높일 수 있다. 전처리부(110)는 직렬화된 온톨로지 서브셋을 온톨로지 저장부(120)로 전달하여 저장한다(S605). 이 후, 동일한 후보 온톨로지가 수신되는 경우, 온톨로지 저장부(120)에 저장된 대응하는 온톨로지 서브셋을 호출하여 사용한다. 이를 통해, 서브셋으로 분류하는 전처리 과정을 생략함으로써, 시간 및 연산 자원을 절약할 수 있다.Next, the preprocessing unit 110 serializes the classified ontology subset (S604). The preprocessing unit 110 serializes the ontology subset in a binary form to increase data processing efficiency and storage space efficiency. The preprocessing unit 110 transfers the serialized ontology subset to the ontology storage unit 120 and stores it (S605). Thereafter, when the same candidate ontology is received, the corresponding ontology subset stored in the ontology storage unit 120 is called and used. This saves time and computational resources by omitting the preprocessing process of subsetting.

전처리부(110)로부터 분산 처리부(130)로 온톨로지 서브셋이 전달(S603)되면, 분산 처리부(130)는 참여 노드(30)의 개수와 각 참여 노드(30)에 구비된 개별 코어의 개수를 확인한다(S606). 분산 처리부(130)는 먼저 참여 노드(30)의 수와 각 첨여 노드가 구비한 개별 코어의 수를 확인하여, 가용 연산 자원을 파악한다. 그리고, 분산 처리부(130)는 확인된 참여 노(30)드 및 개별 코어의 수와 분산 알고리즘을 고려하여, 최적의 분산 갯수를 설정한다.When the ontology subset is transferred from the pre-processing unit 110 to the distributed processing unit 130 in step S603, the distribution processing unit 130 checks the number of the participating nodes 30 and the number of individual cores included in each participating node 30 (S606). The distributed processing unit 130 first determines the number of participating nodes 30 and the number of individual cores included in each attached node, and grasps available computing resources. Then, the dispersion processing unit 130 sets the optimal number of dispersions considering the number of the participating nodes 30 and the individual cores and the dispersion algorithm that have been confirmed.

분산 처리부(130)는 참여 노드(30) 및 개별 코어의 수와 분산 알고리즘을 고려하여 설정된 최적의 분산 갯수를 고려하여, 온톨로지 서브셋을 분할하고(S607), 분할된 온톨로지 서브셋에 매칭 라이브러리의 매칭 알고리즘을 적용하여 매칭 스레드를 생성한다(S608). 생성된 매칭 스레드는 매칭 요구, 매칭 작업 및 매칭 태스크를 포함한다. 매칭 라이브러리에 저장된 매칭 알고리즘은 후보 온톨로지 사이의 온톨로지 매핑을 생성하기 위해 후보 온톨로지를 구성하는 개별 구성요소 사이의 매칭을 연산하기 위한 알고리즘으로 동의어 기반 매칭(Synonym-based matching), 라벨 기반 매칭(Label-based matching), 광의어 기반 매칭(Broader Term-based matching) 및 자식 기반 매칭(Child-based matching) 등과 같이 온톨로지 매칭에 적용할 수 있는 매칭 알고리즘을 포함할 수 있다.The distributed processing unit 130 divides the ontology subset in consideration of the number of distributed nodes and the number of individual cores and the optimal distribution number set in consideration of the distribution algorithm (S607), and compares the divided ontology subset with the matching library matching algorithm To generate a matching thread (S608). The generated matching thread includes a matching request, a matching operation, and a matching task. The matching algorithm stored in the matching library is an algorithm for computing the matching between the individual components constituting the candidate ontology to generate the ontology mapping between the candidate ontologies. The matching algorithm includes a synonym-based matching, a label- based matching, broader term-based matching, and child-based matching, which can be applied to ontology matching.

분산 처리부(130)는 먼저 참여 노드(30)의 수를 고려하여 온톨로지 서브셋을 각각의 참여 노드(30)에 대응하는 하나 이상의 매칭 요구로 분할한다. 매칭 요구의 최대 개수는 참여 노드(30)의 개수가 될 수 있다. 그리고, 분산 처리부(130)는 분할된 매칭 요구를 다시 해당 매칭 요구가 할당되는 참여 노드(30)의 코어의 수에 따라 매칭 작업으로 분할하여 해당 매칭 요구에 포함시킨다. 매칭 요구, 매칭 작업 및 매칭 태스크는 전체 매칭 프로세스를 위한 세 개의 추상화 계층(Layer of Abstraction)이다. 병렬 처리 프로세싱을 위한 분산 알고리즘에서 서로 다른 레벨에서 분류를 제공해야 하기 때문에, 분산부(132)의 분산 알고리즘은 실행중인 모든 작업의 트랙(Track)을 보존해야 한다. 매칭 요청, 매칭 작업 및 매칭 태스크의 개수는 참여 노드(30) 및 개별 참여 노드(30)의 수뿐만 아니라, 가장 최적화된 병렬 처리를 수행할 수 있는 시나리오를 고려하여 설정될 수 있다. 그리고, 각각의 매칭 작업은 스스로 다른 매칭 작업 및 원격에서 동작하는 다른 매칭 요구에 독립적이다. The distributed processing unit 130 first divides the ontology subset into one or more matching requests corresponding to the respective participating nodes 30 in consideration of the number of participating nodes 30. The maximum number of matching requests may be the number of participating nodes 30. The distributed processing unit 130 divides the divided matching request into a matching operation according to the number of cores of the participating node 30 to which the corresponding matching request is assigned, and includes the divided matching request in the matching request. Matching requests, matching tasks, and matching tasks are three layers of abstraction for the entire matching process. Since the classification algorithms for parallel processing must provide classification at different levels, the distribution algorithm of the distributor 132 must preserve the tracks of all running tasks. The number of matching requests, matching tasks, and matching tasks may be set considering the number of participating nodes 30 and individual participating nodes 30, as well as scenarios that can perform the most optimized parallel processing. And each matching operation is independent of other matching operations and other matching requirements that operate remotely.

매칭 스레드를 생성한 분산 처리부(130)는 생성된 각각의 매칭 스레드를 대응하는 참여 노드(30)에 전달한다(S609). 그리고, 분산 처리부(130)로부터 대응하는 매칭 스레드를 수신받은 각 노드는 수신된 매칭 스레드의 매칭 요청에 포함된 매칭 작업을 구비된 개별 노드에 할당한다. 하나의 매칭 요청은 대응하는 하나의 참여 노드(30)로 전달된다. 하나의 참여 노드(30)로 전달된 하나의 매칭 요청은 해당 참여 노드(30)에 포함된 하나 이상의 코어 각각에 대응하는 매칭 작업을 포함한다. 하나의 매칭 요청에 포함된 하나 이상의 매칭 작업은 대응하는 코어에 일 대 일로 할당되다. The distributed processing unit 130 that generated the matching thread delivers each generated matching thread to the corresponding participating node 30 (S609). Each node that receives the corresponding matching thread from the distributed processing unit 130 assigns the matching operation included in the matching request of the received matching thread to the individual node. One matching request is forwarded to a corresponding one participating node 30. One matching request transmitted to one participating node 30 includes a matching operation corresponding to each of one or more cores included in the participating node 30. [ One or more matching operations included in one matching request are assigned one-to-one to the corresponding cores.

참여 노드(30)에 의해 매칭 스레드에 포함된 매칭 작업을 할당받은 참여 노드(30)의 개별 코어는 매칭 작업에 포함된 매칭 태스크에 기초하여 매칭 연산을 수행한다(S610). 개별 코어에 할당된 매칭 작업은 하나 이상의 매칭 태스크를 포함한다. 매칭 태스크는 후보 온톨로지를 구성하는 구성 요소의 매칭 연산을 수행하기 위한 매칭 알고리즘 및 세분화되어 분할된 온톨로지 서브셋을 포함한다. 개별 코어는 매칭 태스크에 기초하여 온톨로지 매칭 연산을 수행한다. 그리고, 참여 노드(30)는 개별 코어에서 수행된 매칭 연산에 의해 생성된 매칭 결과를 집산부(140)로 전달한다(S611). 집산부(140)는 분산 처리부(130)로부터 매칭 스레드를 전달받은 참여 노드로부터 매칭 스레드를 통해 매칭 연산을 수행한 매칭 결과를 수신한다. 그리고, 집산부(140)는 참여 노드(30)로부터 수신된 매칭 결과를 합산하여 온톨로지 매핑을 생성한다(S612). 온톨로지 매핑은 둘 이상의 후보 온톨로지의 구성요소 사이의 개별 매칭 결과의 집합이다. 그리고, 집산부(140)는 참여 노드(30)의 개별 코어에서 매칭 작업의 연산에 의해 연산된 매칭 결과를 축적하고, 축적된 매칭 결과에 브릿지 패턴(Bridge Pattern)을 적용하여 온톨로지 매핑의 형식적 표현(Formal Representation)을 생성한다. 브릿지 패턴은 매핑 파일의 유형으로서, 개별 사용자에 따라 매핑 파일을 위해 정의된 개별 형식(custom defined format)이다. 즉, 집산부(140)는 브릿지 패턴을 통해 개별 사용자가 요구하는 개별 형식에 맞추어 온톨로지 매핑을 제공할 수 있다. 그리고, 집산부(140)는 사용한 브릿지 패턴을 다른 사용자를 위해 저장한다.The individual core of the participant node 30 that has been assigned the matching task included in the matching thread by the participating node 30 performs the matching operation based on the matching task included in the matching task (S610). A matching task assigned to an individual core includes one or more matching tasks. The matching task includes a matching algorithm for performing a matching operation of the constituent elements of the candidate ontology and a divided and subdivided ontology subset. Individual cores perform ontology matching operations based on matching tasks. Then, the participant node 30 transmits the matching result generated by the matching operation performed in the individual core to the aggregator 140 (S611). The collecting unit 140 receives a matching result of performing a matching operation through a matching thread from a participating node that has received the matching thread from the distribution processing unit 130. [ Then, the collection unit 140 adds the matching result received from the participating node 30 to generate an ontology mapping (S612). The ontology mapping is a set of individual matching results between the components of two or more candidate ontologies. The accumulator 140 accumulates the matching result calculated by the calculation of the matching operation in the individual core of the participating node 30 and applies a bridge pattern to the accumulated matching result to obtain a formal expression of the ontology mapping (Formal Representation). The bridge pattern is a type of mapping file, which is a custom defined format for mapping files according to individual users. That is, the collection unit 140 can provide the ontology mapping according to the individual format requested by the individual user through the bridge pattern. Then, the collection unit 140 stores the used bridge pattern for another user.

도 7은 본 발명의 일 실시예에 따른 라지 스케일 온톨로지 매칭 방법을 나타내는 흐름도이다.7 is a flowchart illustrating a method of matching a large scale ontology according to an embodiment of the present invention.

도 7을 참조하면, 본 발명의 일 실시예에 따른 라지 스케일 온톨로지 매칭 방법은 먼저, 수신된 소스 온톨로지 및 타겟 온톨로지를 포함하는 후보 온톨로지를 하나 이상의 서브셋으로 분류한다(S701). 온톨로지는 일반적으로 5개의 서브셋으로 구성될 수 있다. 온톨로지를 구성하는 서브셋은 크게 이름, 레이블, 관계, 공리 및 특성으로 구분될 수 있다. 하지만, 온톨로지를 상술한 다섯 가지의 서브셋으로만 구분할 수 있는 것은 아니며, 수신된 후보 온토로지의 종류나 특성에 따라 숫자에 관계없이 하나 이상의 서브셋으로 분류할 수 있다. Referring to FIG. 7, in step S701, a large-scale ontology matching method according to an exemplary embodiment of the present invention classifies candidate ontologies including a received source ontology and a target ontology into one or more subsets. An ontology can generally consist of five subsets. The subset that makes up the ontology can be broadly divided into name, label, relation, axiom and property. However, the ontology can not be divided into the five subsets described above, but can be classified into one or more subsets regardless of the number depending on the type and characteristics of the received candidate ontology.

후보 온톨로지가 하나 이상의 서브셋으로 분류되면, 참여 노드의 개수와 각 참여 노드에 구비된 개별 코어의 개수를 확인한다(S702). 라지 스케일 온톨로지 매칭 장치는 먼저 참여 노드의 수와 각 첨여 노드가 구비한 개별 코어의 수를 확인하여, 가용 연산 자원을 파악한다. 그리고, 라지 스케일 온톨로지 매칭 장치는 확인된 참여 노드 및 개별 코어의 수와 분산 알고리즘을 고려하여, 최적의 분산 갯수를 설정한다(S703).If the candidate ontology is classified into one or more subsets, the number of participating nodes and the number of individual cores included in each participating node are checked (S702). The large scale ontology matching apparatus first identifies the number of participating nodes and the number of individual cores included in each attached node, and grasps available computing resources. Then, the large scale ontology matching apparatus sets the optimal number of dispersions considering the number of participating nodes and individual cores determined and the distributed algorithm (S703).

참여 노드 및 개별 코어의 개수를 분산 알고리즘에 적용하여 최적의 분산 갯수가 설정되면, 설정된 최적의 분산 갯수를 고려하여, 온톨로지 서브셋을 분할하고, 분할된 온톨로지 서브셋에 매칭 라이브러리의 매칭 알고리즘을 적용하여 매칭 스레드를 생성한다(S704). 생성된 매칭 스레드는 매칭 요구, 매칭 작업 및 매칭 태스크를 포함한다. 매칭 라이브러리에 저장된 매칭 알고리즘은 후보 온톨로지 사이의 온톨로지 매핑을 생성하기 위해 후보 온톨로지를 구성하는 개별 구성요소 사이의 매칭을 연산하기 위한 알고리즘이다. 라지 스케일 온톨로지 매칭 장치는 먼저 참여 노드의 수를 고려하여 온톨로지 서브셋을 각각의 참여 노드에 대응하는 하나 이상의 매칭 요구로 분할한다. 그리고, 분할된 매칭 요구를 다시 해당 매칭 요구가 할당되는 참여 노드의 코어의 수에 따라 매칭 작업으로 분할하여 해당 매칭 요구에 포함시킨다. When the number of participating nodes and individual cores is applied to the decentralization algorithm to set the optimal number of dispersions, the ontology subset is divided considering the set optimal number of dispersions, and the matching algorithm of the matching library is applied to the divided ontology subset, And creates a thread (S704). The generated matching thread includes a matching request, a matching operation, and a matching task. The matching algorithm stored in the matching library is an algorithm for computing the matching between the individual components constituting the candidate ontology to generate the ontology mapping between the candidate ontologies. The large scale ontology matching device first divides the ontology subset into one or more matching requests corresponding to each participating node, considering the number of participating nodes. Then, the divided matching request is divided into a matching operation according to the number of cores of the participating node to which the corresponding matching request is assigned, and is included in the matching request.

매칭 스레드가 생성되면, 생성된 각각의 매칭 스레드를 대응하는 참여 노드에 전달한다(S705). 매칭 스레드에 포함된 매칭 작업을 할당받은 참여 노드의 개별 코어는 매칭 작업에 포함된 매칭 태스크에 기초하여 매칭 연산을 수행한다. When the matching thread is generated, each generated matching thread is transmitted to the corresponding participating node (S705). The individual cores of the participating nodes that have been assigned the matching tasks included in the matching thread perform matching operations based on the matching tasks included in the matching operation.

참여 노드의 개별 코어에서 매칭 스레드에 기초한 매칭 연산이 수행되면, 매칭 연산에 의한 매칭 결과를 수집하여 합산한다(S706). 라지 스케일 온톨로지 매칭 장치는 참여 노드의 개별 코어에서 수행된 매칭 연산에 의해 생성된 매칭 결과를 수집하고, 수집된 매칭 결과를 합산하여 온톨로지 매핑을 생성한다(S707). 온톨로지 매핑은 둘 이상의 후보 온톨로지의 구성요소 사이의 개별 매칭 결과의 집합이다. 그리고, 라지 스케일 온톨로지 매칭 장치는 참여 노드의 개별 코어에서 매칭 작업의 연산에 의해 연산된 매칭 결과를 축적하고, 축적된 매칭 결과에 브릿지 패턴을 적용하여 온톨로지 매핑의 형식적 표현을 생성한다.When the matching operation based on the matching thread is performed in the individual core of the participating node, the matching result by the matching operation is collected and summed up (S706). The large scale ontology matching apparatus collects matching results generated by the matching operation performed on the individual cores of the participating nodes, and adds the collected matching results to generate an ontology mapping (S707). The ontology mapping is a set of individual matching results between the components of two or more candidate ontologies. The large scale ontology matching apparatus accumulates the matching result calculated by the operation of the matching operation in the individual cores of the participating nodes and generates a formal expression of the ontology mapping by applying the bridge pattern to the accumulated matching result.

도 8은 본 발명의 일 실시예에 따른 라지 스케일 온톨로지 매칭 방법의 전처리 생략 방법을 나타내는 흐름도이다.8 is a flowchart illustrating a method of omitting a pre-processing of a large-scale ontology matching method according to an embodiment of the present invention.

도 8을 참조하면, 본 발명의 일 실시예에 따른 라지 스케일 온톨로지 매칭 방법에서 전처리 과정을 생략하기 위해서는 먼저, 수신된 후보 온톨로지를 서브셋으로 분류한다(S801). 온톨로지는 일반적으로 5개의 서브셋으로 구성될 수 있다. 온톨로지를 구성하는 서브셋은 크게 이름, 레이블, 관계, 공리 및 특성으로 구분될 수 있다.Referring to FIG. 8, in order to omit the preprocessing process in the large-scale ontology matching method according to an embodiment of the present invention, the received candidate ontology is classified into a subset (S801). An ontology can generally consist of five subsets. The subset that makes up the ontology can be broadly divided into name, label, relation, axiom and property.

온톨로지 서브셋으로 분류되면, 분류된 온톨로지 서브셋을 직렬화한다(S802). 둘 이상의 서브셋으로 분류된 온톨로지 서브셋을 바이너리 형태로 직렬화하여 저장 공간을 효율적으로 사용할 수 있으며, 데이터 전송 및 처리의 효율성을 높일 수 있다. 그리고, 직렬화된 온톨로지 서브셋을 저장한다(S803). 온톨로지 서브셋을 직렬화하여 저장함으로써 저장 공간을 효율적으로 사용할 수 있으며, 데이터 처리 효율을 높일 수 있다.If the ontology subset is classified as an ontology subset, the sorted ontology subset is serialized (S802). An ontology subset divided into two or more subsets can be serialized in a binary form to efficiently use the storage space and increase the efficiency of data transmission and processing. Then, the serialized ontology subset is stored (S803). The storage space can be efficiently used by serializing and storing the ontology subset, and the data processing efficiency can be improved.

이 후, 이전에 수신되었던 후보 온톨로지와 동일한 후보 온톨로지가 수신되면, 직렬화된 형태로 저장된 직렬화된 온톨로지 서브셋을 다시 원래의 온톨로지 서브셋 상태로 재구성(De-serialize)한다(S804). 이러한 과정을 통해, 직렬화된 저장한 온톨로지 서브셋을 다시 역직렬화함으로써, 불필요한 전처리 과정의 반복(Re-processing)을 줄이거나 피할 수 있다. 즉, 이전에 저장한 온톨로지 서브셋과 동일한 후보 온톨로지가 수신되는 경우, 서브셋으로 분류하는 전처리 과정을 반복하지 않고, 직렬화된 온톨로지 서브셋을 바로 역직렬화하여 다시 복원(재구성)하여 전처리 과정을 생략함으로써, 전처리 과정에서 소모되는 연산 리소스를 줄일 수 있다.Thereafter, if a candidate ontology identical to the previously received candidate ontology is received, the serialized ontology subset stored in the serialized form is de-serialized into the original ontology subset state again (S804). Through this process, unnecessary re-processing of the preprocessing process can be reduced or avoided by deserializing the serialized stored ontology subset. That is, when a candidate ontology identical to the previously stored ontology subset is received, the serialized ontology subset is directly deserialized and restored (reconstructed) without the preprocessing process, The computational resources consumed in the process can be reduced.

역직렬화를 통해 재구성된 온톨로지 서브셋을 분할하여 참여 노드의 개별 코어에서 분산처리한다(S805). 먼저, 참여 노드의 개수와 각 참여 노드에 구비된 개별 코어의 개수를 확인한다. 그리고, 라지 스케일 온톨로지 매칭 장치는 확인된 참여 노드 및 개별 코어의 수와 분산 알고리즘을 고려하여, 최적의 분산 갯수를 설정한다. 참여 노드 및 개별 코어의 개수를 분산 알고리즘에 적용하여 최적의 분산 갯수가 설정되면, 설정된 최적의 분산 갯수를 고려하여, 온톨로지 서브셋을 분할하고, 분할된 온톨로지 서브셋에 매칭 라이브러리의 매칭 알고리즘을 적용하여 매칭 스레드를 생성한다. 생성된 매칭 스레드는 매칭 요구, 매칭 작업 및 매칭 태스크를 포함한다. 매칭 스레드가 생성되면, 생성된 각각의 매칭 스레드를 대응하는 참여 노드에 전달한다. 매칭 스레드에 포함된 매칭 작업을 할당받은 참여 노드의 개별 코어는 매칭 작업에 포함된 매칭 태스크에 기초하여 매칭 연산을 수행한다. 참여 노드의 개별 코어에서 매칭 스레드에 기초한 매칭 연산이 수행되면, 매칭 연산에 의한 매칭 결과를 수집하여 합산한다. 라지 스케일 온톨로지 매칭 장치는 참여 노드의 개별 코어에서 수행된 매칭 연산에 의해 생성된 매칭 결과를 수집하고, 수집된 매칭 결과를 합산하여 온톨로지 매핑을 생성한다. S805 단계는 도 7의 S703 내지 S707 단계와 동일하게 수행될 수 있다.
The reconstructed ontology subset is deserialized through deserialization and is distributed to individual cores of the participating nodes (S805). First, the number of participating nodes and the number of individual cores in each participating node are checked. The large scale ontology matching apparatus sets the optimal number of dispersions considering the number of identified participating nodes and individual cores and the distribution algorithm. When the number of participating nodes and individual cores is applied to the decentralization algorithm to set the optimal number of dispersions, the ontology subset is divided considering the set optimal number of dispersions, and the matching algorithm of the matching library is applied to the divided ontology subset, Create a thread. The generated matching thread includes a matching request, a matching operation, and a matching task. Once the matching thread is created, it forwards each generated matching thread to the corresponding participating node. The individual cores of the participating nodes that have been assigned the matching tasks included in the matching thread perform matching operations based on the matching tasks included in the matching operation. When the matching operation based on the matching thread is performed in the individual cores of the participating nodes, the matching result by the matching operation is collected and added. The large scale ontology matching device collects the matching results generated by the matching operation performed on the individual cores of the participating nodes, and adds the collected matching results to generate an ontology mapping. Step S805 may be performed in the same manner as steps S703 to S707 of FIG.

상술한 내용을 포함하는 본 발명은 컴퓨터 프로그램으로 작성이 가능하다. 그리고 상기 프로그램을 구성하는 코드 및 코드 세그먼트는 당분야의 컴퓨터 프로그래머에 의하여 용이하게 추론될 수 있다. 또한, 상기 작성된 프로그램은 컴퓨터가 읽을 수 있는 기록매체 또는 정보저장매체에 저장되고, 컴퓨터에 의하여 판독되고 실행함으로써 본 발명의 방법을 구현할 수 있다. 그리고 상기 기록매체는 컴퓨터가 판독할 수 있는 모든 형태의 기록매체를 포함한다.
The present invention including the above-described contents can be written in a computer program. And the code and the code segment constituting the program can be easily deduced by a computer programmer of the field. In addition, the created program can be stored in a computer-readable recording medium or an information storage medium, and can be read and executed by a computer to implement the method of the present invention. And the recording medium includes all types of recording media readable by a computer.

이상 바람직한 실시예를 들어 본 발명을 상세하게 설명하였으나, 본 발명은 전술한 실시예에 한정되지 않고, 본 발명의 기술적 사상의 범위 내에서 당분야에서 통상의 지식을 가진자에 의하여 여러 가지 변형이 가능하다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It is possible.

100: 온톨로지 매칭 장치
110: 전처리부
111: 온톨로지 모델부
112: 직렬화 처리부
113: 역직렬화 처리부
120: 온톨로지 저장부
130: 분산 처리부
140: 집산부100: Ontology matching device
110:
111: ontology model part
112: serialization processing unit
113: Deserialization processor
120: an ontology storage unit
130:
140:

Claims

A preprocessor for generating an ontology subset by classifying the received candidate ontologies into one or more ontology subset;
A distribution processor for dividing the generated ontology subset by applying a distribution algorithm, generating a matching thread by applying a matching algorithm to the divided ontology subset, and delivering the generated matching thread to individual cores of participating nodes; And
A collecting unit for collecting and summing the matching results generated by performing a matching operation based on the matching thread in the individual core to generate an ontology mapping;
Scale ontology matching apparatus.

The method according to claim 1,
The pre-
And generates a serialized ontology subset by serializing the generated ontology subset in a binary form.

3. The method of claim 2,
Further comprising an ontology storage unit that stores the serialized ontology subset and provides a pre-stored serialized ontology subset to the preprocessor upon receipt of a candidate ontology identical to the received candidate ontology.

The method of claim 3,
The pre-
Wherein the ontology subset is generated by reconstructing the serialized ontology subset received from the ontology storage unit through de-serialization.

The method according to claim 1,
The matching thread includes at least one matching request corresponding to each of the participating nodes on a one-to-one basis, one or more matching jobs corresponding one-to-one to individual cores included in the corresponding participating node on a one- And at least one matching task for performing matching operations in the individual cores.

6. The method of claim 5,
The relationship of the matching request, the matching operation and the matching task is

Lt; / RTI >
Wherein MR _i denotes a matching request assigned to each of the participating nodes, MR denotes a set matching request of matching requests MR _i assigned to the participating node, MJ _j denotes a matching request assigned to individual cores MT _k is assigned to an individual core to perform a matching operation.

6. The method of claim 5,
Wherein the matching operation is independent of other matching operations and matching requests operating at other participating nodes.

The method according to claim 1,
The dispersion processing unit may include:
When the candidate ontology is received, the number of participating nodes that can be used for matching and the number of individual cores included in the participant node are checked, and the number of distributed nodes is set by applying the number of participating nodes and the individual cores to the decentralization algorithm. And a matching thread including a matching request, a matching operation, and a matching task is generated by dividing the ontology subset in consideration of the set number of dispersions and applying a matching algorithm.

The method according to claim 1,
In a distributed environment including two or more participating nodes, a process of collecting and aggregating the matching result of the aggregator

,

And

Lt; / RTI >

(The bridge ontology of the matching task) of the matching task,

Is an intermediate bridge ontology of the matching task that collects all the results of all single matching tasks in all matching tasks,

Lt; / RTI >

Is a central bridge ontology from nodes created at all nodes,

Is converted into an ontology mapping result that is an output file as a bridge ontology that is finally generated.

10. The method of claim 9,
The computed matching result is summed up with one ontology object called a bridge ontology. The aggregating unit sums the bridge ontologies, which are the matching results calculated by all participating nodes participating in the distributed environment, to obtain a physical mapping file And outputs the converted result as an ontology mapping result.

In an ontology matching method using a large scale ontology matching apparatus,
Generating an ontology subset by classifying the received candidate ontologies into one or more ontology subset;
Dividing the generated ontology subset using a distribution algorithm, and applying a matching algorithm to the divided ontology subset to generate a matching thread;
Delivering the generated matching thread to an individual core of a participating node;
Generating an ontology mapping by collecting and summing the matching results generated by performing matching operations on the individual cores based on the matching threads;
Dimensional ontology matching method.

12. The method of claim 11,
The step of classifying the received candidate ontology into one or more ontology subset,
Wherein the serialized ontology subset is generated by serializing the generated ontology subset in a binary form.

13. The method of claim 12,
Storing the serialized ontology subset and receiving a candidate ontology identical to the received candidate ontology, reconstructs the pre-stored serialized ontology subset through de-serialization to generate an ontology subset. Scale ontology matching method.

12. The method of claim 11,
The matching thread includes at least one matching request corresponding to each of the participating nodes on a one-to-one basis, one or more matching jobs corresponding one-to-one to individual cores included in the corresponding participating node on a one- And one or more matching tasks for performing matching operations in the individual cores.

15. The method of claim 14,
The relationship of the matching request, the matching operation and the matching task is

Lt; / RTI >
Wherein MR _i denotes a matching request assigned to each of the participating nodes, MR denotes a set matching request of matching requests MR _i assigned to the participating node, MJ _j denotes a matching request assigned to individual cores Wherein MT _k represents a matching task assigned to an individual core to perform a matching operation.

15. The method of claim 14,
Wherein the matching operation is independent of other matching operations and matching requests operating at other participating nodes. &Lt; Desc / Clms Page number 19 >

12. The method of claim 11,
Wherein the generating the matching thread comprises:
Confirming the number of participating nodes in the distributed environment and the number of individual cores included in the participating node;
Applying the number of participating nodes and the individual cores to a variance algorithm to set a number of variances; And
Dividing the ontology subset in consideration of the set number of dispersions and applying a matching algorithm to generate a matching thread including a matching request, a matching operation, and a matching task;
Dimensional ontology matching method.

12. The method of claim 11,
The step of collecting and summing the generated matching results to generate an ontology mapping,

,

And

Lt; / RTI >

(The bridge ontology of the matching task) of the matching task,

Lt; / RTI >

Is a central bridge ontology from nodes created at all nodes,

19. The method of claim 18,
The step of generating an ontology mapping by collecting and summing the matching results generated by performing the matching operation based on the matching thread in the individual core may include summing up the bridge ontologies that are the matching results calculated by all the participating nodes, And outputs the converted ontology mapping result to an ontology mapping result which is a physical mapping file.