KR101940251B1

KR101940251B1 - Spatial knowledge extraction system implemented in Hadoop Map Reduce parallel, distributed computing environment

Info

Publication number: KR101940251B1
Application number: KR1020170018755A
Authority: KR
Inventors: 김인철; 이석준
Original assignee: 경기대학교 산학협력단
Priority date: 2016-11-07
Filing date: 2017-02-10
Publication date: 2019-01-18
Also published as: KR20180051330A

Abstract

하둡 맵리듀스 기반의 공간 지식 추출 시스템이 개시된다. 이 시스템은 하둡 분산 파일 시스템(Hadoop Distributed File System, HDFS) 상에 분산되어 있는 전체 공간 데이터들에 대해 전역 색인과 지역 색인들의 두 층(layer)으로 구성되는 R-트리 색인을 구축하며, 구축된 R-트리 색인을 기반으로 공간 데이터들의 공간 관계를 판별하여 공간 지식을 추출한다.A spatial knowledge extraction system based on Hadoop MapReduce is disclosed. The system builds an R-tree index consisting of two layers, a global index and a local index, for the entire spatial data distributed on the Hadoop Distributed File System (HDFS) Based on the R-tree index, the spatial relation of spatial data is discriminated and spatial knowledge is extracted.

Description

In this paper, we propose a spatial knowledge extraction system based on Hadoop MapReduce.

본 발명은 정성 공간 지식 추출 기술에 관한 것으로, 특히 정량 공간 추론 알고리즘을 이용한 정성 공간 지식 추출 기술에 관한 것이다.The present invention relates to a qualitative spatial knowledge extraction technique, and more particularly, to a qualitative spatial knowledge extraction technique using a quantitative spatial reasoning algorithm.

공간 지식을 습득하는 방법으로는 정성 공간 추론(qualitative spatial reasoning)과 정량 공간 추론(quantitative spatial reasoning)이 있다. 정성 공간 추론은 RCC(Region Connection Calculus)-8, CSD(Cone Shaped Direction Relations)-9 등과 같이 공간 대수(spatial algebra) 이론들에 따라 초기 정성 공간 지식 베이스(initial qualitative spatial knowledge base)를 대상으로 추론 규칙들을 적용하여 새로운 위상 및 방향 관계 지식들을 유도해내는 방식이다. 그리고 정량 공간 추론은 객체 각각의 고유 모양과 위치 정보를 나타내는 공간 데이터를 기초로 기하학적 연산(geometric computation)을 통해 두 공간 객체 간의 정성적 공간 관계를 판별하는 방식이다. 정량 공간 추론은 새로운 정성 공간 지식을 얻기 위해 초기 정성 공간 지식 베이스의 구축을 요구하지 않는 대신, 정밀한 수치 데이터를 요구한다. 초기 지식 베이스의 결핍 문제를 가지고 있는 정성 공간 추론과는 달리, 정량 공간 추론은 OSM(Open Street Map), USGS(United STates Geological Survey), OS OpenData 등과 같이 시맨틱 웹(semantic web) 상에서 잘 구축 되어 있는 웹 규모의 공개 데이터(open data)들을 활용할 수 있다. 이러한 이유로 정량 공간 추론은 정성 공간 추론이 가지고 있는 초기 정성 공간 지식베이스의 결핍 문제를 해결하기 위한 대안이 될 수 있다. 또한, 수치 데이터에는 불확실성이 존재하기 때문에 정량 공간 추론과 정성 공간 추론은 서로 상호보완적으로도 운용이 가능하다. 하지만 현재까지 개발된 정량 공간 추론 시스템들은 모두 단일 머신 컴퓨팅 환경에서 동작하도록 개발되었기 때문에 웹 규모의 공개 데이터를 대상으로 정량 공간 추론을 수행하기에는 성능상 한계가 존재한다.There are qualitative spatial reasoning and quantitative spatial reasoning as methods for acquiring spatial knowledge. The qualitative spatial reasoning is based on the initial qualitative spatial knowledge base according to the spatial algebra theories such as RCC (Region Connection Calculus) -8 and CSD (Cone Shaped Direction Relations) And applying the rules to derive new phase and direction relationship knowledge. And quantitative spatial reasoning is a method of determining the qualitative spatial relation between two spatial objects through geometric computation based on spatial data representing unique shape and position information of each object. Quantitative spatial reasoning does not require the construction of the initial qualitative spatial knowledge base to obtain new qualitative spatial knowledge, but requires precise numerical data. Unlike qualitative spatial reasoning, which has a problem of lack of initial knowledge base, quantitative spatial reasoning is well established on semantic web such as OSM (Open Street Map), USGS (United States Geological Survey), OS OpenData Web-scale open data can be utilized. For this reason, quantitative spatial reasoning can be an alternative to the lack of initial qualitative spatial knowledge base in qualitative spatial reasoning. In addition, since there is uncertainty in the numerical data, quantitative spatial reasoning and qualitative spatial reasoning can be complementary to each other. However, since the quantitative spatial reasoning systems developed so far are designed to operate in a single machine computing environment, there is a performance limitation in performing quantitative spatial inference on web-scale open data.

국내등록특허공보 제10-0685791호 (2007년 2월 22일 공고)Korean Patent Registration No. 10-0685791 (issued on February 22, 2007)

대용량의 공간 데이터 집합으로부터 임의의 두 공간 객체들 사이에 만족되는 공간 관계를 나타내는 정성 공간 지식 베이스를 효율적으로 생성해내기 위한 기술적 방안이 개시된다.A technical solution for efficiently generating a qualitative spatial knowledge base showing a satisfactory spatial relationship between two arbitrary spatial objects from a large-capacity spatial data set is disclosed.

일 양상에 따른 하둡 맵리듀스 기반의 공간 지식 추출 시스템은 제 1 색인 구축부와 데이터 선정부와 제 2 색인 구축부 및 공간 지식 추출부를 포함할 수 있다. 제 1 색인 구축부는 하둡 분산 파일 시스템(Hadoop Distributed File System, HDFS) 상에 분산되어 있는 전체 공간 데이터들에 대해 R-트리 색인을 구축하되, 전체 공간 데이터들을 R-트리의 최하위 노드들을 구성할 최소 경계 사각형(Minimum Boundary Rectangle, MBR) 단위로 그룹화하여 재분배하고, 재분배된 최하위 노드 MBR마다 지역 색인을 구축하며, 구축된 지역 색인들에 대한 메타 정보를 담고 있는 전역 색인을 구축할 수 있다. 데이터 선정부는 제 1 색인 구축부에 의해 R-트리 색인된 최하위 노드 MBR들에 속한 공간 데이터들 중에서 공간 지식 추출을 위한 일부 공간 데이터들을 선정할 수 있다. 제 2 색인 구축부는 데이터 선정부에 의해 선정된 일부 공간 데이터들에 대해 R-트리 색인을 구축하되, 일부 공간 데이터들을 R-트리의 최하위 노드들을 구성할 MBR 단위로 그룹화하여 재분배하고, 재분배된 최하위 노드 MBR마다 지역 색인을 구축하며, 구축된 지역 색인들에 대한 메타 정보를 담고 있는 전역 색인을 구축할 수 있다. 그리고 공간 지식 추출부는 제 2 색인 구축부에 의해 구축된 R-트리 색인을 기반으로 데이터 선정부에 의해 선정된 일부 공간 데이터들의 공간 관계를 판별하여 공간 지식을 추출할 수 있다.According to an aspect of the present invention, a spatial knowledge extraction system based on Hadoop mapper may include a first index construction unit, a data selection unit, a second index construction unit, and a spatial knowledge extraction unit. The first index construction unit constructs an R-tree index for all the spatial data distributed on the Hadoop Distributed File System (HDFS), and allocates the entire spatial data to the minimum nodes constituting the lowest nodes of the R- It is possible to group and redistribute them in units of minimum bounding rectangle (MBR), construct a local index for each redistributed lowest node MBR, and construct a global index containing meta information about the constructed regional indexes. The data selection unit may select some spatial data for spatial knowledge extraction among spatial data belonging to the lowest-order node MBRs indexed by the first index construction unit. The second index construction unit constructs an R-tree index for some spatial data selected by the data selection unit, redirects some of the spatial data to MBR units constituting the lowest nodes of the R-tree and redistributes them, You can build a local index for each node MBR and build a global index that contains meta information about the constructed local indexes. The spatial knowledge extracting unit can extract spatial knowledge by determining the spatial relationship of some spatial data selected by the data selection unit based on the R-tree index constructed by the second index building unit.

소정의 MBR 단위는 HDFS 블록 크기를 가질 수 있다.The predetermined MBR unit may have an HDFS block size.

데이터 선정부는 제 1 색인 구축부에 의해 R-트리 색인된 최하위 노드 MBR들에 속한 공간 데이터들 중에서 범위 질의(range query)를 통해 일부 공간 데이터들을 선정할 수 있다.The data selection unit may select some spatial data from the spatial data belonging to the lowest node MBRs indexed by the first index construction unit through a range query.

데이터 선정부는 입력 범위와 제 1 색인 구축부에 의해 R-트리 색인된 최하위 노드 MBR들 각각의 영역이 교차(intersection)되는지를 검사하며, 입력 범위와 교차되는 영역을 갖는 MBR들만 Map 함수로 분배하여 범위 질의를 수행할 수 있다.The data selection unit checks whether the areas of the lowest node MBRs indexed by the R-tree indexed by the input range and the first index construction unit intersect each other. Only the MBRs having the area intersecting the input range are distributed to Map functions A range query can be performed.

공간 지식 추출부는 기준 데이터와 제 2 색인 구축부에 의해 R-트리 색인된 최하위 노드 MBR들에 속한 공간 데이터들과의 위상 관계 지식을 추출하는 위상 관계 지식 추출부, 및 기준 데이터와 제 2 색인 구축부에 의해 R-트리 색인된 최하위 노드 MBR들에 속한 공간 데이터들과의 위상 관계 지식을 추출하는 방향 관계 지식 추출부를 포함할 수 있으며, 데이터 선정부에 의해 선정된 공간 데이터들은 한 번씩 기준 데이터로 선정될 수 있다.The spatial knowledge extracting unit includes a topological relation knowledge extracting unit for extracting the reference data and the knowledge of the phase relationship between the reference data and the spatial data belonging to the lowest node MBRs indexed by the R-tree by the second index building unit, And a direction relation knowledge extracting unit for extracting a knowledge of a phase relationship with spatial data belonging to the lowest-order node MBRs indexed by the R-tree indexing unit. The spatial data selected by the data selecting unit may be referred to as reference data Can be selected.

위상 관계 지식 추출부는 제 2 색인 구축부에 의해 R-트리 색인된 최하위 노드 MBR들이 기준 데이터가 속한 최하위 노드 MBR과 영역이 교차(intersection)하는지 검사하고, 검사 결과에 따라 MBR들 각각에 교차 레이블 또는 비교차 레이블을 라벨링하고, 기준 데이터와 비교차 레이블이 라벨링된 최하위 노드 MBR에 속한 공간 데이터 간에는 ‘분리(disjoint)’로 위상 관계를 판별하여 위상 관계 지식으로 추출하며, 기준 데이터와 교차 레이블이 라벨링된 최하위 노드 MBR에 속한 공간 데이터 간에는 위상 관계 판별용 모델링을 통해 위상 관계를 판별하여 위상 관계 지식으로 추출할 수 있다.The phase relation knowledge extraction unit checks whether the lowest-order node MBRs indexed by the second index construction unit intersect with the lowest-order node MBR to which the reference data belongs. If the intersection labels or the cross- The comparative label is labeled, and the phase relation between the reference data and the spatial data belonging to the lowest node MBR labeled with the difference label is identified as 'disjoint', extracted as the phase relation knowledge, and the reference data and the crossing label are labeled The phase relation between the spatial data belonging to the lowest node MBR and the phase relation can be extracted by modeling for determination of the phase relation.

위상 관계 지식 추출부는 기준 데이터와 교차 레이블이 라벨링된 최하위 노드 MBR에 속한 공간 데이터 간에는 DE-9IM(Dimensionally Extended nine-Intersection Model)을 이용하여 위상 관계를 판별할 수 있다.The phase relation knowledge extractor can determine the phase relation between the reference data and spatial data belonging to the lowest node MBR labeled with a cross-label using a dimensionally extended nine-intersection model (DE-9IM).

방향 관계 지식 추출부는 기준 데이터가 속한 최하위 노드 MBR의 중심점을 토대로 방향각의 영역들을 모델링하고, Map 함수를 이용하여 각각의 최하위 노드 MBR에 속한 공간 데이터들을 읽어들이고, 읽어들인 공간 데이터들 각각의 MBR과 그 중심점을 모델링하며, 모델링된 공간 데이터의 MBR 중심점이 속한 방향각의 영역에 따라 기준 데이터와 공간 데이터 간의 방향 관계를 각각 판별하여 방향 관계 지식으로 추출할 수 있다.The direction relation knowledge extraction unit models the areas of the direction angle based on the center point of the lowest node MBR to which the reference data belongs, reads the spatial data belonging to each lowest node MBR using the Map function, And the center point, and the directional relationship between the reference data and the spatial data can be discriminated according to the direction angle region to which the MBR center point of the modeled spatial data belongs, and extracted as the directional relationship knowledge.

방향 관계 지식 추출부는 모델링된 공간 데이터의 MBR 중심점이 방향각의 영역들 중 어느 영역에도 속하지 않을 경우에 방향각을 계산하여 방향 관계를 판별할 수 있다.The direction relation knowledge extracting unit can determine the direction relation by calculating the direction angle when the MBR center point of the modeled spatial data does not belong to any one of the regions of the direction angle.

한편, 일 양상에 따른 하둡 맵리듀스 기반의 공간 지식 추출 방법은 하둡 분산 파일 시스템(Hadoop Distributed File System, HDFS) 상에 분산되어 있는 전체 공간 데이터들에 대해 R-트리 색인을 구축하되, 전체 공간 데이터들을 R-트리의 최하위 노드들을 구성할 최소 경계 사각형(Minimum Boundary Rectangle, MBR) 단위로 그룹화하여 재분배하고, 재분배된 최하위 노드 MBR마다 지역 색인을 구축하며, 구축된 지역 색인들에 대한 메타 정보를 담고 있는 전역 색인을 구축하는 제 1 색인 구축 단계, 제 1 색인 구축 단계에 의해 R-트리 색인된 최하위 노드 MBR들에 속한 공간 데이터들 중에서 공간 지식 추출을 위한 일부 공간 데이터들을 선정하는 데이터 선정 단계, 일부 공간 데이터들에 대해 R-트리 색인을 구축하되, 일부 공간 데이터들을 R-트리의 최하위 노드들을 구성할 MBR 단위로 그룹화하여 재분배하고, 재분배된 최하위 노드 MBR마다 지역 색인을 구축하며, 구축된 지역 색인들에 대한 메타 정보를 담고 있는 전역 색인을 구축하는 제 2 색인 구축 단계, 및 제 2 색인 구축 단계에 의해 구축된 R-트리 색인을 기반으로 일부 공간 데이터들의 공간 관계를 판별하여 공간 지식을 추출하는 공간 지식 추출 단계를 포함할 수 있다.Meanwhile, according to one aspect of the present invention, a spatial knowledge extraction method based on Hadoop MapReduce is performed by constructing an R-tree index on all spatial data distributed on a Hadoop Distributed File System (HDFS) Are grouped into a minimum bounding rectangle (MBR) unit to constitute the lowest nodes of the R-tree, and a local index is constructed for each redistributed lowest node MBR, and meta information about the constructed local indexes is stored A first index construction step of constructing a global index having a root node index, a data selection step of selecting some spatial data for spatial knowledge extraction among spatial data belonging to the lowest-order node MBRs indexed by the R-tree by the first index construction step, Tree indexes are constructed for spatial data, and some spatial data are stored in MBR nodes constituting the lowermost nodes of the R-tree. , Constructing a local index for each redistributed least significant node MBR, constructing a global index containing meta information about the constructed local indexes, and constructing a global index by constructing a second index construction step And a spatial knowledge extraction step of extracting spatial knowledge by determining a spatial relation of some spatial data based on the extracted R-tree index.

개시된 기술은 하둡 분산 파일 시스템 상의 분산 공간 데이터 파일에 대한 R-트리 색인과 범위 질의들을 이용함으로써, 웹 규모의 정성 공간 지식 베이스를 매우 효율적으로 추출해낸다.The disclosed technique utilizes R-tree indexes and range queries for distributed spatial data files on the Hadoop distributed file system to efficiently extract web-scale qualitative spatial knowledge bases.

개시된 기술은 공간 지식과 공간 추론 능력이 필수적으로 요구되는 지능형 지리 정보 시스템, 지식 관리 시스템, 질의-응답 시스템, 공간 데이터베이스, 웹 정보 검색, 지능형 서비스 로봇 분야 등에 폭넓게 활용될 수 있다.The disclosed technology can be widely used in the fields of intelligent geographic information systems, knowledge management systems, query-response systems, spatial databases, web information retrieval, and intelligent service robots in which spatial knowledge and spatial reasoning capability are indispensable.

도 1은 일 실시예에 따른 공간 지식을 표현하기 위한 개념도이다.
도 2는 일 실시예에 따른 공간 객체들의 분산 R-트리 색인에 대한 설명을 위한 참조도이다.
도 3은 일 실시예에 따른 하둡 맵리듀스 기반의 공간 지식 추출 시스템의 논리적 구조를 나타낸다.
도 4는 일 실시예에 따른 R-트리 색인 구축 과정을 나타낸다.
도 5는 무작위 샘플 선정 작업의 한 예를 나타낸다.
도 6은 물리적 파티션 분할과 지역 색인 구축의 한 예를 나타낸다.
도 7은 일 실시예에 따른 맵리듀스 기반의 데이터 선정 작업 처리 과정을 나타낸다.
도 8은 도 7에 따른 데이터 선정 작업 처리 과정의 일 예를 나타낸다.
도 9는 일 실시예에 따른 위상 관계 추출 작업 처리 과정을 나타낸다.
도 10은 도 9에 따른 위상 관계 추출 작업 처리 과정의 일 예를 나타낸다.
도 11은 일 실시예에 따른 방향 관계 추출 작업 처리 과정을 나타낸다.
도 12는 도 11에 따른 방향 관계 추출 작업 처리 과정의 일 예를 나타낸다.
도 13은 일 실시예에 따른 하둡 맵리듀스 기반의 공간 지식 추출 시스템의 블록도이다.1 is a conceptual diagram for representing spatial knowledge according to an embodiment.
FIG. 2 is a reference diagram for explaining a distributed R-tree index of spatial objects according to an exemplary embodiment. FIG.
FIG. 3 illustrates a logical structure of a spatial knowledge extraction system based on Hadoop MapReduce according to an embodiment.
FIG. 4 illustrates an R-tree index construction process according to an embodiment.
Figure 5 shows an example of a random sample selection operation.
Figure 6 shows an example of physical partitioning and local index building.
FIG. 7 illustrates a mapping process based on a data selection process according to an exemplary embodiment of the present invention.
FIG. 8 shows an example of a data selection job process according to FIG.
FIG. 9 shows a process of extracting a phase relationship extraction operation according to an embodiment.
FIG. 10 shows an example of a process of extracting a phase relationship extraction operation according to FIG.
FIG. 11 shows a processing procedure of a directional relationship extracting operation according to an embodiment.
FIG. 12 shows an example of a process of extracting a directional relationship extraction operation according to FIG.
13 is a block diagram of a spatial knowledge extraction system based on Hadoop MapReduce according to an embodiment.

전술한, 그리고 추가적인 본 발명의 양상들은 첨부된 도면을 참조하여 설명되는 바람직한 실시예들을 통하여 더욱 명백해질 것이다. 이하에서는 본 발명을 이러한 실시예를 통해 당업자가 용이하게 이해하고 재현할 수 있도록 상세히 설명하기로 한다.BRIEF DESCRIPTION OF THE DRAWINGS The foregoing and further aspects of the present invention will become more apparent from the following detailed description of preferred embodiments with reference to the accompanying drawings. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 일 실시예에 따른 공간 지식을 표현하기 위한 개념도이다. 정성 공간 지식 추출 방식에 대해 설명하기에 앞서, 공간 객체(spatial object)의 기하학적 데이터와 공간 지식이 어떻게 표현되는지, 즉 공간 지식 표현 체계에 대해 도 1을 참조하여 설명한다. 도 1은 GeoSPARQL의 핵심 온톨로지에 CSD(Cone-Shaped Directional relations)-9 이론에서 정의한 9가지 방향 관계 서술자들을 추가한 공간 온톨로지(spatial ontology)를 나타낸다. 공간 객체는 모든 공간 객체들을 나타내는 최상위 클래스(class)이다. 두 공간 객체들 사이의 경계 및 포함 관계는 분리(disjoint), 맞닿음(touches), 동일(equals), 겹침(overlaps), 속함(within), 포함(contains), 교차(crosses)의 총 7가지의 위상 관계 서술자(topological property)들로 표현 가능하며, 두 공간 객체들 사이의 방향 관계는 북(north, N), 북동(north-east, NE), 동(east, E), 남동(south-east, SE), 남(south, S), 남서(south-west, SW), 서(west, W), 북서(north-west, NW), 동일(identical-to, O)의 총 9가지 방향 관계 서술자(directional property)들로 표현할 수 있다. 그리고 공간 객체의 하위 클래스로는 피처(Feature)와 지오메트리(Geometry)가 있다. 피처는 실세계에서 도시, 도로, 건물과 같은 특정한 장소를 의미하며, 지오메트리는 점(Point), 선(LineString), 다각형(Polygon)과 같은 피처의 기하학적 데이터를 나타낸다. 그리고 피처는 문자열(String) 형태의 리터럴(Literal)로 표현되는 반면에, 지오메트리의 기하학적 데이터는 WKT(Well-Known Text) 형태의 리터럴로 표현된다.1 is a conceptual diagram for representing spatial knowledge according to an embodiment. Prior to describing the qualitative spatial knowledge extraction method, the geometric data and the spatial knowledge of the spatial object, that is, the spatial knowledge expression system, will be described with reference to FIG. Figure 1 shows a spatial ontology that adds nine directional descriptors defined in the CSD (Cone-Shaped Directional Relations) -9 theory to the core ontology of GeoSPARQL. A spatial object is a top-level class that represents all spatial objects. The boundaries and containment relationships between two spatial objects can be divided into seven categories: disjoint, touches, equals, overlaps, within, contains, and crosses. The directional relationship between the two spatial objects can be expressed in terms of north (N), north-east (NE), east (E), south- east-west (SE), south (S), south-west (SW), west (W), northwest (NW) and identical- And can be represented by directional properties. The subclasses of the spatial object include features and geometries. A feature represents a specific place in the real world, such as a city, road, or building, and geometry represents geometric data of a feature such as a Point, LineString, or Polygon. The geometry of the geometry is represented by a literal in WKT (Well-Known Text) format, while the feature is represented by a literal in the form of a String.

도 2는 일 실시예에 따른 공간 객체들의 분산 R-트리 색인에 대한 설명을 위한 참조도로서, 대용량의 정성 공간 지식 추출을 위해 하둡 분산 파일 시스템(Hadoop Distributed File System, HDFS)에 생성하는 공간 객체들에 대한 분산 R-트리 색인을 나타낸다. R-트리 색인의 구조는 전역 색인(global index)과 지역 색인들(local indexes)과 같이 두 층(layer)으로 구성된다. 즉, R-트리는 하나의 상위 노드(최상위 노드)와 그에 속하는 복수의 하위 노드(최하위 노드)들로 구성된다. 각각의 슬레이브 노드(200)들에는 R-트리 색인의 최하위 노드인 최소 경계 사각형(Minimum Boundary Rectangle, MBR) 단위로 그룹 지어진 공간 데이터들이 지역 색인 파일(part-xxxxx, rtree)에 저장되며, 이들의 메타 정보는 마스터 노드(100)에서 하나의 전역 색인 파일(_master.rtree)로 저장된다. 전역 색인에 저장된 메타 정보에는 지역 색인의 식별자(ID), 경계(boundary), 지역 색인 파일의 이름(file name), 데이터의 수(record count), 파일 크기(size)들이 포함된다. 일 실시예에 있어서, 각각의 슬레이브 노드들에 저장된 공간 데이터들은 부하 분산(load balancing)을 위해 HDFS 블록 크기(HDFS block size)만큼 할당되어 저장된다. 즉, R-트리의 최하위 MBR들, 즉 지역 색인된 MBR들은 HDFS 블록 크기 단위로 만들어진다. R-트리 색인 구축에 대해서는 후술하기로 한다.FIG. 2 is a reference diagram for explaining a distributed R-tree index of spatial objects according to an exemplary embodiment. In order to extract a large amount of qualitative spatial knowledge, a space object created in a Hadoop Distributed File System (HDFS) &Lt; / RTI > is a distributed R-tree index for the < RTI ID = The structure of an R-tree index consists of two layers, a global index and local indexes. That is, the R-tree is composed of one parent node (highest node) and a plurality of lower nodes (lowest node) belonging thereto. Spatial data grouped in units of Minimum Boundary Rectangle (MBR), which is the lowest node of the R-tree index, is stored in each of the slave nodes 200 in the local index file (part-xxxxx, rtree) The meta information is stored in the master node 100 as one global index file (_master.rtree). The meta information stored in the global index includes an identifier (ID) of a local index, a boundary, a file name of a local index file, a record count, and a file size. In one embodiment, spatial data stored in each of the slave nodes is allocated and stored as much as the HDFS block size for load balancing. That is, the least significant MBRs of the R-tree, i.e., area indexed MBRs, are created in HDFS block size units. Construction of the R-tree index will be described later.

도 3은 일 실시예에 따른 하둡 맵리듀스 기반의 공간 지식 추출 시스템의 논리적 구조를 나타낸 도면이다. 공간 지식 추출 시스템은 웹 규모의 정성 공간 지식을 추출하기 위해 맵리듀스(MapReduce) 기반의 정량 공간 추론 알고리즘을 토대로 웹 규모의 공간 지식을 추출해낸다. 이를 위해, 공간 지식 추출 시스템은 제 1 색인 구축부(211)와 데이터 선정부(220)와 제 2 색인 구축부(212)와 공간 지식 추출부(240)를 포함할 수 있다. 일 실시예에 있어서, 제 1 색인 구축부(211)와 제 2 색인 구축부(212) 및 공간 지식 추출부(240)는 슬레이브 노드(200)들마다 구성되며, 슬레이브 노드(200)의 프로세서에 의해 실행될 수 있다. 그리고 데이터 선정부(220)는 마스터 노드(100)에 구성될 수 있으며, 마스터 노드(100)의 프로세서에 의해 실행될 수 있다.3 is a diagram illustrating a logical structure of a spatial knowledge extraction system based on Hadoop MapReduce according to an embodiment. Spatial knowledge extraction system extracts web - scale spatial knowledge based on MapReduce - based quantitative spatial reasoning algorithm to extract qualitative spatial knowledge of web scale. To this end, the spatial knowledge extraction system may include a first index construction unit 211, a data selection unit 220, a second index construction unit 212, and a spatial knowledge extraction unit 240. In one embodiment, the first index construction unit 211, the second index construction unit 212, and the spatial knowledge extraction unit 240 are configured for each slave node 200, and the processor of the slave node 200 Lt; / RTI > And the data selection unit 220 may be configured in the master node 100 and executed by the processor of the master node 100. [

제 1 색인 구축부(211)는 HDFS 상에 무작위로 분산되어 있는 전체 공간 데이터들에 대해 분산 R-트리 색인 구축 작업을 수행한다. 이 작업을 통해 무작위로 HDFS 상에 흩어져 있는 공간 데이터들이 HDFS 블록 크기만큼 최하위 노드 MBR 단위로 그룹을 지어 여러 슬레이브 노드들에 재분배된다. 전체 공간 데이터에 대해 한 번 구축된 R-트리 색인은 이후 반복되는 데이터 선정 작업을 위해 계속 유지된다. 데이터 선정부(220)는 제 1 색인 구축부(211)에 의해 구축된 R-트리 색인을 이용하여 지식 추출 작업을 위한 공간 데이터들의 범위 선정 작업을 수행한다. 이 작업은 범위 질의(range query), k-최근접 이웃(k-nearest neighbor)과 같은 공간 질의를 이용한다.The first index construction unit 211 performs a distributed R-tree index construction operation on all spatial data randomly distributed on the HDFS. In this work, spatial data scattered randomly on the HDFS is redistributed to several slave nodes in groups of the lowest node MBR units as much as the HDFS block size. Once constructed for the entire spatial data, the R-tree index is maintained for subsequent data selection operations. The data selection unit 220 performs a range selection operation of spatial data for the knowledge extraction operation using the R-tree index constructed by the first index construction unit 211. [ This task uses spatial queries such as range queries and k-nearest neighbors.

제 2 색인 구축부(212)는 데이터 선정부(220)에 의해 선택된 공간 데이터들을 대상으로 별도의 R-트리 색인 구축 작업을 수행한다. 제 1 색인 구축부(211)의 R-트리 색인 구축 방식과 제 2 색인 구축부(212)의 R-트리 색인 구축 방식은 동일하다. 후술할 공간 지식 추출부(240)의 공간 지식 추출 방식은 R-트리 색인을 기반으로 재분배된 데이터의 지역성을 이용하여 공간 관계를 판별하기 때문에, 제 2 색인 구축부(212)의 R-트리 색인 구축 작업은 필수적이다. 그리고 제 1 색인 구축부(211)에서 구축된 R-트리 색인과는 달리 제 2 색인 구축부(212)에서 구축된 R-트리 색인은 공간 지식을 모두 추출한 이후에는 색인을 유지할 필요가 없다. 따라서, 제 2 색인 구축부(212)에 의해 구축된 R-트리 색인은 공간 지식이 모두 추출된 이후에는 파기될 수 있다.The second index construction unit 212 performs a separate R-tree index construction operation on the spatial data selected by the data selection unit 220. The R-tree index construction method of the first index construction unit 211 and the R-tree index construction method of the second index construction unit 212 are the same. Since the spatial knowledge extraction method of the spatial knowledge extraction unit 240, which will be described later, determines the spatial relation using the locality of the redistributed data based on the R-tree index, the R- Building work is essential. Unlike the R-tree index constructed by the first index construction unit 211, the R-tree index constructed by the second index construction unit 212 does not need to maintain the index after extracting all the spatial knowledge. Therefore, the R-tree index constructed by the second index construction unit 212 can be discarded after all the spatial knowledge has been extracted.

제 1 색인 구축부(211)와 제 2 색인 구축부(212)는 사실상 하나의 색인 구축부이다. 단지, 제 1 색인 구축부(211)는 전체 공간 데이터들을 대상으로 하여 R-트리 색인을 구축하는 것이며, 제 2 색인 구축부(212)는 제 1 색인 구축부(211)에 의해 R-트리 색인이 구축된 이후에 일부 선택된 공간 데이터들을 대상으로 하여 R-트리 색인을 구축하는 것으로, 이를 구분하기 위해 제 1 색인 구축부(211)와 제 2 색인 구축부(212)로 나눈 것이다. 이하에서는 제 1 색인 구축부(211)에 의해 구축된 R-트리 색인을 제 1 R-트리 색인이라 하고, 제 2 색인 구축부(212)에 의해 구축된 R-트리 색인을 제 2 R-트리 색인이라 한다.The first index construction unit 211 and the second index construction unit 212 are substantially one index construction unit. The first index construction unit 211 constructs an R-tree index on all the spatial data, and the second index construction unit 212 constructs an R-tree index by the first index construction unit 211. [ Tree index is constructed with respect to some selected spatial data after it is constructed. The R-tree index is divided into a first index construction unit 211 and a second index construction unit 212 to distinguish the selected spatial data. Hereinafter, the R-tree index constructed by the first index construction unit 211 is referred to as a first R-tree index, and the R-tree index constructed by the second index construction unit 212 is referred to as a second R- It is called an index.

기준 선정부(230)는 공간 지식을 추출하기 전에, 데이터 선정부(220)에 의해 선정된 공간 데이터들 중 하나를 기준 데이터로 선정하는데, 모든 선정된 공간 데이터들을 한 번씩 기준 데이터로 선정한다. 공간 지식 추출부(240)는 기준 데이터가 선정될 때마다 데이터 선정부(220)에 의해 선정된 나머지 모든 공간 데이터들과의 지식 추출 작업을 수행한다. 도 3에 도시된 바와 같이, 공간 지식 추출부(240)는 위상 관계 지식 추출부(241)와 방향 관계 지식 추출부(242)를 포함한다. 위상 관계 지식 추출부(241)는 기준 데이터와 나머지 모든 공간 데이터들과의 위상 관계를 판별하여 위상 관계 지식으로 추출하며, 방향 관계 지식 추출부(242)는 기준 데이터와 나머지 모든 공간 데이터들과의 방향 관계를 판별하여 방향 관계 지식으로 추출한다. 참고로, 도 3에서도 표현되어 있듯이, 제 1 색인 구축 작업과 데이터 선정 작업과 제 2 색인 구축 작업 및 공간 지식 추출 작업은 맵리듀스 작업이며, 기준 데이터 선정 작업은 맵리듀스 작업이 아니다.The reference selection unit 230 selects one of the spatial data selected by the data selection unit 220 as reference data before extracting the spatial knowledge, and selects all the selected spatial data as reference data once. The spatial knowledge extraction unit 240 performs a knowledge extraction operation with all remaining spatial data selected by the data selection unit 220 every time the reference data is selected. 3, the spatial knowledge extracting unit 240 includes a phase relation knowledge extracting unit 241 and a direction relation knowledge extracting unit 242. The phase relation knowledge extracting unit 241 determines the phase relation between the reference data and all the remaining spatial data and extracts the phase relation as the phase relation knowledge. The direction relation knowledge extracting unit 242 extracts the direction relation between the reference data and all remaining spatial data Direction relation is extracted and extracted as direction relation knowledge. 3, the first index construction work, the data selection work, the second index construction work, and the spatial knowledge extraction work are the map deuce work, and the reference data selecting work is not the map deuce work.

이하에서는 맵리듀스 기반의 대용량 공간 지식 추출 과정 각각에 대해 보다 구체적으로 설명한다.Hereinafter, each of the spatial knowledge extraction processes based on the MapReduce will be described in more detail.

(1) R-트리 색인 생성(1) R-tree index generation

도 4에 도시된 바와 같이, 제 1 색인 구축부(211)와 제 2 색인 구축부(212) 각각은 크게 분할(partitioning), 지역 색인 구축(local indexing), 전역 색인 구축(global indexing)의 3단계로 색인 구축 작업 과정을 수행한다. 먼저, 분할 단계에서는 입력 데이터(전체 공간 데이터)를 n개의 파티션(number of partition)으로 분할한다. n을 정하는 공식은 수학식 1과 같다.4, each of the first index construction unit 211 and the second index construction unit 212 is divided into three parts: partitioning, local indexing, and global indexing. The index build process is performed. First, in the partitioning step, input data (entire spatial data) is divided into n number of partitions. The formula for determining n is shown in Equation (1).

수학식 1에서 B는 HDFS 블록 크기이고, S는 입력 데이터(전체 공간 데이터)의 크기, α는 내고장성을 위해 각 노드에 복제되어 저장되는 입력 데이터를 고려한 초과용량 비율(overhead ratio)이다. α의 값은 기본(default)으로 0.2로 설정한다. n이 결정되면 각각의 파티션의 경계들(partition boundaries)을 정해야 한다. 파티션의 경계들은 STR(Sort-Tile-Recursive) 알고리즘을 이용하여 정해질 수 있다. 하지만 STR 알고리즘은 단일 프로세서에서 수행되기 때문에 대용량의 모든 입력 데이터를 메모리에 상주시킬 수 없다. 따라서 입력 데이터의 샘플을 무작위로 선정(random sampling)하고, 이를 이용하여 STR 알고리즘을 수행한다. 입력 데이터의 샘플을 무작위로 선정하는 일은 하나의 맵리듀스 작업으로 진행한다. 입력 데이터는 각각 분할되어 맵(Map) 함수의 입력으로 전달된다. 만약, 입력 데이터의 모양이 점이 아니면 MBR의 중심점으로 변환되어 Map 함수에 전달된다. Map 함수에서는 입력 데이터를 1%의 확률로 결과로 출력한다. 만약, 출력된 결과의 크기가 100 MB가 넘으면 결과를 다시 입력 데이터로 하여 100 MB 이하가 될 때까지 무작위 샘플 선정 작업이 반복된다.In Equation (1), B is the HDFS block size, S is the size of the input data (total spatial data), and? Is the overhead ratio considering the input data that is copied and stored in each node for fault tolerance. The value of a is set to 0.2 as default. When n is determined, partition boundaries of each partition must be determined. The boundaries of partitions can be determined using a Sort-Tile-Recursive (STR) algorithm. However, since the STR algorithm is performed by a single processor, it is impossible to store all input data of a large capacity in memory. Therefore, we randomly select samples of the input data (random sampling), and use it to perform the STR algorithm. The random selection of the input data samples proceeds to a single MapReduce operation. The input data is divided and delivered to the input of the map function. If the shape of the input data is not a point, it is converted to the center point of the MBR and transferred to the Map function. The Map function outputs the input data with a probability of 1%. If the output size exceeds 100 MB, random sample selection is repeated until the result is less than 100 MB.

샘플이 선정되면,

을 STR 알고리즘의 매개 변수(parameter) d(R-tree degree)로 설정하고 샘플을 대상으로 STR 알고리즘을 수행한다. STR 알고리즘은 MBR로 파티션의 경계를 정하기 때문에 각각의 파티션의 모양은 MBR이 되며, 이는 곧 R-트리 색인의 최하위 노드 MBR이 된다. 추후 입력 데이터 부하 분산을 위해 데이터들이 HDFS 블록 크기만큼 파티션에 포함되도록 경계를 정한다. 파티션의 경계가 결정되면, 각각의 파티션에 포함되는 데이터들을 그룹지어 재분배하는 물리적 파티션 분할(physical partitioning) 작업을 수행한다. 물리적 파티션 분할과 이후 진행되는 지역 색인 구축 단계는 하나의 맵리듀스 작업이다.Once the sample is selected,

Is set as a parameter d (R-tree degree) of the STR algorithm, and the STR algorithm is performed on the sample. Since the STR algorithm determines the boundaries of the partitions with the MBR, the shape of each partition becomes the MBR, which is the lowest node MBR of the R-tree index. For future input data load balancing, the data is bounded to include the HDFS block size in the partition. When the boundaries of the partitions are determined, the physical partitioning operation of redistributing the data included in each partition is performed. Physical partitioning and subsequent local index building steps are one mapping task.

먼저, 입력 데이터가 분열되어 각각의 Map 함수로 전달된다. Map 함수는 전달받은 데이터가 어느 파티션(= 최하위 노드 MBR)에 속하는지 판단하고, 파티션과 이에 속하는 데이터를 키-값 쌍(key-value pair)으로 만들어 리듀스(Reduce) 함수를 수행한다. Reduce 함수에서는 맵리듀스의 셔플링(shuffling)을 거쳐 각각의 파티션을 키로, 이에 속하는 데이터들을 리스트(list)로 전달받아 파티션별로 하나의 지역 색인 파일(part-xxxx.rtree)을 결과로 출력한다. 지역 색인 파일들은 파티션에 속한 데이터들과 이 MBR의 경계에 대한 정보를 가진다. 마지막으로, 전역 색인 구축 단계에서는 만들어진 모든 지역 색인들의 메타 정보를 담고 있는 전역 색인 파일(_master.rtree)을 만들어 마스터 노드에 저장한다.First, the input data is split and passed to each Map function. The Map function determines which partition (= lowest node MBR) belongs to the received data, and performs a Reduce function by making the partition and its data into a key-value pair. The Reduce function shuffles the MapReduce, passes each partition as a key, and the data belonging to it as a list, and outputs one local index file (part-xxxx.rtree) for each partition. The local index files have information about the data belonging to the partition and the boundaries of this MBR. Finally, in the global index construction stage, a global index file (_master.rtree) containing meta information of all local indexes created is created and stored in the master node.

도 5는 무작위 샘플 선정 작업의 한 예를 나타낸다. 먼저, 대표 샘플 데이터들을 추출하기 위해 입력되는 모든 공간 데이터들(geom_1 ~ geom_6)은 각각의 Map 함수로 분열(data splitting)된다. Map 함수에서는 분열되어 입력된 데이터들에 대해 사전에 정한 비율 1%를 토대로 데이터들을 무작위로 선정하고, 선정된 데이터들(geom_1, geom_4, geom_6)을 결과로 출력(data writing)한다. 만약, 결과의 크기가 100 MB보다 크면 결과를 다시 입력으로 하여 100 MB 이하가 될 때까지 작업을 반복한다.Figure 5 shows an example of a random sample selection operation. First, all the spatial data (geom_1 to geom_6) input to extract the representative sample data are divided into respective Map functions. The Map function randomly selects data based on a pre-determined ratio of 1% to the discretized data, and writes the selected data (geom_1, geom_4, and geom_6) as a result. If the result is larger than 100 MB, repeat the operation until the result is less than 100 MB.

도 6은 물리적 파티션 분할과 지역 색인 구축의 한 예를 나타낸다. 먼저, 초기 MBR(initial MBR)과 각 MBR로 그룹화하기 위한 모든 공간 데이터들(geom_1 ~ geom_6)이 입력되면 각각의 Map 함수로 입력 데이터를 분열(data splitting)한다. Map 함수에서는 입력된 데이터가 어느 MBR의 영역에 속하는지 검사한 후 해당 MBR과 함께 그룹을 지어(data grouping) Reduce 함수에 전달한다. 만약, 데이터가 어느 MBR에도 속하지 않으면 가장 거리가 가까운 MBR에 속하는 것으로 간주한다. Reduce 함수는 동일한 파티션을 키로 하여 값들을 리스트로 묶어주는 셔플링(MBR & data shuffling)을 거쳐 전달받은 파티션(mbr_1, mbr_2)과 해당 파티션에 속한 데이터들의 리스트(list_1, list_2)를 토대로 파티션의 영역(boundary)을 수정하고 하나의 파티션당 해당 데이터들과 함께 하나의 이진 형태의 지역 색인 파일로 출력(MBR & data writing)한다.Figure 6 shows an example of physical partitioning and local index building. First, if initial MBR (initial MBR) and all spatial data (geom_1 ~ geom_6) for grouping into MBR are inputted, input data is divided by each Map function. The Map function examines which MBR area the input data belongs to, and groups the data together with the corresponding MBR (data grouping) and sends it to the Reduce function. If the data does not belong to any MBR, it is considered to belong to the nearest MBR. The Reduce function uses the partition (mbr_1, mbr_2) passed through shuffling (MBR & data shuffling) to list the values with the same partition as a key and the list of data (list_1, list_2) (MBR & data writing) to a binary index of a local index file together with corresponding data per partition.

(2) 작업 범위 선정(2) Selection of work scope

제 1 R-트리 색인이 구축되면, 데이터 선정부(220)는 이를 이용하여 공간 지식 추출 작업을 위한 입력 데이터를 선정하는 작업을 진행한다. 데이터 선정 작업은 범위 질의, k-최근접 이웃과 같은 공간 질의를 이용하여 수행한다. R-트리 색인의 지역성을 이용하면 공간 질의 수행에서 데이터 접근 비용과 처리 비용을 상당히 감소시켜 주기 때문에 데이터 선정 작업에 큰 도움이 된다. 이하에서는 범위 질의를 토대로 데이터 선정 작업을 수행하는 예에 대해 설명한다. 맵리듀스 기반의 데이터 선정 작업 처리 과정은 도 7과 같다.When the first R-tree index is constructed, the data selection unit 220 selects the input data for the spatial knowledge extraction operation. Data selection is performed using spatial queries such as range queries, k-nearest neighbors. Using the locality of the R-tree index greatly reduces data access cost and processing cost in spatial query execution, which is very helpful for data selection. Hereinafter, an example of performing the data selection operation based on the range query will be described. The process of selecting data based on MapReduce is shown in FIG.

도 7에 대해 설명하면, 데이터 선정 작업은 R-트리 색인의 최하위 노드 MBR들과 주요 관심의 대상이 되는 공간 데이터들이 대거 포함된 범위(range)를 입력받는다. 특히, 최하위 노드 MBR들을 Map 함수로 분배(MBR distribution)하기 전에 불필요한 데이터 접근 비용을 줄이기 위해 입력받은 범위와 최하위 노드 MBR의 영역이 교차(intersection)되는지 검사한다. 교차되지 않는 최하위 노드 MBR은 Map 함수로 분배되지 않도록 사전에 가지치기(MBR pruning)를 한다. 이후, 가지치기되지 않은 최하위 노드 MBR들이 Map 함수로 분배되고 각각의 Map 함수에서 각각의 MBR에 속한 공간 데이터들을 읽어온다(data reading). 마지막으로, 읽어들인 공간 데이터들이 실제로 범위에 속하는지 검사(range query)한 후 범위에 속하면 결과로 출력한다.Referring to FIG. 7, in the data selection operation, the lowest node MBRs of the R-tree index and a range including a lot of spatial data that are the main object of interest are inputted. In particular, before the MBR distribution of the lowest node MBRs to the Map function, it is checked whether the input range and the lowest node MBR area intersect to reduce unnecessary data access costs. The lowest non-intersecting node MBR is pre-pruned (MBR pruning) so that it is not distributed to the Map function. Then, the lowest-level node MBRs that are not pruned are distributed to the Map function, and the spatial data belonging to each MBR is read from each Map function (data reading). Finally, if the read spatial data belongs to a range, it is output as a result.

도 8은 상술한 데이터 선정 작업 처리 과정의 한 예를 보인다. 먼저, 데이터 선정 작업을 진행하기 위해서 R-트리 색인의 MBR들(mbr_1, mbr_2)과 이에 속한 공간 데이터들이 저장된 지역 색인의 주소(local address_1, local address_2), 그리고 범위(range)를 입력받는다. mbr_1과 mbr_2를 각각의 Map 함수로 전달하기 전에, 범위(range)와 영역이 교차되는지 검사하고 영역이 교차되는 mbr_1은 Map 함수로 분배하고 그렇지 않은 mbr_2은 가지치기를 하여 Map 함수로 분배하지 않는다. 즉, 범위와 관련이 없는 데이터들을 사전에 여과하여 불필요한 데이터 접근과 처리를 생략한다. Map 함수는 전달된 mbr_1에 속한 데이터들을 모두 읽어와 리스트(list) 형태로 만들고 리스트에 속한 데이터들이 실제로 범위에 속하는지 검사하고 그 중 범위에 속한 geom_1과 geom_2를 결과로 출력한다.FIG. 8 shows an example of the data selection job process described above. First, in order to perform the data selection operation, the MBRs (mbr_1 and mbr_2) of the R-tree index and the addresses (local address_1 and local address_2) and the range of the area index stored therein are received. Before delivering mbr_1 and mbr_2 to each Map function, check whether range and area intersect, distribute mbr_1 where area intersects with map function, and not otherwise, mbr_2 does not distribute to map function by pruning. In other words, data that is not related to the range is filtered in advance to avoid unnecessary data access and processing. The Map function reads all the data belonging to the mbr_1 passed to it, forms it into a list, checks whether the data belonging to the list actually belongs to the range, and outputs geom_1 and geom_2, which belong to the range.

(3) 위상 관계 지식 추출(3) Extraction of phase relation knowledge

기준 데이터가 선정될 때마다 위상 관계 지식 추출부(241)에 의해 수행되는 위상 관계 추출 작업의 처리 과정은 도 9와 같다. 위상 관계 추출 작업은 제 2 R-트리 색인의 최하위 노드 MBR들과 이에 속한 공간 데이터들의 주소, 및 위상 관계 추출의 기준이 되는 기준 데이터를 입력받는다. 위상 관계 추출 작업은 불필요한 위상 관계 분석 비용을 줄이기 위해서 최하위 노드 MBR들을 Map 함수들에 분배하기 전에 각각의 최하위 노드 MBR들이 기준 데이터가 속한 최하위 노드 MBR과 영역이 교차되는지 검사하여 그 검사 결과에 따라 최하위 노드 MBR들에 교차 레이블 또는 비교차(non-intersection) 레이블을 붙인다(MBR labeling). 이후, 레이블이 붙은 최하위 노드 MBR들을 각각 Map 함수로 분배하고, Map 함수에서는 각각의 최하위 노드 MBR에 속한 공간 데이터들을 읽어온다(data reading). 기준 데이터와 읽어들인 공간 데이터 간에 위상 관계 모델링을 통해 위상 관계를 판별하기 전에, 최하위 노드 MBR들에 붙은 레이블을 보고 위상 관계 모델링을 해야만 위상 관계를 판별할 수 있는 최하위 노드 MBR과 위상 관계 모델링을 생략하고 위상 관계를 판별할 수 있는 최하위 노드 MBR을 분류한다(MBR classification). 즉, 교차 레이블이 라벨링된 최하위 노드 MBR과 비교차 레이블이 라벨링된 최하위 노드 MBR을 분류한다. 기준 데이터와 비교차 레이블이 붙은 최하위 노드 MBR에 속한 공간 데이터 간에는 ‘분리(disjoint)’로 위상 관계를 판별한다. 그리고 기준 데이터와 교차 레이블이 붙은 최하위 노드 MBR에 속한 공간 데이터 간에는 위상 관계 모델링을 통해 위상 관계를 판별하는데, DE-9IM(Dimensionally Extended nine-Intersection Model)을 이용한 모델링을 통해 위상 관계를 판별할 수 있다.The process of extracting the phase relation performed by the phase relation knowledge extracting unit 241 every time the reference data is selected is shown in FIG. The phase relation extraction operation receives the lowest node MBRs of the second R-tree index, the address of the spatial data belonging thereto, and the reference data used as a reference for extracting the phase relation. In order to reduce the unnecessary phase relationship analysis cost, the phase relation extraction process checks whether the lowest-order node MBRs of the lowest-order node MBRs intersect with the lowest-order node MBR to which the reference data belongs before distributing the lowest-order node MBRs to Map functions. Cross-label or non-intersection labeling of node MBRs (MBR labeling). We then distribute the labeled lowest node MBRs to each Map function, and the Map function reads the spatial data belonging to each lowest node MBR (data reading). Before determining the phase relationship between the reference data and the read spatial data by modeling the phase relation, the lowest node MBR and phase relationship modeling that can determine the phase relation only by looking at the labels attached to the lowest node MBRs and topological relation modeling are omitted And classify the lowest node MBR that can identify the phase relation (MBR classification). That is, we classify the lowest node MBR labeled with a cross-label and the lowest node MBR labeled with a difference label. The phase relation between the reference data and the spatial data belonging to the least significant node MBR labeled with a difference label is determined by 'disjoint'. The phase relation between the reference data and the spatial data belonging to the lowest node MBR with the crossed label is determined by phase relationship modeling, and the phase relation can be determined by modeling using DE-9IM (Dimensionally Extended nine-Intersection Model) .

도 10은 위상 관계 추출 작업 처리 과정의 한 예를 나타낸다. 먼저, 위상 관계 추출 작업을 진행하기 위해서 제 2 R-트리 색인의 MBR들(mbr_1, mbr_2)과 이에 속한 공간 데이터들이 위치한 주소(local address_1, local address_2), 그리고 기준 데이터(base geometry)를 입력받는다. mbr_1과 mbr_2를 각각의 Map 함수로 전달하기 전에, 기준 데이터의 MBR과 영역이 교차되는지 검사하고 영역이 교차되는 mbr_1에 교차 레이블, 그렇지 않은 mbr_2에 비교차 레이블을 붙이고(MBR labeling) Map 함수로 분배(MBR distribution)한다. Map 함수는 전달된 mbr_1과 mbr_2에 속한 데이터들을 모두 읽어와 리스트(list) 형태로 만들고 DE-9IM 모델링을 해야 하는 mbr_1과 DE-9IM 모델링을 생략하고 위상 관계를 판별할 수 있는 mbr_2를 분류(MBR classification)한다. MBR들이 분류되면 mbr_1에 있는 데이터들은 기준 데이터와의 DE-9IM 모델링을 토대로 위상 관계를 판별하고 mbr_2에 있는 데이터들은 DE-9IM 모델링을 생략하고 기준 데이터와 ‘떨어져 있다(disjoint)’라는 위상 관계로 판별한다. 위상 관계가 판별되면 위상 관계 지식을 합성(knowledge synthesizing)하여 결과로 출력(knowledge writing)한다.FIG. 10 shows an example of a process of extracting a phase relation. First, the MBRs (mbr_1 and mbr_2) of the second R-tree index and the addresses (local address_1 and local address_2) where the spatial data belonging thereto are located and the base geometry are input to proceed with the phase relation extraction operation . Before transferring mbr_1 and mbr_2 to each Map function, check whether the area intersects with the MBR of the reference data, add cross label to mbr_1 where the area intersects, and compare label to mbr_2 that does not map (MBR labeling) (MBR distribution). The map function reads mbr_1 and mbr_2 and converts mbr_1 and mbr_2 to DE-9IM modeling and mbr_2, which can distinguish phase relationship by omitting DE-9IM modeling (MBR classification. When the MBRs are classified, the data in mbr_1 is determined based on the DE-9IM modeling with the reference data, and the data in mbr_2 is omitted from the DE-9IM modeling and is "disjoint" . When the phase relation is determined, the phase relation knowledge is synthesized (knowledge synthesized) and outputted as a result (knowledge writing).

(4) 방향 관계 지식 추출(4) Extraction of orientation relation knowledge

기준 데이터가 선정될 때마다 방향 관계 지식 추출부(242)에 의해 수행되는 방향 관계 추출 작업의 처리 과정은 도 11과 같다. 방향 관계 추출 작업은 제 2 R-트리 색인의 최하위 노드 MBR들과 이에 속한 공간 데이터들의 주소, 및 방향 관계 추출의 기준이 되는 기준 데이터를 입력받는다. 방향각을 직접 계산하는 것보다 처리 비용이 적게 드는 범위 질의를 이용하기 위해서 기준 데이터의 MBR, 즉 기준 데이터 자체의 MBR을 구하고 그 중심점을 토대로 제 2 R-트리 색인의 최상위 노드 MBR 내에서 형성되는 방향각의 영역들을 모델링(direction area modeling)한다. 여기서, 최상위 노드 MBR은 최하위 노드 MBR들을 포함하는 모두 MBR을 말하는 것으로, 도 11에서는 mbr_1, mbr_2, mbr_3, mbr_4에 대한 MBR을 의미한다.The processing of the directional relationship extracting operation performed by the directional relationship knowledge extracting unit 242 every time the reference data is selected is as shown in FIG. The direction relation extraction operation receives the lowest node MBRs of the second R-tree index, the address of the spatial data belonging thereto, and the reference data that is a reference of the direction relation extraction. In order to use a range query that requires less processing cost than to directly calculate a direction angle, the MBR of the reference data, that is, the MBR of the reference data itself is obtained, and is formed in the top node MBR of the second R-tree index based on the center point Directional area modeling. Here, the highest node MBR refers to all MBRs including the lowest node MBRs, and in FIG. 11, MBR for mbr_1, mbr_2, mbr_3, and mbr_4.

이후, 최하위 노드 MBR들을 각각 Map 함수로 분배(MBR distribution)하고, Map 함수에서는 각각의 최하위 노드 MBR에 속한 공간 데이터들을 읽어온다(data reading). 방향각 영역으로 범위 질의를 수행하기 전에 읽어들인 공간 데이터 자체의 MBR과 그 중심점을 모델링(MBR & centroid modeling)한다. 결국, 공간 데이터의 MBR의 중심점이 8가지 방향각의 영역 중 어느 곳에 속하는지에 따라 방향 관계를 판별한다. 만약, 중심점이 방향각 영역의 경계선 상에 존재하여 8가지 방향각의 영역 중 어느 곳에도 속해 있지 않는다면 실제로 방향각을 계산하여 방향 관계를 판별한다.Then, the lowest node MBRs are distributed to the map function (MBR distribution), and the map function reads the spatial data belonging to each lowest node MBR (data reading). (MBR & centroid modeling) the MBR of the read spatial data itself and its center point before performing the range query on each direction domain. As a result, the directional relationship is determined according to which of the eight directional angles the center point of the MBR of the spatial data belongs to. If the center point lies on the boundary of the directional angle region and does not belong to any of the eight directional angle regions, the directional angle is actually calculated to determine the directional relationship.

도 12는 방향 관계 추출 작업 처리 과정의 한 예를 나타낸다. 먼저, 방향 관계 추출 작업을 진행하기 위해서 제 2 R-트리 색인의 최하위 노드 MBR들(mbr_1, mbr_2)과 이에 속한 공간 데이터들이 위치한 주소(local address_1, local address_2), 그리고 기준 데이터(base geometry)를 입력받는다. mbr_1과 mbr_2를 각각의 Map 함수로 전달하기 전에, 기준 데이터의 MBR을 구하고 그 중심점을 토대로 최상위 노드 MBR 내에서 형성되는 방향각의 영역들을 모델링(direction area modeling)하며, mbr_1과 mbr_2를 각각 Map 함수에 분배(MBR distribution)한다. Map 함수는 전달된 mbr_1과 mbr_2에 속한 데이터들을 모두 읽어와 리스트(list_1, list_2) 형태로 만든다. 리스트 내의 데이터들은 순차적으로 하나씩 이들의 MBR과 그 중심점으로 모델링(MBR & centroid modeling)되고, 8가지 방향각 영역으로 범위 질의를 수행하여 중심점이 어느 영역에 속하는지에 따라 방향 관계를 판별한다. 방향 관계가 판별되면 방향 관계 지식을 합성(knowledge synthesizing)하여 결과로 출력(knowledge writing)한다.Fig. 12 shows an example of a process of extracting a directional relationship. First, in order to perform the direction relation extraction operation, the lowest node MBRs (mbr_1, mbr_2) of the second R-tree index and the addresses (local address_1, local address_2) and the base geometry Receive input. Before transferring mbr_1 and mbr_2 to each Map function, the MBR of the reference data is obtained, and directional area models formed in the top node MBR are modeled based on the center point, and mbr_1 and mbr_2 are mapped to Map function (MBR distribution). The Map function reads all the data belonging to the mbr_1 and mbr_2 that have been passed into the list (list_1, list_2). The data in the list are sequentially modeled (MBR & centroid modeling) by their MBR and center point, and the range query is performed on each of the 8 directional areas to determine the directional relationship according to which area the center point belongs to. When the direction relation is determined, knowledge synthesis is performed on the direction relation knowledge and the result is outputted as knowledge writing.

도 13은 일 실시예에 따른 하둡 맵리듀스 기반의 공간 지식 추출 시스템의 블록도이다. 공간 지식 추출 시스템의 프론트엔드(front-end)에 위치하는 마스터 노드(100)는 웹 인터페이스(web interface)(110), 이메일 통보부(email notifier)(120), 다운로드 도구(download tool)(130)를 포함하여 구성된다. 먼저, 사용자는 웹 인터페이스(110)를 통해 통해 지도상에서 작업 범위의 영역(knowledge extraction area)을 지정(예를 들어, 사각형 모양의 작업 범위 영역을 지정)하고 지식 추출 작업을 위한 매개 변수들(knowledge extraction parameter)을 설정하여 MapReduce 작업을 요청한다. 지식 추출 영역을 선정하는 일은 불필요한 지식 추출과 이로 인해 추출된 지식 용량의 과장성(expansion) 때문에 필요한 과정이다. 요청된 정량 추론 작업이 백엔드(back-end)의 슬레이브 노드(200)들에서 수행되면 사용자는 이메일 통보부(120)로부터 작업 진행 상태를 통보받고, 작업이 완료되면 사용자는 다운로드 도구(130)를 이용하여 back-end에 저장된 추출된 지식을 N-Triple 형태의 텍스트 파일로 받아올 수 있다.13 is a block diagram of a spatial knowledge extraction system based on Hadoop MapReduce according to an embodiment. The master node 100 located at the front end of the spatial knowledge extraction system includes a web interface 110, an email notifier 120, a download tool 130 ). First, a user designates a knowledge extraction area (for example, designates a rectangular work area) on a map through the web interface 110 and inputs parameters for knowledge extraction extraction parameter to request a MapReduce operation. Selecting a knowledge extraction area is a necessary process because of unnecessary knowledge extraction and expansion of knowledge capacity extracted by it. When the requested quantitative reasoning work is performed in the back-end slave nodes 200, the user is informed of the work progress status from the email notification unit 120. When the work is completed, the user can download the download tool 130 The extracted knowledge stored in the back-end can be received as an N-Triple text file.

공간 지식 추출 시스템의 백엔드(back-end)는 사용자로부터 요청된 작업을 각각의 슬레이브 노드(200)들의 프로세서들에서 분산 병렬 처리하여 공간 지식을 추출해낸다. 슬레이브 노드(200)들은 색인 구축부(210)와 데이터 선정부(220) 및 공간 지식 추출부(240)를 포함하여 구성된다. 먼저, 색인 구축부(210)는 사전에 시맨틱 웹 상에서 공개된 도시, 도로, 하천 등의 지리 정보를 포함한 공간 데이터들(USGS, OpenStreetMap, OSOpen Data)로부터 R-트리 색인을 구축하고 R-트리 색인의 최하위 MBR들을 토대로 가까운 거리에 있는 공간 데이터들을 그룹지어 재분배한다. 이후 지식 추출 작업이 요청되면 데이터 선정기(data selector)가 사용자가 입력한 사각형의 지식 추출 영역을 입력받아 범위 질의를 수행한다. 선정된 데이터는 색인 구축기에 의해 재차 R-트리 색인 구축 및 데이터 재분배가 이루어지고 사용자가 설정한 매개 변수를 토대로 지식 추출 작업을 처리하여 지식을 추출한다.The back-end of the spatial knowledge extraction system extracts the spatial knowledge by distributing and parallel processing the tasks requested from the user by the processors of the respective slave nodes 200. The slave nodes 200 include an index construction unit 210, a data selection unit 220, and a spatial knowledge extraction unit 240. First, the index construction unit 210 constructs an R-tree index from spatial data (USGS, OpenStreetMap, OSOpen Data) including geographical information such as city, road, Based on the MBRs of the lowest MBRs in the group. Then, when a knowledge extracting operation is requested, a data selector receives the knowledge extraction area of the rectangle inputted by the user and performs a range query. The selected data is constructed again by R-tree index construction and data redistribution by the index builder, and the knowledge extraction is performed by processing the knowledge extraction work based on the parameters set by the user.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.The present invention has been described with reference to the preferred embodiments. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is defined by the appended claims rather than by the foregoing description, and all differences within the scope of equivalents thereof should be construed as being included in the present invention.

100 : 마스터 노드 200 : 슬레이브 노드
210 : 색인 구축부 211 : 제 1 색인 구축부
212 : 제 2 색인 구축부 220 : 데이터 선정부
230 : 기준 선정부 240 : 공간 지식 추출부
241 : 위상 관계 지식 추출부 242 : 방향 관계 지식 추출부100: master node 200: slave node
210: Index construction unit 211: First index construction unit
212: second index construction unit 220: data selection unit
230: Reference line selection unit 240: Spatial knowledge extraction unit
241: phase relation knowledge extracting unit 242: direction relation knowledge extracting unit

Claims

Tree index on the entire spatial data distributed on the Hadoop Distributed File System (HDFS), and the entire spatial data is divided into a minimum bounding rectangle to constitute the lowermost nodes of the R-tree, Rectangle, and MBR), constructing a local index for each redistributed least significant node MBR, and constructing a global index containing meta information about the constructed regional indexes;
A data selection unit for selecting some spatial data for spatial knowledge extraction through a range query among spatial data belonging to the lowest-order node MBRs indexed by the first index construction unit by R-tree indexing;
Tree index is constructed for some spatial data selected by the data selection unit, and some spatial data is grouped into MBR units constituting the lowest nodes of the R-tree and redistributed, and a local index A second index construction unit for constructing a global index including meta information about the constructed local indexes; And
And a spatial knowledge extraction unit for extracting spatial knowledge by determining a spatial relationship of some spatial data selected by the data selection unit based on the R-tree index constructed by the second index construction unit,
The data selection unit checks whether the areas of the lowest node MBRs indexed by the R-tree indexed by the input range and the first index construction unit intersect each other. Only the MBRs having the area intersecting the input range are distributed to Map functions Hadoop MapReduce - based spatial knowledge extraction system that performs range query.

The method according to claim 1,
A spatial knowledge extraction system based on Hadoop MapReduce having a predetermined MBR unit having an HDFS block size.

delete

3. The apparatus according to claim 1 or 2, wherein the spatial knowledge extracting unit comprises:
A phase relation extracting unit for extracting a phase relation knowledge between the reference data and spatial data belonging to the lowest-order node MBRs indexed by the second index construction unit; And
And a direction relation knowledge extracting unit for extracting a direction relation knowledge between the reference data and the spatial data belonging to the lowest node MBRs indexed by the R-tree by the second index building unit,
The spatial data extraction system based on Hadoop MapReduce is selected by the data selection unit as the reference data once.

6. The method of claim 5,
The phase relation knowledge extraction unit checks whether the lowest-order node MBRs indexed by the second index construction unit intersect with the lowest-order node MBR to which the reference data belongs. If the intersection labels or the cross- The comparative label is labeled, and the phase relation between the reference data and the spatial data belonging to the lowest node MBR labeled with the difference label is identified as 'disjoint', extracted as the phase relation knowledge, and the reference data and the crossing label are labeled Hadoop mapper-based spatial knowledge extraction system that identifies the phase relationship between the spatial data belonging to the lowest-level node MBR and the topological relation through modeling for phase relation determination.

The method according to claim 6,
The spatial relation knowledge extraction unit extracts the spatial relationship between the reference data and spatial data belonging to the lowest node MBR labeled with a cross-label by using DE-9IM (Dimensionally Extended nine-Intersection Model).

6. The method of claim 5,
The direction relation knowledge extraction unit models the areas of the direction angle based on the center point of the lowest node MBR to which the reference data belongs, reads the spatial data belonging to each lowest node MBR using the Map function, And extracts the direction relation between the reference data and the spatial data according to the direction angles to which the MBR center point of the modeled spatial data belongs.

9. The method of claim 8,
The directional relationship knowledge extractor is a Hadoop mapper-based spatial knowledge extraction system that calculates orientation angles when the MBR center point of the modeled spatial data does not belong to any one of the regions of directional angles.

delete

A method for extracting spatial knowledge based on Hadoop MapReduce performed by one or more processors,
The first index construction unit of the processor constructs an R-tree index on the entire spatial data distributed on the Hadoop Distributed File System (HDFS), and constructs the lowest spatial nodes of the R- A first index for grouping and redistributing a minimum boundary rectangle (MBR) unit, building a local index for each redistributed lowest node MBR, and building a global index containing meta information about the constructed regional indexes Building phase;
The data selection unit of the processor may include a data selection step of selecting some spatial data for spatial knowledge extraction through a range query among spatial data belonging to the lowest-order node MBRs indexed by R-trees in the first index construction step;
The second index construction unit of the processor constructs an R-tree index for some spatial data, groups and redirects some spatial data into MBR units constituting the lowermost nodes of the R-tree, A second index construction step of constructing a global index including meta information about the constructed regional indexes; And
The spatial knowledge extracting unit of the processor includes a spatial knowledge extracting step of extracting spatial knowledge by determining a spatial relation of some spatial data based on the R-tree index constructed by the second index building step,
In the data selection step, it is checked whether the input range and the area of each of the lowest-order node MBRs indexed by the first index construction unit intersect each other. Only MBRs having an area intersecting the input range are distributed A spatial knowledge extraction method based on Hadoop MapReduce that performs range queries.

13. The method of claim 12, wherein extracting the spatial knowledge comprises:
Extracting a phase relation knowledge between the reference data and spatial data belonging to the lowest node MBRs indexed by the R-tree by the second index construction step; And
Extracting a phase relation knowledge between the reference data and spatial data belonging to the lowest-level node MBRs indexed by the R-tree by the second index construction step;
A spatial data extraction method based on Hadoop MapReduce based on spatial data selected in the data selection stage.

14. The method of claim 13,
In the phase relation knowledge extraction step, it is checked whether the lowest-order node MBRs indexed by the R-tree index intersect with the lowest node MBR belonging to the reference data by the second index construction step, and cross- Or comparison label is labeled and the phase relationship between the reference data and the spatial data belonging to the lowest node MBR labeled with the difference label is identified as 'disjoint', extracted as the phase relation knowledge, and the reference data and cross- A spatial knowledge extraction method based on Hadoop MapReduce that identifies the phase relation between spatial data belonging to the labeled lowest node MBR by modeling for topological relationship and extracts it as phase relation knowledge.

15. The method of claim 14,
The phase relation knowledge step is a Hadoop mapper-based spatial knowledge extraction method for determining the phase relation between the reference data and the spatial data belonging to the lowest node MBR labeled with a cross-label using a dimensionally extended nine-intersection model (DE-9IM).

14. The method of claim 13,
In the step of extracting the directional relationship knowledge, the direction angle areas are modeled based on the center point of the lowest node MBR to which the reference data belongs, the spatial data belonging to each lowest node MBR is read using the Map function, Hadoop Mapper-based Spatial Knowledge Extraction Method that models the MBR and its center point and extracts direction relations between the reference data and the spatial data according to the direction angles of the MBR center point of the modeled spatial data. .

17. The method of claim 16,
The method of extracting the directional relationship knowledge extracts a spatial knowledge based on Hadoop MapReduce based on determining directional angles when the MBR center point of the modeled spatial data does not belong to any one of the directional angle regions.