KR102022488B1

KR102022488B1 - Method for storing and searching of massive spatial data using HBase

Info

Publication number: KR102022488B1
Application number: KR1020180024495A
Authority: KR
Inventors: 윤정식; 김성호
Original assignee: (주)이지스
Priority date: 2018-02-28
Filing date: 2018-02-28
Publication date: 2019-09-18

Abstract

The present invention relates to a method for storing and searching for spatial data of the large capacity using an H-base, which allows users to rapidly search for desired spatial information. The method comprises: a spatial data storage step of storing spatial data including object information and attribute information; and a spatial data searching step of searching for corresponding space information by calling a column family.

Description

Method for storing and searching of massive spatial data using HBase}

본 발명은 대용량 공간 데이터에 대해 지구 좌표계 기반의 위치 좌표를 2차원 평면 기반의 좌표로 인덱싱하여 NoSQL 방식의 에이치베이스(HBase)에 저장함과 더불어, 대용량 공간 데이터에 대한 2차원 평면 기반의 인덱싱 좌표를 이용하여 에이치베이스(HBase)에서 보다 신속하게 원하는 공간 정보를 검색할 수 있도록 해 주는 기술에 관한 것이다. The present invention indexes the location coordinates based on the global coordinate system for large volume spatial data into two-dimensional plane-based coordinates and stores them in a NoSQL method HBase, and also stores the two-dimensional plane-based indexing coordinates for the large spatial data. The present invention relates to a technology that enables users to retrieve desired spatial information more quickly from HBase.

최근 대용량의 다양한 형태와 속성을 가진 공간 데이터는 매일 수도 없이 인터넷과 가상 공간에서 작성되고 있으며, 전세계를 기반으로 하는 위치 기반의 다양한 메시지의 경우 엄청나게 많은 용량의 크기로 매일 생성되고 또 소모되고 있다.In recent years, spatial data with various shapes and attributes of a large amount is being created in the Internet and virtual spaces every day. In the case of various location-based messages based on the world, they are generated and consumed with a huge amount of size every day.

이러한 빅데이터를 저장 관리하기 위해서는 빅데이터가 가지는 대용량, 비정형, 실시간성이라는 특징을 수용할 수 있는 데이터베이스가 요구된다. In order to store and manage such big data, a database capable of accommodating the features of big data, such as large capacity, atypical shape, and real time is required.

그러나, 데이터양이 방대해지면서 하나의 노드 능력을 최대화시키는 스케일-업(scale-up) 방법만으로는 부족해졌고, 데이터의 유형이 다양해지면서 기존의 행과 열로 구성된 정형 데이터를 위한 관계형 모델에 기반한 데이터베이스 구조만으로는 다양한 데이터 유형을 저장하기에 부적합해졌다. 또한, 데이터 생성 속도가 빨라져서 스케일-업 기술과 하드디스크에 기반한 기술이 한계에 다다르게 되었다.However, as the volume of data grows, only the scale-up method of maximizing single node capability is insufficient. As the data types vary, the database structure based on relational model for structured data of existing rows and columns alone is not enough. Not suitable for storing various data types. In addition, the speed of data generation is increasing, and the scale-up technology and the hard disk-based technology are approaching their limits.

즉, 관계형 모델에 기반한 데이터베이스 구조에서의 기술적인 접근은 매일 전세계에서 쏟아지는 수많은 데이터를 처리하기에는 속도가 느리며, 저장 공간 확장에 많은 비용이 발생하고 이미 정의된 저장형태(Table)에만 저장할 수 있기 때문에 다양하고 변화가 많은 데이터를 수용할 수가 없게 된 것이다.In other words, the technical approach in a database structure based on a relational model is slow to handle the massive amount of data that flows from around the world every day, it is expensive to expand the storage space, and can only be stored in a predefined table. And it can't accommodate a lot of change.

이러한 문제점을 해결하기 위한 빅데이터 저장 관리를 위한 대표적인 기술로는 분산 파일 시스템(DFS：Distributed File System), NoSQL(Not Only SQL), 비(非) 디스크 기반 데이터베이스 관리 시스템 등이 있다.Representative technologies for big data storage management to solve this problem include Distributed File System (DFS), Not Only SQL (NoSQL), and non-disk based database management system.

NoSQL은 관계형 데이터 모델을 사용하지 않고, SQL을 사용하지 않는 모든 DBMS 혹은 데이터 스토어를 일컫는 것으로, 데이터 저장에 대한 이론 중 ‘데이터 저장소는 일관성(Consistency), 가용성(Availability), 파티션 내성(Partition tolerance) 중 2가지만을 선택하여 만족시킬 수 있다’는 CAP 이론에서 일관성(C) 또는 가용성(A)을 일부 포기함으로써 분산 환경 적용을 통한 확장이 가능한 특징이 있다.NoSQL refers to any DBMS or data store that does not use a relational data model and does not use SQL.In the theory of data storage, 'No data store is consistent, availability, partition tolerance'. In the CAP theory, only two of them can be satisfied and can be satisfied, 'which can be extended by applying a distributed environment by giving up part of consistency (C) or availability (A).

NoSQL의 대표적 사례로는 키 값 모델 기반의 다이나모(Dynamo)와 멤베이스(Membase), 열 기반의 빅테이블(Bigtable)과 에이치베이스(Hbase), 카산드라(Cassandra), 문서 기반의 코치DB(CouchDB), 몽고DB(MongoDB) 등이 있다.Typical examples of NoSQL include Dynamo and Membase based on a key value model, Bigtable and Hbase based on column, Cassandra, and document based CoachDB. , And MongoDB.

그러나, 이러한 데이터베이스들은 일반적으로 텍스트 정렬 형태의 인덱싱 방식을 이용하여 공간 정보와 같은 빅 데이터의 저장 및 검색을 수행하게 되는데, 텍스트 정렬 형태의 인덱싱 방식은 수백 테라 바이트에 해당하는 공간 정보를 모두 검색하여야 하므로, 공간 데이터의 검색에 많은 시간이 소요되는 단점이 있다.However, these databases generally use a text-aligned indexing method to store and retrieve big data such as spatial information. In the text-aligned indexing method, all spatial information corresponding to hundreds of terabytes must be retrieved. Therefore, there is a disadvantage in that a lot of time is required for searching for spatial data.

1. 한국등록특허 제10-1712925호 (명칭 : 영상과 위치정보를 연계한 데이터베이스를 구축하는 방법, 상기 데이터베이스를 활용하여 측위하는 방법, 및 상기 방법들을 수행하는 전자 장치)1. Korean Registered Patent No. 10-1712925 (Name: A method of building a database linking images and location information, a method of positioning using the database, and an electronic device performing the methods) 2. 한국등록특허 제10-1585146호 (명칭 : 오브젝트를 복수개의 데이터 노드들의 위치에 기반하여 분산 저장하는 분산 저장 시스템 및 그 위치 기반 분산 저장 방법 및 컴퓨터에 의하여 독출 가능한 저장 매체)2. Korean Registered Patent No. 10-1585146 (Name: Distributed storage system for storing and storing objects based on the positions of a plurality of data nodes, location-based distributed storage method and a computer-readable storage medium)

이에, 본 발명은 상기한 사정을 감안하여 창출된 것으로, 본 발명은 지구 좌표계 기반의 대용량 공간 데이터에 대해 그 위치 좌표를 2차원 평면 기반의 인덱싱 좌표로 변환하여 일련번호를 생성하고, 일련번호와 연결되도록 에이치베이스의 컬럼 패밀리에 분산 저장함과 더불어, 지구 좌표계 기반의 공간 정보를 2차원 평면 기반의 인덱싱 좌표로 변환하여 에이치베이스에서 해당 인덱싱 좌표 일련번호에 연결된 공간 정보를 검색함으로써, 대용량 공간 데이터에 대한 정보 저장 및 정보 검색을 보다 신속하게 할 수 있도록 해 주는 에이치베이스를 이용한 대용량 공간 데이터의 저장 및 검색 방법을 제공함에 그 기술적 목적이 있다. Accordingly, the present invention was created in view of the above circumstances, and the present invention generates serial numbers by converting the position coordinates of the large-scale spatial data based on the global coordinate system into two-dimensional plane-based indexing coordinates, In addition to storing distributed data in HBase's column family, HBS also transforms spatial information based on the global coordinate system into two-dimensional plane-based indexing coordinates, and retrieves spatial information associated with the corresponding indexing coordinate serial number in HBase. The technical purpose of this method is to provide a method of storing and retrieving a large amount of spatial data using HBase, which enables faster information storage and retrieval.

상기 목적을 달성하기 위한 본 발명의 일측면에 따르면, 분산서버에서 에이치베이스를 이용하여 대용량 공간데이터를 저장 및 검색하는 방법에 있어서, 저장 요청된 대용량 공간 데이터에 대한 지구좌표계 기반의 위치 좌표를 2차원 평면 기반의 인덱싱 좌표로 변환하고, 인덱싱 좌표를 이용하여 생성된 공간분할 인덱스 일련번호를 생성하며, 공간분할 인덱스 일련번호와 매칭되는 에이치베이스(HBase)의 컬럼 패밀리에 객체 정보와 속성 정보를 포함하는 공간 데이터를 저장하는 공간데이터 저장단계와, 검색 요청된 대용량 공간 데이터에 대한 지구좌표계 기반의 위치 좌표를 2차원 평면 기반의 인덱싱 좌표로 변환하고, 인덱싱 좌표에 의해 생성된 검색 영역에 대한 공간분할 인덱스 일련번호 목록을 생성하며, 에이치베이스(HBase)에서 공간분할 인덱스 일련번호와 매칭되는 컬럼 패밀리를 호출하여 해당 공간 정보를 검색하는 공간데이터 검색단계를 포함하여 구성되고, 상기 공간데이터 저장단계와 공간데이터 검색단계는 분산서버에서 2차원 평면을 지구 구면체의 모든 경도값과 위도값을 포함하도록 좌하단 좌표값을 (-180,-90), 우상단 좌표값을 (180,90)으로 설정한 상태에서 기 설정된 개수로 분할하여 일정 공간 영역을 갖는 다수의 셀을 생성하고, 좌하단에 위치하는 셀의 좌표값을 (0,0)으로 설정한 후 이를 기준으로 우측으로 진행하면서 X축 좌표값이 "1"씩 증가하고, 상측으로 진행하면서 Y축 좌표값이 "1"씩 증가하도록 셀별 좌표값을 설정함으로써, 지구 좌표계 위치 좌표를 2차원 평면 기반의 인덱싱 좌표로 변환하는 것을 특징으로 하는 에이치베이스를 이용한 대용량 공간 데이터의 저장 및 검색 방법이 제공된다.According to an aspect of the present invention for achieving the above object, in a method for storing and retrieving a large amount of spatial data using an HBase in a distributed server, geocoordinate system-based position coordinates for the stored large-size spatial data is 2 Convert to dimensional plane-based indexing coordinates, generate spatially generated index serial numbers using indexing coordinates, and include object and attribute information in HBase's column family that matches spatially partitioned index serial numbers. Spatial data storage step of storing the spatial data, and converts the position coordinates based on the geo-coordinate system for the searched large-scale spatial data into a two-dimensional plane-based indexing coordinates, the spatial partitioning for the search region generated by the indexing coordinates Create a list of index serial numbers, and spatially partitioned indexes in HBase It comprises a spatial data retrieval step of retrieving the corresponding spatial information by calling a column family matching the serial number, wherein the spatial data storage step and the spatial data retrieval step is a two-dimensional plane in the distributed server all the hardness values of the globe sphere In the state where the lower left coordinate value is set to (-180, -90) and the upper right coordinate value is set to (180,90) to include the and latitude values, a plurality of cells having a predetermined spatial area are generated by dividing the predetermined number. After setting the coordinate value of the cell located at the lower left corner to (0,0), proceed to the right based on this, and increase the X-axis coordinate value by "1", and proceed to the upper side and the Y-axis coordinate value is "1". Storage and retrieval of large spatial data using HBase, which sets the coordinate value of each cell to increase by ", converting the global coordinate system position coordinates into 2D plane-based indexing coordinates. A method is provided.

또한, 상기 공간 데이터에 대한 지구 좌표계 위치 좌표를 2차원 평면 인덱싱 좌표로 변환하는 과정은 다수의 LOD 레벨에 대해 각각 수행되되, 하위 레벨은 그 상위 레벨의 2배씩 셀의 개수가 증가하도록 2차원 평면 인덱싱 좌표가 설정되는 것을 특징으로 하는 에이치베이스를 이용한 대용량 공간 데이터의 저장 및 검색 방법이 제공된다.In addition, the process of converting the global coordinate system position coordinates for the spatial data into two-dimensional plane indexing coordinates is performed for each of a plurality of LOD levels, but the lower level is a two-dimensional plane so that the number of cells increases by twice the upper level. Provided are a method for storing and retrieving a large amount of spatial data using HBase, wherein indexing coordinates are set.

또한, LOD 레벨을 갖는 공간 데이터에 대해, 상기 공간분할 인덱스 일련번호는 해당 공간 데이터의 LOD 레벨과, X축 방향 인덱싱 좌표값과 Y축 방향 인덱싱 좌표값으로 이루어지고, 에이치베이스의 컬럼 패밀리는 공간분할 인덱스 일련번호와 동일한 식별자를 갖도록 설정되어 해당 공간 데이터의 공간분할 인덱스 일련번호와 동일 식별자를 갖는 컬럼 패밀리에 저장되는 것을 특징으로 하는 대용량 공간 데이터의 저장 및 검색 방법이 제공된다.In addition, for spatial data having an LOD level, the spatial division index serial number is composed of an LOD level of the corresponding spatial data, an X-axis indexing coordinate value, and a Y-axis indexing coordinate value, and the column family of HBase is spatial. Provided is a method of storing and retrieving a large amount of spatial data, the method being set to have the same identifier as the partition index serial number and stored in a column family having the same identifier as the spatial partition index serial number of the corresponding spatial data.

또한, 상기 공간 데이터 저장단계는 공간 데이터의 지구 좌표계 기반의 위치 좌표값을 근거로 해당 공간 데이터에 대한 대표점을 추출하고, 추출된 대표점에 대한 인덱싱 좌표를 이용하여 공간분할 인덱스 일련번호를 생성하되, 점 형태의 공간 데이터는 그 점을 대표점으로 추출하고, 단일 라인 형태의 공간 데이터는 그 선의 중심을 대표점으로 추출하며, 폴리라인 형태의 공간 데이터는 그 폴리라인에 의해 형성되는 면적의 무게 중심을 대표점을 추출하고, 면 형태의 공간 데이터는 그 내부 면적의 무게 중심을 대표점으로 추출하는 것을 특징으로 하는 대용량 공간 데이터의 저장 및 검색 방법이 제공된다.The storing of the spatial data may include extracting a representative point for the corresponding spatial data based on a position coordinate value based on the global coordinate system of the spatial data, and generating a spatial partition index serial number using the indexing coordinates for the extracted representative point. However, spatial data in the form of a point extracts the point as a representative point, spatial data in the form of a single line extracts the center of the line as the representative point, and spatial data in the form of a polyline is formed of the area formed by the polyline. There is provided a method for storing and retrieving a large volume of spatial data, wherein a representative point is extracted from the center of gravity, and the spatial data in the form of a face is extracted from the center of gravity of the inner area.

또한, 상기 공간 데이터 저장단계는 에이치베이스에 대표점에 대한 공간분할 인덱스 일련번호와 매칭되는 컬럼 패밀리를 생성하고, 해당 컬럼 패밀리에는 해당 대표점을 갖는 공간 데이터의 셀 단위의 공간정보가 파일 형태로 저장되는 것을 특징으로 하는 대용량 공간데이터의 저장 및 검색 방법이 제공된다.In addition, the storing of the spatial data generates a column family matching to the spatial partition index serial number of the representative point in the HBase, the spatial information of the cell unit of the spatial data having the representative point in the column family in the form of a file Provided are a method for storing and retrieving a large amount of spatial data, which is stored.

또한, 상기 공간 데이터 검색단계는 다수의 좌표점을 갖는 폴리 라인 또는 면 형태의 공간데이터에 대해 X축 좌표값이 최소인 경계 좌표값을 포함하는 좌하단 인덱싱 좌표와, Y축 좌표값이 최대인 경계 좌표값을 포함하는 우상단 인덱싱 좌표를 추출하고, 이 좌하단 인덱싱 좌표와 우상단 인덱싱 좌표를 이용하여 공간 영역을 포함하는 최소 검색 영역을 설정한 후, 최소 검색 영역에 해당하는 각 셀에 대한 공간분할 인덱스 일련번호를 생성함으로써, 검색 요청 공간 데이터에 대한 공간분할 인덱스 일련번호 목록을 생성하는 것을 특징으로 하는 대용량 공간데이터의 저장 및 검색 방법이 제공된다.In addition, the spatial data retrieval step may include a lower left indexing coordinate including a boundary coordinate value having a minimum X-axis coordinate value and a maximum Y-axis coordinate value for a polyline or plane-type spatial data having a plurality of coordinate points. After extracting the upper right indexing coordinates including the boundary coordinates, using the lower left indexing coordinates and the upper right indexing coordinates, the minimum search region including the spatial region is set, and the spatial partitioning for each cell corresponding to the minimum search region is performed. By generating an index serial number, there is provided a method of storing and retrieving a large amount of spatial data, which generates a list of spatially divided index serial numbers for search request spatial data.

또한, 상기 공간 데이터 검색단계는 다수의 공간분할 인덱스 일련번호를 갖는 공간 데이터에 대해 적어도 둘 이상의 분산 컴퓨터를 이용하여 서로 다른 공간분할 인덱스 일련번호에 대한 영역을 분산 검색하되, 각 분산 컴퓨터는 에이치 베이스에서 제공되는 테이블 스플리트(Table Split) 기능을 이용하여 자신에게 할당된 공간분할 인덱스 일련번호가 식별자인 컬럼 패밀리에 접근한 후, 에이치 베이스에서 제공되는 테이블 맵퍼(Table Mapper)와 테이블 리듀서(Table Reducer)기능을 이용하여 자신에게 할당된 영역에서의 객체 의 위치 관계에 대한 검색을 수행하는 것을 특징으로 하는 대용량 공간데이터의 저장 및 검색 방법이 제공된다.In addition, the spatial data retrieval step may search for spatial data having different spatial partition index serial numbers by using at least two distributed computers for spatial data having a plurality of spatial partition index serial numbers. After accessing a column family whose identifier is a serial partition index serial number assigned to it by using the table split function provided by the user, a table mapper and a table reducer provided in HBase are provided. There is provided a method of storing and retrieving a large amount of spatial data, wherein a search for a positional relationship of an object in a region allocated to the user is performed using the "

또한, 상기 공간데이터 검색단계는 분산 컴퓨터에서 폴리 라인 또는 폴리곤 형태의 공간데이터에 대한 인덱싱 좌표에 해당하는 각 셀에 대하여 해당 셀에 위치하는 기준 인덱싱 좌표점과, 이 기준 인덱싱 좌표점과 연결되는 서로 다른 방향에 인접하는 제1 인접인덱싱 좌표점과 제2 인접 인덱싱 좌표점을 연결하는 라인과 해당 셀 경계가 이루는 공간상에 오버랩 되거나 포함되는 객체정보를 검색하는 것을 특징으로 하는 대용량 공간데이터의 저장 및 검색 방법이 제공된다. The spatial data retrieval step may include a reference indexing coordinate point located in the cell for each cell corresponding to the indexing coordinates of the polyline or polygonal spatial data in a distributed computer, and the reference indexing coordinate point. Storage of a large amount of spatial data, characterized by retrieving object information overlapping or included in a space formed by a line connecting a first adjacent indexing coordinate point and a second adjacent indexing coordinate point adjacent to another direction and a corresponding cell boundary; A search method is provided.

본 발명에 의하면, 지구 규모의 대용량 데이터를 NoSQL 형태의 데이터베이스인 에이치베이스(HBase)에 컬럼 패밀리(Column Family)관리를 통해 공간 인덱싱하여 저장함으로써, 자료 조회를 보다 신속하게 처리할 수 있다. According to the present invention, data inquiry can be processed more quickly by spatially indexing and storing large-scale data on a global scale in HBase, a NoSQL-type database, through column family management.

또한, 데이터에 대한 신속한 접근은 대용량 데이터를 신속하게 분석하고 통계를 구하고 활용할 수 있는 기반 기술이기 때문에 기존의 공간 정보 서비스 기술을 빅데이터 클라우드 기반의 분산 서비스 환경에도 용이하게 적용하여 실시할 수 있다. In addition, since rapid access to data is a foundational technology that can rapidly analyze large amounts of data, obtain statistics, and utilize it, the existing spatial information service technology can be easily applied to a big data cloud-based distributed service environment.

도1은 본 발명이 적용되는 에이치베이스(HBase)에서의 공간정보 저장 및 검색 시스템의 구성을 기능적으로 분리하여 나타낸 도면.
도2는 본 발명에 따른 에이치베이스(HBase)에서의 공간정보 저장방법을 설명하기 위한 도면.
도3은 본 발명에 따른 에이치베이스(HBase)에서의 공간정보 검색방법을 설명하기 위한 도면.
도4는 도2에서 공간 데이터 종류별 대표점 추출 방법을 설명하기 위한 도면
도5는 도2에서 지구 구면 기반의 대용량 공간 데이터를 평면 좌표로 변환하는 인덱싱 구조를 설명하기 위한 도면.
도6은 도2에서 면 형태의 공간 데이터(Z)를 2차원 평면상에 매칭시킨 형태를 예시하여 대표점에 대한 인덱싱 좌표를 생성하는 과정을 설명하기 위한 도면.
도7은 도2에서 에이치베이스(200) 테이블에 컬럼 패밀리를 공간분할 인덱스 일련번호로 생성하고 저장하는 과정을 설명하기 위한 도면.
도8은 도3에서 면 형태의 공간 데이터에 대한 다수의 경계 좌표값으로 이루어지는 공간 영역의 공간분할 인덱스 일련번호 목록이 생성되는 과정을 설명하기 위한 도면.
도9는 도3에서 클라우드 분산 환경에서 공간영역 검색에 대한 분산 작업을 수행하는 과정을 설명하기 위한 도면.1 is a diagram functionally separating the configuration of a spatial information storage and retrieval system in HBase to which the present invention is applied;
2 is a view for explaining a method of storing spatial information in HBase according to the present invention.
Figure 3 is a view for explaining a spatial information retrieval method in HBase according to the present invention.
FIG. 4 is a diagram for describing a representative point extraction method for each type of spatial data in FIG. 2. FIG.
FIG. 5 is a view for explaining an indexing structure for converting large-volume spatial data based on a globe spherical surface into planar coordinates in FIG. 2; FIG.
FIG. 6 is a view for explaining a process of generating indexing coordinates for a representative point by exemplifying a form in which spatial data Z having a surface form is matched on a two-dimensional plane in FIG.
FIG. 7 is a view for explaining a process of generating and storing a column family as a spatial partition index serial number in an HBase table in FIG. 2; FIG.
FIG. 8 is a view for explaining a process of generating a list of spatial division index serial numbers of a spatial area including a plurality of boundary coordinate values for spatial data in a plane form in FIG.
9 is a view for explaining a process of performing a distributed operation for searching a spatial domain in a cloud distributed environment in FIG.

이하에서는 첨부된 도면을 참조하여 본 발명을 보다 상세하게 설명한다. 도면들 중 동일한 구성요소들은 가능한 한 어느 곳에서든지 동일한 부호로 나타내고 있음을 유의해야 한다. 한편, 이에 앞서 본 명세서 및 특허청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시예에 불과할 뿐이고, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형예들이 있을 수 있음을 이해하여야 한다.Hereinafter, with reference to the accompanying drawings will be described in detail the present invention. It should be noted that the same elements in the figures are denoted by the same reference signs wherever possible. On the other hand, the terms or words used in the present specification and claims are not to be construed as limiting the ordinary or dictionary meanings, the inventors should use the concept of the term in order to explain the invention in the best way. It should be interpreted as meanings and concepts corresponding to the technical idea of the present invention based on the principle that it can be properly defined. Therefore, the embodiments described in the present specification and the configuration shown in the drawings are only the most preferred embodiments of the present invention, and do not represent all of the technical ideas of the present invention, and various alternatives may be substituted at the time of the present application. It should be understood that there may be equivalents and variations.

도1은 본 발명이 적용되는 에이치베이스(HBase)를 이용한 대용량 공간데이터의 저장 및 검색 시스템의 구성을 기능적으로 분리하여 나타낸 도면이다.1 is a diagram functionally separating the configuration of a system for storing and retrieving a large amount of spatial data using HBase to which the present invention is applied.

도1에 도시된 바와 같이 에이치베이스(HBase)를 이용한 대용량 공간데이터의 저장 및 검색 시스템은 공간데이터 저장부(110) 및 공간데이터 검색부(120)를 포함하는 분산 서버(100)와, 에이치베이스(HBase, 200)를 포함하여 구성된다. As shown in FIG. 1, a system for storing and retrieving a large amount of spatial data using HBase includes a distributed server 100 including a spatial data storage unit 110 and a spatial data retrieval unit 120, and HBase. (HBase, 200).

분산 서버(100)는 클라우드 분산 환경 기반에서 공간 정보를 인덱싱하여 NoSQL 데이터베이스인 에이치베이스(200)에 물리적으로 가까운 거리상에 위치하도록 저장하고, 인덱스에 속하는 저장 파일에서 원하는 공간정보를 다수의 분산 컴퓨터를 이용하여 병렬 방식으로 검색한다.The distributed server 100 indexes spatial information based on a cloud distributed environment and stores the spatial information so as to be located at a physically close distance to HBase 200, which is a NoSQL database, and stores desired spatial information in a storage file belonging to the index. Search in parallel.

이때, 공간데이터 저장부(110)는 저장 요청된 대용량 공간 데이터에 대한 지구좌표계 기반의 위치 좌표를 2차원 평면 기반의 인덱싱 좌표로 변환하고, 인덱싱 좌표를 이용하여 생성된 공간분할 인덱스 일련번호에 매칭되는 에이치베이스(HBase)의 컬럼 패밀리에 객체 정보와 속성 정보를 포함하는 파일 형태의 공간 데이터를 저장한다. 그리고, 공간데이터 저장부(110)는 대용량 공간 데이터를 한 번에 관리하기에 무리가 있기 때문에, 공간 데이터를 정밀도에 따라서 다수의 LOD(Level Of Detail) 레벨로 분할하여 저장한다. At this time, the spatial data storage unit 110 converts the coordinates of the geocoordinate system-based position for the massive spatial data requested for storage into a two-dimensional plane-based indexing coordinate, and matches the spatial partition index serial number generated using the indexing coordinates. Stores spatial data in a file format that includes object information and attribute information in HBase column family. In addition, since the spatial data storage unit 110 is difficult to manage a large amount of spatial data at one time, the spatial data storage unit 110 divides and stores the spatial data into a plurality of LOD (Level Of Detail) levels according to the precision.

공간데이터 검색부(120)는 검색 요청된 대용량 공간 데이터에 대한 지구좌표계 기반의 위치 좌표를 2차원 평면 기반의 인덱싱 좌표로 변환하고, 인덱싱 좌표에 의해 생성된 검색 영역에 대한 공간분할 인덱스 일련번호 목록을 생성하며, 에이치베이스(HBase)에서 공간분할 인덱스 일련번호와 매칭되는 컬럼 패밀리를 호출하여 해당 공간 정보를 검색한다. The spatial data search unit 120 converts the position coordinates based on the geocoordinate system for the large-scale spatial data requested to be converted into the two-dimensional plane-based indexing coordinates, and lists the spatial partition index serial numbers for the search area generated by the indexing coordinates. In order to retrieve the spatial information, HBase calls the column family matching the spatial partition index serial number.

일반적으로 공간 데이터에서 형상을 이루는 객체 정보는 예컨대, 전국의 건물에 대한 외곽선 또는 단면에 대한 정보를 가지며, 시설물 데이터의 경우 1개의 건물 단면도는 공간적인 평면들로 구성되고, 각 평면은 폐합된 폴리곤 형태의 공간 정보로 이루어진다. 즉, 하나의 건물정보는 다수의 폴리곤 형태의 공간정보로 이루어진다.In general, the object information forming the shape in the spatial data has information on the outline or section of the building of the whole country, for example, in the case of facility data, one building cross section is composed of spatial planes, and each plane is a closed polygon. Form of spatial information. That is, one building information consists of a plurality of polygonal spatial information.

그리고, 컬럼 패밀리는 에이치베이스(200)에 물리적으로 연결되는 다수의 파일(HFILE)들로 이루어진다. The column family consists of a plurality of files (HFILEs) that are physically connected to the HBase 200.

즉, 본 발명에서는 동일한 공간분할 인덱스 일련번호에 속하는 공간데이터는 같은 컬럼 패밀리에 저장되는 바, 물리적으로 같은 영역에 해당 공간 데이터가 저장됨으로 인해 공간 데이터의 저장 및 검색 속도가 향상되는 것이다.That is, in the present invention, since the spatial data belonging to the same spatial partition index serial number is stored in the same column family, the spatial data is stored in the same region physically, thereby improving the storage and retrieval speed of the spatial data.

이어 도2와 도3을 참조하여 본 발명의 제1 실시예에 따른 에이치베이스를 이용한 대용량 공간 데이터의 저장 및 검색 방법을 설명한다. Next, a method of storing and retrieving a large amount of spatial data using HBase according to the first embodiment of the present invention will be described with reference to FIGS. 2 and 3.

먼저, 도2를 참조하여 에이치베이스를 이용한 대용량 공간 데이터 저장 방법을 설명한다.First, a method of storing a large amount of spatial data using HBase will be described with reference to FIG. 2.

분산 서버(100) 보다 상세히는, 공간데이터 저장부(110)는 공간데이터 저장 요구에 대해 공간 영역에 대한 대표점을 추출한다(ST110). 이때, 공간데이터 저장부(110)는 저장 요구되는 공간 데이터의 모양에 따라 서로 다른 방식으로 대표점을 추출한다.In more detail, the distributed server 100, the spatial data storage unit 110 extracts a representative point for the spatial region in response to the spatial data storage request (ST110). At this time, the spatial data storage unit 110 extracts the representative point in different ways according to the shape of the spatial data to be stored.

도4는 공간 데이터 종류별 대표점 추출 방법을 설명하기 위한 도면이다. 도4에 도시된 바와 같이 공간 데이터는 대표적으로 점(A), 폴리 라인(B), 폴리곤(C)의 형태로 구분될 수 있다. 여기서, 도4에 도시되지는 않았지만, 공간 데이터는 단일 라인의 형태가 될 수도 있으며, 단일 라인에 대해서는 그 라인상의 중심점을 대표점으로 설정한다.4 is a diagram illustrating a representative point extraction method for each type of spatial data. As shown in FIG. 4, the spatial data may be typically classified into the form of a point A, a polyline B, and a polygon C. FIG. Although not shown in FIG. 4, the spatial data may be in the form of a single line, and the center point on the line is set as the representative point for the single line.

그리고, 도4 (A)와 같은 점(X0,Y0) 형태의 공간 데이터에 대해서는 해당 점 좌표를 대표점 좌표(X_C,Y_C)로 설정한다. And, for spatial data in the form of points (X0, Y0) as shown in Fig. 4A, the point coordinates are set as representative point coordinates (X _C , Y _C ).

또한, 도4 (B)의 폴리라인과 (C)의 폴리곤 형태의 공간 데이터에 대해서는 하기의 수학식1에 따라 각 대표점 좌표(X_C,Y_C)를 산출한다.In addition, for the spatial data of the polyline of FIG. 4B and the polygonal shape of (C), each representative point coordinate (X _C , Y _C ) is calculated according to the following equation (1).

여기서, N은 좌표점의 개수이고, i는 좌표점 순서이다. Where N is the number of coordinate points and i is the coordinate point order.

즉, 폴리라인(도4의 (B))과 폴리곤(도4의 (C))은 해당 면적에서의 무게 중심점을 대표점으로 설정한다. 이때, 폴리라인(도4의 (B))에 대해서는 각 라인을 폐합된 형태로 연결하여 수학식1을 통해 면적을 구한 후, 이 면적에서의 중심점을 대표점으로 설정한다.In other words, the polyline (Fig. 4B) and the polygon (Fig. 4C) set the center of gravity at the area as a representative point. At this time, for the polyline (Fig. 4 (B)) by connecting each line in a closed form to obtain the area through the equation (1), the center point in this area is set as the representative point.

이어, 공간데이터 저장부(110)는 상기 ST110 단계에서 추출된 지구 좌표계 기반의 대표점 위치좌표를 2차원 평면 기반의 인덱싱 좌표로 변환한다(ST120). Subsequently, the spatial data storage unit 110 converts the representative point position coordinates based on the global coordinate system extracted in step ST110 into two-dimensional plane-based indexing coordinates (ST120).

이때, 공간 데이터에 대한 지구 좌표계 위치 좌표를 2차원 평면 인덱싱 좌표로 변환하는 과정은 N단계의 LOD 레벨에 대해 각각 수행하며, 하위 레벨은 그 상위 레벨의 2배씩 셀의 개수가 증가하도록 2차원 평면 인덱싱 좌표가 설정된다. At this time, the process of converting the coordinates of the global coordinate system position for the spatial data into the two-dimensional plane indexing coordinates is performed for each LOD level of N levels, and the lower level is the two-dimensional plane so that the number of cells increases by twice the upper level. Indexing coordinates are set.

또한, 공간데이터 저장부(110)는 공간 데이터에 대해 LOD 레벨을 설정하고, 해당 레벨에서 1개 셀의 가로, 세로 간격을 산출하며, 대표점 좌표값에 대한 인덱싱 좌표를 셀의 좌하단 기준점을 (-180,-90)으로 설정하여 산출한다.In addition, the spatial data storage unit 110 sets the LOD level for the spatial data, calculates horizontal and vertical intervals of one cell at the corresponding level, and indexes coordinates for the representative point coordinate value to the lower left reference point of the cell. It is calculated by setting to (-180, -90).

도5에는 지구 구면 기반의 대용량 공간 데이터를 평면 좌표로 변환하는 인덱싱 구조를 설명하기 위한 도면이 예시되어 있다. FIG. 5 is a diagram illustrating an indexing structure for converting a large amount of spatial data based on the earth's sphere into planar coordinates.

도5를 참조하면, 기본적으로 지구 규모의 공간을 가로 방향 10칸, 세로 방향으로 5칸으로 나누어서 50개의 직사각형의 셀로 분할한다. 이때, 공간 데이터는 다단의 LOD 레벨에 대해 설정될 수 있다. 즉, 2차원 도면 정보에 대해 지구 구면체의 모든 경도값과 위도값을 포함하도록 좌하단 좌표값을 (-180,-90), 우상단 좌표값을 (180,90)으로 설정하고, 공간 정보를 기 설정된 개수에 대응하도록 분할하여 일정 공간 영역을 갖는 다수의 셀을 생성하고, 분할된 셀에 대해 좌하단에 위치하는 셀 좌표값을 (0,0)으로 설정한 후 이를 기준으로 우측으로 진행하면서 X축 좌표값이 "1"씩 증가하고, 상측으로 진행하면서 Y축 좌표값이 "1"씩 증가하도록 셀별 좌표값을 생성한다. 도5에는 0레벨에서 각 좌하단의 (0,0)에서 우상단(9,4)까지 50개 셀이 형성되고, 하위 레벨의 셀은 바로 상위 레벨의 셀보다 가로 세로 모두 2배씩 증가한다. 즉, N레벨의 경우 좌하단 (0,0), 좌상단(0,(5×2^N)-1), 우하단((10×2^N)-1,0), 우상단((10×2^N)-1, (5×2^N)-1)으로, ((5×2^N)-1)×((10×2^N)-1) 개의 셀로 분할된다. Referring to FIG. 5, the earth-sized space is basically divided into 10 rectangular cells and 5 rectangular cells by 50 cells. In this case, the spatial data may be set for the multi-level LOD level. That is, the lower left coordinate value is set to (-180, -90) and the upper right coordinate value is set to (180,90) to include all the longitude and latitude values of the globe spherical body with respect to the two-dimensional drawing information. Create a plurality of cells having a certain spatial area by dividing to correspond to the set number, and set the cell coordinate value located at the bottom left of the divided cells to (0,0) and proceed to the right based on the X The axis coordinate value is increased by "1", and the cell-specific coordinate value is generated such that the Y axis coordinate value is increased by "1" while proceeding upward. In FIG. 5, 50 cells are formed from (0,0) in the lower left corner to the upper right (9,4) in the zero level, and the cells in the lower level are doubled both horizontally and vertically than the cells in the upper level. That is, in the case of N-level bottom left (0,0), the upper left corner ^{(0, (5 × 2 N} ) -1), the bottom-right corner ^{((10 × 2 N) -1,0} ), upper right ((10 × 2 ^N ), And (5 x 2 ^N ) -1) into ((5 x 2 ^N ) -1) x ((10 x 2 ^N ) -1) cells.

도6은 면 형태의 공간 데이터(Z)를 2차원 평면상에 매칭시킨 형태를 예시한 도면으로, 이를 참조하여 대표점에 대한 인덱싱 좌표를 생성하는 과정을 설명한다. FIG. 6 is a diagram illustrating a form in which spatial data Z in a plane form is matched on a two-dimensional plane, and a process of generating indexing coordinates for a representative point will be described with reference to the figure.

도6을 참조하면, N레벨에서 인덱스 영역 즉, 셀에 대한 크기(Δx, Δy)는 하기 수학식 2와 같이 산출된다.Referring to FIG. 6, the sizes Δx and Δy of an index area, that is, a cell at N level are calculated as in Equation 2 below.

그리고, N 레벨에서 공간 데이터에 대한 대표점 좌표(X_C,Y_C)는 수학식 3과 같이 산출된다.The representative point coordinates (X _C , Y _C ) for the spatial data at the N level are calculated as in Equation 3.

이후, 공간데이터 저장부(110)는 대표점에 대한 공간분할 인덱스 일련번호를 생성한다(ST130). 공간분할 인덱스 일련번호는 "레벨(2자리)+X축방향 인덱싱 좌표(8자리)+Y축방향 인덱싱 좌표(8자리)"의 형태로 생성되고, 각 자리값에서 할당되지않는 자리는 "0"으로 채운다. 예컨대, LOD 레벨이 "8"(08) 이고, X축 방향 인덱싱 좌표가 234567(00234567), 가로 방향 인덱싱 좌표가 98345(00098345)인 경우, "080023456700098345"의 공간분할 인덱스 일련번호가 생성된다. Thereafter, the spatial data storage unit 110 generates a spatial division index serial number for the representative point (ST130). The spatial partition index serial number is generated in the form of "Level (2 digits) + X-axis indexing coordinates (8 digits) + Y-axis indexing coordinates (8 digits)", and an unassigned digit in each digit is "0." Fill with " For example, when the LOD level is "8" (08), the X-axis indexing coordinate is 234567 (00234567), and the horizontal indexing coordinate is 98345 (00098345), a spatial division index serial number of "080023456700098345" is generated.

그리고, 공간데이터 저장부(110)는 공간분할 인덱스 일련번호에 매칭되는 에이치베이스(200)의 해당 컬럼 패밀리 테이블을 생성하여 해당 공간 데이터를 저장한다(ST140). The spatial data storage unit 110 generates a corresponding column family table of the HBase 200 that matches the spatial partition index serial number and stores the corresponding spatial data (ST140).

이때, 공간데이터 저장부(120)는 에이치베이스(200)에 공간분할 인덱스 일련번호와 동일한 식별자를 갖도록 컬럼 패밀리 테이블을 생성하고, 해당 컬럼 패밀리 테이블에 해당 공간 데이터를 HFILE 형태로 저장한다. At this time, the spatial data storage unit 120 generates a column family table to have the same identifier as the spatial partitioning index serial number in the HBase 200, and stores the spatial data in the column family table in the form of an HFILE.

도7에는 에이치베이스(200) 테이블에 컬럼 패밀리를 공간분할 인덱스 일련번호로 생성하고 저장하는 과정을 설명하기 위한 도면이 예시되어 있다.FIG. 7 is a diagram illustrating a process of generating and storing a column family as a spatial partitioning index serial number in the HBase table 200. Referring to FIG.

도7을 참조하면, "0" 레벨에서 좌하단(0,0)부터 우상단(9,4) 영역의 공간 데이터에 대해 0 레벨(0L)부터 2레벨(2L)까지의 공간분할 인덱스 일련번호를 생성하고, 에이치베이스(200)에 제2 레벨(2L)에 대한 각 공간분할 인덱스 일련번호와 동일한 식별자를 갖는 컬럼 패밀리(CF)를 생성하며, 각 컬럼 패밀리(CF)에는 해당 셀을 대표점으로 하는 공간 데이터의 모양(좌표값 등) 등의 객체정보와 건물명, 주소, 전화번호 등의 속성정보를 포함하는 다수의 셀 정보가 HFILE로 저장된다.Referring to FIG. 7, the spatial partitioning index serial numbers from the 0 level (0L) to the 2 level (2L) for the spatial data in the lower left (0,0) to the upper right (9,4) region at the level "0" Create a column family CF having the same identifier as each spatial division index serial number for the second level 2L in HBase 200, and for each column family CF a corresponding cell as a representative point. A plurality of cell information including object information such as shape (coordinate value, etc.) of spatial data and attribute information such as building name, address, and telephone number are stored as HFILE.

즉, 공간데이터 저장부(110)는 공간 데이터에 해당하는 대표점에 기반하여 컬럼 패밀리를 생성하고, 각 컬럼 패밀리에는 해당 대표점을 갖는 공간 데이터의 공간 영역에 해당하는 각 셀에 대한 HFILE이 저장된다. That is, the spatial data storage unit 110 generates a column family based on a representative point corresponding to the spatial data, and the HFILE for each cell corresponding to the spatial area of the spatial data having the corresponding representative point is stored in each column family. do.

이어, 도3을 참조하여 에이치베이스를 이용한 대용량 공간 데이터 검색 방법을 설명한다.Next, a method of retrieving a large amount of spatial data using HBase will be described with reference to FIG. 3.

먼저, 공간데이터 검색부(120)는 외부로부터 공간 데이터에 대한 검색 요구가 수신되면, 해당 공간 영역에 대한 경계 좌표값을 추출한다(ST210).First, when the spatial data search unit 120 receives a search request for spatial data from the outside, the spatial data search unit 120 extracts a boundary coordinate value for the corresponding spatial region (ST210).

이때, 공간 데이터는 도4에 도시된 바와 같이 점, 라인(폴리라인), 폴리곤 형태가 될 수 있으며, 도4의 (A)와 같이 점 형태의 공간 데이터에 대해서는 한 개의 경계 좌표값이 추출되고, 도4의 (B) 또는 (C)와 같이 일정 영역을 갖는 경우에는 다수의 경계 좌표값, 예컨대 6개(도4의 (B),(C))의 경계 좌표값이 추출될 수 있다.In this case, the spatial data may be in the form of a point, a line (polyline), or a polygon as shown in FIG. 4, and one boundary coordinate value is extracted for the point data as shown in FIG. In the case of having a certain area as shown in (B) or (C) of FIG. 4, a plurality of boundary coordinate values, for example, six (6 (B) and (C)) boundary coordinate values can be extracted.

그리고, 공간데이터 검색부(120)는 공간데이터 모양에 대응되는 지구 좌표계 기반의 각 경계 좌표값을 2차원 평면 기반의 인덱싱 좌표로 변환한다(ST220). The spatial data search unit 120 converts each boundary coordinate value based on the global coordinate system corresponding to the shape of the spatial data into a two-dimensional plane-based indexing coordinate (ST220).

그리고, 공간데이터 검색부(120)는 인덱싱 좌표를 이용하여 검색 영역을 설정하고, 검색 영역에 대한 인덱싱 좌표를 이용하여 공간분할 인덱스 일련번호 목록을 생성한다(ST230). 이때, 점 형태의 공간 데이터에 대한 한 개의 경계 좌표값에 대해서는 하나의 셀에 대한 공간분할 인덱스 일련번호 목록이 생성되고, 폴리 라인 또는 폴리곤 형태와 같이 다수의 경계 좌표값을 갖는 공간 데이터에 대해서는 다수 셀에 대응되는 공간분할 인덱스 일련번호들로 이루어지는 공간분할 인덱스 일련번호 목록이 생성된다.In addition, the spatial data search unit 120 sets a search region using the indexing coordinates, and generates a list of spatial division index serial numbers using the indexing coordinates of the search region (ST230). In this case, a list of spatial division index serial numbers for one cell is generated for one boundary coordinate value of the spatial data in the form of a point, and a plurality of spatial data having a plurality of boundary coordinate values such as a polyline or polygon form are generated. A space partition index serial number list consisting of space partition index serial numbers corresponding to a cell is generated.

도8에는 폴리곤 형태의 다수의 경계 좌표값으로 이루어지는 검색 영역의 설정 및 이에 대한 공간분할 인덱스 일련번호 목록을 생성되는 과정을 설명하기 위한 도면이 예시되어 있다. FIG. 8 is a diagram illustrating a process of setting a search region including a plurality of polygonal coordinate values and generating a list of spatial division index serial numbers.

도8을 참조하면, 공간데이터 검색부(120)는 LOD 레벨이 설정된 상태에서, X 축 좌표값이 최소인 경계 좌표값(X0,Y0)을 포함하는 좌하단 인덱싱 좌표(XL,YL)와, Y축 좌표값이 최대인 경계 좌표값(X3,Y3)을 포함하는 우상단 인덱싱 좌표(XR,YR)를 추출하고, 이 좌하단 인덱싱 좌표(XL,YL)와 우상단 인덱싱 좌표(XR,YR)를 이용하여 사각형태의 최소 검색 영역(S)을 설정한다. 그리고, 최소 검색 영역(S)에 해당하는 각 셀에 대한 공간분할 인덱스 일련번호 목록을 생성한다. 도8에는 검색 요청된 공간 영역(Z)에 대해 8 레벨에서 총 12개의 셀로 이루어지는 최소 검색 영역(S)이 설정되고, 이 최소 검색 영역(S)에 대응하여 12개의 공간분할 인덱스 일련번호가 생성된 예가 도시되어 있다.Referring to FIG. 8, the spatial data search unit 120 includes the lower left indexing coordinates XL and YL including boundary coordinate values X0 and Y0 having a minimum X-axis coordinate value while the LOD level is set. Extract the upper right indexing coordinates (XR, YR) including the boundary coordinate values (X3, Y3) having the largest Y-axis coordinate values, and extract the lower left indexing coordinates (XL, YL) and the upper right indexing coordinates (XR, YR). The minimum search area S of the rectangular shape is set. Then, a list of spatial division index serial numbers for each cell corresponding to the minimum search area S is generated. In FIG. 8, a minimum search area S consisting of a total of 12 cells at eight levels is set for the searched spatial area Z, and 12 spatially divided index serial numbers are generated corresponding to the minimum search area S. In FIG. An example is shown.

이어, 공간데이터 검색부(120)는 에이치베이스(200)에서 ST230단계에서 생성된 각 공간분할 인덱스 일련번호와 매칭되는 컬럼 패밀리에 접근함으로써, 검색 요청된 공간 데이터에 대한 검색처리를 수행한다(ST240).Subsequently, the spatial data retrieval unit 120 accesses a column family matching each spatial partition index serial number generated in step ST230 in the HBase 200 to perform a search process for the spatial data requested for search (ST240). ).

도9에는 클라우드 분산 환경에서 공간영역 검색 분산 작업을 수행하는 과정을 설명하기 위한 도면이 예시되어 있다. 도9에 도시된 바와 같이 분산 서버(100)는 하나의 검색 요청 공간 데이터에 대해 다수의 공간분할 인덱스 일련번호 목록이 생성된 경우, 다수의 분산 컴퓨터를 이용하여 서로 다른 공간 영역에 대한 분산 검색을 수행한다. 9 is a diagram illustrating a process of performing a spatial domain search distribution task in a cloud distributed environment. As illustrated in FIG. 9, the distributed server 100 performs distributed search for different spatial regions using a plurality of distributed computers when a plurality of spatial partition index serial number lists are generated for one search request spatial data. Perform.

보다 상세하게는 분산 컴퓨터는 에이치 베이스(200)에서 제공되는 테이블 스플리트(Table Split)기능을 이용하여 HFILE 들이 존재하는 공간분할 인덱스 일련번호가 식별자인 컬럼 패밀리에 접근한 후, 에이치 베이스(200)에서 제공되는 테이블 맵퍼(Table Mapper)와 테이블 리듀서(Table Reducer)기능을 이용하여 각 분산 컴퓨터에서 자신에게 할당된 영역에서의 객체와의 위치 관계에 대한 검색을 수행한다. 그리고, 각 분산 컴퓨터에서 검색된 결과를 취합함으로써, 검색 요청된 공간 데이터에 대한 결과정보를 획득한다. In more detail, the distributed computer accesses a column family whose identifier is a spatial partition index serial number in which HFILEs exist by using a table split function provided in the HBase 200, and then the HBase 200. Using the Table Mapper and Table Reducer functions provided in, search for the positional relationship with the objects in the area allocated to each distributed computer. Then, by collecting the results retrieved from each distributed computer, the result information for the spatial data requested to be retrieved is obtained.

즉, 본 발명에서는 검색 요청된 공간 데이터를 포함하는 최소 영역에 해당하는 컬럼 패밀리만을 선택하여 검색함으로써, 전체 저장 데이터를 검색하는 종래 텍스트 정렬 형태의 인덱싱 방식에 비해 보다 신속하게 목적하는 공간 데이터의 검색이 가능하게 된다. That is, in the present invention, by selecting and searching only the column family corresponding to the minimum area including the requested spatial data, the desired spatial data is searched more quickly than the conventional text sorting indexing method for searching the entire stored data. This becomes possible.

도9에는 제1 분산 컴퓨터(HTable Split 1)에서 ①②③④ 셀에 해당하는 공간분할 인덱스 일련번호의 컬럼 패밀리 테이블에서 검색을 수행하고, 제2 분산 컴퓨터(HTable Split 2)에서 ⑤⑥⑦⑧ 셀에 해당하는 공간분할 인덱스 일련번호의 컬럼 패밀리 테이블에서 검색을 수행하며, 제3 분산 컴퓨터(HTable Split 3)에서 ⑨⑩⑪⑫ 셀에 해당하는 공간분할 인덱스 일련번호의 컬럼 패밀리 테이블에서 검색을 수행한다. 즉, 클라우드 분산 환경에서 분산 기능을 이용하여 동시에 대용량 공간영역을 병렬로 검색하기 때문에 검색 서비스 속도가 보다 빨라질 수 있다. In FIG. 9, a search is performed on a column family table of spatial division index serial numbers corresponding to cells ①②③④ in the first distributed computer (HTable Split 1), and space division corresponding to cells ⑤⑥⑦⑧ in a second distributed computer (HTable Split 2). The search is performed on the column family table of the index serial number, and the search is performed on the column family table of the spatial partition index serial number corresponding to the cell ⑨⑩⑪⑫ in the third distributed computer (HTable Split 3). In other words, in the cloud distributed environment, the search service can be faster because the large capacity area is searched in parallel using the distributed function.

이때, 분산 컴퓨터는 각 셀에 대하여 검색 요청된 공간 영역과의 관계를 조사한다. 도9에 도시된 바와 같이 해당 셀에 위치하는 기준 인덱싱 좌표값(X0,Y0)과, 이 기준 인덱싱 좌표값과 연결되는 서로 다른 방향으로 인접하는 위치의 제1 인접인덱싱 좌표값(X1,Y1)과 제2 인접 인덱싱 좌표값(X5,Y5)를 연결하는 라인과 해당 셀 경계 내에 포함되는 객체를 조사한다. 도9의 셀의 컬럼 패밀리 테이블에서 ⓐ 객체 정보는 포함되지 않음(Not Contained), ⓑ 객체 정보는 오버 랩(Overlap), ⓒ 객체 정보는 포함됨(Contain)의 조사결과를 획득하고, 그 조사 결과에 따라 오버랩 관계에 있는 ⓑ 객체와, 포함됨 관계에 있는 ⓒ 객체가 검색 결과로서 획득된다. At this time, the distributed computer examines the relationship with the spatial area of the search requested for each cell. As shown in Fig. 9, reference indexing coordinate values (X0, Y0) located in a corresponding cell, and first adjacent indexing coordinate values (X1, Y1) of adjacent positions in different directions connected to the reference indexing coordinate values. Examine the line connecting the second neighbor indexing coordinate value (X5, Y5) and the object included in the cell boundary. In the column family table of the cell of Fig. 9, the object information is not included (Not Contained), the object information is overlapped, and the object information is contained (Contain). Thus, ⓑ object in overlapping relation and ⓒ object in included relation are obtained as a search result.

100 : 분산 서버, 110 : 공간데이터 저장부,
120 : 공간데이터 검색부, 200 : 에이치베이스(HBase). 100: distributed server, 110: spatial data storage,
120: spatial data search unit, 200: HBase.

Claims

In the method of storing and retrieving a large amount of spatial data using HBase in a distributed server,
Convert the geocoordinate system-based position coordinates for the requested massive spatial data into two-dimensional plane-based indexing coordinates, generate the spatial partition index serial numbers using the indexing coordinates, and match the spatial partition index serial numbers. A spatial data storage step of storing spatial data including object information and attribute information in a column family of HBase;
Converts geocoordinate-based position coordinates for searched large spatial data into 2D plane-based indexing coordinates, generates a list of spatially divided index serial numbers for the search area generated by indexing coordinates, and HBase. And a spatial data retrieval step of retrieving corresponding spatial information by calling a column family matching the spatial partition index serial number in
The spatial data storing step and the spatial data retrieving step include the lower left coordinate value (-180, -90) and the upper right coordinate value (180, In the state set to 90), generate a plurality of cells having a predetermined spatial area by dividing by a predetermined number, and set the coordinate value of the cell located at the lower left to (0,0) and proceed to the right based on this. By setting the cell-specific coordinate values such that the X-axis coordinate value increases by "1" and the Y-axis coordinate value increases by "1" while moving upwards, the coordinates of the global coordinate system position are converted into two-dimensional plane-based indexing coordinates. A method of storing and retrieving a large amount of spatial data using HBase.

The method of claim 1,
The process of converting the global coordinate system position coordinates for the spatial data into two-dimensional plane indexing coordinates is performed for a plurality of LOD levels, respectively, and the lower level is the two-dimensional plane indexing coordinates so that the number of cells increases by twice the upper level. The method of storing and retrieving a large amount of spatial data using HBase, characterized in that is set.

The method according to claim 1 or 2,
For spatial data with LOD levels,
The spatial partition index serial number includes an LOD level of the corresponding spatial data, an X-axis indexing coordinate value, and a Y-axis indexing coordinate value.
Hbase's column family is set to have the same identifier as the spatial partition index serial number is stored in a column family having the same identifier as the spatial partition index serial number of the spatial data.

The method of claim 1,
In the storing of the spatial data, a representative point for the corresponding spatial data is extracted based on the position coordinate value based on the earth coordinate system of the spatial data, and a spatial division index serial number is generated using the indexing coordinates of the extracted representative point.
Spatial data in point form extracts the point as a representative point,
Spatial data in the form of a single line extracts the center of the line as a representative point,
Spatial data in the form of a polyline extract a representative point from the center of gravity of the area formed by the polyline,
A method of storing and retrieving a large amount of spatial data, wherein the spatial data in the form of a face is extracted from the center of gravity of its inner area as a representative point.

The method of claim 4, wherein
The spatial data storing step generates a column family that matches the spatial division index serial number of the representative point in HBase, and stores the spatial information in cell units of the spatial data having the corresponding representative point in the column family. Storage and retrieval method of a large amount of spatial data, characterized in that.

The method of claim 1,
The spatial data retrieval step includes a lower left indexing coordinate including a boundary coordinate value having a minimum X-axis coordinate value and a boundary coordinate having a maximum Y-axis coordinate value for a polyline or a plane-shaped spatial data having a plurality of coordinate points. Extract the upper right indexing coordinates containing the values, set the minimum search region containing the spatial region using the lower left indexing coordinates and the upper right indexing coordinates, and then set the spatial division index for each cell corresponding to the minimum search region. A method of storing and retrieving a large amount of spatial data by generating a number, thereby generating a list of spatial division index serial numbers for the search request spatial data.

The method of claim 1
In the spatial data retrieval step, the spatial data having a plurality of spatial partition index serial numbers is distributedly searched for different spatial partition index serial numbers using at least two distributed computers,
Each distributed computer uses the table split function provided by HBase to access a column family whose identifier is a serial partition index serial number assigned to it, and then provides a table mapper provided by HBase. And a method of reducing and retrieving a large amount of spatial data using a table reducer function.

The method according to claim 6 or 7,
In the spatial data retrieval step, a reference indexing coordinate point located in a corresponding cell for each cell corresponding to the indexing coordinates of polyline or polygonal spatial data in a distributed computer, and different directions connected to the reference indexing coordinate point A method of storing and retrieving a large amount of spatial data, comprising: retrieving object information overlapped or included in a space formed by a line connecting a first adjacent indexing coordinate point adjacent to a second adjacent indexing coordinate point and a corresponding cell boundary; .