KR102006283B1

KR102006283B1 - Dataset loading method in m-tree using fastmap

Info

Publication number: KR102006283B1
Application number: KR1020190022540A
Authority: KR
Inventors: 노웅기
Original assignee: 가천대학교 산학협력단
Priority date: 2019-02-26
Filing date: 2019-02-26
Publication date: 2019-10-01

Abstract

The present invention relates to an M-tree loading method of a dataset using a fast map, which comprises the steps of: mapping a dataset on a metric space to a point on a k-dimensional Euclidean space; aligning the points on the k-dimensional Euclidean space into a one-dimensional sequence; dividing the points of the one-dimensional sequence into contiguous groups; and generating a leaf node for the contiguous group. The dataset is bulk-loaded to the M-tree. Accordingly, a large dataset having the similarity between data objects which is not defined by an Lp distance between two points in the Euclidean space can be efficiently loaded in the M-tree, so that indexing and searching efficiency is more improved than the performance of the existing M-tree.

Description

M-tree loading method of dataset using fast map {DATASET LOADING METHOD IN M-TREE USING FASTMAP}

본 발명은 패스트맵을 이용한 데이터셋의 M-트리 적재방법에 관한 것으로, 패스트맵을 이용하여 대량의 데이터베이스 객체를 M-트리에 적재하는 기술에 관한 것이다.The present invention relates to a method of loading an M-tree of a data set using a fast map, and a technique of loading a large amount of database objects into an M-tree using a fast map.

M-트리는 임의의 두 객체간 거리가 정의된 공간인 metric space 내의 객체들에 대한 효율적인 검색을 위한 인덱스 구조로서, Euclidean 공간에서만 사용 가능한 R-트리와는 달리, M-트리는 metric space 뿐만 아니라 Euclidean 공간에서도 사용 가능하다는 장점이 있다. The M-tree is an index structure for efficient search for objects in metric space where the distance between any two objects is defined. Unlike the R-trees available only in Euclidean space, M-trees are Euclidean space as well as metric space. There is an advantage that can be used in.

특히, 동일한 데이터셋에 대해 M-트리가 R-트리보다 데이터 검색 성능이 우수함이 연구로 입증되었다.In particular, research has demonstrated that M-trees perform better than R-trees for the same dataset.

그리고, M-트리의 인덱스에 대량의 데이터 객체를 순차적으로 삽입함에 따른 비용 소모 및 비효율성의 한계점 개선을 위해, 대량의 데이터 객체를 대용량으로 업로드하는 bulk loading 알고리즘이 제안되고 있다.In addition, a bulk loading algorithm has been proposed for uploading a large amount of data objects in order to improve the limitations of cost consumption and inefficiency by sequentially inserting a large number of data objects into an M-tree index.

한편, 음성, 이미지, 텍스트 등의 비정형 멀티미디어 데이터와 같이 용량이 크고 크기가 일정하지 않은 비정형 멀티미디어 데이터나, 도로 네트워크 상의 POI(point of interest)와 같이, 데이터 객체간 유사성을 Euclidean 공간 상에서 두 점 간의 Lp 거리로 정의하기 어려운 데이터를 데이터 검색 성능이 우수한 M-트리에 대용량으로 업로드하기 위한 알고리즘의 개발이 필요한 실정이다.On the other hand, similarities between data objects are similar between two points in Euclidean space, such as unstructured multimedia data with large capacity and irregular size such as unstructured multimedia data such as voice, image, text, or point of interest on road network. It is necessary to develop an algorithm for uploading data that is difficult to define by Lp distance to M-tree having excellent data retrieval performance.

대한민국 공개특허공보 제10-2012-0096894(데이터베이스 검색방법, 네비게이션 장치 및 인덱스 구조 생성 방법)Republic of Korea Patent Publication No. 10-2012-0096894 (database search method, navigation device and index structure generation method)

본 발명이 이루고자 하는 기술적 과제는 데이터 객체간 유사성을 유클리디안 공간 상의 두 점 간의 Lp 거리를 정의하기 어려운 데이터셋을 검색 효율이 우수한 M-트리에 대용량 업로드하는 방법을 제시하여, 데이터셋의 데이터베이스 내 인덱스 생성 및 검색 성능을 향상하기 위한 것이다.An object of the present invention is to propose a method of uploading a large dataset, which is difficult to define the Lp distance between two points in Euclidean space, to a M-tree with high search efficiency. To improve my index creation and search performance.

본 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법은 데이터 객체를 M-트리에 적재 처리하는 처리장치에서 수행되고, 메트릭 스페이스 상의 데이터셋을 k-차원의 유클리디안 스페이스 상의 한 점으로 매핑하는 단계; k-차원의 유클리디안 스페이스 상의 점들을 1차원 시퀀스로 정렬하는 단계; 상기 1차원 시퀀스의 점들을 연속하는 그룹으로 분할하는 단계; 그리고, 상기 연속하는 그룹에 대해 리프 노드를 생성하는 단계; 를 포함하여 상기 데이터셋을 M-트리에 적재하는 것을 특징으로 한다.The M-tree loading method of a dataset using a fast map according to an embodiment of the present invention is performed in a processing apparatus that loads a data object into an M-tree, and k-dimensional Euclidean dataset on a metric space. Mapping to a point in space; aligning the points on the k-dimensional Euclidean space into a one-dimensional sequence; Dividing the points of the one-dimensional sequence into contiguous groups; Creating a leaf node for the contiguous group; It characterized in that to load the data set, including the M-tree.

상기 메트릭 스페이스 상의 데이터셋은 데이터 객체간 유사성이 유클리디안 공간에서 두 점 간의 Lp 거리로 정의되지 않는 데이터셋인 것을 특징으로 한다.The data set on the metric space is a data set in which the similarity between data objects is not defined as the Lp distance between two points in Euclidean space.

상기 메트릭 스페이스 상의 데이터셋은 비정형 멀티미디어 데이터 또는 도로 네트워크의 POI(point of interest)인 것을 특징으로 한다.The dataset on the metric space is characterized in that it is a point of interest (POI) of unstructured multimedia data or road network.

상기 메트릭 스페이스 상의 데이터셋을 k-차원의 유클리디안 스페이스 상의 한 점으로 매핑하는 단계는, 상기 메트릭 스페이스 상의 데이터의 객체간 간격이 상기 k-차원의 유클리디안 스페이스 상에서 유지되도록 매핑되도록 수행되는 것을 특징으로 한다.Mapping the dataset on the metric space to a point on a k-dimensional Euclidean space is performed such that the inter-object spacing of data on the metric space is mapped so as to remain on the k-dimensional Euclidean space. It is characterized by.

상기 메트릭 스페이스 상의 데이터셋을 k-차원의 유클리디안 스페이스 상의 한 점으로 매핑하는 단계는 하기의 식 1로부터 수행되는 것을 특징으로 한다.The mapping of the data set on the metric space to a point on the k-dimensional Euclidean space is performed from Equation 1 below.

[식 1][Equation 1]

(d_ij는 메트릭 스페이스 상의 두 객체 O_i, O_j간의 거리이고,

는 유클리디안 스페이스로 매핑된 두 객체

간의 거리이고, i 및 j는 1부터 N까지의 자연수임.)(d _ij is the distance between two objects O _i , O _j in the metric space,

Are two objects mapped to Euclidean space

Distance, and i and j are natural numbers from 1 to N.)

상기 메트릭 스페이스 상의 데이터셋을 k-차원의 유클리디안 스페이스 상의 한 점으로 매핑하는 단계는 상기 데이터셋을 구성하는 복수 개의 객체들을 상기 유클리디안 스페이스 상에 BulkLoading으로 매핑하도록 수행되는 것을 특징으로 한다.The mapping of the dataset on the metric space to a point on a k-dimensional Euclidean space is performed to map the plurality of objects constituting the dataset to BulkLoading on the Euclidean space. .

상기 k-차원의 유클리디안 스페이스 상의 점들을 1차원 시퀀스로 정렬하는 단계는 공간채움곡선을 이용하여 수행되는 것을 특징으로 한다.Arranging the points on the k-dimensional Euclidean space in a one-dimensional sequence is performed using a space filling curve.

상기 k-차원의 유클리디안 스페이스 상의 점들을 1차원 시퀀스로 정렬하는 단계는 공간채움곡선으로 Z curve(Morton curve), Hilbert curve, Gray-code curve 중 어느 하나를 이용하도록 수행되는 것을 특징으로 한다.Arranging the points on the k-dimensional Euclidean space in a one-dimensional sequence is performed using any one of Z curve (Morton curve), Hilbert curve, and Gray-code curve as a space filling curve. .

상기 1차원 시퀀스의 점들을 연속하는 그룹으로 분할하는 단계는, 상기 1차원 시퀀스에 정렬된 연속하는 점들을 순차적으로 그룹으로 분할하되, 모든 그룹이 최대 객체를 포함하도록 균일하게 그룹화하는 풀페이지 방법을 이용하여 수행되는 것을 특징으로 한다.The dividing the points of the one-dimensional sequence into successive groups may be performed by dividing the consecutive points arranged in the one-dimensional sequence into groups, but grouping all the groups uniformly so as to include the maximum objects. It is characterized in that performed using.

상기 1차원 시퀀스의 점들을 연속하는 그룹으로 분할하는 단계는, 상기 1차원 시퀀스에 정렬된 연속하는 점들을 순차적으로 그룹으로 분할하되, 그룹에 포함되는 점간 거리와 그룹에 포함된 점의 개수를 비교하여 그룹에 포함시킬 점의 개수를 산출하고 산출결과를 이용하여 그룹의 영역 크기가 확장되지 않는 범위로 그룹을 구성하는 휴리스틱 방법을 이용하여 수행되는 것을 특징으로 한다.In the step of dividing the points of the one-dimensional sequence into a contiguous group, the consecutive points arranged in the one-dimensional sequence are sequentially divided into a group, the distance between the points included in the group and the number of points included in the group By using the heuristic method to calculate the number of points to be included in the group and to configure the group to the extent that the size of the area of the group does not extend by using the calculation result.

상기 휴리스틱 방법에서 상기 그룹에 포함되는 점간 거리는 실제 메트릭 스페이스 상의 점간 거리로 산출되거나, k-차원 유클리디안 공간 내의 두 점간의 거리로 대체하여 산출되도록 수행되는 것을 특징으로 한다.In the heuristic method, the point-to-point distance included in the group may be calculated as the point-to-point distance in the actual metric space, or may be calculated by substituting the distance between two points in the k-dimensional Euclidean space.

상기 1차원 시퀀스의 점들을 연속하는 그룹으로 분할하는 단계는, 상기 1차원 시퀀스에 정렬된 연속하는 점들을 순차적으로 그룹으로 분할하되, 후보 그룹들을 생성하고, 각 후보 그룹에 포함된 영역 크기의 반지름과 후보 그룹에 포함된 점의 개수를 산출하고 산출결과를 이용하여 그룹을 구성하는 리저러스 방법을 이용하여 수행되는 것을 특징으로 한다.The dividing the points of the one-dimensional sequence into consecutive groups may be performed by sequentially dividing the consecutive points arranged in the one-dimensional sequence into groups, generating candidate groups, and a radius of an area size included in each candidate group. And calculating the number of points included in the candidate group and using the calculation result to configure the group.

상기 리저러스 방법에서 상기 후보 그룹 영역의 반지름은 실제 메트릭 스페이스 상의 반지름으로 산출되거나, k-차원 유클리디안 공간 내의 두 점간의 거리로 대체하여 산출되도록 수행되는 것을 특징으로 한다.In the resource method, the radius of the candidate group region may be calculated as the radius on the actual metric space or by replacing the distance between two points in the k-dimensional Euclidean space.

상기 연속하는 그룹에 대해 리프 노드를 생성하는 단계는 상기 1차원 시퀀스의 점들을 연속하는 그룹으로 분할하는 단계에서 분할된 그룹에 포함된 점들에 대응되는 메트릭 스페이스 상의 객체를 포함하는 리프 노드를 생성하도록 수행되는 것을 특징으로 한다.The step of creating a leaf node for the continuous group may include creating a leaf node including an object on a metric space corresponding to the points included in the divided group in the step of dividing the points of the one-dimensional sequence into successive groups. Characterized in that it is carried out.

상기 연속하는 그룹에 대해 리프 노드를 생성하는 단계는, k-차원의 유클리디안 스페이스 상의 점들을 이용하여 리프 노드의 대표 객체를 산출하도록 수행되는 것을 특징으로 한다.Generating leaf nodes for the consecutive groups is characterized in that it is performed to calculate representative objects of the leaf nodes using points on the k-dimensional Euclidean space.

상기 연속하는 그룹에 대해 리프 노드를 생성하는 단계 이후에, 각 그룹에 대해 생성된 리프 노드에 대해 non-리프 엔트리를 생성하는 단계;를 수행하고, 상기 non-리프 엔트리의 개수에 따라, 상기 non-리프 엔트리를 엔트리 그룹으로 분할하는 단계;와 전체 엔트리를 하나의 루트 노드에 저장하는 단계; 중 어느 한 단계를 더 수행하는 것을 특징으로 한다.After generating leaf nodes for the contiguous groups, generating non-leaf entries for leaf nodes created for each group; and according to the number of non-leaf entries, Dividing the leaf entries into groups of entries, and storing the entire entries in one root node; It is characterized in that any one of the steps further performed.

상기 non-리프 엔트리를 엔트리 그룹으로 분할하는 단계는 풀페이지 방법, 휴리스틱 방법 또는 리저러스 방법으로 수행되는 것을 특징으로 한다.Dividing the non-leaf entry into an entry group may be performed by a full page method, a heuristic method, or a resource method.

상기 non-리프 엔트리를 엔트리 그룹으로 분할하는 단계에서 생성된 엔트리는 버퍼에 저장되도록 수행되고, 상기 버퍼가 가득 차면 상기 전체 엔트리를 하나의 루트 노드에 저장하는 단계를 수행하여 상기 엔트리를 메모리 디스크에 기록하는 것을 특징으로 한다.The entry generated in the step of dividing the non-leaf entry into an entry group is performed to be stored in a buffer, and when the buffer is full, the entire entry is stored in one root node to store the entry in a memory disk. It is characterized by recording.

본 발명의 한 실시예에 따른 컴퓨터 판독 가능한 기록매체는 상기 패스트맵을 이용한 데이터셋의 M-트리 적재방법을 수행하는 것을 특징으로 한다.A computer-readable recording medium according to an embodiment of the present invention is characterized by performing an M-tree stacking method of a data set using the fast map.

이러한 특징에 따르면, 본원 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법은 패스트맵과 공간채움곡선을 이용하여 대량의 데이터셋을 M-트리에 적재하는 구조를 가지므로, 데이터 객체간 유사성이 유클리디안 공간에서 두 점 간의 Lp 거리로 정의되지 않는 데이터셋을 대량으로 M-트리에 효율적으로 적재할 수 있고, 이의 인덱싱 및 검색 효율이 기존의 M-트리에서의 검색 성능보다 향상되는 효과가 있다.According to this aspect, the M-tree stacking method of a data set using a fast map according to an embodiment of the present invention has a structure for loading a large data set into the M-tree using a fast map and a space filling curve In addition, a dataset whose similarity between data objects is not defined by the Lp distance between two points in Euclidean space can be efficiently loaded into the M-tree in a large amount, and its indexing and retrieval efficiency is searched in the existing M-tree. This has the effect of improving performance.

도 1은 본 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법의 흐름을 나타낸 순서도이다.
도 2는 본 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법 흐름 중 1차원 시퀀스를 그룹으로 분할하는 과정의 여러 실시예를 나타낸 도면이다.
도 3은 본 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법 흐름 중 1차원 시퀀스를 휴리스틱 방법을 이용하여 그룹으로 분할하는 과정을 나타낸 순서도이다.
도 4는 본 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법 흐름 중 1차원 시퀀스를 리저러스 방법을 이용하여 그룹으로 분할하는 과정을 나타낸 순서도이다.
도 5는 본 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법 흐름 중 그룹에 대해 리프 노드를 생성하는 과정을 나타낸 순서도이다.
도 6은 본 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법 흐름 중 그룹에 대해 리프 노드를 생성하는 과정의 실시예를 나타낸 도면이다.
도 7은 본 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법 흐름 중 리프 노드에 대해 non-leaf 엔트리를 생성하는 과정에서의 non-리프 엔트리의 실시예를 나타낸 도면이다.
도 8은 본 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법에 따른 데이터 검색 속도를 데이터 순차 삽입과 성능을 비교한 그래프이다.1 is a flowchart illustrating a flow of an M-tree loading method of a data set using a fast map according to an embodiment of the present invention.
2 is a diagram illustrating various embodiments of a process of dividing a one-dimensional sequence into groups in an M-tree loading method flow of a data set using a fast map according to an embodiment of the present invention.
3 is a flowchart illustrating a process of dividing a one-dimensional sequence into groups using a heuristic method in the flow of an M-tree loading method of a dataset using a fast map according to an embodiment of the present invention.
4 is a flowchart illustrating a process of dividing a one-dimensional sequence into groups using a resource method in the M-tree loading method flow of a data set using a fast map according to an embodiment of the present invention.
5 is a flowchart illustrating a process of creating a leaf node for a group in an M-tree loading method flow of a dataset using a fast map according to an embodiment of the present invention.
FIG. 6 is a diagram illustrating an embodiment of a process of creating a leaf node for a group in an M-tree loading method flow of a data set using a fast map according to an embodiment of the present invention.
7 is a diagram illustrating an embodiment of a non-leaf entry in a process of generating a non-leaf entry for a leaf node in an M-tree loading method flow of a dataset using a fast map according to an embodiment of the present invention. .
8 is a graph comparing performance of data sequential insertion and data retrieval speed according to an M-tree loading method of a data set using a fast map according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.

도 1은 본 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법의 흐름을 나타낸 순서도이고, 도 2는 본 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법 흐름 중 1차원 시퀀스를 그룹으로 분할하는 과정의 여러 실시예를 나타낸 도면이고, 도 3은 본 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법 흐름 중 1차원 시퀀스를 휴리스틱 방법을 이용하여 그룹으로 분할하는 과정을 나타낸 순서도이고, 도 4는 본 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법 흐름 중 1차원 시퀀스를 리저러스 방법을 이용하여 그룹으로 분할하는 과정을 나타낸 순서도이고, 도 5는 본 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법 흐름 중 그룹에 대해 리프 노드를 생성하는 과정을 나타낸 순서도이고, 도 6은 본 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법 흐름 중 그룹에 대해 리프 노드를 생성하는 과정의 실시예를 나타낸 도면이고, 도 7은 본 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법 흐름 중 리프 노드에 대해 non-leaf 엔트리를 생성하는 과정에서의 non-리프 엔트리의 실시예를 나타낸 도면이고, 도 8은 본 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법에 따른 데이터 검색 속도를 데이터 순차 삽입과 성능을 비교한 그래프이다.1 is a flowchart illustrating a method of loading an M-tree of a data set using a fast map according to an embodiment of the present invention, and FIG. 2 is an M- of a data set using a fast map according to an embodiment of the present invention. 3 is a view illustrating various embodiments of a process of dividing a one-dimensional sequence into groups in a tree loading method flow, and FIG. 3 is a one-dimensional view of an M-tree loading method flow of a dataset using a fast map according to an embodiment of the present invention. 4 is a flowchart illustrating a process of dividing a sequence into groups using a heuristic method, and FIG. 4 illustrates a method of storing a 1-dimensional sequence in a M-tree loading method flow of a data set using a fast map according to an embodiment of the present invention. FIG. 5 is a flowchart illustrating a process of dividing the data into groups using the fast map. FIG. FIG. 6 is a flowchart illustrating a process of generating a new node, and FIG. 6 illustrates an embodiment of a process of generating a leaf node for a group in an M-tree loading method flow of a dataset using a fast map according to an embodiment of the present invention. 7 is a view illustrating an embodiment of a non-leaf entry in a process of generating a non-leaf entry for a leaf node in an M-tree loading method flow of a dataset using a fast map according to an embodiment of the present invention. 8 is a graph comparing data sequential insertion with performance of data retrieval speed according to an M-tree loading method of a data set using a fast map according to an embodiment of the present invention.

본 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법은 비정형 멀티미디어 데이터 또는 도로 네트워크의 POI와 같이 객체간 유사성이 유클리디안 공간 상에서 두 점 간의 Lp 거리를 정의되지 않는 데이터셋을 데이터베이스 구조인 M-트리에 적재하는 방법에 관한 것으로, 객체간 유사성이 유클리디안 공간 상에서 두 점 간의 Lp 거리로 정의되는 데이터셋에 대해서도 적용될 수 있으며, 이를 한정하지는 않아야 할 것이다.In the M-tree loading method of a dataset using a fast map according to an embodiment of the present invention, similarities between objects, such as unstructured multimedia data or POI of road networks, do not define the Lp distance between two points in Euclidean space. A method of loading a set into an M-tree, which is a database structure, may be applied to a data set in which similarity between objects is defined as an Lp distance between two points in Euclidean space.

본 발명의 일 실시예에서는 유클리디안 공간 상에서 두 점 간의 Lp 거리를 정의되지 않는 데이터셋을 M-트리에 적재하는 방법을 기준으로 설명하도록 한다.In an embodiment of the present invention, the Lp distance between two points in Euclidean space will be described based on a method of loading an undefined data set into an M-tree.

또한, 본 발명의 한 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법은 데이터베이스를 포함하는 컴퓨팅장치, 서버, 또는 이를 포함하는 네트워크에서 수행될 수 있으며, 이를 한정하지는 않는다. In addition, the M-tree loading method of a data set using a fast map according to an embodiment of the present invention may be performed in a computing device including a database, a server, or a network including the same, but is not limited thereto.

이하의 명세서에서는 본 실시예에 따른 알고리즘은 컴퓨터장치, 서버 또는 네트워크에서 데이터 객체를 M-트리에 적재 처리하는 처리장치, 즉, 프로세서에서 수행되는 것, 즉, 본 실시예에 따른 알고리즘을 저장하는 컴퓨터 판독 가능한 기록매체인 것을 기반으로 하여 설명하도록 하며, 본 알고리즘이 수행되는 주체를 한정하지는 않는다.In the following specification, an algorithm according to the present embodiment is performed by a processing apparatus for loading data objects into an M-tree in a computer apparatus, a server, or a network, that is, a processor, that is, storing an algorithm according to the present embodiment. The description will be made based on what is a computer-readable recording medium, and does not limit the subject on which the algorithm is performed.

또한, 본 실시예에 따른 알고리즘을 수행하는 처리장치는 별도의 저장장치와 접속되어 저장장치에 저장된 데이터를 참조하거나, 처리결과를 저장장치에 저장할 수 있으며, 저장장치의 형태, 용량 또는 설치 위치 등을 한정하지는 않아야 할 것이다.In addition, the processing device for performing the algorithm according to the present embodiment may be connected to a separate storage device to refer to the data stored in the storage device or to store the processing result in the storage device. It should not be limited.

본 실시예에서, 패스트맵을 이용한 데이터셋의 M-트리 적재방법은 데이터셋을 k-차원의 유클리디안 스페이스 상의 한 점으로 매핑(S100)하는 단계, k-차원의 점들을 정렬하여 1차원 시퀀스를 생성(S200)하는 단계, 1차원 시퀀스를 분할하여 연속 그룹을 생성(S300)하는 단계, 각 그룹에 대해 leaf 노드를 생성(S400)하는 단계, 각 그룹에 대해 생성된 leaf 노드에 대해 non-leaf 엔트리를 생성(S500)하는 단계, non-leaf 엔트리 개수를 설정값 M과 비교(Q100)하는 단계, non-leaf 엔트리 개수가 설정값 M보다 큰 경우 수행되는 non-leaf 엔트리를 엔트리 그룹으로 분할(S610)하는 단계와 엔트리 그룹에 대해 non-leaf 노드, non-leaf 엔트리를 생성(S620)하는 단계, 그리고 , non-leaf 엔트리 개수가 설정값 M보다 작은 경우 수행되는 전체 엔트리를하나의 루트 노드에 저장(S700)하는 단계를 포함하여 이루어진다.In the present embodiment, the M-tree loading method of a dataset using a fast map is a step of mapping the dataset to a point on a k-dimensional Euclidean space (S100), by arranging the k-dimensional points in one dimension. Generating a sequence (S200), dividing a one-dimensional sequence to generate a continuous group (S300), generating a leaf node for each group (S400), non for the leaf node generated for each group generating a leaf entry (S500), comparing the number of non-leaf entries with a setting value M (Q100), and performing a non-leaf entry as an entry group when the number of non-leaf entries is larger than the setting value M. Splitting (S610), generating a non-leaf node and non-leaf entry for an entry group (S620), and a whole root to be performed when the number of non-leaf entries is smaller than the set value M Storing in the node (S700). Achieved.

먼저, 데이터셋을 k-차원의 유클리디안 스페이스 상의 한 점으로 매핑(S100)하는 단계에서는 메트릭 스페이스(metric space) 상의 데이터셋 D 내의 각 객체가 k-차원의 유클리디안 스페이스(Euclidean space) 상에서 객체간 상대적인 거리를 가급적 유지하도록 매핑한다.First, in the step of mapping the dataset to a point on the k-dimensional Euclidean space (S100), each object in the dataset D on the metric space is a Euclidean space of the k-dimensional space. Map to keep the relative distance between objects in the image as much as possible.

이 단계(S100)에서, k는 1 이상의 임의의 자연수로, 본 실시예에서는 메트릭 스페이스 상의 데이터셋의 객체들을 1차원의 유클리디안 스페이스 상에 매핑하는 것을 예로 들어 설명하도록 하나, 본 발명을 한정하지는 않아야 할 것이다.In this step (S100), k is an arbitrary natural number of one or more, and in this embodiment, the present invention will be described as an example of mapping objects of a dataset on a metric space onto a one-dimensional Euclidean space, but the present invention is limited. It should not be.

이때, 메트릭 스페이스 상의 데이터셋 D는 객체 Oj(j는 1부터 N, N은 자연수임)를 포함한다.In this case, the data set D on the metric space includes an object Oj (j is 1 to N and N is a natural number).

그리고, 이 단계(S100)에서는 FastMap 기법을 통해 데이터셋의 객체들을 유클리디안 스페이스 상에 매핑하는데, 데이터셋에 포함된 객체들을 하나씩 업로드하는 것이 아닌, 복수 개의 객체를 한꺼번에 유클리디안 스페이스 상에 매핑, 즉, Bulkloading한다.In this step (S100), the objects of the dataset are mapped onto the Euclidean space through the FastMap technique. Instead of uploading the objects included in the dataset one by one, a plurality of objects are simultaneously placed on the Euclidean space. Mapping, that is, bulkloading.

한 예에서, 메트릭 스페이스 상의 데이터셋의 객체들을 FastMap을 통해 유클리디안 스페이스 상으로 매핑함에 있어서, 메트릭 스페이스 상에서의 데이터셋의 객체간 거리가 유클리디안 스페이스 상에서도 유지되도록 매핑하기 위해, 메트릭 스페이스 상의 객체들이 다음의 식 1의 stress 함수를 만족하되, stress값을 최소화하도록 k-차원의 유클리디안 스페이스로 매핑한다.In one example, in mapping objects of a dataset on the metric space onto Euclidean space via FastMap, to map the distance between objects of the dataset on the metric space to be maintained in Euclidean space, The objects satisfy the stress function in Equation 1 below, but map to the k-dimensional Euclidean space to minimize the stress value.

[식 1][Equation 1]

위의 식 1에서, d_ij는 메트릭 스페이스 상의 두 객체 O_i, O_j간의 거리이고,

는 유클리디안 스페이스로 매핑된 두 객체

간의 거리이고, i 및 j는 1부터 N까지의 자연수이다.In Equation 1 above, d _ij is the distance between two objects O _i , O _j in the metric space,

Are two objects mapped to Euclidean space

Distance, and i and j are natural numbers from 1 to N.

다음으로, k-차원의 점들을 정렬하여 1차원 시퀀스를 생성(S200)하는 단계에서는, 위 단계(S100)로부터 k-차원의 유클리디안 스페이스에 매핑된 점들을 1차원 시퀀스로 정렬한다.Next, in the step of generating a one-dimensional sequence by sorting the k-dimensional points (S200), the points mapped to the k-dimensional Euclidean space from the above step (S100) are arranged in a one-dimensional sequence.

이때, 공간채움곡선(Space-filling curve)을 이용하여 k-차원의 유클리디안 스페이스의 점들을 1차원 시퀀스로 정렬한다. At this time, the points of the k-dimensional Euclidean space are arranged in a one-dimensional sequence using a space-filling curve.

한 예예서, 공간채움곡선은 유클리디안 스페이스의 그리드(Grid, 격자) 상에 매핑된 모든 점, 즉, 객체(

)들을 순차적으로 방문하고, 방문 순서에 따라 각 객체의 순번(order)을 부여하여, 순번에 따라 1차원 시퀀스로 정렬한다.In one example, the space fill curve is an object (i.e., all points mapped on a grid of Euclidean space).

) Are sequentially visited, and the order of each object is assigned according to the visit order, and the objects are arranged in a one-dimensional sequence according to the order.

공간채움곡선은 Z curve(Morton curve), Hilbert curve, Gray-code curve를 이용할 수 있으나, 데이터 객체를 FastLoad하는 본 실시예에서는, 힐버트 곡선을 이용하여 k-차원 유클리디안 스페이스의 객체들을 1차원 시퀀스로 정렬한다.As the space filling curve, Z curve (Morton curve), Hilbert curve, and Gray-code curve may be used. However, in this embodiment of FastLoading a data object, objects of k-dimensional Euclidean space are one-dimensional by using Hilbert curve. Sort by sequence.

그리고 이 단계(S200)에서 생성하는 1차원 시퀀스는, k-차원의 유클리디안 스페이스 상에서 두 점이 인접하도록 매핑되어 있는 데이터 객체가 1차원 시퀀스 상에 인접하는 순번으로 정렬되도록 한다.The one-dimensional sequence generated in this step S200 allows the data objects mapped to two points to be adjacent on the k-dimensional Euclidean space to be aligned in the order of the adjacent one on the one-dimensional sequence.

다음으로, 1차원 시퀀스를 분할하여 연속 그룹을 생성(S300)하는 단계에서는 위 단계(S200)에서 생성한 1차원 시퀀스에서 순번에 따라 정렬된 점들을 그룹으로 분할한다.Next, in the step of generating a continuous group by dividing the one-dimensional sequence (S300), the points arranged in order in the one-dimensional sequence generated in the step S200 are divided into groups.

이때, 1차원 시퀀스에서 분할된 그룹들은, 1차원 시퀀스에 정렬된 연속하는 점들을 적어도 한 개, 바람직하게는 복수 개의 점을 포함하고, 1차원 시퀀스로부터 분할 생성되는 각 그룹들은 그룹 번호를 부여받을 수 있다. In this case, the groups divided in the one-dimensional sequence include at least one continuous point arranged in the one-dimensional sequence, preferably a plurality of points, and each group generated by dividing from the one-dimensional sequence receives a group number. Can be.

한 예에서, 첫 번째 그룹이 1차원 시퀀스에 정렬된 첫 번째 점, 두 번째 점을 포함하도록 생성된 경우, 첫 번째 그룹에 연속하는 그룹인 두 번째 그룹은 1차원 시퀀스에 정렬된 세 번째 점부터 포함하도록 생성된다.In one example, if the first group is created to contain the first point, the second point, arranged in a one-dimensional sequence, the second group, which is a group that is contiguous to the first group, starts from the third point, arranged in the one-dimensional sequence. It is generated to include.

이 단계(S300)는, 1차원 시퀀스로부터 분할된 그룹이 1차원 시퀀스에 정렬된 μM~M개의 연속하는 점들을 포함하도록 이루어질 수 있으며, 이때, μ(0.0<μ≤1.0)는 최소 공간 활용 비율(minimum storage utilization)이고, M은 하나의 M-트리 단말 노드 내의 최대 객체 개수, 또는 M-트리 비단말 노드 내의 최대 엔트리 개수이다.This step (S300) may be made such that the group divided from the one-dimensional sequence includes μM to M consecutive points arranged in the one-dimensional sequence, where μ (0.0 <μ≤1.0) is the minimum space utilization ratio. (minimum storage utilization), M is the maximum number of objects in one M-tree terminal node, or the maximum number of entries in the M-tree non-terminal node.

일 예에서, 이 단계(S300)는 1차원 시퀀스에 정렬된 연속된 점들 중 첫 번째 점부터 연속적으로 점들을 스캔하여 그룹에 포함시키고, 그룹으로 이미 포함된 점 이후에 연속으로 정렬된 점부터 스캔하여 새로운 그룹에 포함시키도록 반복적으로 수행되되, 시퀀스에 정렬된 점의 개수가 그룹으로 포함되지 않은 점이 μM 개 이하가 될 때까지 시퀀스의 점들을 스캔하여 그룹으로 분할하는 동작을 반복하도록 수행된다.In one example, this step (S300) scans consecutively from the first point of the consecutive points arranged in the one-dimensional sequence to include in the group, and scans from the continuously aligned points after the points already included in the group It is repeatedly performed to include in a new group, and the operation of scanning and dividing the points of the sequence into groups is repeated until the number of points aligned in the sequence is less than or equal to μM points.

이 단계(S300)는 풀페이지(FULLPAGE) 방법, 휴리스틱(HEURISTIC) 방법, 리저러스(RIGOROUS) 방법 중 어느 한 방법을 이용하여 수행될 수 있다.This step (S300) may be performed using any one of a full page method, a heuristic method, and a rigorous method.

먼저, 풀페이지 방법을 이용하여 1차원 시퀀스의 점들을 그룹으로 분할하는 단계(S300)를 수행하는 일 예를 설명하면, 풀페이지 방법은 1차원 시퀀스에 정렬된 점을 M개씩 균일하게 그룹에 포함시키는 방법으로서, 1차원 시퀀스로부터 생성되는 모든 그룹이 M개의 점을 포함하도록 1차원 시퀀스를 그룹으로 분할한다. First, an example of performing the step S300 of dividing the points of the one-dimensional sequence into groups using the full-page method will be described. In the full-page method, M points uniformly aligned in the one-dimensional sequence are uniformly included in the group. In this method, the one-dimensional sequence is divided into groups such that all groups generated from the one-dimensional sequence include M points.

풀페이지 방법을 이용하여 1차원 시퀀스를 그룹으로 분할하는 경우, M-트리의 공간 활용을 최대화할 수 있다.When the one-dimensional sequence is divided into groups using the full page method, space utilization of the M-tree can be maximized.

그러나, 풀페이지 방법을 이용하여 1차원 시퀀스에 정렬된 점들을 단순히 M개의 점을 포함하는 그룹으로 분할하는 일 예는, 1차원 시퀀스 내에서 연속하는 두 점이 k-차원의 유클리디안 스페이스나 실제 메트릭 스페이스 상에서는 두 점에 해당하는 두 데이터셋 객체가 떨어져 위치하는 경우, 실제 스페이스 상에서 떨어진 두 객체를 포함하는 그룹이 생성되는 경우, 해당 그룹에 대한 region의 크기가 증가하여 M-트리에서의 검색 성능이 크게 저하되는 문제점이 발생할 수 있다. However, using the full-page method, an example of dividing the points arranged in a one-dimensional sequence into a group containing M points is simply performed. Two consecutive points in the one-dimensional sequence are k-dimensional Euclidean space or actual. When two dataset objects corresponding to two points are located apart in a metric space, and when a group is created that includes two objects apart in a real space, the size of the region for the group is increased so that the search performance in the M-tree is increased. This greatly degraded problem may occur.

도 2의 실시예를 참조로 하여 이를 자세히 비교 설명하면, 1차원 시퀀스에 연속 정렬된 객체들은 실제의 k-차원의 유클리디안 스페이스에서 힐버트 곡선 상에 도 2의 (a)와 같이 배치되고, 1차원 시퀀스에 연속 정렬된 O₁부터 O_x에 이르는 객체가 한 그룹으로 분할되는 경우, 원 형상을 갖는 그룹의 최소 영역이 형성된다. Referring to this in detail with reference to the embodiment of Fig. 2, objects continuously aligned in a one-dimensional sequence are arranged as shown in Fig. 2 (a) on the Hilbert curve in the actual k-dimensional Euclidean space, When objects from O ₁ to O _x that are continuously aligned in a one-dimensional sequence are divided into a group, a minimum area of a group having a circular shape is formed.

한편, 도 2의 (b)에 도시한 것처럼, 1차원 시퀀스에 정렬된 O₁부터 O_x+1에 이르는 객체가 한 그룹으로서 분할되는 경우, O_x 객체와 O_x+1 객체는 1차원 시퀀스에서 연속하지만 실제 유클리디안 스페이스 상에서 두 객체는 서로 거리가 떨어져 위치하므로, 도 2의 (a)에 도시한 실시예보다 단지 O_x+1객체만을 더 포함하는 구조임에도, 객체들을 포함하는 그룹의 최소 영역은 도 2의 (a)에서 형성되는 최소 영역의 크기보다 크게 확장되며, 이와 같은 그룹의 영역(region) 크기 증가는 위에서 언급한 것처럼 객체 검색 성능 효율을 저하시키게 된다.On the other hand, as shown in (b) of FIG. 2, when the objects O ₁ to O _{x + 1} arranged in the one-dimensional sequence are divided as a group, the O _x object and the O _{x + 1} object are one-dimensional sequence. In the continuous but actual Euclidean space in the two objects are located at a distance from each other, even though the structure includes only O _{x +1} objects more than the embodiment shown in Fig. 2 (a) of the group containing the objects The minimum area extends larger than the size of the minimum area formed in FIG. 2 (a), and the increase in the size of the region of the group decreases the object search performance efficiency as mentioned above.

이에 따라, 휴리스틱 방법 및 리저러스 방법은 1차원 시퀀스에서 분할되는 그룹에 포함되는 점들에 해당하는 객체들이 실제 메트릭 스페이스 또는 유클리디안 스페이스 상에 분포되었을 때 이들 객체를 포함하는 최소 영역의 크기가 객체 검색 효율을 저하시키지 않는 크기의 범위로 형성되도록 그룹에 포함되는 점의 개수를 제한시킨다.Accordingly, the heuristic method and the resource method show that when the objects corresponding to the points included in the group divided in the one-dimensional sequence are distributed on the actual metric space or the Euclidean space, the size of the minimum area including these objects is the object. The number of points included in the group is limited so as to be formed in a size range that does not reduce the search efficiency.

휴리스틱 방법을 이용하여 1차원 시퀀스의 점들을 그룹으로 분할하는 일 실시예를 도 3을 참고로 하여 설명하면, 먼저, 1차원 시퀀스에 정렬된 점들 중 첫 번째 점부터 μM번째 점에 이르는 μM개의 점을 첫 번째 그룹으로 생성(S210)한다.An embodiment of dividing points of a one-dimensional sequence into groups using a heuristic method will be described with reference to FIG. 3. First, μM points ranging from the first point to a μM point among points aligned in the one-dimensional sequence are described. To create the first group (S210).

그리고, 위 단계(S210)로부터 생성된 첫 번째 그룹에 포함된 μM개의 점들을 이용하여

을 산출(S220)한다. 이때, d는 그룹에 포함되는 첫 번째 점과 그룹에 포함되는 마지막 점(μM번째 점) 간의 거리이고, n은 그룹에 포함된 점들의 개수이다.Then, by using the μM points included in the first group generated from the step (S210)

It is calculated (S220). In this case, d is the distance between the first point included in the group and the last point (μM th point) included in the group, n is the number of points included in the group.

그런 다음, 위 단계(S210)로부터 생성된 그룹에 포함된 마지막 점에 연속하는 점을 그룹에 추가로 포함시키고

을 산출(S230)한다. 이때, 그룹에 추가되는 점은 위 단계(S210)에서 생성된 그룹에 기 포함된 마지막 점인 μM번째 점에 대해 연속하는 점, 즉, 1차원 시퀀스에 정렬된 (μM+1)번째 점이고, d'는 그룹 내의 첫 번째 점과 이 단계(S230)에서 새로 추가된 점 간의 거리이며, n'은 이 단계(S230)로부터 점이 추가된 그룹 내 포함된 점들의 개수이다.Then, further include points in the group subsequent to the last point included in the group generated from the step (S210) and

To calculate (S230). In this case, the point added to the group is a continuous point with respect to the μM-th point which is the last point included in the group generated in the above step (S210), that is, the (μM + 1) -th point arranged in the one-dimensional sequence, and d ' Is the distance between the first point in the group and the newly added point in this step S230, and n 'is the number of points included in the group to which the point was added from this step S230.

위 비교 단계(Q200)에서

이

보다 큰 경우, 예 화살표 방향을 따라 이동하여 (n'-1)개의 점으로 그룹을 생성(S241)하는 단계를 수행하고,

이

보다 크지 않은 경우, 아니오 화살표 방향을 따라 이동하여 S230단계를 수행(S242)하도록 한다.In the above comparison step (Q200)

this

If larger, the step of moving along the direction of the example arrow (n'-1) to create a group (S241),

this

If it is not greater than, move in the direction of the arrow No to perform step S230 (S242).

이때, 위의 비교단계(Q200)는 그룹 내 점들간 최대거리를 점들 개수로 나눈 값(

,

), 즉, 그룹의 영역 크기를 비교하는 단계로서, 위 단계(S230)에서 점이 추가된 그룹의 영역 크기가 최초 단계(S210)에서 생성된 그룹의 영역 크기보다 증가했는지의 여부를 판단하는 과정을 수행한다.At this time, the comparison step (Q200) is a value obtained by dividing the maximum distance between the points in the group by the number of points (

,

In other words, as a step of comparing the area size of the group, determining whether the area size of the group to which the dot is added in step S230 is greater than the area size of the group generated in the first step S210. Perform.

일 예에서,

이

보다 큰 경우, 위 단계(S230)에서 그룹에 점을 추가함에 따라 그룹의 영역 크기가 최초 단계(S210)에서 생성한 그룹의 영역 크기보다 증가한 것으로, 이와 같이 그룹의 영역 크기가 증가하는 경우 그룹 내 객체의 검색 효율이 저하될 수 있으므로, 이를 방지하기 위해, 최초 단계(S210)에서 생성한 그룹의 영역 크기가 확장되지 않는 범위를 유지하는 점들의 개수로 그룹을 구성하도록 한다. 따라서, 위 비교단계(Q200)로부터 그룹에 추가된 점에 의해 그룹 영역 크기가 확장된 것으로 판단되는 경우, 그룹에 마지막으로 추가된 n'번째 점을 포함하지 않도록, 즉, 그룹이 n'개의 점보다 1개 작은 (n'-1)개의 점을 포함하도록 그룹을 생성(S241)한다.In one example,

this

If larger, the area size of the group increases as the area size of the group is created in the first step (S210) as the point is added to the group in the above step (S230). Since the search efficiency of the object may be deteriorated, in order to prevent this, the group is composed of the number of points that maintain the range in which the area size of the group created in the first step S210 does not extend. Therefore, when it is determined that the size of the group region is extended by the points added to the group from the comparison step (Q200), the group does not include the n'th point added last, that is, the group has n 'points. A group is generated to include one smaller (n'-1) point (S241).

한편, 위의 비교단계(Q200)에서

이

보다 크지 않은 경우, 즉, 그룹에 점을 추가하였으나 그룹의 영역 크기가 확장되지 않은 경우, 최초 단계(S210)에서 생성한 그룹에 점을 추가하는 단계(S230)를 반복 수행(S242)한다. 이 단계(S242)를 수행함에 따라, 그룹의 영역 크기가 확장되지 않는 범위에서 그룹에 포함되는 점의 개수를 최대한으로 갱신할 수 있고, 이에 따라, 그룹 내 객체 검색 성능을 저하시키지 않는 범위에서 1차원 시퀀스에 정렬된 점을 그룹 내에 최대로 포함시킬 수 있어, 효율적으로 1차원 시퀀스를 그룹으로 분할할 수 있게 된다.On the other hand, in the comparison step (Q200) above

this

If it is not larger, that is, if the point is added to the group but the area size of the group is not expanded, the step S230 of adding the point to the group created in the first step S210 is repeated (S242). By performing this step (S242), it is possible to update the number of points included in the group to the maximum as long as the area size of the group does not extend, and accordingly, 1 in a range that does not degrade the performance of searching for objects in the group. The points arranged in the dimensional sequence can be included in the group to the maximum, thereby efficiently dividing the one-dimensional sequence into groups.

그리고, 리저러스 방법을 이용하여 1차원 시퀀스의 점들을 그룹으로 분할하는 일 실시예를 도 4를 참고로 하여 설명하면, 먼저, 1차원 시퀀스에 정렬된 점들로부터 μM~M개의 연속된 점들을 각각 포함하는 후보 그룹들을 생성(S310)한다.In addition, an embodiment of dividing the points of the one-dimensional sequence into groups using the resource method will be described with reference to FIG. 4. First, μM to M consecutive points are respectively formed from the points arranged in the one-dimensional sequence. In operation S310, candidate groups that include the IDC are generated.

이 단계(S310)에서 생성하는 후보 그룹들은 1차원 시퀀스에 정렬된 점들을 후보 그룹을 연속적으로 생성하는 것이 아니라, 첫 번째 점에서 μM번째 점까지 연속하는 점들을 제1 후보 그룹으로 생성하고, 첫 번째 점에서 M번째 점까지 연속하는 점들을 제2 후보 그룹으로 생성하는 방법으로서, 1차원 시퀀스에 정렬된 점들을 서로 다른 개수의 점을 포함하는 여러 후보 그룹으로 생성한다.The candidate groups generated in this step (S310) do not continuously generate candidate groups of points aligned in a one-dimensional sequence, but generate continuous points from the first point to the μM-th point as the first candidate group. A method for generating points consecutive from a first point to an Mth point as a second candidate group, wherein points arranged in a one-dimensional sequence are generated as several candidate groups including different numbers of points.

그리고, 위 단계(S310)에서 생성된 각 후보 그룹들에 대해

을 각각 산출(S320)한다. 이때, r은 후보 그룹의 영역 크기의 반지름이고, n은 후보 그룹에 포함된 점의 개수이다.And, for each candidate group generated in the above step (S310)

Are respectively calculated (S320). Where r is the radius of the area size of the candidate group and n is the number of points included in the candidate group.

그런 다음, 위 단계(S310)에서 생성한 후보 그룹들에 대해 산출된

(S320)을 비교하여

값이 가장 작은 후보 그룹을 선택(S330)한다.Then, calculated for the candidate groups generated in the above step (S310)

Compare (S320)

The candidate group having the smallest value is selected (S330).

한 예에서, 휴리스틱 방법에서 그룹에 포함되는 두 점간 거리 d를 구하거나 리저러스 방법에서 그룹 영역의 반지름인 r을 구함에 있어서, 두 점에 해당하는 메트릭 스페이스 상의 실제 데이터셋 객체간의 거리를 구하는 방법과, k-차원 유클리디안 공간 내의 두 점간의 거리

로 대체하는 방법 중 어느 하나를 이용할 수 있으며, 이를 한정하지는 않는다.In one example, to obtain the distance d between two points included in a group in the heuristic method or r, the radius of the group region in the resource method, to obtain the distance between the actual dataset objects on the metric space corresponding to the two points. And the distance between two points in k-dimensional Euclidean space

Any one of the methods may be used, but the present invention is not limited thereto.

이때, 두 점간 거리를 두 점에 대한 메트릭 스페이스 상의 실제 객체간 거리를 이용하여 구하는 경우, 점간 거리를 정확하게 구할 수 있다는 측면에서 효과가 있으나, 이 방법을 리저러스 방법을 통해 그룹 영역의 반지름 산출에 적용하는 경우, 1차원 시퀀스의 점들로부터 생성된 여러 후보 그룹들에 대해 그룹 영역의 반지름을 모두 산출해야 하므로 실제 메트릭 스페이스 상의 객체간 거리를 구하는 데 많은 시간이 소요된다는 문제점이 있을 수 있다. In this case, when the distance between two points is obtained by using the actual distance between objects in the metric space with respect to two points, it is effective in that the distance between points can be accurately calculated. However, this method is used to calculate the radius of the group region through the resource method. In application, since the radius of the group region must be calculated for all the candidate groups generated from the points of the one-dimensional sequence, there may be a problem that it takes a long time to find the distance between objects in the actual metric space.

한편, 두 점간 거리를 k-차원의 유클리디안 스페이스의 거리로 대체하는 방법을 이용하여 구하는 경우, 두 점간 거리의 정확성은 실제 메트릭 스페이스 상의 두 점간 거리를 구하는 방법보다는 다소 부정확하지만, 실제 객체 간 거리를 계산하는 데 소요되는 시간을 절약할 수 있어 그룹 영역의 반지름을 빠르게 구할 수 있다는 장점이 있다. 본 발명의 일 실시예에는 따른 두 점간 거리 d 또는 후보 그룹 영역의 반지름 r을 산출함에 있어서, k-차원의 유클리디안 스페이스 상에서의 두 점간 거리

로 대체하는 방법을 적용하는 것을 기준으로 삼도록 한다.On the other hand, when the distance between two points is obtained by the method of replacing the distance of k-dimensional Euclidean space, the accuracy of the distance between two points is somewhat inaccurate than the method of calculating the distance between two points in the actual metric space, The advantage of saving time in calculating distances is that you can quickly find the radius of a group area. According to an embodiment of the present invention, in calculating the distance d between two points or the radius r of the candidate group region, the distance between two points on the k-dimensional Euclidean space

Use the alternative method as a guideline.

그리고 이때, 1차원 시퀀스로부터 그룹을 생성하는 단계(S300)가 도 3을 참조로 하는 휴리스틱 방법 또는 도 4를 참조로 하는 리저러스 방법으로부터 수행되는 경우, 도 3 또는 도 4의 방법을 통해 1차원 시퀀스 상의 점들 일부를 하나의 그룹을 생성한 이후에, 생성한 그룹에 마지막으로 포함된 점의 이후에 연속하는 1차원 시퀀스 상의 점들에 대해 도 3 또는 도 4의 방법을 반복적으로 수행하여 1차원 시퀀스에 정렬된 점들을 그룹들로 분할한다.In this case, when the step (S300) of generating a group from the one-dimensional sequence is performed from the heuristic method referring to FIG. 3 or the reservoir method referring to FIG. 4, the one-dimensional method is performed using the method of FIG. 3 or 4. After generating a group of some of the points in the sequence, the method of FIG. 3 or 4 is repeatedly performed on the points on the continuous one-dimensional sequence after the points included last in the generated group. Split the points arranged in groups into groups.

즉, 1차원 시퀀스를 분할하여 연속 그룹을 생성(S300)하는 단계가 도 3 또는 도 4의 방법으로 수행되는 경우, 도면에 도시하지는 않았으나, 도 3 및 도 4의 방법이 1차원 시퀀스에 정렬된 모든 점이 그룹으로 분할될 때까지 반복적으로 수행된다.That is, when the step of generating a continuous group by dividing the one-dimensional sequence (S300) is performed by the method of FIG. 3 or 4, although not shown in the drawing, the methods of FIGS. 3 and 4 are arranged in the one-dimensional sequence. It is performed repeatedly until all points are divided into groups.

다음으로, 다시 도 1을 참조하여 각 그룹에 대해 leaf 노드를 생성(S400)하는 단계를 설명하면, 이 단계(S400)에서는, 위 단계(S300)에서 생성된 각 그룹들에 대해 리프 노드를 생성한다. Next, referring to FIG. 1 again, the step of generating a leaf node for each group (S400) will be described. In this step (S400), a leaf node is generated for each group generated in the step S300. do.

리프 노드는 대표 객체 Op를 중심으로 하고, 반지름 r(Op)를 기준으로 형성되는 원형 region이며, region의 크기가 작을수록 불필요한 노드에의 액세스를 예방하여 리프 노드 내 검색 성능이 향상된다.The leaf node is a circular region formed around the representative object Op and is formed based on the radius r (Op). The smaller the region is, the better the search performance in the leaf node is by preventing access to unnecessary nodes.

이때, 하나의 그룹에 대해 하나의 리프 노드를 생성하며, 리프 노드는 1차원 시퀀스로부터 분할 생성된 각 그룹에 포함된 적어도 하나의 점에 대응되는 메트릭 스페이스 내의 객체들을 포함하는 구성으로서 도 5의 단계로부터 생성된다.In this case, one leaf node is generated for one group, and the leaf node includes objects in a metric space corresponding to at least one point included in each group generated by splitting from a one-dimensional sequence. Is generated from

자세하게는, 1차원 시퀀스에 포함된 점으로부터 생성된 그룹의 점들은 k-차원의 유클리디안 스페이스에 매핑된 점이므로, 해당 점이 유클리디안 스페이스에 매핑되기 이전에 메트릭 스페이스 상에 존재하던 데이터셋 객체를 리프 노드로서 생성한다. Specifically, the points in the group generated from the points in the one-dimensional sequence are points mapped to the k-dimensional Euclidean space, so the dataset that existed on the metric space before the point was mapped to the Euclidean space. Create an object as a leaf node.

그리고 이때, 위 단계들(S100, S200)로부터, 데이터셋 객체가 메트릭 스페이스 상에서의 상대적 거리가 유클리디안 스페이스 상에서 유지되도록 1차원 시퀀스가 생성되었으므로, 위 단계(S300)에서 생성한 각 그룹의 점들, 즉, 유클리디안 스페이스 상에 정렬된 1차원 시퀀스로부터 분할된 그룹에 포함된 점들은 메트릭 스페이스 상에서 인접하는 객체일 수 있다.At this time, since the one-dimensional sequence is generated from the above steps (S100, S200) so that the relative distance on the metric space of the dataset object is maintained on the Euclidean space, the points of each group generated in the above step (S300) That is, the points included in the group divided from the one-dimensional sequence arranged on the Euclidean space may be adjacent objects on the metric space.

도 5 및 도 6을 참조하여, 그룹에 대한 리프노드 생성(S400)을 자세히 설명하면, 먼저, 그룹에 포함된 점들에 대한 객체들로부터 대표 객체 Op를 탐색(S410)한다. Referring to FIG. 5 and FIG. 6, the leaf node generation for the group (S400) will be described in detail. First, a representative object Op is searched for from the objects for the points included in the group (S410).

이 단계(S410)는, 그룹의 점들에 해당하는 객체들을 포함하는 리프 노드에 대해, 리프 노드의 모든 객체들을 대표하는 parent 객체 Op를 탐색함에 있어서, 메트릭 스페이스 상에서는 영역의 중심이 존재하지 않으므로 리프 노드에 포함된 모든 객체간 거리 d()를 계산하여 대표 객체 Op를 탐색한다. This step (S410), in searching for a parent object Op representing all objects of the leaf node with respect to the leaf node including the objects corresponding to the points in the group, the leaf node because the center of the region does not exist in the metric space. The representative object Op is searched by calculating the distance d () between all objects included in the.

그러나 이때, 메트릭 스페이스에서 객체간 거리 d()는 그룹에 포함된 점들의 수 M에 대해 O(M²)의 거리 계산 복잡도를 가지므로, 본 단계(S410)의 일 실시예에서는, 객체간 거리 계산 복잡도가 메트릭 스페이스보다 상대적으로 간단한 값을 갖는 k-차원의 스페이스 상에서 객체간 거리

를 계산하고 이를 이용하여 리프 노드의 대표 객체

를 산출하고, 이로부터 메트릭 스페이스 상의 대표 객체 Op를 탐색하는 방법으로 대체한다. However, at this time, since the distance d () between objects in the metric space has a distance calculation complexity of O (M ² ) with respect to the number M of points included in the group, in an embodiment of this step S410, the distance between objects Distance between objects on a k-dimensional space whose computational complexity is relatively simpler than the metric space

Representation of leaf nodes using

Is calculated and replaced by a method of searching for the representative object Op in the metric space.

이와 같이, 리프 노드에 포함된 객체들 중 대표 객체 Op를 탐색함에 있어서, 객체간 거리 계산을 k-차원의 유클리디안 스페이스에서 수행하도록 대체함으로써, 메트릭 스페이스에서 객체간 거리를 계산하는 데 소요되는 시간 및 비용을 감소시킬 수 있다.As described above, in searching for the representative object Op among the objects included in the leaf node, the distance between objects is replaced by performing the k-dimensional Euclidean space to calculate the distance between objects in the metric space. Time and cost can be reduced.

그리고, 위 단계(S410)에서 리프 노드에 포함되는 객체들로부터 대표 객체를 산출함에 있어서, 두 객체 O_i, O_j간 거리 d()의 최대값(max{d()})이 가장 작은 값으로 형성되는 객체를 대표 객체로 산출하도록 다음의 식 2로부터 수행되며, 식 2의 개념으로부터, k-차원의 유클리디안 스페이스에 매핑된 점들에 적용하여, 점들간 거리

의 최대값이 가장 작게 형성되는 객체를 대표 객체

로 산출하도록 수행된다.In calculating the representative object from the objects included in the leaf node in step S410, the maximum value max (d {d ()}) of the distance d () between two objects O _i and O _j is the smallest value. The distance between the points is performed from the following Equation 2 to calculate the object formed as a representative object, and applied from the concept of Equation 2 to the points mapped in the k-dimensional Euclidean space.

A representative object whose largest value is the smallest

To be calculated.

[식 2][Equation 2]

위의 식 2를 이용하여 리프 노드의 대표 객체를 산출(S410)함에 따라, k-차원의 유클리디안 스페이스에 형성된 region에 대한 대표 객체

를 선정할 수 있고, 이에 따라, 도 6에 도시한 것처럼 동일한 객체를 포함하더라도 대표 객체 선정 오류로 region이 확장되어 이로 인해 검색 성능이 저하되는 문제점을 예방할 수 있는 효과를 기대할 수 있다.As the representative object of the leaf node is calculated (S410) using Equation 2 above, the representative object for the region formed in the k-dimensional Euclidean space

As shown in FIG. 6, even if the same object is included as shown in FIG. 6, a region may be expanded due to a selection error of the representative object, thereby preventing the problem that the search performance is deteriorated.

그런 다음, 위 단계(S410)로부터 산출된 리프 노드의 메트릭 스페이스 상의 대표 객체 Op로부터 리프 노드 region의 반지름

을 산출(S420)한다. 이 단계(S420)는 다음의 식 3을 통해 수행된다.Then, the radius of the leaf node region from the representative object Op on the metric space of the leaf node calculated from step S410 above.

It is calculated (S420). This step (S420) is performed through the following equation 3.

[식 3][Equation 3]

위의 식 3에서,

은 리프 노드의 각 객체들 O_i와 메트릭 스페이스 상의 리프 노드 대표 객체 Op간의 거리이고, max_i{}는 i 값들에 따른 d()값 중 가장 큰 값을 추출하는 함수이다. 위의 식 3으로부터, 메트릭 스페이스 상의 리프 노드의 객체들 중 리프 노드의 대표 객체와 거리가 가장 크게 형성되는 값을 리프 노드의 region에 대한 반지름으로 산출한다.In Equation 3 above,

Is the distance between each object O _i of the leaf node and the leaf node representative object Op in the metric space, and max _i {} is a function for extracting the largest value of d () values according to i values. From Equation 3 above, the value of forming the largest distance from the representative object of the leaf node among the objects of the leaf node on the metric space is calculated as the radius of the region of the leaf node.

마지막으로, 그룹에 대한 리프 노드를 생성(S430)하는 단계는 위의 단계(S410, S420)들로부터 산출된 리프 노드의 대표 객체 및 region 반지름을 이용하여 리프 노드를 생성한다.Finally, in step S430 of creating a leaf node for a group, a leaf node is generated using the representative object and region radius of the leaf node calculated from the above steps S410 and S420.

도 5의 일 예에 따라 수행되는 리프 노드 생성 단계(S400)는 1차원 시퀀스로부터 연속 그룹을 생성(S300)하는 단계로부터 생성된 모든 그룹들에 대해 수행된다. The leaf node generation step S400 performed according to the example of FIG. 5 is performed for all groups generated from the step S300 of generating a continuous group from the one-dimensional sequence.

다시 도 1을 참고하여 다음 단계를 계속해서 설명하면, 위 단계(S400)에서 각 그룹에 대해 생성한 리프 노드(S400)에 대해 non-리프 엔트리를 생성(S500)하는 단계를 수행한다.Referring to FIG. 1 again, the next step will be described continuously. A non-leaf entry is generated for the leaf node S400 generated for each group in the above step S400 (S500).

논-리프 엔트리는 리프 노드의 라우팅 객체 Or을 대표하는 특징(feature value)인 f(Or)과, 객체 Or을 중심으로 한 반지름 r(Or), 해당 엔트리에 대한 서브-트리(sub-tree)인 T(Or)을 가리키는 포인터인 ptr(T(Or)), 그리고 Or과 그 parent 객체, 즉, 리프 노드의 대표 객체 Op와의 거리인 d(Or, Op)를 포함하고, 일 예에서, 도 7의 구조를 갖는다.A non-leaf entry is a feature value f (Or) representing a routing object Or of a leaf node, a radius r (Or) around the object Or, and a sub-tree for that entry. Ptr (T (Or)), which is a pointer to T (Or), and d (Or, Op), which is the distance between Or and its parent object, that is, the representative object Op of the leaf node. It has a structure of 7.

다음으로, non-leaf 엔트리의 개수가 M보다 큰 지를 비교하는 단계(Q100)에서는, 위 단계(S500)로부터 생성된 non-리프 엔트리의 개수를 그룹에 포함되는 객체의 최대 개수 M과 비교한다.Next, in comparing Q100 whether the number of non-leaf entries is greater than M, the number of non-leaf entries generated from the above operation S500 is compared with the maximum number M of objects included in the group.

위 비교 단계(Q100)에서, non-리프 엔트리 개수가 그룹에 최대 포함되는 객체의 개수인 M보다 큰 경우, 예 화살표 방향을 따라 이동하여, non-리프 엔트리를 엔트리 그룹으로 분할(S610)하는 단계를 수행한다. 한 예예서, 이 단계(S610)는, 위 단계(S300)에서 이미 설명한 바 있는 풀페이지, 휴리스틱 및 리저러스 방법 중 어느 하나를 이용하여 수행될 수 있다. In the comparison step (Q100), if the number of non-leaf entries is larger than M, which is the maximum number of objects included in the group, moving along the example arrow direction to divide the non-leaf entries into entry groups (S610). Perform In one example, this step (S610) may be performed using any one of the full page, heuristic, and resource methods described above in step S300.

이때, 위 단계(S610)가 리저러스 방법으로부터 수행되는 일 예에서, non-리프 엔트리로부터 엔트리를 분할할 때, non-리프 엔트리에 포함된 엔트리들간 거리 d를 산출하는 경우, k-차원의 유클리디안 스페이스 상의 거리

로 대체할 수 있다. In this case, in the example in which the above step S610 is performed from the resource method, when dividing the entry from the non-leaf entry, the distance d between the entries included in the non-leaf entry is calculated, Clidian Space Top Street

Can be replaced with

좀더 자세한 일 예로서, 위 단계(S610)가 리저러스 방법으로부터 수행됨에 있어서, 엔트리간 거리뿐만 아니라, 엔트리의 region의 반지름도 고려하여 non-리프 엔트리를 엔트리로 분할한다. As a more detailed example, when step S610 is performed from the resource method, the non-leaf entry is divided into entries in consideration of not only the distance between entries, but also the radius of the region of the entry.

이때, k-차원의 유클리디안 스페이스 상의 임의의 두 엔트리 Ei, Ej간 거리인

는 다음의 식 4로부터 도출될 수 있다.Where the distance between any two entries Ei and Ej on the k-dimensional Euclidean space

Can be derived from Equation 4 below.

[식 4][Equation 4]

위의 식 4에서, Ei.Or은 엔트리 객체 Ei의 라우팅 객체 Or이고, Ej.Or은 엔트리 객체 Ej의 라우팅 객체 Or이며, r(Ei.Or)은 Ei.Or은 엔트리 객체 Ei의 라우팅 객체 Or의 반지름이고, r(Ej.Or)은 엔트리 객체 Ej의 라우팅 객체 Or의 반지름이다.In Equation 4 above, Ei.Or is the routing object Or of the entry object Ei, Ej.Or is the routing object Or of the entry object Ej, and r (Ei.Or) is Ei.Or is the routing object Or of the entry object Ei Is the radius of and r (Ej.Or) is the radius of the routing object Or of the entry object Ej.

그리고, 이 단계(S610)에서 분할된 엔트리 그룹에 대해, non-리프 노드와 non-리프 엔트리를 생성(S620)하는 단계를 각각 수행한다.For the entry group divided in this step (S610), a step of generating a non-leaf node and a non-leaf entry (S620) is performed.

한 예에서, 위 단계(S610, S620)들의 수행하여 non-리프 엔트리의 개수가 줄어든 이후에도 non-리프 엔트리의 개수가 M개를 초과하는 지를 판단하여 엔트리 그룹 분할 단계를 계속해서 수행하기 위해서, 위 단계들(S610, S620)을 수행한 이후 위의 비교 단계(Q100)를 재수행하며, 비교 결과에 따라 위 단계들(S610, S620)을 반복적으로 수행하거나 또는 다른 단계(S700)를 수행한다. In one example, in order to continue performing the entry group partitioning step by determining whether the number of non-leaf entries exceeds M even after the number of non-leaf entries is reduced by performing the above steps (S610 and S620), After performing the steps S610 and S620, the comparison step Q100 is performed again, and the steps S610 and S620 are repeatedly performed or another step S700 is performed according to the comparison result.

위 단계들(S610, S620)의 반복적 수행에 따라, 위 비교 단계(Q100)의 판단 결과, non-리프 엔트리의 개수가 M을 초과하지 않는 경우, 예로써, non-리프 엔트리가 엔트리 그룹으로 분할되어 non-리프 엔트리의 개수가 M보다 작은 경우, 아니오 화살표 방향을 따라 이동하여, 전체 엔트리를 하나의 루트 노드에 저장(S700)하는 단계를 수행한다.When the number of non-leaf entries does not exceed M as a result of the determination of the comparison step Q100 according to the repetitive execution of the steps S610 and S620, for example, the non-leaf entries are divided into entry groups. If the number of non-leaf entries is less than M, the mobile station moves in the direction of the NO arrow to store the entire entries in one root node (S700).

이때, 전체 엔트리를 하나의 루트 노드에 저장(S700)함에 따라, 엔트리가 하나씩 생성될때마다 이를 M-트리에 적재하는, 즉, 메모리 디스크에 기록하는 것이 아니라, 생성된 전체 엔트리를 한 번에 M-트리 상에 적재하게 되므로, 디스크의 반복적인 접속으로 소요하는 시간 및 비용을 절감하고, 디스크가 소모되는 것을 감소시킬 수 있는 효과가 있다. At this time, as the entire entries are stored in one root node (S700), each time one entry is generated, the entire entries are loaded into the M-tree, that is, not written to the memory disk, but the entire generated entries are written at once. Since it is loaded on the tree, it is possible to reduce the time and cost required by repetitive access of the disk, and to reduce the consumption of the disk.

한 예예서, 위 단계들(S610, S620)로부터 생성된 엔트리들은 버퍼에 저장될 수 있고, 위 단계(S700)는 버퍼가 가득 찬 시점에 수행될 수 있다.As an example, the entries generated from the above steps S610 and S620 may be stored in a buffer, and the above step S700 may be performed when the buffer is full.

도 1 내지 도 7을 참고로 하여 설명한 본 발명의 일 실시예에 따른 패스트맵을 이용한 데이터셋의 M-트리 적재방법에 따라, 유클리디안 스페이스 상에서 거리가 정의되지 않는 대용량의 데이터셋을 패스트맵을 이용하여 M-트리에 적재함에 있어서, 메트릭 스페이스 상의 데이터셋 객체를 객체간 간격을 유지하도록 k-차원의 유클리디안 스페이스 상에 매핑하여 1차원 시퀀스를 생성하고, 생성된 1차원 시퀀스를 그룹으로 분할하여 leaf 노드를 생성하므로, k-차원의 유클리디안 스페이스 상에서 객체간 거리

로 대체하므로 객체간 거리 계산 복잡도가 O(N)로 형성된다. 이는, 본 발명의 일 실시예를 적용한 경우, 대량의 데이터를 BulkLoading하는 기존의 FastLoad 방법에서의 객체간 거리 계산 복잡도가

인 것에 비해, 객체간 거리 계산 복잡도가 현저히 낮으므로 대량의 데이터셋을 M-트리에 적재하는 데 소요되는 처리시간 및 처리비용을 감소시키는 효과가 있다.According to the M-tree stacking method of a data set using a fast map according to an embodiment of the present invention described with reference to FIGS. 1 to 7, a fast map of a large data set whose distance is not defined on Euclidean space In loading into the M-tree by using, map the dataset object on the metric space on the k-dimensional Euclidean space to maintain the inter-object spacing to generate a one-dimensional sequence, and group the generated one-dimensional sequence Create a leaf node by dividing by

As a result, the complexity of calculating the distance between objects is formed by O (N). In the case of applying an embodiment of the present invention, the complexity of calculating the distance between objects in the existing FastLoad method for bulkloading a large amount of data

Compared to, the complexity of calculating the distance between objects is significantly low, thereby reducing the processing time and processing cost for loading a large data set into the M-tree.

또한, 본 실시예의 방법을 적용하여 대용량의 데이터셋을 M-트리에 적재하는 경우, 데이터셋을 순차적으로 그룹화하여 이를 M-트리에 적재하므로, 디스크에 랜덤 액세스로 데이터를 기록하는 경우보다 효율적으로 데이터를 기록할 수 있다.In addition, when a large data set is loaded into the M-tree by applying the method of the present embodiment, the data sets are grouped sequentially and loaded into the M-tree, which is more efficient than when data is recorded by random access to a disk. Data can be recorded.

그리고 본 실시예에 따른 단계들 중, 첫 번째 단계(S100)와 두 번째 단계(S200)는 각 객체들에 대해 독립적으로 수행될 수 있어, 다중코어를 사용하여 위 단계들(S100, S200)을 동시 쓰레드로 처리하여, 대용량의 데이터를 M-트리에 적재하는 시간을 효과적으로 단축할 수 있다.And, among the steps according to the present embodiment, the first step (S100) and the second step (S200) can be performed independently for each object, so that the above steps (S100, S200) using a multi-core By processing concurrent threads, the time to load large amounts of data into the M-tree can be effectively reduced.

또한, 위 단계들(S300, S400, S500)에서도 1차원 시퀀스로부터 분할된 그룹을 복수 개의 쓰레드에 각각 할당하여 병렬 처리할 수 있어, 대용량의 데이터를 M-트리에 적재하는 시간을 효과적으로 단축할 수 있다.In addition, in the above steps (S300, S400, S500), the group divided from the one-dimensional sequence can be allocated to a plurality of threads, respectively, to be processed in parallel, thereby effectively reducing the time for loading a large amount of data into the M-tree. have.

본 발명의 일 실시예를 이용하여 대용량의 데이터셋을 M-트리에 적재하는 경우의 데이터 크기에 따른 실행시간, 계산 거리의 수, 디스크 액세스를, 종래기술을 이용하였을 경우에서의 데이터 크기에 따른 실행시간, 계산 거리의 수, 디스크 액세스와 비교하면, 도 8의 성능 그래프에 도시한 것처럼, 본 발명의 실시예에 따른 FastLoad를 수행하는 경우(FastLoad-F, FastLoad-H, FastLoad-R)에서 데이터 크기가 4K에서 1M로 증가함에 따른 데이터 업로드 실행 시간(도 8의 (a)), 거리 계산의 수(도 8의 (b)), 디스크 액세스 수(도 8의 (c))에서 모두 종래의 단순 삽입 및 단순 BulkLoad 대비 우수함을 시뮬레이션을 통해 확인할 수 있다.By using an embodiment of the present invention, the execution time, the number of calculation distances, and the disk accesses according to the data size when a large data set is loaded into the M-tree according to the data size when the conventional technology is used. Compared with the execution time, the number of calculation distances, and the disk access, as shown in the performance graph of FIG. 8, in the case of performing FastLoad according to an embodiment of the present invention (FastLoad-F, FastLoad-H, FastLoad-R) As the data size increases from 4K to 1M, the data upload execution time (FIG. 8A), the number of distance calculations (FIG. 8B), and the number of disk accesses (FIG. 8C) are all conventional. The simulation shows that it is superior to simple insertion and simple BulkLoad.

이때, 도 8의 (a) 내지 (c)에 도시한 FastLoad-F는 풀페이지 방법을 이용하는 경우이고, FastLoad-H는 휴리스틱 방법을 이용하는 경우이며, FastLoad-R은 리저러스 방법을 이용하는 경우에 관한 그래프이며, 도 8의 그래프는 2차원의 유클리디안 스페이스에서 4KB의 크기를 갖는 디스크 페이지에 대해 시뮬레이션 수행되었다.At this time, FastLoad-F shown in (a) to (c) of FIG. 8 is a case of using the full-page method, FastLoad-H is a case of using the heuristic method, FastLoad-R relates to the case of using the resource method The graph of FIG. 8 is simulated for a disk page having a size of 4 KB in a two-dimensional Euclidean space.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements of those skilled in the art using the basic concepts of the present invention defined in the following claims are also provided. It belongs to the scope of rights.

Claims

Performed by a processing unit that loads a data object into an M-tree,
Mapping the dataset on the metric space to a point on the k-dimensional Euclidean space;
aligning the points on the k-dimensional Euclidean space into a one-dimensional sequence;
Dividing the points of the one-dimensional sequence into contiguous groups; And,
Creating a leaf node for the contiguous group;
M-tree loading method of a data set using a fast map for loading the data set in the M-tree including.

The method of claim 1,
And the data set on the metric space is a data set whose similarity between data objects is not defined as the Lp distance between two points in Euclidean space.

The method of claim 2,
And the data set on the metric space is an unstructured multimedia data or a point of interest (POI) of a road network.

The method of claim 1,
Mapping the dataset on the metric space to a point on a k-dimensional Euclidean space is performed such that the inter-object spacing of data on the metric space is mapped so as to remain on the k-dimensional Euclidean space. M-tree loading method of a data set using a fast map, characterized in that.

The method of claim 4, wherein
The mapping of the data set on the metric space to a point on the k-dimensional Euclidean space is performed by Equation 1 below.
[Equation 1]

(d _ij is the distance between two objects O _i , O _j in the metric space,

Are two objects mapped to Euclidean space

Distance, and i and j are natural numbers from 1 to N.)

The method of claim 1,
Mapping a dataset on the metric space to a point on a k-dimensional Euclidean space is performed to map a plurality of objects constituting the dataset to BulkLoading on the Euclidean space. M-tree loading of datasets using fast map.

The method of claim 1,
Arranging the points on the k-dimensional Euclidean space in a one-dimensional sequence is performed using a space filling curve M-tree loading method of a data set using a fast map.

The method of claim 7, wherein
Arranging the points on the k-dimensional Euclidean space in a one-dimensional sequence is performed using any one of a Z curve (Morton curve), a Hilbert curve, and a Gray-code curve as a space filling curve. M-tree loading of datasets using fast map.

The method of claim 1,
In the step of dividing the points of the one-dimensional sequence into a contiguous group, the successive points arranged in the one-dimensional sequence are divided into groups in sequence,
A method of loading an M-tree of a dataset using a fast map, characterized in that the full page method is used to uniformly group all groups to include the maximum objects.

The method of claim 1,
In the step of dividing the points of the one-dimensional sequence into a contiguous group, the successive points arranged in the one-dimensional sequence are divided into groups in sequence,
By using the heuristic method that compares the distance between the points in the group and the number of points in the group, calculates the number of points to be included in the group and uses the calculation result to form the group in a range in which the area of the group does not expand. M-tree loading method of a dataset using a fast map, characterized in that performed by.

The method of claim 10,
In the heuristic method, the point-to-point distance included in the group is calculated to be calculated as the point-to-point distance in the actual metric space or replaced by the distance between two points in the k-dimensional Euclidean space. M-tree loading method.

The method of claim 1,
In the step of dividing the points of the one-dimensional sequence into a contiguous group, the successive points arranged in the one-dimensional sequence are divided into groups in sequence,
Fast candidates are generated by generating candidate groups, calculating the radius of the area size included in each candidate group and the number of points included in the candidate group, and forming a group using the calculation result. M-tree loading of datasets using maps.

The method of claim 12,
In the resource method, the radius of the candidate group region is calculated as a radius on an actual metric space or is replaced by a distance between two points in a k-dimensional Euclidean space. M-tree loading method.

The method of claim 1,
The step of creating a leaf node for the continuous group may include creating a leaf node including an object on a metric space corresponding to the points included in the divided group in the step of dividing the points of the one-dimensional sequence into successive groups. M-tree loading method of a dataset using a fast map, characterized in that performed.

The method of claim 14,
Generating leaf nodes for the contiguous groups is performed to calculate representative objects of leaf nodes using points on k-dimensional Euclidean space. How to load the tree.

The method of claim 1,
After creating a leaf node for the contiguous group,
Generating a non-leaf entry for the leaf node created for each group;
Dividing the non-leaf entries into an entry group according to the number of non-leaf entries; and storing the entire entries in one root node; M-tree loading method of a dataset using a fast map, characterized in that further performing any one of steps.

The method of claim 16,
And dividing the non-leaf entries into a group of entries is performed by a full page method, a heuristic method, or a resource method.

The method of claim 16,
The entry created in the step of dividing the non-leaf entry into an entry group is performed to be stored in a buffer,
And storing the entire entry in one root node when the buffer is full, thereby writing the entry to a memory disk.

A computer-readable recording medium having recorded thereon a program capable of executing on a computer the method of loading an M-tree of a dataset using a fast map according to any one of claims 1 to 18.