KR20020004301A

KR20020004301A - Method of Nearest Query Processing using the Spherical Pyramid-Technique

Info

Publication number: KR20020004301A
Application number: KR1020000038050A
Authority: KR
Inventors: 이동호; 김형주
Original assignee: 이동호; 김형주
Priority date: 2000-07-04
Filing date: 2000-07-04
Publication date: 2002-01-16

Abstract

PURPOSE: A method for processing nearest query is provided to index high-dimensional data efficiently and to process a question of a spherical type used for similar search efficiently. CONSTITUTION: A nearest query process method comprises dividing a d-dimensional data space into 2d spherical pyramids(11), dividing the divided spherical pyramids into spherical segments(13), calculating the shortest distance between a query point and the spherical pyramid so as to be inserted in an order queue according to ascending power, extracting the first element of the order queue, calculating the shortest distance between a spherical segment in the spherical pyramid and the query point when the first extracted element is a spherical pyramid, to insert the spherical segment in the queue again, and returning an object to a result of the nearest query when the first extracted element is the object.

Description

Method of Nearest Query Processing using the Spherical Pyramid-Technique}

본 발명은 d-차원의 공간을 2d 개의 구형 피라미드들로 공간분할하여 고차원 데이터를 색인하는 구형 피라미드 기법을 이용하여 질의어에 대하여 최근접된 객체를 검색하는 구형피라미드기법을 이용한 최근접 질의 처리 방법에 대한 것이다.The present invention relates to a method for processing a nearest neighbor query using a spherical pyramid technique that searches an object nearest to a query using a spherical pyramid technique that spatially divides a d-dimensional space into 2d spherical pyramids and indexes high-dimensional data. It is about.

최근접 질의 혹은 k-최근접 질의를 처리하기 위한 알고리즘은 GIS응용이나 패턴 인식, 서류 검색, 학습 이론 등의 분야에서 그 필요성에 의해 많은 연구가 이루어졌다. 이러한 알고리즘은 d-차원의 벡터 공간에서 점 객체나 임의의 공간 객체에 대하여 질의 객체와 가장 유사한 객체들을 찾기 위해 개발되었으며, 대부분 일정한 공간 자료구조를 기반으로 하고 있다. 예를 들어, k-d tree에 기반한 알고리즘과 quad-tree에 기반한 알고리즘, 그리고, R-tree에 기반한 알고리즘 등이 있으며 이러한 알고리즘들은 대부분 수정을 통하여 다른 공간 자료구조 상에서도 자연스럽게 적용될 수 있다.Algorithms for processing the nearest or k-nearest queries have been studied by the necessity in the fields of GIS application, pattern recognition, document retrieval, and learning theory. This algorithm was developed to find the most similar objects to the query objects for point or arbitrary spatial objects in d-dimensional vector space, and most of them are based on a certain spatial data structure. For example, there are k-d tree-based algorithms, quad-tree-based algorithms, and R-tree-based algorithms. Most of these algorithms can be naturally applied to other spatial data structures through modification.

Roussopoulos와 Kelley등은 mindist와 minmaxdist를 이용하여 R-tree상에서 branch & bound 알고리즘으로 질의 점(질의 객체)와 가장 유사한 k개의 객체를 검색하는 알고리즘을 제안하였다. 이 알고리즘의 기본적인 아이디어는 R-tree를 깊이 우선 방식으로 탐색하면서 후보가 될 수 있는 객체들을 활성 리스트(Active Branch List)에 유지하는 것이다. 이 알고리즘은 k가 이미 결정된 상태에서 알고리즘이 시작된다. 즉, 사전에 k가 이미 결정되기 때문에 만약 사용자가 (k+1)번째 객체를 얻고자 하다면 알고리즘을 처음부터 다시 시작해야 하는 단점이 존재한다(N. Roussopoulos, S. Kelley, F. Vincent. "Nearest Neighbor Queries" Proc. ACM SIGMOD, San Jose, CA, pages 71-79, 1995.).Roussopoulos and Kelley have proposed an algorithm to search k objects that are most similar to the query point (query object) using the branch & bound algorithm on R-tree using mindist and minmaxdist. The basic idea behind the algorithm is to search the R-tree in a depth-first manner, keeping candidate objects in the Active Branch List. The algorithm starts with k already determined. In other words, since k is already determined in advance, there is a disadvantage that if the user wants to get the (k + 1) th object, the algorithm must be restarted from the beginning (N. Roussopoulos, S. Kelley, F. Vincent. " Nearest Neighbor Queries "Proc. ACM SIGMOD, San Jose, CA, pages 71-79, 1995.).

Hjaltason과 Samet은 이러한 단점을 제거하고자 사전에 k 값을 결정하는 것이 아니라 사용자의 요구에 따라 한 개씩 점진적으로 질의 결과를 얻어내는 점진적 최근접(incremental nearest neighbor) 질의 처리 알고리즘을 제안하였다. 공간 데이터베이스에서 사용자는 질의 객체와 가장 가까운 객체들을 하나씩 차례로 얻고자 하는 `거리 브라우징(distance browsing)' 요구를 많이 하게 된다. 점진적 최근접 알고리즘은 이러한 응용을 위하여 고안되었다. 이 알고리즘은 순위 큐(priority queue)를 사용하여 항상 질의 객체와 가장 가까운 객체가 큐의 맨 앞에 오게 함으로써, 사용자가 원할 때마다 큐의 맨 앞에 있는 원소를 반환하여 차례로 가장 가까운 객체들을 질의 결과로 보여준다. 따라서, 사용자는 항상 원하는 만큼의 객체를 알고리즘을 처음부터 재시작 할 필요 없이 얻을 수 있게 된다. 즉, k개의 객체를 검색한 후, (k+1)번째 객체를 얻기 위해 알고리즘을 처음부터 재시작할 필요 없이 순위 큐에 남아 있는 다음 객체를 반환하면 된다.Hjaltason and Samet proposed an incremental nearest neighbor query processing algorithm to get the query results one by one according to the user's request, rather than determining k values beforehand. In a spatial database, users make a lot of `distance browsing 'requests to get the query objects and the closest objects one by one. Incremental nearest neighbor algorithms are designed for this application. The algorithm uses a priority queue to always ensure that the object closest to the query object is at the front of the queue, returning the element at the beginning of the queue whenever the user wants to display the closest objects in turn as the query result. . Thus, the user can always get as many objects as they want without having to restart the algorithm from scratch. In other words, after retrieving k objects, we can return the next object left in the rank queue without having to restart the algorithm from the beginning to get the (k + 1) th object.

Hjaltason와 Samet는 점진적 최근접 알고리즘을 이용하여 k-최근접 질의를 처리할 경우에도 기존의 R*-tree상의 k-최근접 질의 처리 알고리즘보다 효율적임을 보였다(Gisli R. Hjaltason, Hanan Samet. "Distance Browsing in Spatial Databases." ACM Transaction on Database Systems, 24(2), pages 265-318,1999.).Hjaltason and Samet show that the k-nearest query processing algorithm is more efficient than the k-nearest query processing algorithm on the R * -tree even when the gradual nearest algorithm is used (Gisli R. Hjaltason, Hanan Samet. Browsing in Spatial Databases. "ACM Transaction on Database Systems, 24 (2), pages 265-318, 1999.).

그러나, 이 알고리즘도 고차원 데이터에 대해서는 효율적이지 못하다. 이것은 알고리즘 자체의 문제라기 보다는 R-tree라는 공간 자료구조가 고차원 공간상에서 효율적이지 못하기 때문이다.However, this algorithm is also inefficient for high dimensional data. This is because the spatial data structure called R-tree is not efficient in high dimensional space, rather than the problem of the algorithm itself.

유사 검색은 데이터베이스 시스템의 중요한 검색 기법 중에 하나로 대두되고 있다. 이는 특정 데이터를 고차원 공간상의 하나의 점으로 변환하여, 다차원 색인 구조를 이용하여 색인하고 검색하는 기법이다. 이러한 검색 기법이 가장 많이 사용되는 응용으로는 내용 기반 멀티미디어 정보 검색을 들 수 있다. 내용 기반 멀티미디어 정보 검색은 멀티미디어 데이터로부터 내용으로 추정되는 특징 데이터를 추출하여 이를 기반으로 검색하는 기법인데, 일반적으로 멀티미디어 데이터에서 추출한 특징 데이터들은 고차원 벡터 형태로 표현된다(S. Berchtold, D. A. Keim, and H.-P. Kriegel. "The X-tree: An Indexing Structure for High-Dimensional Data". Proc. 22nd Int. Conf. on Very Large Database, pages 28-39, September 1996.: S. Berchtold, D. Keim, H.-P. Kriegel, and T. Seidl. "Fast Nearest Neighbor Search in High-Dimensional Spaces". Proc. 14th Int. Conf on Data Engineering, Orlando, 1998.).Similar search has emerged as one of the important search techniques in database systems. This is a technique of converting specific data into a single point in high dimensional space and indexing and searching using a multidimensional index structure. The most popular application of this retrieval technique is content-based multimedia information retrieval. Content-based multimedia information retrieval is a technique of extracting feature data estimated from content from multimedia data and searching based on it. In general, feature data extracted from multimedia data is represented in a high-dimensional vector form (S. Berchtold, DA Keim, and H.-P. Kriegel. "The X-tree: An Indexing Structure for High-Dimensional Data". Proc. 22nd Int. Conf. On Very Large Database, pages 28-39, September 1996 .: S. Berchtold, D. Keim, H.-P. Kriegel, and T. Seidl. "Fast Nearest Neighbor Search in High-Dimensional Spaces". Proc. 14th Int. Conf on Data Engineering, Orlando, 1998.).

또한, 이러한 응용에서는 데이터베이스에 저장된 객체들 중에서 질의 객체와 가장 유사한 객체들을 검색하는 질의로 k-최근접 질의와 구형태의 영역 질의를 사용한다(D. A. White, and R. Jain. "Similarity Indexing with the SS-tree". Proc. 12th Int. Conf on Data Engineering, pages 516-523, 1996.).In addition, these applications use k-nearest and spherical domain queries to retrieve the objects most similar to the query objects among the objects stored in the database (DA White, and R. Jain. "Similarity Indexing with the SS-tree ". Proc. 12th Int. Conf on Data Engineering, pages 516-523, 1996.).

k-최근접 질의는 질의 객체와 가장 유사한 k-개의 객체를 검색하는 것이고, 구형태의 영역 질의는 질의 객체와 일정한 유사성 허용 오차를 만족하는 모든 객체들을 검색하는 방법이다. 효율적인 유사 검색을 지원하기 위해서는 고차원 데이터를 효율적으로 색인할 수 있는 색인 구조와 이러한 색인 구조상에서 위와 같은 유사 질의를 효율적으로 처리할 수 있는 알고리즘이 필수적이다.A k-nearest query is a search for k objects most similar to a query object, and a spherical domain query is a method for searching all objects that satisfy a certain similarity tolerance with the query object. In order to support efficient similarity search, an index structure that can efficiently index high-dimensional data and an algorithm that can efficiently process similar queries on the index structure are essential.

그러나, 상기한 종래 기술에 의하면, 저차원이나 중차원 데이터에 대하여 효율적인 색인을 제공하는 어떤 색인 구조도 고차원 데이터에 대해서는 효율적인 색인을 제공하지 못하는 한계가 있다.However, according to the above-described prior art, any index structure that provides an efficient index for low or medium dimensional data does not provide an efficient index for high dimensional data.

본 발명은 상기와 같은 종래기술의 한계와 문제점을 해결하기 위한 것으로, 고차원 데이터를 효율적으로 색인하고 유사 검색에 많이 사용되는 구형태의 질의를 효율적으로 처리할 수 있는 알고리즘으로 구형 피라미드 기법을 제시한다.The present invention addresses the limitations and problems of the prior art, and proposes a spherical pyramid technique as an algorithm capable of efficiently indexing high-dimensional data and efficiently processing spherical queries that are frequently used for similar searches. .

더 나아가, 본 발명에서는 구형 피라미드 기법을 이용하여 질의 객체와 가장 유사한 k-개의 객체를 검색하는 k-최근접 질의 처리를 위한 알고리즘을 제시한다.Furthermore, the present invention proposes an algorithm for k-nearest query processing that retrieves k-objects most similar to query objects using a spherical pyramid technique.

이로써, R*-tree와 X-tree상에서 구현된 점진적 k-최근접 질의 처리 방법보다, 페이지 접근 횟수, CPU 사용시간, 전체 응답 시간 등의 측면에서 더욱 효율적인 효과가 있음을 밝힌다.This results in a more efficient effect in terms of page access times, CPU usage time, and overall response time than the incremental k-nearest query processing method implemented on R * -tree and X-tree.

도1은 구형 피라미드 기법의 공간 분할방법의 예시도.1 is an illustration of a spatial partitioning method of the spherical pyramid technique.

도2는 구형 피라미드의 번호 결정 과정과 존재하는 점의 거리 예시도.Figure 2 is an exemplary view showing the distance between the number determination process and the existing points of the spherical pyramid.

도3은 2차원 공간상에서 10개의 객체가 존재하는 구형 피라미드의 예시도.3 is an exemplary diagram of a spherical pyramid in which ten objects exist in two-dimensional space.

도4는 질의 점과 인접 구형피라미드간 최소 거리 예시도.4 illustrates an example of the minimum distance between a query point and adjacent spherical pyramids.

도5a, 도5b, 도5c는, 질의 점과 구형 조각간 최소 거리(MINDIST) 예시도.5A, 5B, and 5C are exemplary diagrams illustrating a minimum distance (MINDIST) between a query point and a spherical piece.

도6은 점진적 최근접 질의를 처리하기 위한 알고리즘 1.6 is an algorithm 1 for processing a gradual nearest neighbor query.

도7a, 도7b, 도7c는, 데이터베이스 차원에 따른, 페이지 접근 횟수 그래프, CPU 사용시간 그래프, 전체 응답 시간 그래프.7A, 7B, and 7C are graphs showing page access counts, CPU usage time graphs, and overall response time graphs according to database dimensions.

도8a, 도8b, 도8c는, 데이터베이스크기에 따른, 페이지 접근 횟수 그래프, CPU 사용시간 그래프, 전체 응답 시간 그래프.8A, 8B, and 8C are graphs showing page access counts, CPU usage time graphs, and total response time graphs according to database sizes.

도9a, 도9b, 도9c는, 실제 데이터를 이용한 실시예로서, 페이지 접근 횟수 그래프, CPU 사용시간 그래프, 전체 응답 시간 그래프.9A, 9B, and 9C are examples of using actual data, showing page access count graphs, CPU usage time graphs, and overall response time graphs.

*** 도면의 주요부분에 대한 부호설명 ****** Explanation of main parts of drawing ***

10. 데이터 공간 중앙점 11. 구형 피라미드10. Data Space Center Points 11. Spherical Pyramids

12. (d-1)차원의 구형평면 13. 구형 조각12. Spherical plane in dimension (d-1) 13. Spherical piece

31. 최근접점 41. 질의 점31. Nearest point 41. Point of inquiry

본 발명에 의한 구형 피라미드 기법을 이용한 최근접 질의 처리 방법은, d-차원의 데이터 공간을 2d 개의 구형 피라미드로 분할하는 제1 단계; 분할된 상기구형 피라미드를 다시 구형 조각으로 분할하는 제2 단계; 질의 점(query point)과 상기 구형 피라미드 사이의 최소 거리를 계산하여 오름차순으로 순위 큐에 삽입하는 제3 단계; 상기 순위 큐의 첫 번째 원소를 추출하여, 추출된 첫 번째 원소가 구형 피라미드이면 상기 구형 피라미드 안에 있는 구형 조각과 질의 점간 최소 거리를 계산하여 상기 구형 조각을 상기 큐에 다시 삽입하고, 상기 추출된 첫 번째 원소가 구형 조각이면 상기 구형 조각 안에 있는 객체와 질의 점간 거리를 계산하여 상기 객체를 상기 큐에 다시 삽입하고, 상기 추출된 첫 번째 원소가 객체이면 상기 객체를 최근접 질의의 결과로 반환하는 제4 단계;를 포함하는 것을 특징으로 한다.The nearest query processing method using the spherical pyramid method according to the present invention comprises: a first step of dividing a d-dimensional data space into 2d spherical pyramids; Dividing the divided spherical pyramid into spherical pieces again; A third step of calculating a minimum distance between a query point and the spherical pyramid and inserting it into the ranking queue in ascending order; If the first element of the rank queue is extracted and the extracted first element is a spherical pyramid, the minimum distance between the spherical fragment and the query point in the spherical pyramid is calculated, and the spherical fragment is reinserted into the queue. If the first element is a spherical fragment, the distance between the object and the query point in the spherical fragment is calculated, and the object is reinserted into the queue. If the extracted first element is the object, the object is returned as the result of the nearest query. Step 4; characterized by including.

상기 제3 단계와 제4 단계에서 질의 점과 각 구형 피라미드 또는 구형 조각 사이의 거리를 계산하는 데에는 특별히 정의된 식을 적용하기도 한다.In the third and fourth steps, a specially defined equation may be applied to calculate the distance between the query point and each spherical pyramid or spherical piece.

이하, 수학식과 도면을 참조하여 본 발명을 상세히 설명한다. 그러나, 이들 수학식과 도면은 예시적인 목적일 뿐 본 발명이 이에 한정되는 것은 아니다.Hereinafter, the present invention will be described in detail with reference to equations and drawings. However, these equations and drawings are for illustrative purposes only and the present invention is not limited thereto.

1. 구형 피라미드 기법1.Spherical Pyramid Technique

구형 피라미드 기법은 d-차원의 점을 1-차원 값으로 변환하여 B+-트리와 같은 효율적인 1-차원 색인 구조를 사용하여 1-차원 값들을 저장하고 접근한다. 이러한 변환은 2단계로 이루어진다.The spherical pyramid technique transforms d-dimensional points into one-dimensional values, storing and accessing one-dimensional values using an efficient one-dimensional index structure such as B + -tree. This conversion is done in two steps.

첫 번째 단계에서는, d-차원의 데이터 공간을 2d 개의 구형 피라미드들로 분할한다. 즉, 데이터 공간의 중앙점(0.5, 0.5, ..., 0.5)을 상단점으로 하고 (d-1)-차원의 구형 평면을 기저로 가지는 2d개의 구형 피라미드들로 공간을 분할한다. 두 번째 단계는, 각 단일 구형 피라미드의 상단점을 중심으로 갖는 여러 개의 구형 조각(bounding slice)들로 나눈다. 이러한 구형태의 조각은 B+-트리의 한 페이지에 상응하게 된다.In the first step, we divide the d-dimensional data space into 2d rectangular pyramids. That is, the space is divided into 2d spherical pyramids having the center point (0.5, 0.5, ..., 0.5) as the upper point and the (d-1) -dimensional spherical plane. The second stage divides into several bounding slices centered on the top point of each single spherical pyramid. This spherical piece corresponds to one page of the B + -tree.

도 1과 도 2는 2차원 공간상에서 구형 피라미드 기법의 공간 분할방법을 예시하여 주고 있다. 도 1에서는, 하나의 구형 피라미드(11)가 4개의 구형 조각(13)들로 분할되는 것을 예시하였다. 먼저, 2차원 공간상에서 4개의 구형 피라미드(11)들로 분할되며, 각 구형 피라미드는 동일하게 공간의 중앙점(10)을 상단점으로 가지고 하나의 곡선(12)을 기저로 가진다. 이것은 d-차원으로 동일하게 확장될 수 있으며, d-차원의 구는 2d 개의 (d-1)-차원의 구형 평면을 가지기 때문에 2d개의 구형 피라미드를 얻을 수 있다. 그리고, d-차원에서는 기저가 곡선이 아니라 (d-1)-차원의 구형 평면이 된다. 두 번째 단계에서 각 구형 피라미드는 상단점을 중심으로 갖는 여러 개의 구형 조각(13)들로 분할된다.1 and 2 illustrate a spatial division method of the spherical pyramid technique in two-dimensional space. In FIG. 1, one spherical pyramid 11 is divided into four spherical pieces 13. First, it is divided into four spherical pyramids 11 in two-dimensional space, and each spherical pyramid has the center point 10 of the space as the upper point and one curve 12 as the basis. This can be extended equally in the d-dimensional, and since the d-dimensional sphere has 2d (d-1) -dimensional spherical planes, 2d spherical pyramids can be obtained. In the d-dimension, the base becomes a (d-1) -dimensional spherical plane, not a curve. In the second step, each spherical pyramid is divided into several spherical pieces 13 with their top points centered.

다음의 수학식 1과 2는 구형 피라미드 기법의 공간 분할 전략의 첫 번째와 두 번째 단계에 상응한다.Equations 1 and 2 correspond to the first and second steps of the spatial partitioning strategy of the spherical pyramid technique.

즉, 수학식 1에서는, d-차원의 점 v는 구형 피라미드 sp_i에 존재한다고 정의된다.That is, in Equation 1, the point v of the d-dimensional is defined to exist in the spherical pyramid sp _i .

그리고, d-차원의 점 v가 주어졌을 때, 점 v의 거리는 다음의 수학식 2와 같이 정의된다.And, given the d-dimensional point v, the distance of the point v is defined as in Equation 2 below.

즉, 도 2에 도시된 바와 같이, 먼저 수학식 1을 이용하여 어떤 점 v가 속해 있는 구형 피라미드의 번호를 결정한다. 그리고, 수학식 2에 의해 점 v의 위치를 결정한다.That is, as shown in FIG. 2, first, a number of a spherical pyramid to which a point v belongs is determined by using Equation 1. Then, the position of the point v is determined by the equation (2).

마지막으로, 아래 수학식 3은 수학식 1과 2를 이용하여 d-차원의 점 v를 1차원 값으로 변환한다.Finally, Equation 3 below uses the equations 1 and 2 to determine the d-dimensional point v as a one-dimensional value. Convert to

즉, d-차원의 점 v가 주어졌을 때, sp_i가 수학식 1에 의해 얻어진 점 v가 속해 있는 구형 피라미드이고, d_v가 수학식 2에 의해 얻어진 점 v의 거리라고 할 때, 점 v의 구형 피라미드 값은 수학식 3과 같이 정의된다.That is, given d-dimensional point v, sp _i is a spherical pyramid to which point v obtained by Equation 1 belongs, and d _v is the distance of point v obtained by Equation 2, The spherical pyramid value of is defined as in Equation 3.

예를 들어, 2차원 점 v=(0.4, 0.8)가 주어질 경우, j_max는 1이고 v₁(=0.8)이 0.5보다 크기 때문에 정의 1에 의하여 점 v는 sp₍₁₊₂₎에 속하게 된다. 또한, 수학식2에 의해 중앙으로부터 점 v까지의 거리는이 된다. 따라서, 점 v의 구형 피라미드 값은 수학식 3에 의해 ()이 된다.For example, given the two-dimensional point v = (0.4, 0.8), by definition 1 the point v belongs to sp _{(1 + 2)} because j _max is 1 and v ₁ (= 0.8) is greater than 0.5. . Further, according to equation (2), the distance from the center to the point v is Becomes Thus, the spherical pyramid value of point v is given by )

그리고, 구형 피라미드 기법에 의해 색인을 생성하는 과정은 간단하다. 먼저, d-차원의 점 v가 주어지며 이것의 구형 피라미드 값(spv_v)을 결정한 후에, 이 값을 B+-트리의 키값으로 하여 점 v를 B+-트리에 삽입한다.And, the process of indexing by the spherical pyramid technique is simple. First, the d-dimensional point v is given and its spherical pyramid value (spv _v ) is determined, and then the point v is inserted into the B + -tree with this value as the key value of the B + -tree.

그리고, 점 v와 구형 피라미드 값 spv_v를 B+-트리의 해당 데이터 페이지에 저장한다. 갱신이나 삭제도 B+-트리를 이용하여 할 수 있다.Then store the point v and the spherical pyramid value spv _v in the corresponding data page of the B + -tree. Updates and deletions can also be done using the B + -tree.

도 3은 2차원 공간상에서 구형 피라미드의 예를 보여 주고 있다. 각 구형 조각에는 1개의 객체만 들어간다고 가정하자. 도 3에서 질의 점(q)은 구형 피라미드상의 구형 조각(bounding slice) BS₄에 존재함을 알 수 있다. 대부분의 최근접 질의 처리 알고리즘이 그렇듯이 구형 피라미드(SPY-TEC)상에서의 최근접 질의 처리 알고리즘도 질의 점이 속해 있는 데이터 페이지에 있는 객체들을 먼저 검색한다. 그러나, 효율적인 검색을 위해서는 질의 점과 구형 피라미드 사이의 최소 거리와 질의 점과 구형 조각 사이의 최소 거리를 측정할 필요가 있다. 이러한 거리들을 순서화함으로써 검색시 불필요하게 방문하는 페이지들의 수를 줄일 수 있다.3 shows an example of a spherical pyramid in two-dimensional space. Suppose each sphere piece contains only one object. In FIG. 3, it can be seen that the query point q is present in the bounding slice BS ₄ on the spherical pyramid. As with most nearest query processing algorithms, the nearest query processing algorithm on the SPY-TEC first searches for objects in the data page to which the query point belongs. However, for efficient retrieval, it is necessary to measure the minimum distance between the query point and the spherical pyramid and the minimum distance between the query point and the spherical fragment. By ordering these distances, the number of pages visited unnecessarily in a search can be reduced.

2. 질의 점과 구형 피라미드의 최소거리2. Minimum distance between the quality point and the spherical pyramid

다음의 수학식 4는 질의 점과 구형 피라미드 사이의 최소 거리를 위한 정리이다. 이러한 거리 측정 과정을 좀 더 단순화시켜 설명하기 위하여 질의 점이 속해 있는 구형 피라미드 번호가 차원 d보다 작은 경우만을 설명한다. 차원 d보다 큰 경우도 유사한 방법으로 확장이 가능함은 물론이다.Equation 4 below is a theorem for the minimum distance between the query point and the spherical pyramid. In order to simplify the distance measurement process, only the case where the spherical pyramid number to which the query point belongs is smaller than the dimension d is described. Of course, if it is larger than the dimension d can be extended in a similar manner.

즉, 질의 점(q=[q₀, q₁,...,q_d-1])이 주어졌을 때, 질의 점이 속해 있는 구형 피라미드를 sp_j(j< d)라 하면, 질의 점과 구형 피라미드 sp_i와의 최소 거리 MINDIST(q, sp_i)는 수학식 4와 같이 정의된다(Lemma 1).That is, given a query point (q = [q ₀ , q ₁ , ..., q _d-1 ]), if the spherical pyramid to which the query point belongs is sp _j (j <d), the query point and the sphere The minimum distance MINDIST (q, sp _i ) from the pyramid sp _i is defined as Equation 4 (Lemma 1).

위상 수학에서 한 점([q₀, q₁,...,q_d-1])과 한 평면(k₀x₀+ k₁x₁+,...,k_d-1x_d-1+ C = 0)이 주어졌을 때, 이 점과 평면 사이의 최소 거리는 점에서 평면에 이르는 수직선으로 다음과 같이 정의된다.In point mathematics, one point ([q ₀ , q ₁ , ..., q _d-1 ]) and one plane (k ₀ x ₀ + k ₁ x ₁ +, ..., k _d-1 x _d-1 Given C + 0), the minimum distance between this point and the plane is defined as the vertical line from point to plane.

상기 거리공식을 이용하여 (i> d)인 경우와 (i< d)인 경우 각각 수학식 4를 증명할 수 있다.Equation 4 may be proved using (i> d) and (i <d), respectively, using the distance equation.

먼저, i=j이면, sp_i는 질의 점이 속해 있는 구형 피라미드이다. 따라서,MINDIST(q, sp_i)= 0 이다.인 경우, sp_i는 질의 점이 속해 있는 구형 피라미드 sp_j의 맞은 편에 있는 구형 피라미드이다. 따라서, 질의 점과 구형 피라미드 sp_i에 이르는 최소거리는 sp_i의 상단점, 즉, 공간의 중앙점과 질의 점 사이의 거리가 된다. 따라서, MINDIST(q, sp_i)= d_q가 된다. 단위 공간을 기반으로 하기 때문에 상기 거리공식의 색인 k_n와 상수 C는 [-1, 0, 1]의 값 중에서 하나를 갖는다.First, if i = j, sp _i is the spherical pyramid to which the query point belongs. Therefore, MINDIST (q, sp _i ) = 0. If sp _i is the spherical pyramid opposite sp _{j that} the query point belongs to. Therefore, the minimum distance to the query point and the spherical pyramid sp _i is the upper point of sp _i , that is, the distance between the center point of space and the query point. Therefore, MINDIST (q, sp _i ) = d _q . Since the distance is based on unit space, the index k _n and the constant C of the distance equation have one of [-1, 0, 1].

i< d 이면, 도4의 2차원 공간의 예처럼 질의 점과 인접하는 구형 피라미드 sp_i의 한 측면의 방정식은 "k_jx_j+ k_ix_i= 0" (42)이 된다. 이것은 d차원 이상의 공간에 대해서도 일관되게 확장이 가능하다. 2차원 공간인 경우에는 직선이지만, d차원 공간인 경우에는 (d-1)차원 평면이 된다. 이 (d-1)차원 평면의 방정식은 k_i와 k_j를 제외한 모든 색인이 0이 되는 일반적인 특성을 가진다. 이때, k_j=1 이고, k_i는 i< d 이므로 -1 이다. 따라서, 질의 점과 인접한 구형 피라미드 sp_i사이의 최소 거리는 |q_j- q_i|/가 된다.If i <d, the equation on one side of the spherical pyramid sp _i adjacent to the query point, as in the two-dimensional space example of Fig. 4, becomes " k _j x _j + k _i x _i = 0 " (42). It is possible to extend the coherence even more than d-dimensional space. In the case of two-dimensional space, it is a straight line. In the case of d-dimensional space, it is a (d-1) -dimensional plane. This (d-1) -dimensional equation has the general characteristic that all indices except k _i and k _j are zero. At this time, k _j = 1 and k _i is −1 since i <d. Therefore, the minimum distance between the query point and the adjacent spherical pyramid sp _i is | q _j -q _i | / Becomes

마지막으로, i> d 인 경우는 질의 점과 인접하는 구형 피라미드 sp_i의 한 측면의 방정식은 "k_jx_j+ k_ix_i-1 = 0" (43)이 된다.Finally, if i> d, the equation on one side of the spherical pyramid sp _i adjacent to the query point is "k _j x _j + k _i x _i -1 = 0" (43).

이 때, k_j= 1 이고, k_i는 i> d 이므로 1이다. 따라서, 질의 점과 인접한 구형 피라미드 sp_i사이의 거리는 |q_j- q_i|/가 된다.At this time, k _j = 1 and k _i is 1 since i> d. Thus, the distance between the query point and the adjacent spherical pyramid sp _i is | q _j -q _i | / Becomes

3. 질의 점과 구형 조각의 최소거리3. Minimum distance between quality points and spherical pieces

질의 점과 구형 조각 사이의 최소 거리를 측정하는 것은 질의 점과 구형 피라미드 사이의 최소 거리를 측정하는 것보다 복잡하다. 질의 점이 속해 있는 구형 피라미드 안에 존재하는 구형 조각들과 맞은 편에 있는 구형 조각들, 그리고 인접한 구형 피라미드 안에 존재하는 구형 조각들로 나누어 정리할 수 있다.Measuring the minimum distance between a query point and a spherical piece is more complicated than measuring the minimum distance between a query point and a spherical pyramid. It can be divided into spherical pieces that exist in the spherical pyramid to which the query point belongs, spherical pieces that are opposite, and spherical pieces that exist in the adjacent spherical pyramid.

질의 점이 주어졌을 때 질의 점이 속해 있는 구형 피라미드를 sp_j라 하면, 질의 점과 구형 피라미드 sp_i안에 존재하는 구형 조각(BS_l)들과의 최소 거리 MINDIST(q, BS_l) 는 다음 수학식 5a 내지 5c와 같다(Lemma 2).Given a query point, the spherical pyramid to which the query point belongs is sp _j , and the minimum distance between the query point and the spherical fragments (BS _l ) in the spherical pyramid sp _i MINDIST (q, BS _l ) is given by Equation 5a To 5c (Lemma 2).

첫째로, 구형 조각이 질의 점이 속해 있는 구형피라미드 안에 존재하는 경우(i=j), 질의 점과 구형 조각간 최소 거리는 수학식 5a와 같다.First, when the spherical fragment is present in the spherical pyramid to which the query point belongs (i = j), the minimum distance between the query point and the spherical fragment is expressed by Equation 5a.

둘째로, 구형 조각이 질의 점의 맞은편에 있는 구형피라미드 안에 존재하는 경우(|i-j|= d), 최소거리는 수학식 5b와 같다.Second, if the spherical fragment is in the spherical pyramid opposite the point of the query (| i-j | = d), the minimum distance is given by Equation 5b.

여기서,는 질의 점에서 가장 가까운 구형 피라미드의 한 면에 이르는 거리이고,는 d_q와에 의해 만들어지는 직각 삼각형의 한 각()이다.here, Is the distance from the point of quality to one side of the nearest spherical pyramid, With d _q One angle of a right triangle created by )to be.

셋째로, 구형 조각이 질의 점과 인접한 구형 피라미드 안에 존재하는 경우, 최소거리는 수학식 5c와 같다.Third, if the spherical piece exists in the spherical pyramid adjacent to the query point, the minimum distance is given by Equation 5c.

여기서,는와 d_q에 의해 만들어지는 직각 삼격형의 밑변의 길이이다. min(BS_l)는 구형 조각(BS_l)에 속해 있는 점들 중에서d_v값이 가장 적은 것이며, max(BS_l)는d_v값이 가장 큰 것을 의미한다. min(BS_l)와 max(BS_l)를 이용하여 상기 수학식 5a 내지 5c는 다음과 같이 증명할 수 있다.here, Is The length of the base of a right triangle, made by and d _q . min (BS _l ) is the smallest d _v value among the points belonging to the spherical fragment BS _l , and max (BS _l ) means the largest d _v value. Using min (BS _l ) and max (BS _l ), Equations 5a to 5c can be proved as follows.

첫째(수학식 5a)로, (1) 질의 점이 BS_l안에 속해 있는 경우,MINDIST(q,BS_l)는 질의 점과 BS_l의 안에 존재하는 어떤 점과의 거리보다 작거나 같아야 함으로 0이 된다. (2) d_q>max(BS_l)이면,MINDIST(q, BS_l)는 BS_l에 존재하는 점들 중에서 중앙으로부터 가장 멀리 떨어진 점 v의 d_v, 즉, max(BS_l)와 d_q의 차이가 된다. (3) 마지막으로, d_q<min(BS_l)인 경우, MINDIST(q, BS_l)는 BS_l에 존재하는 점들 중에서 중앙으로부터 가장 가까운 거리에 있는 점 v의 d_v, 즉, min(BS_l)와 d_q의 차이가 된다. 도 5a는 2차원 공간에서, 위와 같이 i=j인 경우를 보여주는 예이다.First (Equation 5a): (1) If the query point is in BS _l , MINDIST (q, BS _l ) is zero because it must be less than or equal to the distance between the query point and any point in BS _l . . (2) If d _q > max (BS _l ), MINDIST (q, BS _l ) is the d _v of the point v farthest from the center among the points present in BS _l , i.e., max (BS _l ) and d _q . It makes a difference. (3) Finally, when d _q <min (BS _l ), MINDIST (q, BS _l ) is the d _v of the point v closest to the center among the points present in BS _l , that is, min (BS _l ) and d _q . 5A illustrates an example in which i = j in the two-dimensional space.

둘째(수학식 5b)로, |i-j|= d 이면, sp_i는 질의 점이 속해 있는 구형 피라미드의 맞은편에 있는 구형 피라미드가 된다. 이 경우 질의 점과 해당 구형 조각 사이의 최소 거리 d_q와 min(BS_l), 그리고, 이것들로 이루어지는 삼각형의 밑변의 길이가 된다. 이는 코사인 제 2법칙으로 구할 수 있다. 먼저, 질의 점과 가장 가까운 인접 구형 피라미드의 한 면에 이르는 거리를라 하면, d_q와에 의해 이루어지는 각는이 된다. 또한, 구형 피라미드의 상단 점의 각은 항상이므로, MINDIST(q, BS_l)는 코사인 제2법칙에 의해이 된다. 도5b는 위와 같이 |i-j|= d인 경우를 보여주는 예이다.Second (Equation 5b), if | ij | = d, sp _i becomes the spherical pyramid opposite the spherical pyramid to which the query points belong. In this case, it is the minimum distance d _q and min (BS _l ) between the query point and the spherical piece, and the length of the base of the triangle. This can be found by the second law of cosine. First, the distance to one side of the adjacent spherical pyramid closest to the point of Let's say d _q Angle made by Is Becomes In addition, the angle of the top point of the spherical pyramid is always Since MINDIST (q, BS _l ) is based on the second law of cosine Becomes 5B shows an example in which | ij | = d as described above.

셋째(수학식 5c)로, 이 경우는 sp_i가 질의 점에 인접해 있는 구형 피라미드이다.와 d_q에 의해 만들어지는 직각 삼각형의 밑변의 길이를라 하면,인 경우, BS_l과 q 사이의 최소 거리는 첫째 경우와 같은 이유로 질의 점 q로부터 sp_i의 수직거리인가 된다. 한편,> max(BS_l)인 경우, MINDIST(q, BS_l)는와 |-max(BS_l)|에 의해 만들어지는 직각 삼각형의 빗변의 길이가 된다. 그리고,< min(BS_l)인 경우는 경우1과 동일한 이유로와 |-min(BS_l)|에 의해 만들어지는 직각 삼각형의 빗변의 길이가 된다.Third (Equation 5c), where sp _i is a spherical pyramid adjacent to the query point. The length of the base of the right triangle created by and d _q Say, , The minimum distance between BS _l and q is the vertical distance of sp _i from the query point q for the same reason as Becomes Meanwhile, > max (BS _l ), MINDIST (q, BS _l ) And | is the length of the hypotenuse of a right triangle created by -max (BS _l ) | And, <min (BS _l ) for the same reason And | is the length of the hypotenuse of a right triangle created by -min (BS _l ) |

4. 질의 점과 객체의 거리4. Distance between query point and object

질의 점(q=[q₀, q₁, ..., q_d-1])과 하나의 객체(p=[p₀, p₁, ...p_d-1]) 사이의 거리(DIST)는 두 점 사이의 거리를 구하는 아래의 수학식 6과 같다.Distance (DIST) between the query point (q = [q ₀ , q ₁ , ..., q _d-1 ]) and one object (p = [p ₀ , p ₁ , ... p _d-1 ]) ) Is given by Equation 6 below to obtain the distance between two points.

표 1은 도 3의 질의 점(q)과 각 구형 피라미드(SP), 구형 조각(BS)의 최소거리 및 해당 객체(OBJ)와의 거리를 수학식 4 내지 수학식 6을 이용하여 계산한 값들을 정리한 것이다.Table 1 shows values obtained by using Equations 4 to 6 to calculate the distance between the query point q of FIG. 3, the minimum distance of each spherical pyramid SP, the spherical fragment BS, and the object OBJ. It is summarized.

5. 최근접 질의 처리5. Nearest Query Processing

그리고, 상기한 수학식 4 내지 수학식 5c를 이용하여, 점진적 최근접 질의를 처리하기 위한 알고리즘은 도6과 같다.And, using the above equations (4) to (5c), the algorithm for processing a gradual closest query is shown in FIG.

도6(알고리즘1)의 줄 1 내지 4에서는 수학식4 (Lemma 1)를 이용하여 질의 점과 각 구형 피라미드 사이의 최소 거리를 계산하여 순위 큐에 삽입한다. 그리고, 줄 6 내지 21은 큐가 empty될 때까지, 큐의 첫 번째 원소를 추출하여 원소의 타입에 맞는 처리를 해준다. 추출된 원소가 구형 피라미드이면 해당 구형 피라미드 안에 있는 구형 조각들과 질의 점과의 최소 거리를 계산하여 이를 다시 큐에 삽입한다. 만약, 추출한 원소의 타입이 구형 조각이면 해당 구형 조각 안에 있는 객체와 질의 점과의 거리를 계산하여 다시 큐에 삽입한다. 그리고, 마지막으로 추출한 원소가 객체이면 해당 객체를 최근접 질의의 결과로 반환한다. 순위 큐는 항상 최소 거리를 가지고 있는 원소가 맨 앞에 있기 때문에 반환된 객체는 질의 점과 가장 가까이에 있는 객체가 된다.In lines 1 to 4 of Fig. 6 (Algorithm 1), the minimum distance between the query point and each spherical pyramid is calculated and inserted into the ranking queue using Equation 4 (Lemma 1). Then, lines 6 to 21 extract the first element of the queue until the queue is empty, and perform processing according to the element type. If the extracted element is a spherical pyramid, the minimum distance between the spherical pieces in the spherical pyramid and the query point is calculated and inserted into the queue again. If the type of the extracted element is a spherical fragment, the distance between the object and the query point in the spherical fragment is calculated and inserted into the queue again. If the last element extracted is an object, the object is returned as the result of the nearest query. Because the rank queue always has the element with the minimum distance at the front, the returned object will be the object closest to the query point.

다음은 도 3의 예에 대하여 상기 도6의 알고리즘 1이 수행되는 동안 순위 큐의 내용을 보여주고 있다.The following shows the contents of the rank queue while Algorithm 1 of FIG. 6 is performed with respect to the example of FIG. 3.

1. Enqueue SP0∼SP3; [SP1,0], [SP2,4], [SP0,21], [SP3,33]1. Enqueue SP0 to SP3; [SP1,0], [SP2,4], [SP0,21], [SP3,33]

2. Dequeue SP1, enqueue BS3, BS4, BS5; [BS4,0], [BS5,2], [SP2,4], [BS3,14],2. Dequeue SP1, enqueue BS3, BS4, BS5; [BS4,0], [BS5,2], [SP2,4], [BS3,14],

[SP0,21], [SP3,33][SP0,21], [SP3,33]

3. Dequeue BS4, enqueuee; [BS5,2], [SP2,4], [BS3,14], [e,19], [SP0,21],3. Dequeue BS4, enqueue e ; [BS5,2], [SP2,4], [BS3,14], [ e , 19], [SP0,21],

[SP3,33][SP3,33]

4. Dequeue BS5, enqueuef; [SP2,4], [f,12], [BS3,14], [e,19], [SP0,21],4. Dequeue BS5, enqueue f ; [SP2,4], [ f , 12], [BS3,14], [ e , 19], [SP0,21],

[SP3,33][SP3,33]

5. Dequeue SP2, enqueue BS6, BS7; [BS7,4], [BS6,8], [f,12], [BS3,14], [e,19],5. Dequeue SP2, enqueue BS6, BS7; [BS7,4], [BS6,8], [ f , 12], [BS3,14], [ e , 19],

[SP0,21], [SP3,33][SP0,21], [SP3,33]

6. Dequeue BS7, enqueueh; [h,6], [BS6,8], [f,12], [BS3,14], [e,19],6. Dequeue BS7, enqueue h ; [ h , 6], [BS6,8], [ f , 12], [BS3,14], [ e , 19],

[SP0,21], [SP3,33][SP0,21], [SP3,33]

7. Dequeueh, reporthas 1st nearest neighbor7.Dequeue h , report h as 1st nearest neighbor

상기 도6의 알고리즘 1은 먼저 질의 점과 각 구형 피라미드 sp₀내지 sp₃사이의 거리를 측정하여 이를 큐에 삽입하면서 시작된다. 상기 알고리즘 1의 순위 큐는 항상 거리를 키값으로 오름차순으로 정렬됨으로 질의 점이 속해 있는 구형 피라미드 sp₁가 큐의 맨 앞에 있게 된다. 다음, 알고리즘 1의 줄 7에서 큐의 첫 번째원소를 추출하고 이것이 구형 피라미드 sp₁이므로 sp₁에 속해있는 구형 조각 BS₃, BS₄, BS₅에 대하여 질의 점과의 거리를 측정하여 큐에 삽입하게 된다. 다시, BS₄를 추출하고 이것이 구형 조각이므로 BS₄에 있는 모든 객체들을 큐에 삽입하게 된다. 이 경우는 하나의 객체만을 가정했으므로 객체e가 큐에 삽입된다. 이런 식으로 알고리즘 1이 진행되며 결국 질의 점과 가장 가까운 객체가 큐의 맨 앞에 있게 된다(상기 예에서는h). 점진적 최근접 질의 처리 알고리즘은 순위 큐를 이용하여 사용자가 원하는 최근접 객체들을 차례대로 추출할 수 있다.Algorithm 1 of FIG. 6 begins by first measuring the distance between a query point and each spherical pyramid sp ₀ through sp ₃ and inserting it into a queue. The ranking queue of Algorithm 1 is always sorted in ascending order by distance, so that the spherical pyramid sp ₁ to which the query point belongs is at the front of the queue. Then, on line 7 of Algorithm 1 extracts the first element in the queue, and this is because older pyramid sp ₁ rectangular pieces that belong to sp ₁ BS _3, BS _4, inserted into the queue by measuring the distance from the query point with respect to the BS ₅ Done. Again, we extract BS ₄ and since it is a spherical piece, we insert all the objects in BS ₄ into the queue. In this case, only one object is assumed, so the object e is inserted into the queue. Algorithm 1 proceeds in this way, so that the object closest to the query point is at the front of the queue ( h in this example). The progressive nearest query processing algorithm can sequentially extract the nearest objects that the user wants using the rank queue.

상기 알고리즘 1의while-루프에서 최근접 객체로 반환하는 객체의 수를 제어하면 간단히 기존의 k-최근접 질의를 처리할 수 있다.By controlling the number of objects returned to the nearest object in the while -loop of Algorithm 1, the existing k-nearest query can be processed simply.

비교 실시예Comparative Example

구형 피라미드 기법을 이용한 최근접 질의 처리의 효율성을 보이기 위하여 R*-tree와 X-tree와의 비교 실험을 비교 실시예로 제시한다.In order to show the efficiency of the nearest query processing using the spherical pyramid scheme, a comparative experiment between R * -tree and X-tree is presented as a comparative example.

공정한 실험을 위하여 R*-tree와 X-tree상에서 점진적 최근접 질의 처리 알고리즘을 구현하였으며 k-최근접 질의를 처리할 수 있도록 알고리즘을 수정하였다. 모든 실험은 128M의 주메모리와 10GB의 보조 기억장치를 가진 Sun Sparc 20 웍스테이션 상에서 수행되었으며, 블럭의 크기는 모든 색인 구조에서 4096 Byte이고, 블럭 사용률은 65%로 고정시켜 실시하였다.For the fair experiment, we implemented a gradual nearest query processing algorithm on R * -tree and X-tree and modified the algorithm to process k-nearest query. All experiments were performed on a Sun Sparc 20 workstation with 128M of main memory and 10GB of auxiliary storage. The block size was 4096 bytes in all index structures and the block utilization was fixed at 65%.

실시예 1 (인위적 데이터를 이용한 비교 실시예)Example 1 (comparative example using artificial data)

인위적으로 생성된 데이터들은 20,000∼100,000개의 균등하게 분포된 데이터들이며 각 데이터의 차원은 4, 8, 12, 16, 20, 24이다.Artificially generated data are 20,000 to 100,000 evenly distributed data, and the dimension of each data is 4, 8, 12, 16, 20, 24.

첫 번째 실시예는 데이터의 차원을 변화시키면서 10개의 최근접 객체(10-NN)를 찾는데 소요되는 페이지 접근 횟수, CPU 사용 시간, 전체 질의 응답 시간을 측정하였다. 100개의 무작위로 선출된 질의 점을 사용하였으며 모든 결과는 100번의 질의 처리 결과에 대한 평균이다.In the first embodiment, the number of page accesses, the CPU usage time, and the overall query response time were measured to find 10 nearest objects (10-NN) while changing the dimension of the data. 100 randomly selected query points were used and all results are averaged over 100 query processing results.

도7a 내지 도7c는 첫 번째 실시예의 결과 그래프이다. 전반적으로 데이터의 차원이 증가할수록 구형 피라미드 기법을 이용한 최근접 질의 처리가 기존의 R*-tree나 X-tree에 비해 효율적임을 알 수 있다. 도 7a는 페이지 접근 횟수를 보여주고 있는데, R*-tree는 12차원 이상에서는 거의 모든 내부 노드와 단말 노드를 접근해야 함을 볼 수 있다. X-tree는 비록 R*-tree보다는 적은 수의 페이지를 접근하나, R*-tree와 마찬가지로 고차원으로 올라갈수록 대부분의 페이지를 접근함을 알 수 있다. 구형 피라미드 기법도 비록 차원이 증가할수록 데이터 페이지 접근 횟수가 선형적으로 증가하나 항상 R*-tree나 X-tree보다 적음을 발견할 수 있었으며, 특히 24차원에서 R*-tree보다 34%, X-tree보다는 31% 정도의 성능향상이 있다. 최근접 질의 처리 알고리즘은 특정 거리 측정 함수(이를테면, R*-tree의 min-max distance)를 이용하여 노드들을 비교하고 정렬하여 처리하기 때문에 조인과 같은 다른 종류의 질의 처리에 비하여 CPU 사용 시간이 많이 소요된다[3].7A-7C are graphs of the results of the first embodiment. In general, as the dimension of data increases, it can be seen that the nearest query processing using the spherical pyramid technique is more efficient than the existing R * -tree or X-tree. FIG. 7A shows the number of page accesses. It can be seen that R * -tree should access almost all internal nodes and terminal nodes in more than 12 dimensions. X-trees access fewer pages than R * -trees, but like R * -trees, you can see that most pages go higher up. The spherical pyramid technique also found that although the number of data page accesses increased linearly with increasing dimensions, there were always fewer than R * -trees or X-trees, especially in 24 dimensions, 34% more than R * -trees. There is a 31% improvement in performance over the tree. The closest query processing algorithm uses more distance-using functions (such as min * max distances in R * -trees) to compare, sort, and process the nodes, which results in more CPU usage than other types of query processing, such as joins. Takes [3].

도 7b는 CPU 사용 시간을 보여주고 있다. 비록 구형 피라미드 기법을 이용한 최근접 질의 처리 알고리즘은 각 구형 조각들과 질의 점 사이의 최소 거리를 측정하는 과정이 복잡해 보이지만 간단한 비교 연산을 통하여 각 경우를 쉽게 검사하여 최소 거리를 측정할 수 있다. 이에 반하여, R*-tree의 경우에는 고차원 공간으로 올라갈수록 영역들 간의 많은 겹침(overlap)이 발생하며 이에 따라 거리를 비교하고 계산하는데 많은 CPU 시간이 사용되는 단점이 있다. 그러나 구형 피라미드 기법은 그 구조상 겹침이 없는 공간 분할 방법을 이용하기 때문에 영역 겹침에서 오는 부담이 없다. 24차원의 경우, CPU 사용 시간에 있어서 R*-tree보다는 38%, X-tree보다는 35%정도의 성능향상이 있다.7B shows the CPU usage time. Although the nearest query processing algorithm using the spherical pyramid technique seems to be complicated to measure the minimum distance between each spherical fragment and the query point, it is possible to easily check each case and measure the minimum distance through a simple comparison operation. On the other hand, in the case of R * -tree, the higher the space, the more overlap (overlap) between the region occurs and accordingly has a disadvantage that a lot of CPU time is used to compare and calculate the distance. However, the spherical pyramid technique does not have the burden of overlapping regions because it uses a spatial partitioning method without overlapping in structure. In the case of 24D, the CPU usage time is improved by 38% over R * -tree and 35% over X-tree.

마지막으로, 도 7c는 전체 질의 응답 시간을 보여주고 있는데, 페이지 접근 횟수나 CPU 사용 시간과 거의 유사한 결과를 보여주고 있으며, 24차원일 때, R*-tree보다는 45%, X-tree보다는 37%정도의 성능 향상이 있다.Finally, Figure 7c shows the total query response time, which is almost the same as the number of page accesses or the CPU usage time. In 24D, 45% than R * -tree and 37% than X-tree. There is a degree of performance improvement.

실시예 2 (실제 데이터를 이용한 비교 실시예)Example 2 (comparative example using actual data)

본 실시예는 데이터베이스의 크기를 변화시키면서 성능의 변화를 측정한 것이다. 차원은 16차원으로 고정시켜 수행하였다. 도8a 내지 도8c는 본 실시예의 결과를 보여주고 있으며, 첫 번째 실시예와 거의 유사한 결과를 확인할 수 있었다. 16차원 100,000개의 데이터에 대해서 구형 피라미드 기법을 이용한 최근접 질의 처리가 R*-tree에 비하여 47%, X-tree보다는 35%정도 빨리 질의를 처리함을 알 수 있었다.This embodiment measures the change in performance while changing the size of the database. Dimensions were performed fixed to 16 dimensions. 8A to 8C show the results of the present embodiment, and the results were almost similar to those of the first embodiment. It can be seen that the nearest-neighbor query processing using spherical pyramid technique for the 16-dimensional 100,000 data is processed 47% faster than the R * -tree and 35% faster than the X-tree.

이 실시예에서는 실제 CAD객체의 영역을 표시하는 16차원 푸리에(Fourier)점 100,000개를 사용하였다. 질의 객체는 실제 데이터에서 무작위로 선출한 100개의 16차원 푸리에 점들을 사용하였으며 모든 결과는 100번의 질의에 대한 평균값이다.In this example, 100,000 16-dimensional Fourier points representing the area of the actual CAD object were used. The query object uses 100 16-dimensional Fourier points randomly selected from the actual data, and all results are averaged for 100 queries.

최근접 객체의 개수(k)를 1에서 10까지 증가시키면서 질의 처리에 필요한 페이지 접근 횟수, CPU 사용 시간, 전체 응답 시간을 측정하였다.We increased the number of nearest objects (k) from 1 to 10 and measured the number of page accesses, CPU usage time, and overall response time needed for query processing.

도9a 내지 도9c는 본 실시예의 결과를 보여준다.9A-9C show the results of this example.

도9a는 k가 변할 때, X-tree, R*-tree, 구형 피라미드 기법을 이용한 최근접 질의 처리, 각각의 페이지 접근 횟수를 나타낸다.Fig. 9A shows the number of page accesses and the nearest query processing using the X-tree, R * -tree, and the spherical pyramid technique when k changes.

k=1일 때, 구형 피라미드 기법을 이용한 최근접 질의 처리 알고리즘은 단지 1개의 데이터 페이지만을 접근함으로써 질의 객체와 가장 가까운 객체(즉, 질의 객체 자체)를 검색함을 알 수 있었다. 그러나, X-tree는 13.45개의 페이지를 접근하고, R*-tree는 16.43개의 페이지를 접근해야 질의를 처리할 수 있었다.When k = 1, it was found that the nearest query processing algorithm using the spherical pyramid technique retrieves the object closest to the query object (ie, the query object itself) by accessing only one data page. However, the X-tree had to access 13.45 pages, and the R * -tree had to access 16.43 pages to process the query.

또한, k=3 이상인 경우에 X-tree나 R*-tree는 거의 모든 데이터 페이지를 접근해야 함을 알 수 있었다. 구형 피라미드의 경우에도 k=3 이상이 되면 k값이 1이나 2인 경우보다는 많은 수의 데이터 페이지를 접근해야 하지만 항상 X-tree나 R*-tree보다는 적은 수의 데이터 페이지를 접근함을 알 수 있었다.Also, when k = 3 or more, it can be seen that X-tree or R * -tree should access almost all data pages. Even in the case of spherical pyramids, if k = 3 or more, we must access more data pages than if k is 1 or 2, but we always access fewer data pages than X-tree or R * -tree. there was.

CPU 사용 시간이나 전체 질의 응답 시간도 역시 페이지 접근 횟수와 유사한 경향을 보여주고 있다. k=10일 때, 구형 피라미드 기법을 이용한 최근접 질의 처리 알고리즘은 전체 질의 응답 시간에 있어서 R*-tree보다는 67%, X-tree보다는 60% 정도 더 효율적이다.CPU usage time and overall query response time also tend to be similar to the number of page accesses. When k = 10, the nearest query processing algorithm using the spherical pyramid scheme is 67% more efficient than R * -tree and 60% more than X-tree in terms of overall query response time.

즉, 인위적 데이터를 이용한 실시예와 비교해 보면, 구형 피라미드 기법을 이용한 최근접 질의 처리 알고리즘은 균등하게 분포된 데이터 집합보다 실제 데이터 집합에서 더 좋은 성능을 보인다.That is, compared to the embodiment using the artificial data, the nearest query processing algorithm using the spherical pyramid technique shows better performance in the actual data set than the uniformly distributed data set.

이상과 같이, 본 발명에서는 기존의 구형 피라미드 기법을 이용하여 점진적 최근접 질의 처리 방법을 제시하였다. 또한, 다양한 실시예를 통하여 이 방법(알고리즘)이 기존의 X-tree나 R*-tree의 점진적 최근접 질의 처리 알고리즘보다 효율적임을 알 수 있다. 구형 피라미드 기법은 그 구조가 기존의 R*-tree 기반의 다른 색인 구조에 비하여 단순하며 B+-tree를 그대로 이용할 수 있기 때문에 B+-tree가 가지는 빠른 삽입과 검색, 삭제의 장점을 그대로 이용할 수 있다.As described above, the present invention proposes a method for processing a gradual nearest neighbor query using a conventional spherical pyramid technique. In addition, through various embodiments, it can be seen that this method (algorithm) is more efficient than the conventional X-tree or R * -tree incremental nearest query processing algorithm. The spherical pyramid technique is simpler than other R * -tree-based index structures and can use the B + -tree as it is, so it can take advantage of the fast insertion, search, and deletion of B + -tree.

또한, 구형 피라미드 기법은 현재 사용되는 어떤 종류의 데이터베이스 시스템 상에서도 B+-tree를 이용하여 쉽게 구현될 수 있으며, 따라서 동시성 제어나 회복 기법도 그대로 이용할 수 있는 효과가 있다.In addition, the spherical pyramid technique can be easily implemented using B + -tree on any type of database system currently used, and thus, the concurrency control or recovery technique can be used as it is.

본 발명에 의한 구형 피라미드 기법을 이용한 최근접 질의 처리 방법을 통하여, 고차원 데이터를 효율적으로 색인하고 유사 검색에 많이 사용되는 구형태의 질의를 더욱 효율적으로 처리할 수 있게 된다. 더 나아가, 본 발명에 의한 구형 피라미드 기법을 이용하여, 질의 점(질의 객체)와 가장 유사한 k-개의 객체를 검색하는 k-최근접 질의 처리를 위한 알고리즘을 더욱 효율적으로 수행하게 된다.Through the closest query processing method using the spherical pyramid method according to the present invention, it is possible to efficiently index high-dimensional data and process more efficiently spherical queries that are frequently used for similar searches. Furthermore, using the spherical pyramid scheme according to the present invention, the algorithm for k-nearest query processing that searches k-objects most similar to the query point (query object) can be performed more efficiently.

이상에서 비교 실시예를 통하여, 본 발명에 의한 구형 피라미드 기법을 이용한 최근접 질의 처리 방법이, R*-tree와 X-tree상에서 구현된 점진적 k-최근접 질의 처리 방법보다, 페이지 접근 횟수, CPU 사용시간, 전체 응답 시간 등의 측면에서 더욱 효율적인 효과가 있음을 밝혔다.As described above, the closest query processing method using the spherical pyramid scheme according to the present invention has a higher number of page accesses and a CPU than the progressive k-nearest query processing method implemented on R * -tree and X-tree. In terms of usage time, overall response time, etc., it is more effective.

Claims

a first step of dividing the d-dimensional data space into 2d spherical pyramids;

Dividing the divided spherical pyramid into spherical pieces again;

A third step of calculating a minimum distance between a query point and the spherical pyramid and inserting it into the ranking queue in ascending order;

If the first element of the rank queue is extracted, and the extracted first element is a spherical pyramid, the minimum distance between the spherical fragment and the query point in the spherical pyramid is calculated, and the spherical fragment is reinserted into the rank queue. If the first element is a spherical fragment, the distance between the object and the query point in the spherical fragment is calculated, and the object is reinserted into the ranking queue. If the extracted first element is an object, the object is returned as the result of the nearest query. The fourth step of the; comprising, the nearest query processing method using the spherical pyramid technique.

The method of claim 1,

In the third step, the minimum distance between the query point (q) and the spherical pyramid (sp _i ) MINDIST (q, sp _i ), if sp _j (j <d) the spherical pyramid to which the query point belongs,

The nearest query processing method using the spherical pyramid technique, which is defined as follows.

The method according to claim 1 or 2,

In the fourth step, the query point _{(q) (q, BS l} ) and the minimum distance from the rectangle pyramid spherical piece (BS _l) existing in the (sp _i) MINDIST is a rectangular pyramid belonging query point sp _j La when doing,

If the spherical fragment is in the spherical pyramid to which the query point belongs (i = j),

Same as

If the spherical fragment is present in the spherical pyramid opposite the point of the vagina (| i-j | = d), Is the distance from the point of inquiry to one side of the nearest pyramid, With d _q One angle of a right triangle created by ),

Same as

If the spherical piece is in a spherical pyramid that is adjacent to the query point, The length of the base of a right triangle formed by and d _q When I say