KR20030013619A

KR20030013619A - Data of insert and search method of parallel high-dimensional index structure

Info

Publication number: KR20030013619A
Application number: KR1020010047715A
Authority: KR
Inventors: 박춘서; 김창수; 김경배; 신범주; 유재수; 송석일; 신재룡
Original assignee: 한국전자통신연구원
Priority date: 2001-08-08
Filing date: 2001-08-08
Publication date: 2003-02-15

Abstract

PURPOSE: A method for inserting and searching data of a parallel higher index structure is provided to search a higher index effectively by transforming a higher index structure which uses a parallel property of a SAN(Storage Area Network) thereby increasing a fan out, reducing a height of a tree, and maximizing a parallel property of an input/output at searching a range in searching a similarity. CONSTITUTION: For performing a partial K-most access query in a main server and all sub servers simultaneously, if a K-access query is entered, all servers access to a root node(1300). It is judged whether the accessed root node is a non-terminal node or not(1301). If the accessed root node is a non-terminal node, all servers calculates a similarity with a query and an entry, and each entry is sorted in a list in order of similarity(1302). All servers access to a child node of the first entry stored in the list in parallel and allocate the current node as a root node(1303), and the current stage is returned to the above stage (1301) for judging a non-terminal node of not of the root node.

Description

Data of insert and search method of parallel high-dimensional index structure}

본 발명은 병렬 고차원 색인 구조의 데이터 삽입 및 탐색 방법에 관한 것으로, 보다 상세하게는 SAN(Storage Area Network) 환경에서 대용량의 고차원 데이터들의 저장을 효율적으로 하기 위하여 병렬 고차원 색인 구조의 데이터를 삽입 및 탐색하는 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.The present invention relates to a data insertion and retrieval method of a parallel high dimensional index structure, and more particularly, to insert and retrieve data of a parallel high dimensional index structure in order to efficiently store large amounts of high dimensional data in a storage area network (SAN) environment. And a computer-readable recording medium having recorded thereon a program for realizing the method.

데이터 전송 네트워크 환경 상에서 데이터가 고차원이 됨에 따라 이를 수용하기 위해 다양한 고차원의 색인 구조의 데이터를 삽입 및 탐색하는 방법이 제안되었다.As data becomes high-dimensional in a data transmission network environment, a method of inserting and searching data having various high-level index structures has been proposed to accommodate this.

그러나, 최근까지 제안되어 있는 고차원 색인 구조는 차원이 증가함에 따라 팬-아웃이 감소되어 검색 성능이 현격히 떨어지는 문제점을 가지고 있으며, 특히 병렬 환경을 이용한 병렬 다차원 색인 구조는 SAN 환경에서의 이미지, 동영상, 캐드 데이터와 같은 대용량의 고차원 데이터를 수용하기가 어려웠다. 또한, 유사도 검색에 있어서 중요한 질의 형태인 K-최근접 질의에 대해서 고려되지 않아 고차원 색인을 효율적으로 검색할 수 있는 적절한 색인 구조의 필요성이 대두되고 있다.However, the recently proposed high-dimensional index structure has a problem that the search performance decreases significantly as the fan-out is reduced as the dimension is increased. In particular, the parallel multi-dimensional index structure using the parallel environment has images, video, It was difficult to accommodate large amounts of high-dimensional data such as CAD data. In addition, since the K-nearest query, which is an important query form in similarity search, is not considered, there is a need for an appropriate index structure that can efficiently search a high-dimensional index.

이에 본 발명은, 상기와 같은 요구에 부응하기 위해 제안된 것으로, SAN 구조등의 병렬성을 이용하는 고차원 색인 구조를 변형시켜서 팬-아웃을 증가시키고 트리의 높이를 줄이며, 유사도 검색에서도 범위 탐색시 입/출력의 병렬성을 최대화하여 고차원 색인을 효율적으로 검색할 수 있도록 하기 위한 데이터 전송 네트워크(SAN) 환경 등에서 병렬 고차원 색인 구조의 데이터를 삽입 및 탐색하는 방법과, 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공함에 그 목적이 있다.Accordingly, the present invention has been proposed to meet the above requirements, and by modifying a high-dimensional index structure using parallelism such as a SAN structure to increase the fan-out and reduce the height of the tree, even when searching the range in similarity search With a computer that records and inserts data in a parallel high dimensional index structure in a data transmission network (SAN) environment to maximize the parallelism of the output so that the high dimensional index can be efficiently searched, and a program for realizing the method. The purpose is to provide a readable recording medium.

도 1은 본 발명이 적용되는 데이터 전송 네트워크(SAN)의 개념을 설명하는 도면.1 is a diagram for explaining the concept of a data transmission network (SAN) to which the present invention is applied.

도 2는 본 발명이 적용되는 데이터 전송 네트워크 시스템의 일실시예 구성도.2 is a configuration diagram of an embodiment of a data transmission network system to which the present invention is applied.

도 3은 도 2에 도시된 각 디스크 그룹에서의 트리 구조에 대한 상세 구성도.FIG. 3 is a detailed configuration diagram of the tree structure in each disk group shown in FIG.

도 4는 도 2에 도시된 시스템에서 병렬 고차원 색인 구조의 삽입 과정을 설명하는 일실시예 도면.4 is a diagram illustrating an insertion process of a parallel high-dimensional index structure in the system shown in FIG.

도 5 및 도 6은 도 4에 도시된 삽입 과정 중 디스크내에서의 노드 분할 과정을 설명하는 일실시예 도면.5 and 6 are diagrams illustrating one embodiment of a node partitioning process in a disk during the insertion process shown in FIG.

도 7은 본 발명에 따른 데이터 전송 네트워크(SAN) 환경에서 병렬 고차원 색인 구조의 데이터 삽입 방법에 대한 일실시예 흐름도.7 is a flow diagram of an embodiment of a data insertion method of a parallel high-dimensional index structure in a data transmission network (SAN) environment in accordance with the present invention.

도 8은 본 발명에 따른 데이터 전송 네트워크(SAN) 환경에서 병렬 고차원 색인 구조의 데이터 탐색 중 디스크 그룹에서의 범위 질의 탐색 과정을 설명하는 일실시예 도면.8 is a diagram illustrating a range query search process in a disk group during data search of a parallel high-dimensional index structure in a data transmission network (SAN) environment according to the present invention.

도 9는 본 발명에 따른 데이터 전송 네트워크(SAN) 환경에서 병렬 고차원 색인 구조의 데이터 탐색 중 디스크 그룹에서의 범위 질의 탐색 과정을 설명하는 다른 일실시예 도면.FIG. 9 is another embodiment illustrating a range query search process in a disk group during data search of a parallel high-dimensional index structure in a data transmission network (SAN) environment according to the present invention. FIG.

도 10은 본 발명에 따른 데이터 전송 네트워크(SAN) 환경에서 병렬 고차원 색인 구조의 데이터 탐색 방법에 대한 제1 실시예 흐름도.10 is a flowchart of a first embodiment of a data searching method of a parallel high-dimensional index structure in a data transmission network (SAN) environment according to the present invention.

도 11은 본 발명에 따른 데이터 전송 네트워크(SAN) 환경에서 병렬 고차원 색인 구조의 데이터 탐색 방법에 대한 제2 실시예 흐름도.11 is a flowchart of a second embodiment of a data searching method of a parallel high-dimensional index structure in a data transmission network (SAN) environment according to the present invention.

도 12는 본 발명에 따른 데이터 전송 네트워크(SAN) 환경에서 병렬 고차원 색인 구조의 데이터 탐색 방법에 대한 제3 실시예 흐름도.12 is a flowchart of a third embodiment of a data searching method of a parallel high-dimensional index structure in a data transmission network (SAN) environment according to the present invention.

도 13은 본 발명에 따른 데이터 전송 네트워크(SAN) 환경에서 병렬 고차원 색인 구조의 데이터 탐색 방법에 대한 제4 실시예 흐름도.13 is a flowchart of a fourth embodiment of a data searching method of a parallel high-dimensional index structure in a data transmission network (SAN) environment according to the present invention.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

200 : 주 서버 201 : 제1 부 서버200: primary server 201: part 1 server

202 : 제2 부 서버 203 : SAN(Storage Area Network)202: Part 2 server 203: Storage Area Network (SAN)

204 : 제1 디스크 저장 그룹 205 : 제2 디스크 저장 그룹204: first disk storage group 205: second disk storage group

206 : 제3 디스크 저장 그룹206: third disk storage group

상기와 같은 목적을 달성하기 위한 본 발명은, 데이터 전송 네트워크(SAN) 환경 등에서의 병렬 고차원 색인 시스템에 적용되는 병렬 고차원 색인 구조의 데이터 삽입 방법에 있어서, 주 서버가 입력되는 새로운 엔트리를 각 디스크 그룹에 할당하고, 자신을 포함하여 결정된 적어도 하나의 부 서버로 새로운 엔트리를 전달하는 제 1 단계; 상기 적어도 하나의 부 서버 및 주 서버는 전달받은 새로운 엔트리의 삽입을 위해 트리 순회를 통해서 노드를 선택하고, 상기 선택한 노드에 엔트리를 삽입할 여유 공간이 있는지를 판단하는 제 2 단계; 상기 제 2 단계의 판단 결과, 노드에 여유 공간이 있을 경우 주 서버 및 적어도 하나의 부 서버는 삽입할 노드의 페이지 여유 공간에 새로운 엔트리를 삽입하는 제 3 단계; 및 상기 제 2 단계의 판단 결과, 노드에 여유 공간이 없을 경우 주 서버 및 적어도 하나의 부 서버는 현재 분할이 발생한 레벨의 페이지들이 분포하는 디스크에 페이지를 할당한 후 노드에 새로운 엔트리를 삽입하고, 부모 노드에 최소 경계 영역(Minimum Bounding Region)의 변경을 반영하는 제 4 단계를 포함하는 것을 특징으로 한다.In order to achieve the above object, the present invention provides a data entry method of a parallel high dimensional index structure applied to a parallel high dimensional index system in a data transmission network (SAN) environment. A first step of allocating a new entry to the at least one secondary server determined by itself and including itself; A second step of the at least one secondary server and the primary server selecting a node through tree traversal for insertion of a received new entry, and determining whether there is free space for inserting an entry in the selected node; A third step of inserting a new entry into the page free space of the node to be inserted by the main server and the at least one sub server when there is free space in the node as a result of the determination of the second step; And as a result of the determination of the second step, when there is no free space in the node, the primary server and the at least one secondary server allocate a page to the disk where the pages of the level where the current partition occurs are distributed, and insert a new entry in the node. And a fourth step of reflecting the change of the minimum bounding region in the parent node.

또한, 본 발명은, 데이터 전송 네트워크(SAN) 환경에서의 병렬 고차원 색인시스템에 적용되는 병렬 고차원 색인 구조의 데이터 탐색 방법에 있어서, 주 서버 및 적어도 하나의 부 서버가 루트 노드 정보를 입/출력 스케쥴러에 전달하고, 상기 전달된 입/출력 스케쥴러에서 관리하는 정보가 있는지를 판단하는 제 1 단계;In addition, the present invention is a data searching method of a parallel high dimensional index structure applied to a parallel high dimensional index system in a data transmission network (SAN) environment, the primary server and at least one secondary server input / output scheduler A first step of determining whether there is information managed by the delivered I / O scheduler;

상기 제 1 단계에서 판단한 결과, 전달된 입/출력 스케쥴러에서 관리하는 정보가 없을 경우, 적어도 하나의 부 서버가 범위 질의한 결과를 주 서버로 전달하고, 상기 주 서버는 적어도 하나의 부 서버에서 전달받은 범위 질의 결과와 자신이 찾은 범위 질의 결과를 취합해서 클라이언트에게 취합된 질의 결과를 전달하는 제 2 단계; 및 상기 제 1 단계에서 판단한 결과, 입/출력 스케쥴러에서 관리하는 정보가 있을 경우는 상기 주 서버 및 적어도 하나의 부 서버가 입/출력 스케쥴러로부터 여러 노드를 병렬로 접근하여 현재 노드와 단말 노드가 동일한지 여부에 따라 범위 질의 탐색을 수행하는 제 3 단계를 포함하는 것을 특징으로 한다.As a result of the determination in the first step, when there is no information managed by the delivered I / O scheduler, at least one secondary server delivers the range query result to the primary server, and the primary server delivers the at least one secondary server. A second step of collecting the received range query result and the range query result found by the user and delivering the collected query result to the client; And as a result of the determination in the first step, when there is information managed by an input / output scheduler, the primary server and at least one secondary server access several nodes in parallel from the input / output scheduler so that the current node and the terminal node are the same. And a third step of performing a range query search according to whether or not there is a request.

한편, 본 발명은 데이터 전송 네트워크(SAN) 환경에서 병렬 고차원 색인 시스템에 적용되는 병렬 고차원 색인 구조의 데이터 탐색 방법에 있어서, 주 서버 및 적어도 하나의 부 서버가 루트 노드에 접근을 하여, 접근한 루트 노드가 비 단말 노드인지를 판단하는 제 1 단계; 상기 제 1 단계의 판단 결과, 접근한 루트 노드가 비 단말 노드일 경우에는 주 서버 및 적어도 하나의 부 서버가 질의와 엔트리와의 유사도를 계산하고, 상기 계산된 각 유사도를 리스트에 유사도 순으로 정렬하며, 리스트에 저장된 첫 번째 엔트리의 자식 노드를 병렬로 접근해서 현재 노드를 루트 노드로 할당하는 제 2 단계; 상기 제 1 단계의 판단 결과, 접근한 루트 노드가 비 단말 노드가 아닐 경우에는 주 서버 및 적어도 하나의 부 서버가 질의와 객체의 유사도를 계산하여 노드에서 K-최근접 거리보다 작거나 같은 값만을 추출하여 결과 집합에 저장하는 제 3 단계; 상기 결과 집합에 저장된 결과의 개수가 K(k는 자연수)개 보다 크거나 같고 그리고 K 번째 결과가 리스트의 첫 번째 엔트리의 유사도 값보다 작거나 같은지를 판단하는 제 4 단계; 상기 제 4 단계의 판단 결과, 저장된 결과의 개수가 K 개 보다 크거나 같고 그리고 K 번째 결과가 리스트의 첫 번째 엔트리의 유사도 값보다 작거나 같을 경우에는 적어도 하나의 부 서버가 결과 K 개를 주 서버에 전달하고, 주 서버가 적어도 하나의 부 서버에서 전달되는 결과 K와 자신이 찾은 결과를 취합하여 유사도 순으로 정렬하고, K 개의 최종 결과를 선택하여 클라이언트에 반환하는 제 5 단계; 및 상기 제 4 단계의 판단 결과, 저장된 결과의 개수가 K 개 보다 크거나 같지 않거나 또는 K 번째 결과가 리스트의 첫 번째 엔트리의 유사도 값보다 작거나 같지 않을 경우에는 주 서버 및 적어도 하나의 부 서버가 리스트에 저장된 첫 번째 엔트리의 자식 노드를 병렬로 접근해서 현재 노드를 루트 노드로 할당하는 제 6 단계를 포함하는 것을 특징으로 한다.On the other hand, the present invention is a data search method of a parallel high-dimensional index structure applied to a parallel high-dimensional index system in a data transmission network (SAN) environment, the main server and at least one secondary server to access the root node, the root accessed Determining whether the node is a non-terminal node; As a result of the determination in the first step, when the accessed root node is a non-terminal node, the primary server and at least one secondary server calculate similarities between the query and the entries, and sort the calculated similarities in the order of similarity in the list. And a second step of accessing the child nodes of the first entry stored in the list in parallel and allocating the current node as the root node; As a result of the determination of the first step, when the approaching root node is not a non-terminal node, the primary server and at least one secondary server calculate the similarity between the query and the object so that only a value less than or equal to the K-nearest distance from the node is calculated. Extracting and storing the result set in a result set; A fourth step of determining whether the number of results stored in the result set is greater than or equal to K (k is a natural number) and the K th result is less than or equal to the similarity value of the first entry of the list; As a result of the determination in the fourth step, if the number of stored results is greater than or equal to K and the Kth result is less than or equal to the similarity value of the first entry in the list, the at least one secondary server returns K primary results. A fifth step in which the primary server collects the results K and the results found by the primary server from the at least one secondary server, sorts them in order of similarity, and selects K final results and returns them to the client; And if the number of stored results is not greater than or equal to K or the Kth result is not less than or equal to the similarity value of the first entry in the list, the primary server and the at least one secondary server are determined. And a sixth step of allocating the current node as the root node by accessing the child nodes of the first entry stored in the list in parallel.

또한, 본 발명은, 데이터 전송 네트워크(SAN) 환경에서 병렬 고차원 색인 시스템에 적용되는 병렬 고차원 색인 구조의 데이터 탐색 방법에 있어서, 상기 주 서버가 입력되는 K-접근 질의를 루트 노드에 접근하고, 이 접근된 루트 노드가 비 단말노드인지를 판단하는 제 1 단계; 상기 제 1 단계의 판단 결과, 접근된 루트 노드가 비 단말 노드일 경우에는 주 서버가 질의와 엔트리와의 유사도를 계산하고, 각 유사도를 리스트에 유사도 순으로 정렬하고, 상기 리스트에 저장된 첫 번째 엔트리의 자식 노드를 병렬로 접근하여 현재 노드를 루트 노드로 할당하는 제 2 단계; 상기 제 1 단계에서 판단 결과, 접근된 루트 노드가 비 단말 노드가 아닐 경우에는 주 서버가 질의와 객체의 유사도를 계산하여 노드에서 K-최근접 거리보다 작거나 같은 값만을 추출하여 결과 집합에 저장하고, 저장된 결과의 개수가 K 개 보다 크거나 같은지를 판단하는 제 3 단계; 상기 제 3 단계에서 판단한 결과, 결과 집합에 저장된 결과의 개수가 K 개 보다 크거나 동일할 경우에는 주 서버가 K 번째 결과 객체와 질의와의 유사도를 계산하여 범위 질의 형태로 변환하여, 상기 변환된 범위 질의를 적어도 하나의 부 서버로 전달하는 제 4 단계; 상기 변환된 범위를 전달받은 적어도 하나의 부 서버가 범위 질의를 수행하여 병렬성을 최대화한 검색 결과를 주 서버로 전달하고, 주 서버가 전달받은 검색 결과와 자신의 얻은 결과를 취합하여 유사도 순으로 정렬하여 최종 K 개의 결과를 얻고, 최종 K 개의 결과를 클라이언트에게 반환하는 제 5 단계; 및 상기 제 3 단계의 판단 결과, 결과 집합에 저장된 결과의 개수가 K 개 보다 크거나 동일하지 않을 경우에는 주 서버가 리스트에 저장된 첫 번째 엔트리의 자식 노드를 병렬로 접근해서 현재 노드를 루트 노드로 할당하는 제 6 단계를 포함하는 것을 특징으로 한다.In addition, the present invention, in a data search method of a parallel high-dimensional index structure applied to a parallel high-dimensional index system in a data transmission network (SAN) environment, the main server inputs a K-access query to the root node, Determining whether the accessed root node is a non-terminal node; As a result of the determination of the first step, if the accessed root node is a non-terminal node, the main server calculates the similarity between the query and the entries, sorts each similarity in the order of similarity in the list, and stores the first entry in the list. Accessing the child nodes of the second node in parallel and allocating the current node as a root node; As a result of the determination in the first step, when the accessed root node is not a non-terminal node, the main server calculates the similarity between the query and the object, extracts only a value less than or equal to the K-nearest distance from the node, and stores the result in the result set. Determining whether the number of stored results is greater than or equal to K; As a result of the determination in the third step, when the number of results stored in the result set is greater than or equal to K, the main server calculates the similarity between the K th result object and the query and converts it into a range query form. Transmitting a range query to at least one secondary server; At least one sub-server receiving the converted range performs a range query to deliver a search result of maximizing parallelism to the main server, and collects the search results received from the main server and the obtained results and sorts them in similarity order. Obtaining a final K results and returning the final K results to the client; And when the number of results stored in the result set is not greater than or equal to K as a result of the determination in the third step, the main server accesses the child nodes of the first entry stored in the list in parallel and makes the current node the root node. And a sixth step of allocating.

또한, 본 발명은, 데이터 전송 네트워크(SAN) 환경에서 병렬 고차원 색인 시스템에 적용되는 병렬 고차원 색인 구조의 데이터 탐색 방법에 있어서, 주 서버 및 적어도 하나의 부서가 K-접근 질의에 대해 루트 노트에 접근하여 루트 노드가 비 단말 노드인지를 판단하는 제 1 단계; 상기 제 1 단계의 판단 결과, 루트 노트가 비 단말 노드일 경우에는 주 서버 및 적어도 하나의 부 서버가 질의와 엔트리와의 유사도를 계산하고, 각 유사도를 리스트에 유사도 순으로 정렬하며, 리스트에저장된 첫 번째 엔트리의 자식 노드를 병렬로 접근해서 현재 노드를 루트 노드로 할당하는 제 2 단계; 상기 제 1 단계의 판단 결과, 루트 노드가 비 단말 노드가 아닐 경우에는 주 서버 및 적어도 하나의 부 서버가 질의와 객체의 유사도를 계산하여 노드에서 K-최근접 거리보다 작거나 같은 값만을 추출하여 결과 집합에 저장하고, 저장된 결과의 개수가 K 개 보다 크거나 같은지를 판단하는 제 3 단계; 상기 3 단계의 판단 결과, 결과의 개수가 K 개 보다 크거나 동일하지 않을 경우에는 주 서버 및 적어도 하나의 부 서버가 리스트에 저장된 첫 번째 엔트리의 자식 노드를 병렬로 접근해서 현재 노드를 루트 노드로 할당하는 제 4 단계; 상기 제 3 단계의 판단 결과, 결과 값이 K 보다 크거나 동일할 경우에는 주 서버 및 적어도 하나의 부 서버가 K 번째 결과 객체와 질의와의 유사도를 계산하여 범위 질의 형태로 변환하고, 적어도 하나의 부 서버가 변환된 범위 질의를 주 서버에 전달하는 제 5 단계; 상기 변환된 범위 질의를 전달받은 주 서버가 전달된 범의 질의들과 자신이 변환한 범위 질의 중 가장 범위가 작은 범위 질의를 선택하여 적어도 하나의 부 서버에 전달하고, 상기 범위 질의를 전달받은 적어도 하나의 부 서버가 범위 질의를 수행하고, 범위 질의의 수행 결과를 주 서버에 전달하는 제 6 단계; 및 적어도 하나의 부 서버로부터 범위 질의의 수행 결과를 전달받은 주 서버가 자신이 찾은 결과와 취합하여 K 개의 결과를 선택하고, 선택된 K 개의 결과를 클라이언트에게 반환하는 제 7 단계를 포함하는 것을 특징으로 한다.In addition, the present invention provides a data search method of a parallel high dimensional index structure applied to a parallel high dimensional index system in a data transmission network (SAN) environment, wherein a main server and at least one department access a root note for a K-access query. A first step of determining whether the root node is a non-terminal node; As a result of the determination of the first step, when the root note is a non-terminal node, the primary server and at least one secondary server calculate similarities between the query and the entries, sort each similarity in the order of similarity in the list, and store the same in the list. A second step of accessing the child nodes of the first entry in parallel and allocating the current node as the root node; If the root node is not the non-terminal node as a result of the first step, the primary server and at least one secondary server calculate the similarity between the query and the object, and extract only the value less than or equal to the K-nearest distance from the node. Storing in a result set and determining whether the number of stored results is greater than or equal to K; If the number of results is not greater than or equal to K as a result of the determination in step 3, the primary server and at least one secondary server access the child nodes of the first entry stored in the list in parallel to access the current node as the root node. Assigning a fourth step; As a result of the determination in the third step, when the result value is greater than or equal to K, the primary server and at least one secondary server calculate the similarity between the K th result object and the query and convert the similarity to the range query form. A fifth step of sending, by the secondary server, the converted range query to the primary server; The main server receiving the converted range query selects a range query having the smallest range among the ranged queries and the range query converted by the main server, and transmits the selected range query to at least one sub-server, and at least one received the range query. A sixth step of the secondary server performing a range query and delivering a result of the range query to the primary server; And a seventh step in which the main server, which has received the result of performing the range query from the at least one secondary server, collects the results found by the main server, selects K results, and returns the selected K results to the client. do.

한편, 본 발명은, 병렬 고차원 색인 구조의 질의를 삽입하기 위하여, 프로세서를 구비한 병렬 고차원 색인 시스템에, 주 서버가 입력되는 새로운 엔트리를 각디스크 그룹에 할당하고, 자신을 포함하여 결정된 적어도 하나의 부 서버로 새로운 엔트리를 전달하는 제 1 기능; 상기 적어도 하나의 부 서버 및 주 서버는 전달받은 새로운 엔트리의 삽입을 위해 트리 순회를 통해서 노드를 선택하고, 상기 선택한 노드에 엔트리를 삽입할 여유 공간이 있는지를 판단하는 제 2 기능; 상기 제 2 기능의 판단 결과, 노드에 여유 공간이 있을 경우 주 서버 및 적어도 하나의 부 서버는 삽입할 노드의 페이지 여유 공간에 새로운 엔트리를 삽입하는 제 3 기능; 및 상기 제 2 기능의 판단 결과, 노드에 여유 공간이 없을 경우 주 서버 및 적어도 하나의 부 서버는 현재 분할이 발생한 레벨의 페이지들이 분포하는 디스크에 페이지를 할당한 후 노드에 새로운 엔트리를 삽입하고, 부모 노드에 최소 경계 영역(Minimum Bounding Region)의 변경을 반영하는 제 4 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.On the other hand, the present invention, in order to insert a query of a parallel high-dimensional index structure, in a parallel high-dimensional index system having a processor, at least one determined by including a new entry that is input by the main server to each disk group, including itself A first function of forwarding a new entry to the secondary server; The at least one secondary server and the primary server select a node through tree traversal for insertion of a new entry, and determine whether there is free space for inserting an entry in the selected node; A third function of inserting a new entry into a page free space of a node to be inserted by the main server and the at least one sub server when there is free space in the node as a result of the determination of the second function; And as a result of the determination of the second function, when there is no free space in the node, the primary server and the at least one secondary server allocate a page to the disk where the pages of the level where the current partition occurs are distributed, and insert a new entry in the node. A computer readable recording medium having recorded thereon a program for realizing a fourth function of reflecting a change in a minimum bounding region on a parent node is provided.

또한, 본 발명은, 병렬 고차원 색인 구조의 데이터를 탐색하기 위하여, 프로세서를 구비한 병렬 고차원 색인 시스템에, 주 서버 및 적어도 하나의 부 서버가 루트 노드 정보를 입/출력 스케쥴러에 전달하고, 상기 전달된 입/출력 스케쥴러에서 관리하는 정보가 있는지를 판단하는 제 1 기능; 상기 제 1 기능에서 판단한 결과, 전달된 입/출력 스케쥴러에서 관리하는 정보가 없을 경우, 적어도 하나의 부 서버가 범위 질의한 결과를 주 서버로 전달하고, 상기 주 서버는 적어도 하나의 부 서버에서 전달받은 범위 질의 결과와 자신이 찾은 범위 질의 결과를 취합해서 클라이언트에게 취합된 질의 결과를 전달하는 제 2 기능; 및 상기 제 1 기능에서 판단한 결과, 입/출력 스케쥴러에서 관리하는 정보가 있을 경우는 상기 주 서버 및 적어도 하나의 부 서버가 입/출력 스케쥴러로부터 여러 노드를 병렬로 접근하여 현재 노드와 단말 노드가 동일한지 여부에 따라 범위 질의 탐색을 수행하는 제 3 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.In addition, the present invention, in order to search the data of the parallel high-dimensional index structure, to a parallel high-dimensional index system having a processor, the main server and at least one secondary server delivers the root node information to the input / output scheduler, A first function of determining whether there is information managed by an input / output scheduler; As a result of the determination in the first function, if there is no information managed by the transmitted I / O scheduler, at least one secondary server transmits the range query result to the primary server, and the primary server delivers the at least one secondary server. A second function of collecting the received range query result and the range query result found by the user and delivering the collected query result to the client; And when there is information managed by an input / output scheduler as determined by the first function, the primary server and at least one secondary server access several nodes in parallel from the input / output scheduler so that the current node and the terminal node are the same. A computer readable recording medium having recorded thereon a program for realizing a third function of performing a range query search depending on whether or not the present invention is provided.

또한, 본 발명은, 병렬 고차원 색인 구조의 데이터를 탐색하기 위하여, 프로세서를 구비한 병렬 고차원 색인 시스템에, 주 서버 및 적어도 하나의 부 서버가 루트 노드에 접근을 하여, 접근한 루트 노드가 비 단말 노드인지를 판단하는 제 1 기능; 상기 제 1 기능의 판단 결과, 접근한 루트 노드가 비 단말 노드일 경우에는 주 서버 및 적어도 하나의 부 서버가 질의와 엔트리와의 유사도를 계산하고, 상기 계산된 각 유사도를 리스트에 유사도 순으로 정렬하며, 리스트에 저장된 첫 번째 엔트리의 자식 노드를 병렬로 접근해서 현재 노드를 루트 노드로 할당하는 제 2 기능; 상기 제 1 기능의 판단 결과, 접근한 루트 노드가 비 단말 노드가 아닐 경우에는 주 서버 및 적어도 하나의 부 서버가 질의와 객체의 유사도를 계산하여 노드에서 K-최근접 거리보다 작거나 같은 값만을 추출하여 결과 집합에 저장하는 제 3 기능; 상기 결과 집합에 저장된 결과의 개수가 K(k는 자연수)개 보다 크거나 같고 그리고 K 번째 결과가 리스트의 첫 번째 엔트리의 유사도 값보다 작거나 같은지를 판단하는 제 4 기능; 상기 제 4 기능의 판단 결과, 저장된 결과의 개수가 K 개 보다 크거나 같고 그리고 K 번째 결과가 리스트의 첫 번째 엔트리의 유사도 값보다 작거나 같을 경우에는 적어도 하나의 부 서버가 결과 K 개를 주 서버에 전달하고, 주 서버가 적어도 하나의 부 서버에서 전달되는 결과 K와 자신이 찾은 결과를 취합하여 유사도 순으로 정렬하고, K 개의 최종 결과를 선택하여 클라이언트에 반환하는제 5 기능; 및 상기 제 4 기능의 판단 결과, 저장된 결과의 개수가 K 개 보다 크거나 같지 않거나 또는 K 번째 결과가 리스트의 첫 번째 엔트리의 유사도 값보다 작거나 같지 않을 경우에는 주 서버 및 적어도 하나의 부 서버가 리스트에 저장된 첫 번째 엔트리의 자식 노드를 병렬로 접근해서 현재 노드를 루트 노드로 할당하는 제 6 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.In addition, the present invention, in order to search the data of the parallel high-dimensional index structure, in the parallel high-dimensional index system having a processor, the main server and at least one secondary server access the root node, the access root node is a non-terminal A first function of determining whether a node is a node; As a result of the determination of the first function, when the accessed root node is a non-terminal node, the primary server and at least one secondary server calculate the similarity between the query and the entry, and sort each calculated similarity in the order of similarity in the list. And a second function of accessing the child nodes of the first entry stored in the list in parallel and allocating the current node as the root node; As a result of the determination of the first function, when the approaching root node is not a non-terminal node, the primary server and at least one secondary server calculate the similarity between the query and the object so that only a value less than or equal to the K-nearest distance from the node is calculated. A third function of extracting and storing the result set; A fourth function of determining whether the number of results stored in the result set is greater than or equal to K (k is a natural number) and the K th result is less than or equal to the similarity value of the first entry of the list; As a result of the determination of the fourth function, when the number of stored results is greater than or equal to K and the Kth result is less than or equal to the similarity value of the first entry in the list, at least one secondary server returns the K results in the primary server. A fifth function of collecting the result K and the result found by the primary server from the at least one secondary server, sorting them in order of similarity, and selecting and returning K final results to the client; And the primary server and the at least one secondary server if the number of stored results is not greater than or equal to K or the Kth result is less than or equal to the similarity value of the first entry of the list. A computer-readable recording medium having recorded thereon a program for realizing a sixth function of allocating a child node of a first entry stored in a list in parallel and allocating a current node as a root node is provided.

또한, 본 발명은, 병렬 고차원 색인 구조의 데이터를 탐색하기 위하여, 프로세서를 구비한 병렬 고차원 색인 시스템에, 상기 주 서버가 입력되는 K-접근 질의를 루트 노드에 접근하고, 이 접근된 루트 노드가 비 단말노드인지를 판단하는 제 1 기능; 상기 제 1 기능의 판단 결과, 접근된 루트 노드가 비 단말 노드일 경우에는 주 서버가 질의와 엔트리와의 유사도를 계산하고, 각 유사도를 리스트에 유사도 순으로 정렬하고, 상기 리스트에 저장된 첫 번째 엔트리의 자식 노드를 병렬로 접근하여 현재 노드를 루트 노드로 할당하는 제 2 기능; 상기 제 1 기능에서 판단 결과, 접근된 루트 노드가 비 단말 노드가 아닐 경우에는 주 서버가 질의와 객체의 유사도를 계산하여 노드에서 K-최근접 거리보다 작거나 같은 값만을 추출하여 결과 집합에 저장하고, 저장된 결과의 개수가 K 개 보다 크거나 같은지를 판단하는 제 3 기능; 상기 제 3 기능에서 판단한 결과, 결과 집합에 저장된 결과의 개수가 K 개 보다 크거나 동일할 경우에는 주 서버가 K 번째 결과 객체와 질의와의 유사도를 계산하여 범위 질의 형태로 변환하여, 상기 변환된 범위 질의를 적어도 하나의 부 서버로 전달하는 제 4 기능; 상기 변환된 범위를 전달받은 적어도 하나의 부 서버가범위 질의를 수행하여 병렬성을 최대화한 검색 결과를 주 서버로 전달하고, 주 서버가 전달받은 검색 결과와 자신의 얻은 결과를 취합하여 유사도 순으로 정렬하여 최종 K 개의 결과를 얻고, 최종 K 개의 결과를 클라이언트에게 반환하는 제 5 기능; 및 상기 제 3 기능의 판단 결과, 결과 집합에 저장된 결과의 개수가 K 개 보다 크거나 동일하지 않을 경우에는 주 서버가 리스트에 저장된 첫 번째 엔트리의 자식 노드를 병렬로 접근해서 현재 노드를 루트 노드로 할당하는 제 6 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.In addition, the present invention, in order to search the data of the parallel high-dimensional index structure, in the parallel high-dimensional index system having a processor, the K-access query inputted by the main server to the root node, the accessed root node is A first function of determining whether the terminal is a non-terminal node; As a result of the determination of the first function, if the accessed root node is a non-terminal node, the main server calculates the similarity between the query and the entry, sorts each similarity in the order of similarity in the list, and stores the first entry in the list. A second function of allocating a current node as a root node by accessing the child nodes in parallel; As a result of the determination in the first function, when the accessed root node is not a non-terminal node, the main server calculates the similarity between the query and the object, extracts only a value less than or equal to the K-nearest distance from the node, and stores the result in the result set. And determining whether the number of stored results is greater than or equal to K numbers; As a result of the third function, if the number of results stored in the result set is greater than or equal to K, the main server calculates the similarity between the K th result object and the query and converts it into a range query form. A fourth function of forwarding a range query to at least one secondary server; At least one sub-server receiving the converted range performs a range query to deliver a search result that maximizes parallelism to the main server, and collects the search results received from the main server and the obtained results and sorts them in similarity order. A fifth function of obtaining the final K results and returning the final K results to the client; And when the number of results stored in the result set is not greater than or equal to K as a result of the third function, the main server accesses the child nodes of the first entry stored in the list in parallel and makes the current node the root node. A computer readable recording medium having recorded thereon a program for realizing the sixth function to be allocated is provided.

또한, 본 발명은, 병렬 고차원 색인 구조의 데이터 탐색하기 위하여, 프로세서를 구비한 병렬 고차원 색인 시스템에, 주 서버 및 적어도 하나의 부서가 K-접근 질의에 대해 루트 노트에 접근하여 루트 노드가 비 단말 노드인지를 판단하는 제 1 기능; 상기 제 1 기능의 판단 결과, 루트 노트가 비 단말 노드일 경우에는 주 서버 및 적어도 하나의 부 서버가 질의와 엔트리와의 유사도를 계산하고, 각 유사도를 리스트에 유사도 순으로 정렬하며, 리스트에 저장된 첫 번째 엔트리의 자식 노드를 병렬로 접근해서 현재 노드를 루트 노드로 할당하는 제 2 기능; 상기 제 1 기능의 판단 결과, 루트 노드가 비 단말 노드가 아닐 경우에는 주 서버 및 적어도 하나의 부 서버가 질의와 객체의 유사도를 계산하여 노드에서 K-최근접 거리보다 작거나 같은 값만을 추출하여 결과 집합에 저장하고, 저장된 결과의 개수가 K 개 보다 크거나 같은지를 판단하는 제 3 기능; 상기 3 기능의 판단 결과, 결과의 개수가 K 개 보다 크거나 동일하지 않을 경우에는 주 서버 및 적어도 하나의 부 서버가 리스트에 저장된 첫 번째 엔트리의 자식 노드를 병렬로 접근해서 현재 노드를 루트 노드로 할당하는 제 4 기능; 상기 제 3 기능의 판단 결과, 결과 값이 K 보다 크거나 동일할 경우에는 주 서버 및 적어도 하나의 부 서버가 K 번째 결과 객체와 질의와의 유사도를 계산하여 범위 질의 형태로 변환하고, 적어도 하나의 부 서버가 변환된 범위 질의를 주 서버에 전달하는 제 5 기능; 상기 변환된 범위 질의를 전달받은 주 서버가 전달된 범의 질의들과 자신이 변환한 범위 질의 중 가장 범위가 작은 범위 질의를 선택하여 적어도 하나의 부 서버에 전달하고, 상기 범위 질의를 전달받은 적어도 하나의 부 서버가 범위 질의를 수행하고, 범위 질의의 수행 결과를 주 서버에 전달하는 제 6 기능; 및 적어도 하나의 부 서버로부터 범위 질의의 수행 결과를 전달받은 주 서버가 자신이 찾은 결과와 취합하여 K 개의 결과를 선택하고, 선택된 K 개의 결과를 클라이언트에게 반환하는 제 7 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.In addition, the present invention provides a parallel high-dimensional indexing system having a processor for searching data of a parallel high-dimensional index structure, wherein a main server and at least one department access a root note for a K-access query so that a root node is a non-terminal. A first function of determining whether a node is a node; As a result of the determination of the first function, when the root note is a non-terminal node, the primary server and at least one secondary server calculate similarities between the query and the entry, sort each similarity in the list in the order of similarity, and store the list in the list. A second function of allocating the current node as the root node by accessing the child nodes of the first entry in parallel; As a result of the determination of the first function, if the root node is not a non-terminal node, the primary server and at least one secondary server calculate the similarity between the query and the object and extract only the value less than or equal to the K-nearest distance from the node. A third function of storing in a result set and determining whether the number of stored results is greater than or equal to K; As a result of the determination of the above three functions, if the number of results is not greater than or equal to K, the primary server and at least one secondary server access the child nodes of the first entry stored in the list in parallel, and make the current node the root node. A fourth function of allocating; As a result of the determination of the third function, when the result value is greater than or equal to K, the main server and the at least one sub server calculate the similarity between the K th result object and the query and convert the result into a range query form. A fifth function for the secondary server to forward the converted range query to the primary server; The main server receiving the converted range query selects a range query having the smallest range among the ranged queries and the range query converted by the main server, and transmits the selected range query to at least one sub-server, and at least one received the range query. A sixth function of the secondary server performing a range query and delivering a result of the range query to the primary server; And a program for realizing a seventh function in which the main server, which has received the result of executing the range query from at least one secondary server, collects the results found by the main server, selects K results, and returns the selected K results to the client. Provide a computer-readable recording medium for recording.

여기서 상술된 목적, 특징들 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이다. 이하 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명한다.The objects, features and advantages described above will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

먼저, 본 발명에 따른 병렬 고차원 색인 구조의 데이터 삽입 및 탐색을 설명하기 전에 본 발명이 적용되는 SAN(Storage Area Network) 환경에 대해 살펴보면 다음과 같다.First, a storage area network (SAN) environment to which the present invention is applied will now be described before explaining data insertion and search of a parallel high-dimensional index structure according to the present invention.

SAN이란 호스트 컴퓨터에서 SCSI(Small Computer System Interface)를 통한 저장 장치와 신속하게 데이터를 주고받을 수 있는 것처럼 네트워크상에서 FC(Fibre Channel)의 이점인 고속 전송과 장거리 연결 및 멀티 프로토콜 기능을 활용 실현하는 기술로서, SAN의 표준화를 위한 단체인 SNIA(Storage Networking Industry Association)에서 "호스트 컴퓨터의 종류에 구애받지 않고 별도의 연결된 저장장치 사이에 대용량의 데이터를 전송시킬 수 있는 고속 네트워크"라고 정의하고 있다.SAN is a technology that enables high-speed transmission, long-range connectivity, and multi-protocol capabilities that are advantages of Fiber Channel (FC) over a network, just as a host computer can quickly exchange data with storage devices over a small computer system interface (SCSI). The Storage Networking Industry Association (SNIA), the organization for standardizing SANs, defines "high-speed networks that can transfer large amounts of data between separate connected storage devices regardless of the type of host computer."

즉, SAN의 출현 배경은 관리해야 하는 데이터가 기하급수적으로 증가하고 있는데 반해, 이를 저장하기 위한 장치는 서로 다른 서버들에 종속되어 있으며, 이 서버들은 LAN(Local Area Network) 또는 WAN(Wide Area Network)으로 연결되어 대량의 데이터를 고속으로 주고 받는데 한계를 들어내고 있다.In other words, the background of the emergence of SANs is that the data to be managed is growing exponentially, while the devices for storing them are dependent on different servers, which are local area network (LAN) or wide area network (WAN). It is connecting with) to limit the sending and receiving of large amount of data at high speed.

따라서, 이러한 한계를 해결하기 위한 방안 중 하나로 등장한 것이 SAN으로, 확장성의 개선, 데이터 전송속도와 전송 거리의 증가, 데이터와 장비의 공유에 따른 비용 절감 및 서버에 의존하지 않는 데이터의 백업이 가능한 장점이 있다.Therefore, SAN has emerged as one of the ways to solve this limitation, which can improve scalability, increase data transmission speed and transmission distance, reduce the cost of sharing data and equipment, and backup server-independent data. There is this.

도 1은 본 발명이 적용되는 데이터 전송 네트워크(SAN)의 개념을 설명하는 도면이다.1 is a view for explaining the concept of a data transmission network (SAN) to which the present invention is applied.

도 1에 도시된 바와 같이, SAN 구조는 대용량의 데이터가 연결된 호스트 서버(100,101,102)에 관계없이 원거리에 분산된 저장 장치(104,105)사이에 주고 받을 수 있도록 하는 초고속 네트워크로서, SAN 환경하의 저장 장치(104, 105)는 호스트 서버(100,101,102)와의 주종 관계를 벗어나 여러 개의 서버에 의해 공유되며, 디스크나 테이프 장비의 공동 사용, 복제 기능과 고 가용성을 클러스터링 등을 가능하게 한다.As shown in FIG. 1, a SAN structure is a high-speed network that allows a large amount of data to be exchanged between remotely distributed storage devices 104 and 105 regardless of a host server 100, 101, or 102 to which a large amount of data is connected. The 104 and 105 may be shared by a plurality of servers out of a master relationship with the host servers 100, 101, and 102, and may be used for the joint use of disk or tape equipment, clustering of replication functions, and high availability.

도 2는 본 발명이 적용되는 데이터 전송 네트워크 환경에서 병렬 고차원 구조의 질의 삽입 및 탐색 시스템의 일실시예 구성도이다.2 is a diagram illustrating an embodiment of a system for inserting and searching a query having a parallel high-dimensional structure in a data transmission network environment to which the present invention is applied.

도 2에 도시된 바와 같이, 상기 시스템은 주 서버(200), 제1 부서버(201), 제2 부서버(202), SAN(203), 제1 디스크 저장 그룹(204), 제2 디스크 저장 그룹(205) 및 제3 디스크 저장 그룹(206)을 포함한다.As shown in FIG. 2, the system includes a primary server 200, a first department 201, a second department 202, a SAN 203, a first disk storage group 204, and a second disk. Storage group 205 and third disk storage group 206.

상기 주 서버(200)는 색인 구성에 관한 연산 및 탐색을 수행하며, 제1 부 서버(201) 및 제2 부 서버(202)로부터 전달받은 탐색 결과를 통합 관리하는 기능을 한다. 상기 제1 부 서버(201) 및 제2 부 서버(202)는 색인 구성에 관한 연산 및 탐색을 수행하여, 그 탐색 결과를 주 서버(200)로 전달한다.The main server 200 performs calculation and search related to the index configuration, and performs integrated management of the search results received from the first sub-server 201 and the second sub-server 202. The first sub-server 201 and the second sub-server 202 perform an operation and search on the index configuration, and transmit the search result to the main server 200.

여기서, 본 발명이 적용되는 색인 구조는 nP-nD와 1P-nD의 구조를 혼합한 nP-n×mD 형태로서(여기서, n 및 m은 1 이상의 자연수이고, P는 서버, D는 저장장치), 전체적으로 서버의 병렬성을 이용하고 각각의 서버는 다시 입력 및 출력의 병렬성을 이용하는 구조이다.Here, the index structure to which the present invention is applied is an nP-n × mD form in which nP-nD and 1P-nD structures are mixed (where n and m are one or more natural numbers, P is a server and D is a storage device). In general, each server uses parallelism of inputs and outputs.

또한, 상기와 같은 색인 구조는 SAN 환경에서 이미지 검색에 참여하는 디스크의 개수가 서버의 수보다 훨씬 많다는 것을 전제로 하고, 하나의 서버는 하나의 디스크 저장 그룹을 관리하며, 각 디스크 그룹에는 별도로 색인 트리가 존재한다.In addition, such an index structure assumes that the number of disks participating in image retrieval in a SAN environment is much larger than the number of servers. One server manages one disk storage group, and each disk group is indexed separately. The tree exists.

도 3은 도 2에 도시된 각 디스크 그룹내의 트리 구조에 대한 일실시예의 도면으로, 도면에 제시된 노드(node)는 논리적인 의미로 색인 구조의 단말 노드(310,314,318) 및 비 단말노드(303)를 포함한다. 그리고, 페이지(304)는 물리적인 디스크의 입출력 단위이다.FIG. 3 is a diagram of an embodiment of a tree structure in each disk group shown in FIG. 2, and the nodes shown in FIG. 2 logically represent terminal nodes 310, 314, 318 and non-terminal nodes 303 in an index structure. Include. The page 304 is an input / output unit of a physical disk.

상기 단말 노드(310,314,318)는 그룹 내 다른 디스크들(300,301,302)에 나누어 저장되고, 각 노드 중 한 엔트리에는 그것이 가리키는 하위 노드의 최소 경계영역(Minimum Bounding Region: 이하 "MBR"이라 약칭한다)과 그 노드를 구성하는 페이지들(각 디스크에 분산된)에 대한 포인터가 포함되어 있다.The terminal nodes 310, 314 and 318 are divided and stored in the other disks 300, 301 and 302 in the group, and one entry of each node has a minimum bounding region (hereinafter abbreviated as "MBR") of the lower node to which the node points. Contains pointers to the pages (distributed on each disk) that make up the.

즉, 도 3에 제시된 바와 같이, 루트 노드(303)의 첫 번째 엔트리(304)는 노드 1(310,314,318)을 가리키고 있다. 상기 노드 1(310,314,318)은 디스크 A(300)의 첫 번째 페이지(310), 디스크 B(301)의 첫 번째 페이지(314) 및 디스크 C(302)의 첫 번째 페이지(318)가 모여서 이루어지게 되며, 루트 노드(303)의 첫 번째 엔트리에는 그 페이지들에 대한 포인터와 페이지들에 저장된 객체를 포함하는 MBR이 저장된다.That is, as shown in FIG. 3, the first entry 304 of the root node 303 points to node 1 310, 314, 318. Node 1 (310,314,318) is composed of the first page 310 of disk A 300, the first page 314 of disk B 301 and the first page 318 of disk C (302) In the first entry of the root node 303, an MBR including a pointer to the pages and an object stored in the pages is stored.

즉, 상기와 같은 색인 구조는 하나의 노드를 여러 디스크의 페이지에 나누어 저장하기 때문에 디클러스터링 효과를 얻을 수 있다. 또한, 색인 트리의 하나의 노드의 크기(그룹내의 디스크 개수 × 페이지 크기)가 되므로 색인 트리의 높이가 낮아진다. 그리고, 노드의 크기가 크지만 이것은 한번의 입/출력 시간에 병렬로 읽어 올 수 있으므로 문제가 되지 않는다. 따라서, 데이터 분할 기법의 색인구조에서 고차원 데이터로 갈수록 노드의 팬-아웃이 작아지게 되어 트리의 높이가 높아지게 되고 이로 인해 겹침 영역도 증가하는 연쇄 효과가 발생한다.That is, the index structure as described above can obtain a declustering effect because one node is divided and stored in pages of several disks. In addition, since the size of one node of the index tree (number of disks in the group x page size) is increased, the height of the index tree is lowered. And although the size of the node is large, this is not a problem because it can be read in parallel at one input / output time. Therefore, the index structure of the data partitioning technique increases the height of the tree as the fan-out of the node decreases as the high-dimensional data becomes higher, which causes a chain effect in which the overlap region also increases.

하지만, 상기 구조에서는 디스크 수에 따라서 팬-아웃 수를 조정할 수 있으므로 트리의 높이를 낮게 유지할 수 있으며, 차원이 증가함에 따라 탐색 수행 시 접근해야 하는 노드의 수가 증가하게 하여 고차원일수록 질의에 참여하는 페이지의 수가 많게 된다.However, in the above structure, the number of fan-outs can be adjusted according to the number of disks, so that the height of the tree can be kept low. As the dimension increases, the number of nodes that need to be accessed when the search is performed increases, so that pages that participate in the query are higher in dimension. The number of will be large.

따라서, 병렬 고차원 색인 구조에 있어서 중요한 성질인 최소 활성디스크(Minimum Load) 성질보다는 균등 활성 디스크(unispread) 성질을 최대한 이용하는 것이 더 효과적이다. 그러므로, 하나의 노드를 읽기 위해서는 반드시 페이지가 존재하는 디스크를 읽어내야 하기 때문에 균등 활성 디스크(unispread) 성질을 최대화할 수 있게 되어 범위 질의 및 K-최근접 질의 처리시 효과적이다.Therefore, it is more effective to utilize the uniformly active disk (unispread) property rather than the minimum active disk (Minimum Load) property which is an important property in the parallel high-dimensional index structure. Therefore, to read a node, it is necessary to read the disk on which the page exists, so that the uniformly active disk (unispread) property can be maximized, which is effective in processing range queries and K-nearest queries.

이하에서는, 제시된 색인 구조에 엔트리 삽입하는 과정과 삽입 후 여유공간이 부족할 때 분할해서 할당하는 과정에 대해서 자세히 설명한다.Hereinafter, the process of inserting an entry into the proposed index structure and the process of dividing and assigning when the free space is insufficient after the insertion will be described in detail.

도 4는 도 2에 도시된 시스템에서 병렬 고차원 색인 구조의 데이터 삽입 과정을 설명하는 일실시예 도면이다.FIG. 4 is a diagram for explaining a data insertion process of a parallel high-dimensional index structure in the system shown in FIG.

도 4에 도시된 바와 같이, 각각의 엔트리(a, b, c, d, e, f, g, h)(400)를 차례로 디스크 그룹에 삽입하려고 할 때, 주 서버(401)는 a 엔트리를 제1 디스크 그룹(404)에, b 엔트리를 제2 디스크 그룹(405) 및 c 엔트리를 제3 디스크 그룹(406)에 차례로 할당하게 된다. 여기서, 주 서버(401)에서 삽입할 엔트리의 할당은 라운드 로빈 방법을 사용함이 바람직한데, 그 이유는 고차원으로 갈수록 다른 디클러스터링 방법이 크게 효과를 보이지 못하고 라운드 로빈 방법과 비교했을 때 차이가 많이 나지 않으므로, 계산에 대한 비용이 적게 들고 구현하기 쉬운 라운드 로빈 방법을 사용하는 것이다.As shown in Fig. 4, when trying to insert each entry (a, b, c, d, e, f, g, h) 400 into a disk group in turn, the main server 401 inserts a entry. In the first disk group 404, the entry b is assigned to the second disk group 405 and the entry c to the third disk group 406. Here, it is preferable to use a round robin method for allocating an entry to be inserted in the main server 401. The reason for this is that other declustering methods do not show a great effect at higher levels, and there are many differences when compared to the round robin method. Therefore, the round robin method is less expensive and easy to implement.

도 5 및 도 6은 도 4에 도시된 엔트리 삽입 과정 중 디스크내에서의 노드 분할 과정을 설명하는 일실시예 도면으로, 삽입 후에 여유 공간이 부족할 때, 노드를 분할해서 엔트리를 할당하는 것이다.5 and 6 are diagrams illustrating an embodiment of a node partitioning process in a disk during the entry insertion process illustrated in FIG. 4. When the free space is insufficient after insertion, nodes are divided and allocated.

즉, 도 5에 도시된 바와 같이, 각 디스크의 노드 2(505)가 꽉 차있을 때 새로운 엔트리 삽입의 시도로 인해 넘침이 발생되면, 노드는 디스크 D(503)와 디스크 E(504)가 가장 적은 페이지 수를 유지하기 때문에 디스크 D(503) 및 디스크 E(504)로부터 할당된다. 이후, 도 6에 도시된 바와 같이, 노드 2(505)가 분할하여 노드 2(608,610)와 노드 3(605,606,607,609)으로 분할하게 된다. 여기서, 할당되는 과정은 먼저 노드 2(608,610)를 디스크 D(603)와 디스크 E(604)에 차례로 할당하고 노드 3(605,606,607,609)을 디스크 A(600)로부터 차례로 할당하게 된다.That is, as shown in FIG. 5, if overflow occurs due to an attempt to insert a new entry when node 2 505 of each disk is full, the node may assume that disk D 503 and disk E 504 impersonate it. It allocates from disk D 503 and disk E 504 because it maintains a small number of pages. Thereafter, as shown in FIG. 6, node 2 505 divides into node 2 608, 610 and node 3 (605, 606, 607, 609). Here, the allocation process first allocates node 2 608, 610 to disk D 603 and disk E 604, and node 3 605, 606, 607, 609 in turn from disk A 600.

이렇게 분할된 노드를 다른 디스크에 할당함으로써 고차원 색인 구조에 있어서 가장 기본적인 질의 형태인 범위 질의 시에 여러 노드를 병렬로 읽을 수 있게 된다.By allocating the partitioned nodes to different disks, multiple nodes can be read in parallel during range queries, the most basic form of queries in high-dimensional index structures.

도 7은 본 발명에 따른 데이터 전송 네트워크(SAN)상에서의 병렬 고차원 색인 구조의 삽입 방법에 대한 일실시예 흐름도이다.7 is a flow diagram of an embodiment of a method for inserting a parallel high dimensional index structure on a data transmission network (SAN) in accordance with the present invention.

도 7에 도시된 바와 같이, 먼저, 디클러스트링 방법에 따라 주 서버(401)는 삽입할 서버를 결정하고, 각각의 새로운 엔트리를 삽입할 서버에 전달한다(700).As shown in FIG. 7, first, according to the declustering method, the main server 401 determines a server to insert and forwards each new entry to the server to be inserted 700.

그러면, 상기 주 서버(401)와 각 부 서버(402,403)는 전달받은 새로운 엔트리의 삽입을 위해 트리 순회를 통해서 가장 적당한 노드를 선택하고(701), 이 트리 순회를 통해 적당한 노드가 선택되면 엔트리를 삽입할 노드에 여유 공간이 있는지를 판단한다(702),Then, the main server 401 and each sub-server 402, 403 selects the most suitable node through tree traversal for insertion of the new entry received (701), and selects an entry when the appropriate node is selected through this tree traversal. It is determined whether there is free space in the node to be inserted (702).

상기 과정(702)에서 판단한 결과, 삽입할 노드에 여유 공간이 있으면 주 서버(401)와 각 부 서버(402,403)는 삽입할 노드의 페이지에 여유 공간이 있는지를 판단한다(705).As a result of the determination in step 702, if there is free space in the node to be inserted, the main server 401 and each of the sub-servers 402 and 403 determine whether there is free space in the page of the node to be inserted (705).

상기 과정(705)에서 판단한 결과, 노드에 여유 공간이 있지만 페이지가 여유 공간이 없을 경우는 상기 주 서버(401)와 각 부 서버(402,403)는 현재 노드에 페이지가 할당되지 않은 디스크로부터 페이지를 할당받아서 삽입하고(706), 페이지가 여유 공간이 있을 경우에는 새로운 엔트리를 삽입하고 부모 노드에 MBR 변경을 반영한 후에 본 루프를 종료한다(707).As a result of the determination in step 705, if there is free space in the node but there is no free space in the node, the main server 401 and each sub-server 402, 403 allocates a page from the disk where no page is allocated to the current node. If the page has free space, a new entry is inserted and the main loop ends after reflecting the MBR change in the parent node (707).

한편, 상기 과정(702)에서 판단한 결과, 상기 삽입할 노드에 여유 공간이 없을 경우에는 주 서버(401)와 각 부 서버(402,403)는 노드 분할 함수를 호출하여 노드 분할을 수행하여 새로운 노드를 할당한다(703). 그리고, 상기 할당받은 노드에 새로운 엔트리를 삽입하고, 부모 노드에 MBR 변경을 반영한다(704). 즉, 현재 분할이 발생한 레벨의 페이지들이 가장 적게 분포하는 디스크에 먼저 페이지가 할당되도록 하여 노드에 새로운 엔트리를 삽입하고, 부모 노드에 MBR 변경을 반영하여 본 루프를 종료한다.On the other hand, as a result of the determination in step 702, if there is no free space in the node to be inserted, the main server 401 and each sub-server 402, 403 call a node splitting function to perform node splitting and allocate a new node. (703). Then, a new entry is inserted into the allocated node and the MBR change is reflected in the parent node (704). In other words, a new entry is inserted into a node by first assigning a page to a disk in which pages with the lowest level of partitioning are distributed the least, and ending this loop by reflecting the MBR change in the parent node.

도 8은 본 발명에 따른 데이터 전송 네트워크(SAN) 환경에서 병렬 고차원 색인 구조의 데이터 탐색 중 디스크 그룹에서의 범위 질의 탐색 과정을 설명하는 일실시예 도면이다.FIG. 8 is a diagram illustrating a range query search process in a disk group during data search of a parallel high-dimensional index structure in a data transmission network (SAN) environment according to the present invention.

즉, 도 8은 질의 처리시 여러 노드를 접근하는 경우 주 서버(401)와 각 부 서버(402,403)가 하나씩 순서대로 접근하여 탐색하는 것으로, 루트 노드에서 범위 질의에 포함되는 엔트리가 2,4,7 일 때의 경우를 들어 설명한다. 본 범위 탐색 방법은 데이터 전송 네트워크(SAN) 환경에 구성된 모든 디스크 그룹에서 동일하게 발생한다. 도 8에서는 해당 디스크 그룹을 디스크 그룹 1(404)로 가정하고 디스크 그룹1(404)을 관리하는 서버를 주 서버(401)라고 가정하고 설명한다.That is, FIG. 8 illustrates that the main server 401 and each of the sub-servers 402 and 403 approach and search one by one in order to access several nodes during query processing. An entry included in the range query in the root node is 2, 4, The case at 7 will be explained. This range discovery method occurs equally in all disk groups configured in a data transmission network (SAN) environment. In FIG. 8, assuming that the disk group is disk group 1 404 and a server managing disk group 1 404 is assumed to be a main server 401.

먼저, 디스크 그룹1(404)을 관리하는 주 서버(401)는 루트 노드에 접근해서 삽입된 엔트리 1(805)에서 엔트리 7(811)까지 읽고, 범위에 해당하는 엔트리 2(806) 엔트리 4(808) 및 엔트리 7(811)을 선택하게 된다. 그리고, 선택된 엔트리 2(806), 엔트리 4(808) 및 엔트리 7(811)이 가리키는 각각 자식 노드 2(812,816,818), 자식 노드 4(813,819) 및 자식 노드 7(814,815,817)에 접근할 때, 노드 하나씩 접근하기 때문에 먼저 자식 노드 2(812,816,818)에 접근한다.First, the main server 401 managing the disk group 1 404 accesses the root node and reads from the inserted entry 1 805 to the entry 7 811, and the entry 2 806 entry 4 (corresponding to the range). 808 and entry 7 811 are selected. When accessing the child nodes 2 812, 816, 818, the child nodes 4 813, 819, and the child nodes 7 814, 815, 817 respectively indicated by the selected entry 2 806, entry 4 808, and entry 7 811, one node. Because we are accessing, first we access the child node 2 (812, 816, 818).

여기서, 상기 자식 노드 2(812,816,818)는 디스크 A(800), 디스크 D(803) 및 디스크 E(804)에 분산 저장되기 때문에 자식 노드 2를 접근할 때는 디스크 A(800), 디스크 D(803) 및 디스크 E(804)를 동시에 접근하게 된다.Here, since the child nodes 2 812, 816, 818 are distributed and stored in the disk A 800, the disk D 803, and the disk E 804, the child nodes 2 812, 816, and 818 may access the disk A 800, the disk D 803. And disk E 804 are accessed simultaneously.

다음으로, 상기 자식 노드 4(813,819)는 디스크 A(800) 및 디스크 E(804)에 분산 저장되어 있어 자식 노드 4(813,819)에 접근할 때는 디스크 A(800) 및 디스크 E(804)에 동시에 접근한다. 그리고, 자식 노드 7(814,815,817)은 디스크 B(801), 디스크 C(802) 및 디스크 D(803)에 분산되어 저장되어 있어 자식 노드 7에 접근할 때는 디스크 B(801), 디스크 C(802) 및 디스크 D(803)에 동시에 접근하게 된다.Next, the child nodes 4 813 and 819 are distributed and stored in the disk A 800 and the disk E 804 so that when the child nodes 4 813 and 819 are accessed, the child nodes 4 813 and 819 are simultaneously stored in the disk A 800 and the disk E 804. Approach The child nodes 7 (814, 815, 817) are distributed and stored in the disk B 801, the disk C 802, and the disk D 803, and the disk B 801 and the disk C 802 are accessed when the child node 7 is accessed. And disk D 803 are accessed simultaneously.

따라서, 자식 노드 2(812,816,818), 자식 노드 4(813,819) 및 자식 노드 7(814,815,817)의 순서대로 접근하여 데이터를 읽어 오는데, 이때의 총 디스크 접근 수는 처음 루트 노드에 접근하는 수와 단말 노드 디스크 접근수의 합으로 계산할 수 있다. 여기서, 루트 노드(R)의 접근 수는 1이고, 단말 노드 2, 4, 7 에 대한 디스크 접근수가 3이므로 총 디스크 접근 수는 4가 된다.Therefore, data is accessed and read in the order of child nodes 2 (812, 816, 818), child nodes 4 (813, 819), and child nodes 7 (814, 815, 817), where the total number of disk accesses is the number of first access to the root node and the terminal node disk. Can be calculated as the sum of the number of accesses. Here, since the number of accesses of the root node R is 1 and the number of disk accesses for the terminal nodes 2, 4, and 7 is 3, the total number of disk accesses becomes 4.

이때, 상기 단말 노드 2, 4, 7 노드의 페이지들의 위치를 조합하면 디스크 접근 수를 줄일 수 있다.In this case, the number of disk accesses can be reduced by combining the positions of the pages of the terminal nodes 2, 4, and 7 nodes.

도 9는 본 발명에 따른 데이터 전송 네트워크(SAN) 환경에서 병렬 고차원 색인 구조의 데이터 탐색 중 디스크 그룹에서의 범위 질의 탐색 과정을 설명하는 다른 일실시예 도면이다.9 is a diagram illustrating another example of a range query search process in a disk group during data search of a parallel high-dimensional index structure in a data transmission network (SAN) environment according to the present invention.

즉, 도 9는 범위 질의 처리시 여러 노드의 페이지를 동시에 접근해서 탐색하는 것으로, 루트 노드에서 범위 질의에 포함되는 엔트리가 2,4,7 일 때의 경우이나 ,범위 탐색 방법은 데이터 전송 네트워크(SAN) 환경에 구성된 모든 디스크 그룹에서 동일하게 발생한다.That is, FIG. 9 is a method of searching and accessing pages of several nodes at the same time when processing a range query. When the entries included in the range query are 2, 4, and 7 at the root node, the range search method is a data transmission network ( The same happens for all disk groups configured in a SAN) environment.

여기서는, 해당 디스크 그룹을 디스크 그룹 1(404)로 가정하고 디스크 그룹1(404)을 관리하는 서버를 주 서버(401)라고 가정하고 설명한다.Here, it is assumed that the disk group is assumed to be disk group 1 404, and the server managing disk group 1 404 is assumed to be the main server 401.

먼저, 디스크 그룹1(404)을 관리하는 주 서버(401)는 루트 노드에 접근해서 삽입된 엔트리 1(905)에서 엔트리 7(911)까지 읽고, 범위에 해당하는 엔트리 2(906) 엔트리 4(908) 및 엔트리 7(911)을 선택한 경우, 각 엔트리가 가리키는 페이지에 대한 포인터 정보를 모두 수집하여 디스크 입/출력 계획을 세운다. 여기에서, 상기 엔트리 2의 포인터(A3, D1, E1)의 의미하는 바는 각각 디스크 A(900)의 세 번째 페이지(912), 디스크 D(903)의 첫 번째 페이지(916) 및 디스크 E(904)의 첫 번째 페이지(918)를 의미한다.First, the main server 401 managing the disk group 1 404 accesses the root node and reads from the inserted entry 1 905 to the entry 7 911, and the entry 2 906 entry 4 (corresponding to the range). 908 and entry 7 911, all the pointer information for the page pointed to by each entry is collected to make a disk I / O plan. Here, the pointers A3, D1, and E1 of the entry 2 mean the third page 912 of the disk A 900, the first page 916 of the disk D 903, and the disk E ( 904, the first page 918.

여기서, 상기 디스크 입/출력 계획이란 디스크 접근 수를 줄이기 위해서 동시에 읽을 수 있는 페이지들은 자신이 속한 노드에 상관없이 한번에 입/출력하도록한 것으로 여러 노드의 서로 다른 디스크에 존재하는 페이지들을 동시에 접근하여 병렬로 한번에 입/출력을 한다. 입/출력 할 노드들의 페이지를 적절히 계획을 세워 가능한 많은 페이지를 한 번에 입/출력하게 된다. 즉, A3(912), B3(914), C3(915), D4(917), E1(918)은 모두 서로 다른 디스크에 존재하는 페이지이다. 따라서, 이들은 한번의 입/출력으로 처리 할 수 있다. 그 다음으로 A5(913), D4(917), E2(919)를 동시에 입/출력하게 된다. 따라서, 단말노드 접근 시간은 2가 되고, 루트 노드를 포함한 접근 수는 3이 되므로 첫 번째 방법보다 접근 수를 줄일 수 있다.Here, the disk I / O plan means that pages that can be read at the same time in order to reduce the number of disk accesses are input / output at one time regardless of the node to which they belong. Input / output at once. By properly planning the pages of the nodes to be input and output, you will be able to input and output as many pages as possible at once. That is, A3 912, B3 914, C3 915, D4 917, and E1 918 are all pages present on different disks. Thus, they can be processed with one input / output. Next, A5 913, D4 917, and E2 919 are simultaneously input / output. Therefore, the terminal node access time is 2, and the number of access including the root node is 3, thereby reducing the number of accesses than the first method.

도 10은 본 발명에 따른 데이터 전송 네트워크(SAN) 환경에서 병렬 고차원 색인 구조의 데이터 탐색 방법에 대한 제1 실시예 흐름도이다.10 is a flowchart of a first embodiment of a data searching method of a parallel high-dimensional index structure in a data transmission network (SAN) environment according to the present invention.

도 10에 도시된 바와 같이, 모든 서버(401,402,403)에서는 루트 노드 정보를 입/출력 스케쥴러에 전달하고(1000), 전달된 입/출력 스케쥴러에서 관리하는 정보가 있는지를 판단한다(1001).As illustrated in FIG. 10, all servers 401, 402, and 403 transmit root node information to the I / O scheduler 1000, and determine whether there is information managed by the transferred I / O scheduler 1001.

상기 과정(1001)에서 판단한 결과, 입/출력 스케쥴러에서 관리하는 정보가 있을 경우는 모든 서버(401,402,403)가 입/출력 스케쥴러로부터 여러 노드를 병렬로 접근하고(1003), 현재 노드와 단말 노드가 동일한지를 판단한다(1004).As a result of the determination in the process 1001, when there is information managed by the input / output scheduler, all servers 401, 402, and 403 access several nodes in parallel from the input / output scheduler (1003), and the current node and the terminal node are the same. Determine (1004).

상기 과정(1004)에서 판단한 결과, 현재 노드와 단말 노드가 동일하지 않을 경우에는 모든 서버(401,402,403)가 범위에 포함되는 엔트리를 선택하고, 그 엔트리에 대한 정보를 입/출력 스케쥴러에게 전달한 후(1006), 전달된 입/출력 스케쥴러에서 관리하는 정보가 있는지를 판단하는 과정(1001)으로 복귀하여 루프를 반복 수행한다.As a result of the determination in step 1004, if the current node and the terminal node are not the same, all servers 401, 402, 403 select an entry included in the range, and transmit information on the entry to the input / output scheduler (1006). In step 1001, it is determined whether there is information managed by the transferred I / O scheduler, and the loop is repeated.

한편, 상기 과정(1004)에서 판단한 결과, 현재 노드와 단말 노드가 동일할 경우에는 모든 서버(401,402,403)가 노드에 포함된 객체와 질의의 유사도를 계산하고, 결과 집합에 저장한 후(1005), 전달된 입/출력 스케쥴러에서 관리하는 정보가 있는지를 판단하는 과정(1001)으로 복귀하여 루프를 반복 수행한다.On the other hand, if it is determined in step 1004 that the current node and the terminal node are the same, all the servers (401, 402, 403) calculates the similarity of the object and query included in the node, and stores the result in the result set (1005), The process returns to the process of determining whether there is information managed by the transferred I / O scheduler (1001) and repeats the loop.

한편, 상기 과정(1001)에서 판단한 결과, 입/출력 스케쥴러에서 관리하는 정보가 없을 경우는 각 부 서버(402,403)에서 범위 질의한 결과를 주 서버(401)로 전달하여, 주 서버(401)로 하여금 각 부 서버(402,403)에서 찾은 범위 질의 결과와 주 서버(401)에서 찾은 범위 질의 결과를 취합해서 클라이언트에게 그 최종 결과를 반환하고 본 루프를 종료한다(1002).On the other hand, as a result of the determination in the process 1001, if there is no information managed by the input / output scheduler, the result of the range query in each sub-server 402, 403 is transmitted to the main server 401, to the main server 401 It combines the range query result found by each sub-server 402 and 403 with the range query result found by the main server 401, returns the final result to the client, and terminates this loop (1002).

도 11은 본 발명에 따른 데이터 전송 네트워크(SAN) 환경에서 병렬 고차원 색인 구조의 데이터 탐색 방법에 대한 제2 실시예 흐름도이다.11 is a flowchart of a second embodiment of a data searching method of a parallel high-dimensional index structure in a data transmission network (SAN) environment according to the present invention.

즉, 도 11은 모든 서버(401,402,403)에서 독립적인 K-최근접 질의의 수행에 따른 탐색 방법으로, 먼저 모든 서버(401,402,403)는 루트 노드에 접근을 하고(1100), 이 접근된 루트 노드가 비 단말 노드인지를 판단한다(1101).That is, FIG. 11 is a search method according to the execution of an independent K-nearest query in all servers 401, 402 and 403. First, all servers 401, 402 and 403 access the root node (1100), and the accessed root node is non-discovered. It is determined whether the terminal node (1101).

상기 과정(1101)에서 판단한 결과, 접근된 루트 노드가 비 단말 노드일 경우에는 모든 서버(401,402,403)는 질의와 엔트리와의 유사도를 계산하고, 각 유사도를 리스트에 유사도 순으로 정렬한다(1102). 그리고, 모든 서버(401,402,403)는 이 리스트에 저장된 첫 번째 엔트리의 자식 노드를 병렬로 접근해서 현재 노드를 루트 노드로 할당하고(1103), 루트 노드가 비단말 노드인지를 판단하는 과정(1101)으로 복귀하여 루프를 반복 수행한다.As a result of the determination in step 1101, when the accessed root node is a non-terminal node, all servers 401, 402, and 403 calculate similarities between the query and the entry, and sort each similarity in the order of similarity in the list (1102). Then, all the servers 401, 402, 403 access the child nodes of the first entry stored in this list in parallel, assign the current node as the root node (1103), and determine whether the root node is a non-terminal node (1101). It returns and repeats the loop.

한편, 상기 과정(1101)에서 판단한 결과, 루트 노드가 비 단말 노드가 아닐 경우에는 모든 서버(401,402,403)에서는 질의와 객체의 유사도를 계산하여 노드에서 K-최근접 거리보다 작거나 같은 값만을 추출하여 결과 집합에 저장하고(1404), 저장된 결과의 개수가 K 개 보다 크거나 같고 그리고 K 번째 결과가 리스트의 첫 번째 엔트리의 유도값보다 작거나 같은지를 모든 서버(401,402,403)가 판단한다(1105).On the other hand, if the root node is not a non-terminal node as a result of the process 1101, all the servers (401, 402, 403) calculates the similarity between the query and the object to extract only the value less than or equal to the K- nearest distance from the node Stored in the result set (1404), all servers 401, 402, 403 determine if the number of stored results is greater than or equal to K and the K th result is less than or equal to the derived value of the first entry in the list (1105).

상기 과정(1105)에서 판단한 결과, 저장된 결과의 개수가 K 개 보다 크거나 같고 그리고 K 번째 결과가 리스트의 첫 번째 엔트리의 유사도 값보다 작거나 같을 경우에는 각 부 서버(402,403)에서는 결과 K 개를 주 서버(401)에 전달한다(1106). 그러면, 상기 주 서버(401)는 각 부 서버(402,403)에서 전달되는 결과 K와 주 서버(401)에 얻은 결과를 취합하여 유사도 순으로 정렬하고, K 개의 결과를 선택하여 클라이언트에 반환한다(1107).If the number of stored results is greater than or equal to K and the K th result is less than or equal to the similarity value of the first entry in the list, the sub-servers 402 and 403 select K results. Forward to primary server 401 (1106). Then, the main server 401 collects the result K delivered from each sub-server 402 and 403 and the result obtained by the main server 401, sorts them in the order of similarity, and selects K results and returns them to the client (1107). ).

한편, 상기 과정(1105)에서 판단한 결과, 저장된 결과의 개수가 K 개 보다 크거나 같지 않거나 또는 K 번째 결과가 리스트의 첫 번째 엔트리의 유사도 값보다 작거나 같지 않을 경우에는 과정(1103)으로 진행하여 모든 서버(401,402,403)가 리스트에 저장된 첫 번째 엔트리의 자식 노드를 병렬로 접근해서 현재 노드를 루트 노드로 할당하고, 루트 노드가 비단말 노드인지를 판단하는 과정(1101)으로 복귀하여 루프를 반복 수행한다.On the other hand, if it is determined in step 1105 that the number of stored results is not greater than or equal to K or the Kth result is not less than or equal to the similarity value of the first entry in the list, the process proceeds to step 1103. All servers 401, 402, 403 access the child nodes of the first entry stored in the list in parallel, assign the current node as the root node, and return to the process of determining whether the root node is a non-terminal node 1101 to repeat the loop. do.

도 12는 본 발명에 따른 데이터 전송 네트워크(SAN) 환경에서 병렬 고차원 색인 구조의 데이터 탐색 방법에 대한 제3 실시예 흐름도이다.12 is a flowchart of a third embodiment of a data searching method of a parallel high-dimensional index structure in a data transmission network (SAN) environment according to the present invention.

즉, 도 12는 범위 질의와 K-최근접 질의를 혼용한 것으로, 도 11에 제시된 탐색 방법보다 입 출력시 병렬성을 최대화 할 수 있는 것이다.That is, FIG. 12 is a mixture of a range query and a K-nearest query, and it is possible to maximize parallelism at the input and output than the search method shown in FIG.

먼저, K-접근 질의가 들어오면 주 서버(401)는 루트 노드에 접근하고(1200), 이 접근된 루트 노드가 비 단말노드인지를 판단한다(1201).First, when a K-access query comes in, the main server 401 approaches 1200 the root node, and determines whether the accessed root node is a non-terminal node (1201).

이때, 상기 과정(1201)에서 판단한 결과, 접근된 루트 노드가 비 단말 노드일 경우에는 주 서버(401)는 질의와 엔트리와의 유사도를 계산하고, 각 유사도를 리스트에 유사도 순으로 정렬한다(1202). 그리고, 주 서버(401)는 이 리스트에 저장된 첫 번째 엔트리의 자식 노드를 병렬로 접근해서 현재 노드를 루트 노드로 할당하고(1203), 루트 노드가 비단말 노드인지를 판단하는 과정(1201)으로 복귀하여 루프를 반복 수행한다.In this case, when it is determined in step 1201 that the accessed root node is a non-terminal node, the main server 401 calculates the similarity between the query and the entry, and sorts each similarity in the order of similarity in the list (1202). ). Then, the main server 401 accesses the child node of the first entry stored in this list in parallel, assigns the current node to the root node (1203), and determines whether the root node is a non-terminal node (1201). It returns and repeats the loop.

한편, 상기 과정(1201)에서 판단한 결과, 루트 노드가 비 단말 노드가 아닐 경우에는 주 서버(401)는 질의와 객체의 유사도를 계산하여 노드에서 K-최근접 거리보다 작거나 같은 값만을 추출하여 결과 집합에 저장하고(1204), 저장된 결과의 개수가 K 개 보다 크거나 같은지를 판단한다(1205).On the other hand, if the root node is not a non-terminal node, as determined in the process 1201, the main server 401 calculates the similarity between the query and the object and extracts only a value less than or equal to the K-nearest distance from the node. It is stored in the result set (1204), and it is determined whether the number of stored results is greater than or equal to K (1205).

상기 과정(1205)에서 판단한 결과, 결과의 개수가 K 개 보다 크거나 동일할 경우에는 주 서버(401)는 K 번째 결과 객체와 질의와의 유사도를 계산하여 범위 질의 형태로 변환하여, 이 변환된 범위 질의를 모든 부 서버(402,403)로 전달한다(1206). 즉, 주 서버(401)는 K 번째에 해당하는 객체와 질의와의 거리를 구하여 부분적인 K-최근접 검색을 하는 것으로, 변환된 범위를 전달받은 부 서버(402,403)와 주 서버(401)는 제안한 방법(도 10)의 범위 질의를 수행하여 병렬성을 최대화해서 각 부 서버(402,403)에서 찾은 결과는 주 서버(401)로 전달한다(1207).As a result of the determination in step 1205, if the number of results is greater than or equal to K, the main server 401 calculates the similarity between the K th result object and the query and converts the result into a range query form. The range query is forwarded to all secondary servers 402 and 403 (1206). That is, the main server 401 performs a partial K-nearest search by obtaining the distance between the object corresponding to the K th and the query, and the sub-servers 402 and 403 and the main server 401 that receive the converted range The range query of the proposed method (FIG. 10) is performed to maximize parallelism, and the findings of each sub server 402 and 403 are transmitted to the main server 401 (1207).

그러면, 각 부 서버(402,403)로부터 검색 결과를 전달받은 주 서버(401)는 해당 검색 결과와 주 서버(401)에 얻는 결과를 취합하여 유사도 순으로 정렬하여 최종 K 개의 결과를 얻고, 그 얻은 K 개의 결과를 클라이언트에게 반환하고 본 루프를 종료한다(1208).Then, the main server 401, which has received the search results from the sub-servers 402 and 403, collects the search results and the results obtained in the main server 401, sorts them in the order of similarity, and obtains the final K results. Results are returned to the client and the loop ends (1208).

한편, 상기 과정(1205)에서 판단한 결과, 결과의 개수가 K 개 보다 크거나 동일하지 않을 경우에는 주 서버(401)는 과정(1203)으로 진행하여 리스트에 저장된 첫 번째 엔트리의 자식 노드를 병렬로 접근해서 현재 노드를 루트 노드로 할당하고, 루트 노드가 비 단말 노드인지를 판단하는 과정(1201)으로 복귀하여 루프를 반복 수행한다.On the other hand, as a result of the determination in step 1205, if the number of results is not greater than or equal to K, the main server 401 proceeds to step 1203 to parallelly execute the child nodes of the first entry stored in the list. The method returns to the process of determining whether the root node is a non-terminal node and assigns the current node to the root node, and repeats the loop.

도 13은 본 발명에 따른 데이터 전송 네트워크(SAN) 환경에서 병렬 고차원 색인 구조의 데이터 탐색 방법에 대한 제4 실시예 흐름도이다.13 is a flowchart of a fourth embodiment of a data searching method of a parallel high-dimensional index structure in a data transmission network (SAN) environment according to the present invention.

즉, 도 13은 부분적인 K-최근접 질의를 주 서버(401) 및 모든 부 서버(402,403)에서 동시에 수행하는 것으로, K-접근 질의가 들어오면 모든 서버(401,402,403)는 루트 노트에 접근하고(1300), 이 접근된 루트 노드가 비 단말 노드인지를 판단한다(1301).That is, FIG. 13 simultaneously executes a partial K-nearest query on the primary server 401 and all secondary servers 402, 403. When a K-access query comes in, all servers 401, 402, 403 access the root note ( 1300, It is determined whether the accessed root node is a non-terminal node (1301).

상기 과정(1301)에서 판단 결과, 루트 노트가 비 단말 노드일 경우에는 모든 서버(401,402,403)는 질의와 엔트리와의 유사도를 계산하고, 각 유사도를 리스트에 유사도 순으로 정렬한다(1302). 그리고, 모든 서버(401,402,403)는 이 리스트에 저장된 첫 번째 엔트리의 자식 노드를 병렬로 접근해서 현재 노드를 루트 노드로 할당하고(1303), 루트 노드가 비단말 노드인지를 판단하는 과정(1301)으로 복귀하여 루프를 반복 수행한다.As a result of the determination in step 1301, when the root note is a non-terminal node, all the servers 401, 402, and 403 calculate similarities between the query and the entry, and sort each similarity in the order of similarity in the list (1302). Then, all the servers 401, 402, 403 access the child nodes of the first entry stored in this list in parallel to allocate the current node as the root node (1303), and determine whether the root node is a non-terminal node (1301). It returns and repeats the loop.

한편, 상기 과정(1301)에서 판단 결과, 루트 노드가 비 단말 노드가 아닐 경우에는 모든 서버(401,402,403)는 질의와 객체의 유사도를 계산하여 노드에서 K-최근접 거리보다 작거나 같은 값만을 추출하여 결과 집합에 저장하고(1304), 저장된 결과의 개수가 K 개 보다 크거나 같은지를 판단한다(1305).On the other hand, if the root node is not a non-terminal node as a result of the process 1301, all the servers (401, 402, 403) calculates the similarity between the query and the object to extract only the value less than or equal to the K- nearest distance from the node It is stored in the result set (1304), and it is determined whether the number of stored results is greater than or equal to K (1305).

상기 과정(1305)에서 판단한 결과, 결과값이 K 보다 크거나 동일할 경우에는 모든 서버(401,402,403)는 K 번째 결과 객체와 질의와의 유사도를 계산하여 범위 질의 형태로 변환하고, 각 부 서버(402,403)는 변환된 범위 질의를 주 서버(401)에 전달한다(1306). 그러면, 주 서버(401)는 각 부 서버(402,403)에서 전달된 범의 질의들과 주 서버(401)의 범위 질의 중 가장 범위가 작은 것을 선택하여 모든 부 서버(402,403)에 전달한다(1307). 그러면, 모든 부 서버(402,403)는 전달받은 범위 질의를 가지고 범위 질의를 수행하고, 각 부 서버(402,403)는 그 수행한 결과를 주 서버(401)에 전달한다(1308). 이때, 상기 주 서버(401)는 모든 부 서버(402,403)로부터 전달받은 결과와 주 서버(401)에서 찾은 결과를 취합하여 K 개의 결과를 선택하고, 주 서버(401)는 그 선택된 K 개의 결과를 클라이언트에게 반환하고 본 루프를 종료한다(1309).As a result of the determination in step 1305, if the result value is greater than or equal to K, all servers 401, 402, 403 calculate the similarity between the K th result object and the query and convert it to a range query form. ) Transmits the converted range query to the main server 401 (1306). Then, the main server 401 selects the smallest of the range queries from the sub servers 402 and 403 and the range queries of the main server 401 and transmits the smallest one to all the sub servers 402 and 403 (1307). Then, all secondary servers 402 and 403 execute the range query with the received range query, and each secondary server 402 and 403 transmits the result to the primary server 401 (1308). At this time, the main server 401 selects K results by combining the results received from all the sub-servers 402 and 403 and the results found by the main server 401, and the main server 401 selects the K results. Return to the client and end this loop (1309).

상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 기록매체(씨디롬, 램, 롬, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에저장될 수 있다.The method of the present invention as described above may be implemented as a program and stored in a computer-readable recording medium (CD-ROM, RAM, ROM, floppy disk, hard disk, magneto-optical disk, etc.).

이상에서 설명한 본 발명은 진술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니고, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하다는 것이 본 발명이 속하는 기술분야에서 통상의 지식을 가진자에게 있어 명백할 것이다.The present invention described above is not limited to the stated embodiments and the accompanying drawings, and various substitutions, modifications, and changes are possible in the art without departing from the technical spirit of the present invention. It will be apparent to those of ordinary skill.

상기와 같은 본 발명은, 디스크 그룹을 서버의 수만큼 나누어서 각각의 서버가 디스크 그룹을 관리함으로써 입출력과 처리기의 병렬성을 동시에 최대화하고 노드의 구조를 변경시켜 팬-아웃을 증가시켜 고차원의 특징을 갖는 데이터에 대해서 효율적으로 색인을 할수 있으며, 또한, 범위 질의시 입출력의 병렬성을 최대화하여 K-최근접 질의에 적용시켜 보다 빠르게 데이터를 검색할 수 있는 효과가 있다.The present invention as described above, by dividing the disk group by the number of servers, each server manages the disk group to maximize the parallelism of the input and output and processor at the same time, change the structure of the node to increase the fan-out has a high-dimensional feature The data can be indexed efficiently, and the range query can maximize the parallelism of I / O and apply it to the K-nearest query to retrieve the data faster.

Claims

In the data insertion method of a parallel high dimensional index structure applied to a parallel high dimensional index system,

A first step in which the primary server allocates a new entry to each disk group and forwards the new entry to at least one secondary server determined including itself;

A second step of the at least one secondary server and the primary server selecting a node through tree traversal for insertion of a received new entry, and determining whether there is free space for inserting an entry in the selected node;

A third step of inserting a new entry into the page free space of the node to be inserted by the main server and the at least one sub server when there is free space in the node as a result of the determination of the second step; And

As a result of the determination of the second step, if there is no free space in the node, the primary server and at least one secondary server allocate a page to the disk where the pages of the level where the current partition occurs are distributed and insert a new entry in the node. Fourth step to reflect the change in the minimum bounding region on the node

Data insertion method of a parallel high-dimensional index structure comprising a.

The method of claim 1,

The third step,

A fifth step of determining whether there is free space in pages of the node to be inserted by the primary server and the at least one secondary server when there is free space in the node;

A sixth step in which the main server and at least one sub-server insert a new entry and reflect the change of the minimum boundary area in the parent node when there is free space in the page as a result of the determination in the fifth step; And

As a result of the determination of the fifth step, when there is no free space in the page, the seventh step of the primary server and at least one secondary server receives and inserts the page from the disk that the page is not allocated to the current node

In the data search method of a parallel high dimensional index structure applied to a parallel high dimensional index system,

Transmitting root node information from the primary server and at least one secondary server to the input / output scheduler;

A second step of determining whether there is information managed by the transferred I / O scheduler;

A third step of performing a range query search according to whether the current node and the terminal node are identical by accessing several nodes in parallel by the primary server and at least one secondary server when there is information managed by the input / output scheduler. ; And

If there is no information managed by an input / output scheduler, the at least one secondary server transmits the range query result to the primary server, and the primary server receives the range query result received from the at least one secondary server and its own. Fourth step of collecting and outputting the range query result

Data search method of a parallel high-dimensional index structure comprising a.

4. The method of claim 3, wherein the third step is repeated until there is no information managed by the scheduler as a result of the determination of the second step.

The method according to claim 3 or 4,

The third step,

A fifth step of determining whether the current node and the terminal node are the same by accessing several nodes in parallel from the input / output scheduler by the primary server and the at least one secondary server;

As a result of the determination in the fifth step, when the current node and the terminal node are not the same, the primary server and the at least one secondary server select an entry included in the query range, and the information on the selected entry is input / output scheduler. A sixth step of delivering to; And

As a result of the determination in the fifth step, when the current node and the terminal node are the same, the sixth step in which the primary server and the at least one secondary server calculate the similarity between the object and the query included in the node and store the calculated result.

Data search method of a parallel high-dimensional index structure comprising a.

A first step in which the primary server or at least one secondary server accesses the root node to determine whether the accessed root node is a non-terminal node;

As a result of the determination in the first step, when the accessed root node is a non-terminal node, the primary server and at least one secondary server calculate similarities between the query and the entries, and sort the calculated similarities in the order of similarity in the list. And a second step of accessing the child nodes of the first entry stored in the list in parallel and allocating the current node as the root node;

As a result of the determination of the first step, when the approaching root node is not a non-terminal node, the primary server and at least one secondary server calculate the similarity between the query and the object so that only a value less than or equal to the K-nearest distance from the node is calculated. Extracting and storing the result set in a result set;

A fourth step of determining whether the number of results stored in the result set is greater than or equal to K (k is a natural number) and the K th result is less than or equal to the similarity value of the first entry of the list;

As a result of the determination in the fourth step, if the number of stored results is greater than or equal to K and the Kth result is less than or equal to the similarity value of the first entry in the list, the at least one secondary server returns K primary results. A fifth step in which the primary server collects the results K and the results found by the primary server from the at least one secondary server, sorts them in order of similarity, and selects K final results and returns them to the client; And

As a result of the determination in the fourth step, if the number of stored results is not greater than or equal to K or if the Kth result is not less than or equal to the similarity value of the first entry in the list, the primary server and at least one secondary server are listed. Step 6 of allocating the current node as the root node by accessing the child nodes of the first entry stored in parallel in parallel

Data search method of a parallel high-dimensional index structure comprising a.

A first step of accessing a root node by a K-access query inputted by the main server, and determining whether the accessed root node is a non-terminal node;

As a result of the determination of the first step, if the accessed root node is a non-terminal node, the main server calculates the similarity between the query and the entries, sorts each similarity in the order of similarity in the list, and stores the first entry in the list. Accessing the child nodes of the second node in parallel and allocating the current node as a root node;

As a result of the determination in the first step, when the accessed root node is not a non-terminal node, the main server calculates the similarity between the query and the object, extracts only a value less than or equal to the K-nearest distance from the node, and stores the result in the result set. Determining whether the number of stored results is greater than or equal to K;

As a result of the determination in the third step, when the number of results stored in the result set is greater than or equal to K, the main server calculates the similarity between the K th result object and the query and converts it into a range query form. Transmitting a range query to at least one secondary server;

At least one sub-server receiving the converted range performs a range query to deliver a search result of maximizing parallelism to the main server, and collects the search results received from the main server and the obtained results and sorts them in similarity order. Obtaining a final K results and returning the final K results to the client; And

As a result of the determination in the third step, if the number of results stored in the result set is not greater than or equal to K, the main server accesses the child nodes of the first entry stored in the list in parallel and allocates the current node as the root node. 6th Step

Data search method of a parallel high-dimensional index structure comprising a.

A first step in which the primary server and at least one department access the root note for the K-access query to determine whether the root node is a non-terminal node;

As a result of the determination in the first step, when the root note is a non-terminal node, the primary server and at least one secondary server calculate similarities between the query and the entries, sort each similarity in the list in the order of similarity, and store them in the list. A second step of accessing the child nodes of the first entry in parallel and allocating the current node as the root node;

If the root node is not the non-terminal node as a result of the first step, the primary server and at least one secondary server calculate the similarity between the query and the object, and extract only the value less than or equal to the K-nearest distance from the node. Storing in a result set and determining whether the number of stored results is greater than or equal to K;

If the number of results is not greater than or equal to K as a result of the determination in step 3, the primary server and at least one secondary server access the child nodes of the first entry stored in the list in parallel to access the current node as the root node. 4th step of allocating

As a result of the determination in the third step, when the result value is greater than or equal to K, the primary server and at least one secondary server calculate the similarity between the K th result object and the query and convert the similarity to the range query form. A fifth step of sending, by the secondary server, the converted range query to the primary server;

The main server receiving the converted range query selects a range query having the smallest range among the ranged queries and the range query converted by the main server, and transmits the selected range query to at least one sub-server, and at least one received the range query. A sixth step of the secondary server performing a range query and delivering a result of the range query to the primary server; And

A seventh step in which the primary server, which has received the result of executing the range query from at least one secondary server, collects the results found by the primary server, selects K results, and returns the selected K results to the client;

Data search method of a parallel high-dimensional index structure comprising a.

In a parallel high dimensional indexing system with a processor for inserting a query of a parallel high dimensional index structure,

A first function in which a primary server allocates a new entry to each disk group and forwards the new entry to at least one secondary server determined including itself;

The at least one secondary server and the primary server select a node through tree traversal for insertion of a new entry, and determine whether there is free space for inserting an entry in the selected node;

A third function of inserting a new entry into a page free space of a node to be inserted by the main server and the at least one sub server when there is free space in the node as a result of the determination of the second function; And

As a result of the determination of the second function, if there is no free space in the node, the primary server and the at least one secondary server allocate a page to the disk where the pages of the level where the current partition occurs are distributed, and insert a new entry in the node. Fourth feature to reflect changes in the minimum bounding region on the node

A computer-readable recording medium having recorded thereon a program for realizing this.

In order to search the data of the parallel high dimensional index structure, in a parallel high dimensional index system with a processor,

A first function for the primary server and the at least one secondary server to transmit root node information to the input / output scheduler, and to determine whether there is information managed by the transferred input / output scheduler;

As a result of the determination in the first function, if there is no information managed by the transmitted I / O scheduler, at least one secondary server transmits the range query result to the primary server, and the primary server delivers the at least one secondary server. A second function of collecting the received range query result and the range query result found by the user and delivering the collected query result to the client; And

As a result of judging by the first function, if there is information managed by an input / output scheduler, the primary server and at least one secondary server access several nodes in parallel from the input / output scheduler to determine whether the current node and the terminal node are the same. Third function to perform range query search depending on whether

A first function of the primary server and at least one secondary server accessing the root node to determine whether the accessed root node is a non-terminal node;

As a result of the determination of the first function, when the accessed root node is a non-terminal node, the primary server and at least one secondary server calculate the similarity between the query and the entry, and sort each calculated similarity in the order of similarity in the list. And a second function of accessing the child nodes of the first entry stored in the list in parallel and allocating the current node as the root node;

As a result of the determination of the first function, when the approaching root node is not a non-terminal node, the primary server and at least one secondary server calculate the similarity between the query and the object so that only a value less than or equal to the K-nearest distance from the node is calculated. A third function of extracting and storing the result set;

A fourth function of determining whether the number of results stored in the result set is greater than or equal to K (k is a natural number) and the K th result is less than or equal to the similarity value of the first entry of the list;

As a result of the determination of the fourth function, when the number of stored results is greater than or equal to K and the Kth result is less than or equal to the similarity value of the first entry in the list, at least one secondary server returns the K results in the primary server. A fifth function of collecting the result K and the result found by the primary server from the at least one secondary server and sorting the results found by the primary server, and selecting and returning K final results to the client; And

As a result of the determination of the fourth function, when the number of stored results is not greater than or equal to K or the Kth result is not less than or equal to the similarity value of the first entry of the list, the primary server and at least one secondary server are listed. The sixth function of assigning the current node as the root node by accessing the child nodes of the first entry stored in parallel in parallel.

A first function of accessing a root node through a K-access query input by the main server, and determining whether the accessed root node is a non-terminal node;

As a result of the determination of the first function, if the accessed root node is a non-terminal node, the main server calculates the similarity between the query and the entry, sorts each similarity in the order of similarity in the list, and stores the first entry in the list. A second function of allocating a current node as a root node by accessing the child nodes in parallel;

As a result of the determination in the first function, when the accessed root node is not a non-terminal node, the main server calculates the similarity between the query and the object, extracts only a value less than or equal to the K-nearest distance from the node, and stores the result in the result set. And determining whether the number of stored results is greater than or equal to K numbers;

As a result of the third function, if the number of results stored in the result set is greater than or equal to K, the main server calculates the similarity between the K th result object and the query and converts it into a range query form. A fourth function of forwarding a range query to at least one secondary server;

At least one sub-server receiving the converted range performs a range query to deliver a search result of maximizing parallelism to the main server, and collects the search results received from the main server and the obtained results and sorts them in similarity order. A fifth function of obtaining the final K results and returning the final K results to the client; And

As a result of the determination of the third function, if the number of results stored in the result set is not greater than or equal to K, the main server accesses the child nodes of the first entry stored in the list in parallel and allocates the current node as the root node. 6th function to do

In order to search the data of the parallel high dimensional index structure, in a parallel high dimensional index system having a processor,

A first function for determining whether the root node is a non-terminal node by accessing the root note for the K-access query by the primary server and the at least one department;

As a result of the determination of the first function, when the root note is a non-terminal node, the primary server and at least one secondary server calculate similarities between the query and the entry, sort each similarity in the list in the order of similarity, and store the list in the list. A second function of allocating the current node as the root node by accessing the child nodes of the first entry in parallel;

As a result of the determination of the first function, if the root node is not a non-terminal node, the primary server and at least one secondary server calculate the similarity between the query and the object and extract only the value less than or equal to the K-nearest distance from the node. A third function of storing in a result set and determining whether the number of stored results is greater than or equal to K;

As a result of the determination of the above three functions, if the number of results is not greater than or equal to K, the primary server and at least one secondary server access the child nodes of the first entry stored in the list in parallel, and make the current node the root node. A fourth function of allocating;

As a result of the determination of the third function, when the result value is greater than or equal to K, the main server and the at least one sub server calculate the similarity between the K th result object and the query and convert the result into a range query form. A fifth function for the secondary server to forward the converted range query to the primary server;

The main server receiving the converted range query selects a range query having the smallest range among the ranged queries and the range query converted by the main server, and transmits the selected range query to at least one sub-server, and at least one received the range query. A sixth function of the secondary server performing a range query and delivering a result of the range query to the primary server; And

The seventh function that the main server, which has received the result of executing the range query from at least one secondary server, collects the results found by itself and selects K results and returns the selected K results to the client.