KR20030006638A

KR20030006638A - Apparatus And Method of Cell-based Indexing of High-dimensional Data

Info

Publication number: KR20030006638A
Application number: KR1020010042482A
Authority: KR
Inventors: 박수준; 장재우; 김현진; 박성희; 장명길; 박상규; 한성근
Original assignee: 한국전자통신연구원
Priority date: 2001-07-13
Filing date: 2001-07-13
Publication date: 2003-01-23
Also published as: KR100446639B1

Abstract

PURPOSE: A cell based high dimensional data indexing system and method is provided to index high dimensional data based on a cell for preventing a lowering of a search efficiency in searching for high dimensional data. CONSTITUTION: The method comprises several steps. First, an N dimensional feature vector is extracted from a multimedia object via a feature vector extractor(801). A distance signature is generated via a signature generation module by using a distance between a signature on the feature vector and a cell center(802). One signature is generated by concatenating the feature vector signature and the distance signature(803), and then is stored at a signature database(804). At the same time, the feature vector is stored at a feature vector database(805). A user can perform a search operation on the stored feature vectors by using various queries like a point query, a range query or k-nearest query(806).

Description

Apparatus And Method of Cell-based Indexing of High-dimensional Data

본 발명은 셀 기반의 고차원 데이터 색인 장치 및 그 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.The present invention relates to a cell-based high-dimensional data indexing apparatus, a method thereof, and a computer-readable recording medium having recorded thereon a program for realizing the method.

일반적으로 이미지나 비디오와 같은 멀티미디어 객체로부터 추출되는 특징 벡터(feature vector)는 멀티미디어 객체의 내용-기반 검색에 사용된다. 이를 위해 고차원 특징 벡터를 효율적으로 저장하기 위한 다수의 고차원 데이터 색인 기법들이 제안되어 왔다. 기존 고차원 색인 기법은 데이터 공간상에 흩어져 있는 객체들을 효율적으로 검색하기 위해, 근접한 객체들의 집합인 최소경계사각형(MBR)을 검색 단위로 사용하였다. 그러나, 데이터 차원이 증가할수록 최소경계사각형 사이에겹침 영역이 확대됨으로 인해 검색 성능이 기하급수적으로 떨어지는 차원 저주(dimensional curse) 문제가 발생하여 이에 대한 개선이 요구되고 있다.In general, feature vectors extracted from multimedia objects such as images and videos are used for content-based retrieval of multimedia objects. To this end, a number of high-dimensional data indexing techniques have been proposed to efficiently store high-dimensional feature vectors. In order to efficiently search for objects scattered in the data space, the existing high-dimensional indexing technique uses a minimum bounding rectangle (MBR) as a search unit. However, as the data dimension increases, the overlap region between the minimum boundary rectangles is enlarged, thereby causing a dimensional curse problem in which search performance decreases exponentially.

즉, 기존의 고차원 데이터 색인 기법들은 객체 사이의 유사 거리를 계산하기 위하여 객체를 포함하는 최소경계사각형(MBR : Minimum Bounding Rectangle)을 사용한다. 그러나, 차원이 증가할수록 최소경계사각형 사이에 겹침 영역(overlap)이 급격하게 발생함으로써 검색 성능을 떨어뜨리는 문제를 안고 있다. 이와 같이, 기존에 제시된 대부분의 고차원 데이터 색인 기법들은 10차원 이하의 저차원 데이터에 대해서는 검색 성능이 우수하지만, 차원이 증가함에 따라 검색 성능이 기하급수적으로 떨어지는 문제점이 있다.In other words, existing high-dimensional data indexing techniques use a minimum bounding rectangle (MBR) including objects to calculate similar distances between objects. However, as the dimension increases, the overlap occurs between the minimum bounding rectangles so that the search performance is degraded. As such, most of the existing high-dimensional data indexing techniques have excellent search performance for low-dimensional data of 10 dimensions or less, but have a problem in that the search performance decreases exponentially with increasing dimensions.

본 발명은, 상기한 바와 같은 문제점을 해결하기 위하여 제안된 것으로, 고차원의 데이터를 검색함에 있어 성능이 저하되지 않도록, 셀-기반으로 고차원 데이터를 색인하기 위한 셀-기반의 고차원 데이터 색인 장치 및 그 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있다.The present invention has been proposed to solve the above problems, and a cell-based high-dimensional data indexing apparatus for indexing high-dimensional data on a cell-based basis so that performance is not degraded in retrieving high-dimensional data and its It is an object of the present invention to provide a computer-readable recording medium storing a method and a program for realizing the method.

도 1 은 본 발명이 적용되는 셀 기반의 고차원 데이터 색인 시스템의 일실시예 구성도.1 is a diagram illustrating an embodiment of a cell-based high-dimensional data indexing system to which the present invention is applied.

도 2 는 본 발명에 따른 새로운 최소거리(MINDIST) 및 최대거리(MAXDIST)에 대한 정의도.2 is a definition of a new minimum distance (MINDIST) and maximum distance (MAXDIST) in accordance with the present invention.

도 3 은 본 발명에 따른 N차원 벡터를 시그니쳐로 변환하는 일실시예 구조도.3 is a structural diagram of an embodiment of converting an N-dimensional vector into a signature according to the present invention;

도 4 는 본 발명에 따른 시그니쳐 및 벡터의 저장을 위한 일실시예 구조도.4 is an exemplary structural diagram for the storage of signatures and vectors in accordance with the present invention;

도 5 는 본 발명에 따른 시그니쳐 및 벡터 검색을 위한 일실시예 구조도.5 is an exemplary structural diagram for signature and vector search according to the present invention.

도 6 은 본 발명에 따른 k-최근접 질의에 대한 처리 예시도.6 is an exemplary processing diagram for a k-nearest query according to the present invention.

도 7 은 본 발명에 따른 주어진 범위내의 포함된 모든 객체를 검색하는 범위 질의에 대한 처리 예시도.7 is an exemplary process for a range query to retrieve all objects included within a given range in accordance with the present invention.

도 8 은 본 발명에 따른 셀 기반의 고차원 데이터 색인 방법의 일실시예 동작 흐름도.8 is a flowchart illustrating an embodiment of a cell-based high-dimensional data indexing method according to the present invention.

* 도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

11 : 이미지12 : 객체 저장기11: Image 12: Object Saver

13 : 이미지 데이터베이스14 : 특징 벡터 추출기13: Image Database 14: Feature Vector Extractor

15 : N-차원 특징 벡터16 : 셀-기반 색인 장치15 N-dimensional feature vector 16 Cell-based indexing device

17 : 사용자101 : 시그니쳐 생성 모듈17: user 101: signature generation module

102 : 저장 모듈103 : 동시성 제어 모듈102: storage module 103: concurrency control module

104 : 검색 모듈105 : 시그니쳐 데이터베이스104: search module 105: signature database

106 : 특징 벡터 데이터베이스106: Feature Vector Database

상기 목적을 달성하기 위한 본 발명은, 셀 기반의 고차원 데이터 색인 장치에 있어서, 객체로부터 추출된 N-차원 특징 벡터를 이용하여, 상기 N-차원 특징 벡터가 속하는 셀을 구하고 이 셀을 표시하기 위한 특징 벡터 시그니쳐와 상기 N-차원 특징 벡터가 속하는 셀의 중심점에서 주어진 상기 N-차원 특징 벡터까지의 거리 시그니쳐를 생성하고 상기 특징 벡터 시그니쳐와 상기 거리 시그니쳐를 병합한 병합 시그니쳐를 생성하기 위한 시그니쳐 생성 수단; 상기 N-차원 특징 벡터와 상기 생성된 병합 시그니쳐를 서로 대응되도록 저장하기 위한 저장 수단; 및 상기 N-차원 특징 벡터 및 상기 병합 시그니쳐의 저장과 검색시, 록킹(Locking)개념을 이용하여 다수의 사용자를 지원하기 위한 동시성 제어 수단을 포함하여 이루어진 것을 특징으로 한다.The present invention for achieving the above object, in the cell-based high-dimensional data indexing apparatus, using the N-dimensional feature vector extracted from the object, to obtain a cell to which the N-dimensional feature vector belongs and to display the cell Signature generation means for generating a distance signature from a center point of a cell to which a feature vector signature and the N-dimensional feature vector belong to the given N-dimensional feature vector and generating a merge signature in which the feature vector signature and the distance signature are merged. ; Storage means for storing the N-dimensional feature vector and the generated merge signature to correspond to each other; And concurrency control means for supporting a plurality of users by using a locking concept when storing and retrieving the N-dimensional feature vector and the merge signature.

한편, 본 발명은, 셀 기반의 고차원 데이터 색인 장치에 사용되는 고차원 데이터 색인 방법에 있어서, 특징 벡터 추출기를 통해 멀티미디어 객체로부터 N-차원 특징 벡터를 추출하는 제 1 단계; 상기 추출된 N-차원 특징 벡터로부터 특징 벡터 시그니쳐와 셀 중심에서 객체까지의 거리 값을 사용한 거리 시그니쳐를 생성하는 제 2 단계; 상기 생성된 특징 벡터 시그니쳐와 거리 시그니쳐를 하나의 시그니쳐로 병합(concatenation)하여 병합 시그니쳐를 생성하는 제 3 단계; 및 상기 생성된 병합 시그니쳐와 상기 특징 벡터를 서로 대응되도록 저장하는 제 4 단계를 포함하여 이루어진 것을 특징으로 한다.On the other hand, the present invention, a high-dimensional data indexing method used in a cell-based high-dimensional data indexing apparatus, comprising: a first step of extracting an N-dimensional feature vector from a multimedia object through a feature vector extractor; Generating a distance signature using the feature vector signature and a distance value from a cell center to an object from the extracted N-dimensional feature vector; A third step of concatenating the generated feature vector signature and the distance signature into one signature to generate a merge signature; And a fourth step of storing the generated merge signature and the feature vector to correspond to each other.

한편, 본 발명은, 프로세서를 구비한 고차원 데이터 색인 장치에, 특징 벡터 추출기를 통해 멀티미디어 객체로부터 N-차원 특징 벡터를 추출하는 제 1 기능; 상기 추출된 N-차원 특징 벡터로부터 특징 벡터 시그니쳐와 셀 중심에서 객체까지의 거리 값을 사용한 거리 시그니쳐를 생성하는 제 2 기능; 상기 생성된 특징 벡터 시그니쳐와 거리 시그니쳐를 하나의 시그니쳐로 병합(concatenation)하여 병합 시그니쳐를 생성하는 제 3 기능; 및 상기 생성된 병합 시그니쳐와 상기 특징 벡터를 서로 대응되도록 저장하는 제 4 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.On the other hand, the present invention provides a high-dimensional data indexing apparatus having a processor, comprising: a first function of extracting an N-dimensional feature vector from a multimedia object through a feature vector extractor; A second function of generating a distance signature using a feature vector signature and a distance value from a cell center to an object from the extracted N-dimensional feature vector; A third function of generating a merge signature by concatenating the generated feature vector signature and the distance signature into one signature; And a computer-readable recording medium having recorded thereon a program for realizing a fourth function of storing the generated merge signature and the feature vector in correspondence with each other.

고차원 데이터 검색을 지원하는 종래의 색인 기술의 단점은 근접 객체 집합인 최소경계사각형의 겹침 영역 발생의 문제와 객체와 최소경계사각형 사이의 거리 개념 정의의 비효율성이다. 본 발명에서는 고차원 데이터 공간을 셀로 나누고 시그니쳐로 표현함으로써, 셀 사이의 겹침을 제거하고, 객체와 셀 중심 사이의 새로운 거리 개념을 사용함으로써 필터링 효과를 증대하여 고차원 데이터에 대한 검색 효율을 최대화한다.Disadvantages of the conventional indexing technology supporting high-dimensional data retrieval are the problem of occurrence of overlapping area of the minimum boundary rectangle, which is a set of adjacent objects, and the inefficiency of the concept of distance between the object and the minimum boundary rectangle. In the present invention, by dividing the high-dimensional data space into cells and representing them as signatures, the overlap between the cells is eliminated and the new concept of distance between the object and the cell center is used to increase the filtering effect to maximize the search efficiency for the high-dimensional data.

상술한 목적, 특징들 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명한다.The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1 은 본 발명이 적용되는 셀 기반의 고차원 데이터 색인 시스템의 일실시예 구성도이다.1 is a configuration diagram of an embodiment of a cell-based high-dimensional data indexing system to which the present invention is applied.

본 발명이 적용되는 셀 기반의 고차원 데이터 색인 시스템은 도 1 에 도시된 바와 같이, 이미지(11)를 이미지 데이터베이스(13)에 저장하는 객체 저장기(12), 이미지(11)의 특징 벡터를 추출하는 특징 벡터 추출기(14), 객체 저장기(12)로부터의 식별자(ID)와 특징 벡터 추출기(14)를 통해 추출된 N-차원 특징 벡터(15)에 따라 고차원 데이터를 색인하기 위한 셀-기반 고차원 데이터 색인 장치(16)를 구비한다.The cell-based high-dimensional data indexing system to which the present invention is applied extracts the feature vector of the object store 12 and the image 11 that stores the image 11 in the image database 13, as shown in FIG. Cell-based for indexing high-dimensional data according to the feature vector extractor 14, the identifier (ID) from the object store 12 and the N-dimensional feature vector 15 extracted through the feature vector extractor 14. A high dimensional data indexing device 16 is provided.

도 1 에 도시된 블록 중 셀-기반 색인 장치(16)를 좀 더 상세히 살펴보면, 주어진 N-차원 특징 벡터(15)를 이용하여, N-차원 특징 벡터(15)가 속하는 셀을 구하고 이 셀을 표시하기 위해 1과 0의 비트 패턴으로 표현되는 시그니쳐로 변환하고 아울러 N-차원 특징 벡터(15)가 속하는 셀의 중심점에서 주어진 N-차원 특징 벡터(15)까지의 거리 값을 구하기 위한 시그니쳐 생성 모듈(101), 특징 벡터 데이터베이스(106)에 N-차원 특징 벡터(15)를 저장하고 시그니쳐 데이터베이스(105)에 상기에서 구한 시그니쳐와 거리 값을 저장하는 저장 모듈(102), N-차원 특징 벡터(15) 및 시그니쳐의 저장과 검색 시, 록킹(Locking)개념을 이용하여 다수의 사용자를 지원하기 위한 동시성 제어 모듈(103) 및 사용자의 질의에 대해 시그니쳐 데이터베이스(105)를 이용해서 사용자의 질의를 만족할 가능성이 없는 것을 필터링하고, 마지막으로 필터링되지 않는 것에 대해 특징 벡터 데이터베이스(106)로부터 사용자의 질의를 만족하는 객체의 특징 벡터 및 아이디를 찾아 출력하는 검색 모듈(104)을 구비한다.Referring to the cell-based indexing device 16 of the block shown in FIG. 1 in more detail, using a given N-dimensional feature vector 15, a cell to which the N-dimensional feature vector 15 belongs is obtained and this cell is obtained. Signature generation module for converting to a signature represented by a bit pattern of 1s and 0s for display and obtaining a distance value from the center point of the cell to which the N-dimensional feature vector 15 belongs to a given N-dimensional feature vector 15. (101), a storage module 102 for storing the N-dimensional feature vector 15 in the feature vector database 106 and the signature and distance values obtained above in the signature database 105, the N-dimensional feature vector ( 15) and at the time of storing and retrieving signatures, the concurrency control module 103 for supporting a plurality of users using a locking concept and the signature database 105 for user queries can satisfy the user's queries.Filter that there is no possibility, and with a search module 104 to find and output a feature vector and the identity of the object that satisfies the user's query from the feature vector database 106 about the end that is not filtered.

도 2 는 본 발명에 따른 새로운 최소거리(MINDIST) 및 최대거리(MAXDIST)에 대한 정의도이다.2 is a definition of the new minimum distance (MINDIST) and the maximum distance (MAXDIST) according to the present invention.

본 발명에서는 셀-기반으로 필터링을 수행하기 위해 새로운 거리 개념을 정의하여 사용한다. 본 발명에서 사용하는 새로운 거리 개념은 도 2 에 도시된 최소거리(MINDIST)(202)와 최대거리(MAXDIST)(201)이다.In the present invention, a new distance concept is defined and used to perform cell-based filtering. The new distance concept used in the present invention is the minimum distance (MINDIST) 202 and the maximum distance (MAXDIST) 201 shown in FIG.

본 발명에서는 객체를 저장하기 전에 셀 중심과 객체 사이의 거리(RADIUS)를미리 계산하여 저장하며, 이 값을 사용하여 사용자로부터 주어진 질의 객체(Q)와의 최소, 최대 거리를 다음의 <수학식1>, <수학식2>와 같이 정의하여 사용한다.In the present invention, before storing the object, the distance between the cell center and the object (RADIUS) is calculated and stored in advance. Using this value, the minimum and maximum distances from the user to the given query object (Q) are expressed as follows. >, <Equation 2> is used as defined.

<수학식1>에서 정의된 최소거리(MINDIST)(202)는 질의 객체와 셀 내에 저장된 객체와의 가장 가까운 거리이며, 질의 객체와 셀 중심과의 거리(CENTERDIST)에서, 미리 계산되어 저장된 셀 중심과 객체 사이의 거리(RADIUS)를 뺀 값이다.The minimum distance (MINDIST) 202 defined in Equation 1 is the closest distance between the query object and the object stored in the cell, and in the distance between the query object and the cell center (CENTERDIST), the cell center is calculated and stored in advance. Minus the distance between the object and RADIUS.

그리고, <수학식2>에서 정의된 최대거리(MAXDIST)(201)는 질의 객체와 셀 내에 저장된 객체와의 가장 먼 거리이며, 질의 객체와 셀 중심과의 거리(CENTERDIST)에 셀 중심과 객체 사이의 거리(RADIUS)를 더한 값이다.The maximum distance (MAXDIST) 201 defined in Equation 2 is the farthest distance between the query object and the object stored in the cell, and the distance between the cell center and the object in the distance between the query object and the cell center (CENTERDIST). The distance of RADIUS is added.

상기의 질의 객체로부터 구해진 최소거리와 최대거리를 사용하면, 데이터베이스에 저장된 모든 객체를 접근하지 않고 필터링을 통하여 원하는 객체를 빨리 탐색할 수 있다. 즉, 새롭게 정의된 최소, 최대 거리는 하나의 셀에 저장된 객체들의 영역을 최적화하여 표현함으로써 필터링 효과를 증대시킨다.By using the minimum and maximum distances obtained from the query objects, you can quickly search for the desired object through filtering without accessing all the objects stored in the database. That is, the newly defined minimum and maximum distances increase the filtering effect by optimizing and expressing an area of objects stored in one cell.

도 3 은 본 발명에 따른 N차원 벡터를 시그니쳐로 변환하는 일실시예 구조도이다.3 is a structural diagram of an embodiment of converting an N-dimensional vector into a signature according to the present invention.

셀-기반 필터링에서는 데이터 공간이 셀로 분할되며, 메인 메모리 사용의 최적화를 위해 각각의 셀은 시그니쳐로 표현된다. 이때, 셀이라 함은 구간을 나눈 결과로 이루어지는 한 부분을 말하며, 시그니쳐는 셀을 1과 0의 비트 패턴으로 표현한 것을 말한다.In cell-based filtering, the data space is divided into cells, and each cell is represented by a signature to optimize main memory usage. In this case, the cell refers to a part formed as a result of dividing the interval, and the signature refers to a cell represented by a bit pattern of 1 and 0.

고차원 공간상의 객체의 특징 벡터는 그 객체를 포함하는 셀의 시그니쳐로 변환되어 저장된다. 또한, 필터링 효과를 증대하기 위해 셀 중심에서 객체까지의 거리 값을 계산하며, 이 값 또한 시그니쳐로 변환하여 저장한다. 도 3 은 이와 같은 시그니쳐를 생성하는 과정을 나타낸다. 차원이 N인 특징 벡터를 시그니쳐로 변환하기 위해서 다음과 같은 <수학식3>을 사용한다.The feature vector of the object in the high dimensional space is converted into the signature of the cell containing the object and stored. In addition, in order to increase the filtering effect, the distance value from the cell center to the object is calculated, and this value is also converted into a signature and stored. 3 shows a process of generating such a signature. Equation 3 is used to convert a feature vector having a dimension of N into a signature.

여기서, b는 특징 벡터의 각 차원마다 할당할 시그니쳐 비트 수Where b is the number of signature bits to allocate for each dimension of the feature vector

F는 0이상 1.0미만의 값을 갖는 특징 벡터F is a feature vector with a value between 0 and less than 1.0

s는 생성되는 시그니쳐s is the signature generated

<수학식3>에 따르면, N차원 특징 벡터에 대한 N차원 시그니쳐의 전체 크기는bits 가 된다. 아울러, 셀 중심에서 객체까지의 거리값 또한 1바이트의 시그니쳐로 표현한다. 이렇게 생성된 특징 벡터의 시그니쳐와 거리 시그니쳐는 하나의시그니쳐로 병합되어 시그니쳐 파일에 저장된다.According to Equation 3, the total size of the N-dimensional signature for the N-dimensional feature vector is bits. In addition, the distance value from the cell center to the object is also expressed as a signature of 1 byte. The signature and the distance signature of the feature vector thus generated are merged into one signature and stored in the signature file.

도 4 는 본 발명에 따른 시그니쳐 및 벡터의 저장을 위한 일실시예 구조도이다.4 is a structural diagram of an embodiment for storing a signature and a vector according to the present invention.

시그니쳐 생성 모듈에 의해 N차원 특징 벡터(41)로부터 생성된 특징 벡터 시그니쳐(401)와 거리 시그니쳐(402)를 병합한 병합 시그니쳐(403)는 저장 모듈을 통해 시그니쳐 파일(42)에 순차적으로 저장된다.The merge signature 403 that merges the feature vector signature 401 and the distance signature 402 generated from the N-dimensional feature vector 41 by the signature generation module is sequentially stored in the signature file 42 through the storage module. .

그러나, 객체의 삭제나 갱신과 같은 연산이 시스템에서 발생하였다면, 참조 파일(reference file)(44)을 참조하여 현재 시그니쳐 파일(42)에서 빈 레코드 영역의 위치를 알아내어 그 위치에 시그니쳐를 저장한다.However, if an operation such as deleting or updating an object occurs in the system, the reference file 44 is used to locate the empty record area in the current signature file 42 and store the signature there. .

시그니쳐를 저장한 후, 실제 객체의 특징 벡터를 시그니쳐와 같은 위치의 데이터 파일(data file)(43)에 저장한다. 이와 같이, 시그니쳐와 특징 벡터를 같은 위치에 저장함으로써, 검색시 부가적인 연산을 줄일 수 있다.After storing the signature, the feature vector of the actual object is stored in a data file 43 at the same location as the signature. In this way, by storing the signature and the feature vector in the same position, additional operations in the retrieval can be reduced.

도 5 는 본 발명에 따른 시그니쳐 및 벡터 검색을 위한 일실시예 구조도이다.5 is a structural diagram of an embodiment for signature and vector search according to the present invention.

저장된 객체들을 검색하기 위해 사용자 질의가 주어지면 사용자 질의 벡터(51)로부터 시그니쳐 생성 모듈을 통해 질의 시그니쳐(501)를 구한다. 사용자 질의 벡터(51) 및 질의 시그니쳐(501) 정보를 사용하여 시그니쳐 파일(52)을 순차 탐색한다. 시그니쳐 파일(52)을 순차 탐색하여 얻어진 후보 셀 리스트(54)를 통해 특징 벡터로부터의 데이터 파일(53)을 검색할 수 있다.Given a user query for retrieving stored objects, the query signature 501 is obtained from the user query vector 51 through the signature generation module. The signature file 52 is sequentially searched using the user query vector 51 and query signature 501 information. The data file 53 from the feature vector can be retrieved through the candidate cell list 54 obtained by sequentially searching the signature file 52.

이 때, 본 발명에서 새롭게 정의한 최소 거리와 최대 거리를 사용하여 필터링을 수행한다. 필터링하여 얻은 시그니쳐 레코드와 이에 해당되는 데이터 레코드들만을 액세스함으로써 원하는 결과를 얻을 수 있다. 즉, 새롭게 정의된 최소 거리와 최대거리를 사용함으로써 불필요한 데이터 액세스를 줄일 수 있어 검색 속도가 향상된다.At this time, filtering is performed using the minimum distance and the maximum distance newly defined in the present invention. By accessing only the signature record obtained by filtering and the corresponding data records, the desired result can be obtained. In other words, by using the newly defined minimum and maximum distances, unnecessary data access can be reduced, thereby improving search speed.

이하 도 6 내지 도 7 를 통해 본 발명에 따른 특징 벡터 검색 방법을 좀 더 상세히 살펴보기로 한다.Hereinafter, the feature vector search method according to the present invention will be described in more detail with reference to FIGS. 6 to 7.

도 6 은 본 발명에 따른 k-최근접 질의에 대한 처리 예시도이다.6 is an exemplary processing diagram for a k-nearest query according to the present invention.

본 발명에 따른 k-최근접 질의라 함은 사용자의 질의에 가장 유사한 k개의 객체를 검색하는 방법이다. 도 6 에 도시된 바와 같이 차원의 수는 2이며, 차원마다 2비트 시그니쳐를 사용한다고 가정한다.The k-nearest query according to the present invention is a method for searching k objects most similar to a user's query. As shown in FIG. 6, the number of dimensions is 2, and it is assumed that a 2-bit signature is used for each dimension.

사용자 질의(Q)가 (0.4, 0.2)로 주어졌을 경우, 우선 시그니쳐 파일에 저장된 모든 시그니쳐들(A, B, C, D, E)을 순차적으로 탐색하여 후보 셀들을 얻는다. 즉, 각각의 시그니쳐들을 순차적으로 탐색하면서 현재까지 얻어진 k-번째 최대 거리(MAXDIST)와 현재 탐색 중인 셀의 최소 거리(MINDIST)를 비교하여 필터링을 수행한다.If the user query Q is given as (0.4, 0.2), first, all the signatures A, B, C, D, and E stored in the signature file are sequentially searched to obtain candidate cells. That is, filtering is performed by comparing the k-th maximum distance (MAXDIST) obtained so far and the minimum distance (MINDIST) of the cell currently being searched while sequentially searching each signature.

좀 더 자세히 살펴보면, 도 6 에서 시그니쳐들을 탐색하면서 얻어진 k-번째 최대 거리 값은 질의 점(Q)과 셀 D 사이의 거리이다. 따라서, 이 값보다 큰 최소 거리를 갖는 셀들(B, C, E)은 후보 셀로부터 제외되며, 이 값보다 작은 최소 거리를 갖는 셀들(A, D)은 후보 셀로 선택된다. 이렇게 선택된 후보 셀들에 대해서, 데이터 파일을 접근하여 객체의 특징 벡터와 질의 점(Q) 사이의 거리를 비교하여 가장 가까운 거리를 갖는 객체 D(0.6, 0.4)를 최종적으로 검색하여 반환한다.In more detail, the k-th maximum distance value obtained while searching for the signatures in FIG. 6 is the distance between the query point Q and the cell D. Thus, cells B, C, and E having a minimum distance greater than this value are excluded from the candidate cell, and cells A and D having a minimum distance less than this value are selected as candidate cells. For the selected candidate cells, the data file is accessed to compare the distance between the feature vector of the object and the query point Q and finally search for and return the object D (0.6, 0.4) having the closest distance.

다시 보면, 사용자 질의 Q(0.4, 0.2) (k=1)가 주어지면 시그니쳐 화일을 순차적으로 탐색하여 Q 로부터 가장 짧은 최대 거리를 보관(예컨대, 도 6 의 현재 D 가 속한 셀 부분)하고 이보다 길이가 긴 최소 거리를 가진 셀들은 필터링되어 더 이상 고려의 대상이 되지 않는다(예컨대, 도 6 의 B, C, E 를 포함하는 셀). 따라서, 현재는 A, D 를 담고 있는 셀만이 고려의 대상이 되므로 박스(BOX)화하여 나타내었고, 실제 데이터 화일에서는 이에 해당하는 2개만을 고려 대상으로 선택하고 최근접(1-NN) 셀을 찾으므로 최종 결과는 객체 D(0.6,0.4)가 된다.Again, given the user query Q (0.4, 0.2) (k = 1), the signature files are searched sequentially to store the shortest maximum distance from Q (e.g., the portion of the cell to which the current D in FIG. 6 belongs) and Cells with a long minimum distance are filtered out and are no longer considered (eg, cells including B, C, E of FIG. 6). Therefore, at present, only cells containing A and D are considered, so they are boxed. In the actual data file, only two corresponding cells are selected as considerations and the nearest (1-NN) cell is selected. The final result is object D (0.6,0.4).

도 7 은 본 발명에 따른 주어진 범위내의 포함된 모든 객체를 검색하는 범위 질의에 대한 처리 예시도이다.7 is an exemplary process for a range query to retrieve all objects included within a given range according to the present invention.

본 발명에 따른 주어진 범위내의 포함된 모든 객체를 검색하는 범위 질의를 처리하기 위해 먼저, 후보 셀들을 찾기 위해서 시그니쳐 파일을 순차적으로 탐색한다. 이때, 사용자 질의(Q)를 중심으로 주어진 거리 값(반경 : radius)보다 큰 최소 거리 값을 갖는 셀들은 후보 리스트로부터 제외된다. 즉, 도 7 에서 객체 A, E의 최소 거리가 중심으로부터의 반경보다 크기 때문에 후보 리스트에서 제외된다. 선택된 후보 셀들에 대해서 데이터 파일을 액세스하게 되는데 최종적으로 주어진 거리 값(반경) 안에 포함되는 객체 B, C가 검색되어 결과 값으로 반영된다.To process a range query that retrieves all objects contained within a given range according to the present invention, the signature file is first searched sequentially to find candidate cells. At this time, cells having a minimum distance value larger than a given distance value (radius) around the user query Q are excluded from the candidate list. That is, in FIG. 7, since the minimum distances of the objects A and E are larger than the radius from the center, they are excluded from the candidate list. The data file is accessed for the selected candidate cells. Finally, the objects B and C included in the given distance value (radius) are searched and reflected as the result value.

도 7 에 도시된 바와 같이, 사용자 질의 Q(0.4, 0.2)(radius=O.3)가 주어지면, 시그니쳐 화일을 순차적으로 탐색하여 사용자 질의(Q)로부터 반경 범위 내의 셀만이 고려대상이고 반경 범위 밖의 셀들은 필터링되어 더 이상 고려 대상이 되지않는다(예컨대, A, E 를 포함하는 셀). 따라서, 현재는 B, C, D 를 담고 있는 셀만이 고려 대상이 되므로 박스(BOX)화하여 나타내었고, 실제 데이터 화일에서는 이에 해당하는 3개만을 고려 대상으로 놓고 범위 질의를 찾으니 최종 결과는 2개로써 (0.1,0.8)과 (0.7,0.85)가 된다.As shown in Fig. 7, given the user query Q (0.4, 0.2) (radius = O.3), the signature files are sequentially searched so that only cells within the radius range from the user query Q are considered and the radius range is considered. Outer cells are filtered and no longer considered (eg, cells containing A, E). Therefore, at present, only cells containing B, C, and D are considered, so they are boxed. In the actual data file, only three of them are considered and the range query is found. The dogs are (0.1,0.8) and (0.7,0.85).

도 8 은 본 발명에 따른 셀 기반의 고차원 데이터 색인 방법의 일실시예 동작 흐름도이다.8 is a flowchart illustrating an embodiment of a cell-based high-dimensional data indexing method according to the present invention.

먼저, 특징 벡터 추출기를 통해 멀티미디어 객체로부터 N-차원 특징 벡터를 추출하고(801), 추출된 N-차원 특징 벡터로부터 본 발명의 시그니쳐 생성 모듈을 통해 특징 벡터에 대한 시그니쳐와 셀 중심에서 객체까지의 거리 값을 사용한 거리 시그니쳐를 생성한다(802).First, an N-dimensional feature vector is extracted from a multimedia object through a feature vector extractor (801), and from the extracted N-dimensional feature vector to a signature and a cell center to an object through the signature generation module of the present invention. A distance signature using the distance value is generated (802).

그리고, 생성된 특징 벡터 시그니쳐와 거리 시그니쳐를 하나의 시그니쳐로 병합(concatenation)하여 병합 시그니쳐를 생성하고(803), 생성된 병합 시그니쳐를 저장 모듈을 통해 시그니쳐 데이터베이스(signature DB)에 저장한다(804). 아울러, 멀티미디어 객체의 특징 벡터 정보를 저장 모듈을 통해 특징 벡터 데이터베이스(feature vector DB)에 저장한다(805).The merged signature is generated by concatenating the generated feature vector signature and the distance signature into one signature (803), and the generated merge signature is stored in a signature database through a storage module (804). . In addition, the feature vector information of the multimedia object is stored in a feature vector database through the storage module (805).

한편, 사용자는 검색 모듈을 통해 다양한 질의(즉, 점(point) 질의, 범위(range) 질의, k-최근접 질의)를 사용하여 저장된 객체의 특징 벡터에 대한 검색을 수행할 수 있다(806).Meanwhile, the user may perform a search for the feature vector of the stored object using various queries (that is, point query, range query, and k-nearest query) through the search module (806). .

상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 형태로 기록매체(씨디롬, 램, 롬, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다.As described above, the method of the present invention may be implemented as a program and stored in a recording medium (CD-ROM, RAM, ROM, floppy disk, hard disk, magneto-optical disk, etc.) in a computer-readable form.

이상에서 설명한 본 발명은 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니고, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하다는 것이 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 명백할 것이다.The present invention described above is not limited to the above-described embodiments and the accompanying drawings, and various substitutions, modifications, and changes are possible in the art without departing from the technical spirit of the present invention. It will be apparent to those of ordinary knowledge.

상기한 바와 같은 본 발명은, 고차원 공간상의 데이터를 저장하기 위해 데이터 공간을 셀로 나누어 시그니쳐로 표현하여 셀간의 겹침을 방지할 수 있으며, 셀 중심과 저장된 객체 사이의 거리 값을 이용하여 보다 효과적인 필터링을 수행하여 고차원 데이터의 검색에서도 성능의 손실을 줄일 수 있는 효과가 있다.The present invention as described above, in order to store the data in the high-dimensional space by dividing the data space into cells by the signature to prevent the overlap between the cells, and more effective filtering by using the distance value between the cell center and the stored object By doing so, it is possible to reduce the loss of performance even in the retrieval of high-dimensional data.

또한, 본 발명은 사용자의 선택에 따라 다양한 검색 방법으로 검색의 효율을 높일 수 있는 효과가 있다.In addition, the present invention has the effect of increasing the efficiency of the search by a variety of search methods according to the user's selection.

Claims

In the cell-based high-dimensional data indexing device,

Using a N-dimensional feature vector extracted from an object, a feature vector signature for obtaining and displaying a cell to which the N-dimensional feature vector belongs and the N- given at the center of the cell to which the N-dimensional feature vector belongs Signature generation means for generating a distance signature to a dimensional feature vector and generating a merge signature combining the feature vector signature and the distance signature;

Storage means for storing the N-dimensional feature vector and the generated merge signature to correspond to each other; And

Concurrency control means for supporting a plurality of users using a locking concept when storing and retrieving the N-dimensional feature vector and the merge signature

Cell-based high-dimensional data index device comprising a.

The method of claim 1,

Retrieval means for filtering out the possibility of satisfying the user's query using the stored signature for the user's query and finding and outputting the stored feature vector for the unfiltered

Cell-based high-dimensional data index device further comprising.

The method of claim 2,

The search means,

A cell-based high-dimensional data indexing device which searches for a signature that exactly matches a user's query and outputs a feature vector.

The method of claim 2,

The search means,

A cell-based high-dimensional data indexing device which searches for a predetermined number of objects most similar to a user's query, and comprises a feature vector therefor.

The method of claim 2,

The search means,

A cell-based high-dimensional data indexing device which searches for all objects included in a given range for a user's query, and features a feature vector therefor.

The method according to any one of claims 1 to 5,

The cell refers to a portion formed as a result of dividing the interval, and the signature is a cell-based high-dimensional data indexing device, characterized in that the representation of the cell in a binary bit pattern.

The method according to any one of claims 1 to 5,

The storage means,

Feature vector storage means for storing the N-dimensional feature vector; And

Signature storage means for storing the generated merge signature

Cell-based high-dimensional data index device comprising a.

In the high-dimensional data indexing method used in the cell-based high-dimensional data indexing apparatus,

Extracting an N-dimensional feature vector from the multimedia object through the feature vector extractor;

Generating a distance signature using the feature vector signature and a distance value from a cell center to an object from the extracted N-dimensional feature vector;

A third step of concatenating the generated feature vector signature and the distance signature into one signature to generate a merge signature; And

A fourth step of storing the generated merge signature and the feature vector to correspond to each other

Cell-based high-dimensional data indexing method comprising a.

The method of claim 8,

A fifth step of performing a search using the stored signature with respect to a user's query and outputting the feature vector corresponding to the search result;

Cell-based high-dimensional data indexing method further comprising.

The method of claim 9,

The fifth step,

A sixth step of sequentially searching for the signatures stored in the fourth step from the user's query to find a signature that matches the user's query; And

A seventh step of outputting a feature vector corresponding to a signature matching the generated user query found as the search result of the sixth step;

Cell-based high-dimensional data indexing method comprising a.

The method of claim 9,

The fifth step,

A sixth step of sequentially retrieving the signatures stored in the fourth step from the user's query to obtain candidate cells;

Finding a cell having a maximum distance corresponding to a predetermined order from the user query from the candidate cells;

An eighth step of filtering a cell having a minimum distance longer than the maximum distance of the cell found in the seventh step;

A ninth step of finding the nearest cell among the remaining cells except the cells filtered in the eighth step; And

A tenth step of outputting a feature vector corresponding to the cell of the ninth step;

Cell-based high-dimensional data indexing method comprising a.

The method of claim 9,

The fifth step,

A sixth step of sequentially searching for the signatures stored in the fourth step from the user's query and searching for cells included in the predetermined range from the user query and filtering cells longer than a predetermined range;

A seventh step of finding cells within the fixed range among the remaining cells except the cells filtered in the sixth step; And

An eighth step of outputting a feature vector corresponding to the cell of the seventh step

Cell-based high-dimensional data indexing method comprising a.

The method according to any one of claims 8 to 12,

The cell refers to a portion formed by dividing the interval, and the signature is a cell-based high-dimensional data indexing method characterized in that the representation of the cell in a binary bit pattern.

In a high-dimensional data indexing device having a processor,

A first function of extracting an N-dimensional feature vector from the multimedia object via the feature vector extractor;

A second function of generating a distance signature using a feature vector signature and a distance value from a cell center to an object from the extracted N-dimensional feature vector;

A third function of generating a merge signature by concatenating the generated feature vector signature and the distance signature into one signature; And

A fourth function of storing the generated merge signature and the feature vector to correspond to each other

A computer-readable recording medium having recorded thereon a program for realizing this.

The method of claim 14,

A fifth function of performing a search using the stored signature with respect to a user's query and outputting the feature vector corresponding to the search result

A computer-readable recording medium that records a program for further realization.