KR102102307B1

KR102102307B1 - Method for searching storage device in database management system with multiple storage devices

Info

Publication number: KR102102307B1
Application number: KR1020190152126A
Authority: KR
Inventors: 이고르 체르냐크; 민대홍; 한혁; 진성일
Original assignee: 주식회사 리얼타임테크
Priority date: 2019-11-25
Filing date: 2019-11-25
Publication date: 2020-04-20
Also published as: WO2021107210A1

Abstract

The present invention relates to a technology which, based on a vector, can more quickly search for storage in which search target data is stored by using a bloom filter in a database management system storing data in multiple storage. According to the present invention, a method for searing for the storage based on vectorization by using the bloom filter in the data management system having the multiple storage comprises: a first step of sequentially assigning N index keys corresponding to requested data to a hash vector; a second step of calculating a hash function result value for each index key and inserting the same into an assigned hash vector cell; a third step of generating test vectors for each index key, respectively; a fourth step of generating a bit vector; a fifth step of updating the hash vector; and a sixth step of determining that the data is stored in corresponding storage.

Description

METHOD FOR SEARCHING STORAGE DEVICE IN DATABASE MANAGEMENT SYSTEM WITH MULTIPLE STORAGE DEVICES in a database management system with multiple storages using bloom filters

본 발명은 DRAM, NVM 또는 DISK 등의 다수의 스토리지에 데이터를 저장하는 데이터베이스 관리 시스템에서 블룸필터(Bloom Filter)를 사용하여 검색 대상 데이터가 저장된 스토리지를 벡터 기반으로 보다 신속하게 탐색할 수 있도록 해 주는 기술에 관한 것이다. According to the present invention, a vector based on a storage in which data to be searched is stored using a bloom filter in a database management system that stores data in multiple storages such as DRAM, NVM, or DISK can be more quickly searched based on vectors. It's about technology.

데이터베이스 관리 시스템(DataBase Management System, DBMS)은 다수의 사용자들이 데이터베이스 내의 데이터에 접근할 수 있도록 하는 소프트웨어 도구의 집합이다. 보다 구체적으로, DBMS는 데이터베이스 서버에 구현되어 다수의 사용자들 또는 프로그램들의 요구를 체계적으로 처리하고 적절히 응답하여 데이터를 사용할 수 있도록 해준다.A database management system (DBMS) is a set of software tools that allow multiple users to access data in a database. More specifically, a DBMS is implemented in a database server to systematically handle the needs of a large number of users or programs and respond appropriately to use data.

한편, DBMS는 외부로부터 특정 질의(Query)가 입력되는 경우, 입력된 질의에 따라 데이터베이스에 데이터를 선택, 삽입, 갱신, 삭제 등의 기능을 수행한다. 여기서 질의란 데이터베이스의 테이블에 저장되어 있는 데이터에 대한 어떠한 요구, 즉 데이터에 대한 어떠한 조작을 하기 원하는지를 기술한 것을 의미하는 것으로서, SQL(Structured Query Language)과 같은 언어를 이용하여 표현한다.Meanwhile, when a specific query is input from the outside, the DBMS performs functions such as selecting, inserting, updating, and deleting data in the database according to the entered query. Here, the query means a description of what request for data stored in a table in a database, that is, what operation to perform on the data, is expressed by using a language such as Structured Query Language (SQL).

그리고, 데이터의 양이 갈수록 방대해짐에 따라 DBMS는 일반적으로 인덱스(index)를 이용하여 데이터 관리를 수행하고 있다. 여기서, 인덱스는 데이터베이스 분야에 있어서 테이블에 대한 탐색 속도를 높여주는 자료구조를 의미한다. In addition, as the amount of data increases, the DBMS generally performs data management using an index. Here, the index means a data structure that speeds up a search for a table in the database field.

한편, 최근에는 각기 다른 성능을 가진 스토리지를 구비하는 하이브리드 DBMS가 제안되어 운용되고 있다. 이러한 하이브리드 DBMS에 구비되는 스토리지로는 DRAM, NVM, DISK 등이 있다.Meanwhile, recently, hybrid DBMSs having storages having different performances have been proposed and operated. Storage provided in the hybrid DBMS includes DRAM, NVM, and DISK.

DRAM은 읽기 및 쓰기 작업이 빠르지만 가격이 크게 떨어졌음에도 불구하고 아직도 비교적 비싼 편에 속한다. DRAM is fast to read and write, but it is still relatively expensive despite a significant price drop.

NVM은 쓰기 작업의 성능은 DRAM 처럼 빠르지만 읽기 성능은 쓰기 성능에 비해 떨어진다. 또한 NVM의 가격은 DISK보다 비싸다.NVM writes are as fast as DRAM, but read performance is poor compared to write. Also, NVM is more expensive than DISK.

DISK는 비용이 저렴하고 대용량 저장공간을 제공하지만 읽기 및 쓰기 성능이 가장 떨어진다. DISK provides low cost and large storage space, but has the lowest read and write performance.

이에, 하이브리드 DBMS는 빈번하게 액세스되는 HOT 데이터는 DRAM에 저장하고, 보통으로 액세스 되는 WARM 데이터는 NVM에, 가장 오랫동안 액세스되지 않은 COLD 데이터는 DISK에 저장하는 방식으로 스토리지를 운용한다. 즉, 하나의 테이블에 대한 데이터는 그 특성에 따라 서로 다른 스토리지에 저장될 수 있다.Accordingly, the hybrid DBMS operates storage by storing frequently accessed HOT data in DRAM, normally accessed WARM data in NVM, and longest inaccessible COLD data in DISK. That is, data for one table may be stored in different storages according to its characteristics.

이러한 특성 때문에 데이터 검색 시 해당 데이터가 어느 스토리지에 저장되어 있는지를 판단하는 것은 하이브리드 DBMS의 데이터 검색 성능에 큰 영향을 미친다.Because of these characteristics, determining which storage the data is stored in when retrieving data has a significant impact on the data retrieval performance of the hybrid DBMS.

현재 데이터가 저장된 스토리지를 판단하기 위한 방법으로 블룸필터(Bloom-Filter)가 이용되고 있다.Currently, a bloom filter is used as a method for determining storage in which data is stored.

블룸필터의 비트 검사는 검색 대상 데이터에 대응되는 인덱스 키에 대해 다수의 해시함수를 적용하여 각 해시함수에 대한 결과를 모두 산출하고, 각 해시함수 결과를 근거로 블룸필터의 비트값 즉, 블룸필터 검사값을 확인하는 과정을 통해 스토리지 탐색을 수행하게 된다. The bit check of the bloom filter calculates all the results for each hash function by applying multiple hash functions to the index key corresponding to the data to be searched, and based on the results of each hash function, the bit value of the bloom filter, that is, the bloom filter The storage search is performed through the process of checking the check value.

즉, 종래 블룸필터를 이용하여 데이터가 저장된 스토리지를 판단하는 방법은 하나의 인덱스 키에 대해 해시함수 결과값을 모두 산출하는 해시 카운트 작업을 수행한 후, 블룸필터에서 모든 해시함수 결과값에 해당하는 비트값을 확인하는 과정으로 이루어지게 된다.That is, the conventional method of determining storage in which data is stored using a bloom filter is a hash count operation that calculates all hash function result values for one index key, and then corresponds to all hash function result values in the bloom filter. This is done by checking the bit value.

그러나, 다수의 인덱스 키를 검사해야 경우, 하나의 인덱스 키에 대해 블룸필터 검사를 수행하는 동안 다른 인덱스 키는 대기상태로 있어야 하므로, 하이브리드 DBMS의 데이터 검색 성능에 문제점이 야기되는 실정이다. However, when a plurality of index keys need to be checked, a problem occurs in data retrieval performance of the hybrid DBMS because the other index keys must be in a standby state while performing a bloom filter check on one index key.

1. 한국등록특허 제1775107호 (명칭 : 하이브리드데이터베이스 및 하이브리드데이터베이스에서 테이블을 관리하는 방법)1. Korean Registered Patent No. 1775107 (Name: Hybrid database and how to manage tables in hybrid database)

이에, 본 발명은 상기한 사정을 감안하여 창출된 것으로, 데이터를 다수의 스토리지에 저장하는 데이터베이스 관리시스템에서 블룸필터(Bloom Filter)에 SIMD 수집 명령을 적용하여 동시에 다수의 인덱스 키에 대한 블룸필터 검사값을 획득함으로써, 검색 대상 데이터가 저장된 스토리지를 보다 신속하게 탐색할 수 있도록 해 주는 다수의 스토리지를 갖는 데이터베이스 관리시스템에서 블룸필터를 이용한 벡터화 기반의 스토리지 탐색 방법을 제공함에 그 기술적 목적이 있다. Accordingly, the present invention was created in view of the above-described circumstances, and a SIMD collection command is applied to a Bloom Filter in a database management system that stores data in a plurality of storages. The technical object of the present invention is to provide a vectorization-based storage search method using a bloom filter in a database management system having a plurality of storages, which enables a faster search for storage in which search target data is stored by acquiring a value.

상기 목적을 달성하기 위한 본 발명의 일측면에 따르면, 데이터 및 해당 데이터에 대한 인덱스 키정보가 저장됨과 더불어 인덱스 키에 대응되는 블룸필터가 구비되는 스토리지가 다수 구비되고, 질의 요청된 데이터가 저장된 스토리지를 검색하여 해당 질의처리를 수행하는 데이터 처리수단을 구비하는 다수의 스토리지를 갖는 데이터베이스 관리시스템에서 블룸필터를 이용한 스토리지 탐색 방법에 있어서, 데이터 처리수단에서 N개 셀을 갖는 해시벡터에 질의 요청된 데이터에 대응되는 N개의 인덱스 키를 순차로 할당하는 제1 단계와, 인덱스 키별 해시함수 결과값을 산출하고, 할당된 해시벡터 셀에 삽입하는 제2 단계, 블룸필터에서 인덱스 키별 해시함수 결과값을 근거로 랜덤하게 수집된 기 설정된 개수의 블룸필터 검사값으로 이루어지는 인덱스 키별 테스트 벡터를 각각 생성하는 제3 단계, 인덱스 키별 테스트 벡터에서 해시함수 결과값에 해당하는 대상 비트를 결정하고, 인덱스 키별 대상 비트의 블룸필터 검사값들로 이루어지는 비트 벡터를 생성하는 제4 단계, 비트 벡터에서 무효값을 갖는 대상 비트가 존재하는 경우, 해시벡터에서 해당 인덱스 키를 제거하고, 차 순위의 인덱스 키를 해시벡터에서 제거된 인덱스 키에 할당된 셀에 삽입하여 해시벡터를 갱신하는 제5 단계 및, 해시벡터에 할당된 인덱스 키에 대해 해시함수 결과값을 산출하여 할당된 해시벡터 셀에 삽입하는 상기 제2 단계 이후의 동작을 반복적으로 수행하되, 각 인덱스 키에 대해 기 설정된 해시함수를 순차로 적용하여 결과값을 산출하고, 기 설정된 모든 해시함수 결과값에 대한 블룸필터 검사가 완료된 인덱스 키를 근거로 해당 데이터가 해당 스토리지에 저장된 것으로 판단하는 제6 단계를 포함하여 구성되는 것을 특징으로 하는 다수의 스토리지를 갖는 데이터베이스 관리시스템에서 블룸필터를 이용한 벡터화 기반의 스토리지 탐색 방법이 제공된다.According to an aspect of the present invention for achieving the above object, data and index key information for the corresponding data are stored, and a plurality of storages equipped with a bloom filter corresponding to the index keys are provided, and storage where query requested data is stored In a storage search method using a bloom filter in a database management system having a plurality of storages having data processing means for searching and performing the corresponding query processing, the data requested to query the hash vector with N cells in the data processing means The first step of sequentially assigning N index keys corresponding to the second step, calculating a hash function result value for each index key, and inserting it into the assigned hash vector cell, based on the hash function result value of each index key in the bloom filter Index consisting of a preset number of bloom filter check values collected randomly A third step of generating a test vector for each key, a fourth step of determining a target bit corresponding to a hash function result value in the test vector for each index key, and generating a bit vector consisting of bloom filter check values of the target bit for each index key. If there is a target bit having an invalid value in the bit vector, remove the corresponding index key from the hash vector, and update the hash vector by inserting the index key of the difference rank into the cell assigned to the index key removed from the hash vector. Step 5 and the operation after the second step of calculating the result of the hash function for the index key assigned to the hash vector and inserting it into the assigned hash vector cell are repeatedly performed, but a predetermined hash function for each index key Sequentially to calculate the result value, and based on the index key where the bloom filter inspection for all the hash function result values has been completed. A vectorization-based storage search method using a bloom filter is provided in a database management system having a plurality of storages, characterized in that it comprises a sixth step of determining that the corresponding data is stored in the corresponding storage.

또한, 상기 제2 단계에서 데이터 처리수단은 인덱스 키별 해시함수 결과값을 산출한 해시함수 순번으로 이루어지는 해시함수 제어벡터를 생성하고, 상기 제6 단계에서 데이터 처리수단은 해시함수 제어벡터의 각 인덱스 키별 해시함수 순번과 해당 인덱스 키에 대해 기 설정된 해시함수 개수가 동일한 경우, 해시벡터에서 해당 인덱스 키를 제거하고 제거된 인덱스 키에 차 순위의 인덱스 키를 삽입하여 해시벡터를 갱신하는 것을 특징으로 하는 다수의 스토리지를 갖는 데이터베이스 관리시스템에서 블룸필터를 이용한 벡터화 기반의 스토리지 탐색 방법이 제공된다.In addition, in the second step, the data processing means generates a hash function control vector consisting of a hash function sequence that calculates a hash function result value for each index key, and in the sixth step, the data processing means for each index key of the hash function control vector. If the hash function sequence number and the predetermined number of hash functions for the index key are the same, the hash vector is removed by removing the corresponding index key from the hash vector and updating the hash vector by inserting the index key of the second rank into the removed index key. A vectorization-based storage search method using a bloom filter in a database management system having a storage of is provided.

또한, 상기 제3 단계에서 데이터 처리수단은 인덱스 키별 해시함수 결과값과 동일한 값을 갖는 오프셋 벡터를 생성하고, 인덱스 키별 오프셋 값을 근거로 블룸필터에서 오프셋 값의 배수에 해당하는 비트의 블룸필터 검사값을 기 설정된 개수만큼 추출하여 테스트 벡터를 생성하고, 상기 제4 단계에서 데이터 처리수단은 인덱스 키별 테스트 벡터에서 첫번째 비트를 해당 인덱스 키의 대상 비트로 결정하는 것을 특징으로 하는 다수의 스토리지를 갖는 데이터베이스 관리시스템에서 블룸필터를 이용한 벡터화 기반의 스토리지 탐색 방법이 제공된다.In addition, in the third step, the data processing means generates an offset vector having the same value as the hash function result value for each index key, and checks the bloom filter of the bit corresponding to a multiple of the offset value in the bloom filter based on the offset value for each index key. Database management with a plurality of storages, characterized in that by extracting a predetermined number of values to generate a test vector, in the fourth step, the data processing means determines the first bit as the target bit of the corresponding index key in the test vector for each index key. A system-based storage search method using bloom filters is provided.

또한, 상기 제5 단계 또는 제6 단계에서 데이터 처리수단은 해시벡터에서 인덱스 키가 제거된 경우, 대기대상 인덱스 키의 개수가 해시벡터에서 제거된 인덱스 키의 개수 이상인지를 판단하고, 대기대상 인덱스 키 개수가 해시벡터에서 제거된 인덱스 키 개수 미만이면, 잔여 인덱스 키 각각에 대해 해시함수 결과값을 블룸필터에 적용하여 블룸필터 검사값을 추출하는 종래 블룸필터 검사동작을 수행하는 것을 특징으로 하는 다수의 스토리지를 갖는 데이터베이스 관리시스템에서 블룸필터를 이용한 벡터화 기반의 스토리지 탐색 방법이 제공된다.In addition, in the fifth or sixth step, when the index key is removed from the hash vector, the data processing means determines whether the number of index keys to be queued is equal to or greater than the number of index keys removed from the hash vector, and If the number of keys is less than the number of index keys removed from the hash vector, a number of features characterized by performing a conventional bloom filter inspection operation to extract the bloom filter check value by applying the hash function result value to each bloom filter for each remaining index key A vectorization-based storage search method using a bloom filter in a database management system having a storage of is provided.

또한, 상기 제5 단계에서 데이터 처리수단은 블룸필터 무효값으로 이루어지는 제로 벡터와 비트 벡터를 AND 연산하여 마스크 셋을 생성하고, 마스크 셋에서 제로 벡터와 동일한 값을 갖는 대상 비트를 무효값을 갖는 것으로 판단하는 것을 특징으로 하는 다수의 스토리지를 갖는 데이터베이스 관리시스템에서 블룸필터를 이용한 벡터화 기반의 스토리지 탐색 방법이 제공된다.In addition, in the fifth step, the data processing means generates a mask set by ANDing a zero vector and a bit vector consisting of a bloom filter invalid value, and the target bit having the same value as the zero vector in the mask set has an invalid value. A vectorization-based storage search method using a bloom filter is provided in a database management system having multiple storages characterized by determining.

또한, 상기 데이터 처리수단은 하나의 명령에 대해 여러 개의 값을 동시에 계산하는 SIMD(Single Instruction Multiple Data) 수집 명령을 이용하여 스토리지별 블룸필터 검사처리를 각각 수행하는 것을 특징으로 하는 다수의 스토리지를 갖는 데이터베이스 관리시스템에서 블룸필터를 이용한 벡터화 기반의 스토리지 탐색 방법이 제공된다.In addition, the data processing means has a plurality of storage, characterized in that each of the storage using a single instruction multiple data (SIMD) collection command that simultaneously calculates a plurality of values for a single command to perform a bloom filter inspection process for each storage. A vectorization-based storage search method using a bloom filter in a database management system is provided.

본 발명에 의하면, SIMD 방식을 이용하여 동시에 다수의 인덱스 키에 대한 블룸필터 검사를 수행함과 더불어, 해시함수 결과를 근거로 무효값이 존재하는 해시함수 결과값이 존재하는 경우에는 해당 인덱스 키에 대한 블룸필터 검사 처리를 중단함으로써, 블룸필터를 이용하여 보다 신속하게 스토리지에 해당 데이터가 저장되어 있는지의 여부를 판단할 수 있다. According to the present invention, in addition to performing a bloom filter check on a plurality of index keys at the same time using the SIMD method, when there is a hash function result value that has an invalid value based on a hash function result, the index key for the corresponding index key is performed. By stopping the bloom filter inspection process, it is possible to determine whether or not the data is stored in the storage more quickly by using the bloom filter.

도1은 본 발명이 적용되는 다수의 스토리지를 갖는 데이터베이스 관리시스템의 개략적인 구성을 도시한 도면.
도2는 도1에 도시된 데이터 처리수단(100)에서 이루어지는 SIMD 방식을 이용한 블룸필터 검사 과정을 개념화한 도면.
도3은 본 발명에 따른 다수의 스토리지를 갖는 데이터베이스 관리시스템에서 블룸필터를 이용한 스토리지 탐색 방법을 설명하기 위한 흐름도.
도4는 도3에 도시된 블룸필터 검사를 동시 수행하는 과정(ST300)을 설명하기 위한 흐름도
도5는 도4의 블룸필터 검사 과정을 개념화하여 예시한 도면.1 is a view showing a schematic configuration of a database management system having a plurality of storage to which the present invention is applied.
FIG. 2 is a conceptual diagram illustrating a bloom filter inspection process using a SIMD method performed by the data processing means 100 shown in FIG. 1.
3 is a flowchart illustrating a storage search method using a bloom filter in a database management system having multiple storages according to the present invention.
FIG. 4 is a flowchart for explaining a process (ST300) of simultaneously performing a bloom filter test shown in FIG. 3.
FIG. 5 conceptually illustrates the bloom filter inspection process of FIG. 4.

본 발명에 기재된 실시예 및 도면에 도시된 구성은 본 발명의 바람직한 실시예에 불과할 뿐이고, 본 발명의 기술적 사상을 모두 표현하는 것은 아니므로, 본 발명의 권리범위는 본문에 설명된 실시예 및 도면에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 실시예는 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 본 발명의 권리범위는 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다. 또한, 본 발명에서 제시된 목적 또는 효과는 특정 실시예가 이를 전부 포함하여야 한다거나 그러한 효과만을 포함하여야 한다는 의미는 아니므로, 본 발명의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.The configurations shown in the embodiments and drawings described in the present invention are only preferred embodiments of the present invention, and do not represent all the technical spirit of the present invention, so the scope of the present invention is the embodiments and drawings described in the text It should not be construed as limited by. That is, since the embodiments can be variously changed and have various forms, it should be understood that the scope of the present invention includes equivalents capable of realizing technical ideas. In addition, the purpose or effect presented in the present invention does not mean that a specific embodiment should include all of them or only such an effect, and the scope of the present invention should not be understood as being limited thereby.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 본 발명이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 것으로 해석되어야 하며, 본 발명에서 명백하게 정의하지 않는 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.All terms used herein have the same meaning as commonly understood by a person skilled in the art to which the present invention pertains, unless otherwise defined. The terms defined in the commonly used dictionary should be interpreted as being consistent with meanings in the context of related technologies, and cannot be interpreted as having ideal or excessively formal meanings that are not explicitly defined in the present invention.

도1은 본 발명이 적용되는 다수의 스토리지를 갖는 데이터베이스 관리시스템의 개략적인 구성을 도시한 도면이다. 1 is a view showing a schematic configuration of a database management system having a plurality of storage to which the present invention is applied.

도1을 참조하면, 본 발명이 적용되는 다수의 스토리지를 갖는 데이터베이스 관리시스템은 데이터 처리수단(100)과 다수의 스토리지(200)를 포함한다. Referring to FIG. 1, a database management system having a plurality of storages to which the present invention is applied includes a data processing means 100 and a plurality of storages 200.

이때, 상기 스토리지(200)는 DRAM(dynamic random access memory, 210)과, NVM(non-volatile memory, 220) 및 DISK(230)를 포함하여 서로 다른 이기종의 스토리지로 구성될 수도 있고, 동종의 다수개 스토리지로 구성될 수도 있다. At this time, the storage 200 may be composed of different heterogeneous storage, including dynamic random access memory (DRAM) 210, non-volatile memory (NVM) 220, and DISK 230, and a plurality of the same It can also consist of dog storage.

여기서, 상기 DRAM(210)에는 최초 데이터 삽입을 포함하여 억세스 시점이 현재로부터 일정 기간 이내인 HOT 데이터가 저장되고, NVM(220)에는 현재로부터 억세스 시점이 일정 기간이 경과된 WARM 데이터가 저장되며, DISK(230)에는 현재로부터 억세스 시점이 일정 기간을 초과하는 COLD 데이터가 저장되도록 구성된다. 물론, 이외에 또 다른 타입의 스토리지가 구비된 경우, 기 설정된 억세스 시점을 만족하는 데이터가 해당 스토리지에 이동되어 저장될 수 있다. Here, the DRAM 210 stores the HOT data having an access point within a certain period from the present, including the initial data insertion, and the NVM 220 stores WARM data whose access point has passed a certain period from the present, DISK 230 is configured to store COLD data whose access point exceeds a certain period from the present. Of course, if another type of storage is provided, data satisfying a preset access point may be moved to and stored in the corresponding storage.

또한, 상기 각 스토리지(200)는 기본적으로 인덱스 데이터저장소(201)와 블룸필터(Bloom filter, 202)를 포함하여 구성되고, 최초 삽입되는 데이터가 저장되는 DRAM(210)에는 인덱스 오브젝트 저장소(203)가 추가로 구비된다. In addition, each of the storage 200 is basically composed of an index data storage 201 and a bloom filter (Bloom filter, 202), the index object storage 203 is stored in the DRAM 210 where the first inserted data is stored Is additionally provided.

인덱스 데이터 저장소(201)는 각 스토리지(200)에 저장된 데이터에 대한 실제 인덱스 데이터 즉, 인덱스 키를 저장한다. The index data storage 201 stores actual index data for data stored in each storage 200, that is, an index key.

인덱스 오브젝트 저장소(203)는 현재 사용중인 스토리지 장치 수와 각 스토리지에 대한 인덱스 포인트를 포함하는 스토리지 정보로 이루어지는 인덱스 오브젝트가 저장된다. The index object store 203 stores index objects consisting of storage information including the number of storage devices currently in use and index points for each storage.

한편, 도1에서 데이터 처리수단(100)은 입력되는 질의를 분석하고 질의에 대응하여 질의 요청된 데이터에 대한 삽입, 삭제, 수정, 검색 등의 트랜잭션 처리를 수행하며, 트랜잭션 처리 결과를 질의 처리 결과로서 반환한다.Meanwhile, in FIG. 1, the data processing means 100 analyzes an input query and performs transaction processing such as inserting, deleting, modifying, and retrieving the requested data in response to the query, and querying the result of the transaction processing Returns as

이때, 데이터 처리수단(100)은 데이터의 삽입, 삭제, 수정 및 검색 등과 같이 기 저장된 데이터와 관련된 질의에 대해서는 해당 데이터가 저장된 스토리지를 우선적으로 검색한다. 그리고, 데이터 처리수단(100)은 그 검색 결과를 근거로 데이터가 저장된 것으로 판단된 스토리지(200)에서 질의 요청된 데이터에 대한 일련의 트랜잭션 처리를 수행한다.At this time, the data processing means 100 preferentially searches the storage in which the data is stored for queries related to pre-stored data, such as insertion, deletion, modification, and retrieval of data. Then, the data processing means 100 performs a series of transaction processing on the requested data in the storage 200, where it is determined that the data is stored based on the search result.

데이터 처리수단(100)은 각 스토리지(200)에 구비된 해당 블룸필터(202)를 이용하여 해당 스토리지(200)에 대한 데이터 저장 여부를 각각 검사한다. 이때, 데이터 처리수단(100)은 SIMD 방식을 이용하여 다수의 인덱스 키에 대한 블룸필터 검사 결과를 동시에 획득한다. 여기서, SIMD (Single Instruction Multiple Data)는 하나의 명령어로 여러 개의 값을 동시에 계산하는 방식의 병렬 프로세서의 한 종류로, 하나의 명령어를 가지고 데이터의 벡터 위에 동작한 CDC Star-100, 텍사스 인스트루먼츠 ASC 등 1970년대 초 벡터 슈퍼컴퓨터에서 처음 사용되었다. 이러한 SIMD 는 벡터 프로세서에서 많이 사용되며, 특히 비디오 게임 콘솔이나 그래픽 카드와 같은 멀티미디어 분야에 자주 사용된다. The data processing means 100 checks whether or not data is stored in the storage 200 by using the bloom filter 202 provided in each storage 200. At this time, the data processing means 100 simultaneously acquires the bloom filter inspection results for a plurality of index keys using the SIMD method. Here, SIMD (Single Instruction Multiple Data) is a type of parallel processor that calculates multiple values at the same time with one instruction. CDC Star-100 operating on a vector of data with one instruction, Texas Instruments ASC, etc. It was first used in vector supercomputers in the early 1970s. Such SIMDs are frequently used in vector processors, especially in multimedia fields such as video game consoles and graphics cards.

도2는 도1에 도시된 데이터 처리수단(100)에서 이루어지는 SIMD 방식을 이용한 블룸필터 검사 과정을 개념화한 도면이다.FIG. 2 is a diagram conceptualizing a bloom filter inspection process using a SIMD method performed by the data processing means 100 shown in FIG. 1.

도2를 참조하면, 다수의 셀을 갖는 일정 크기의 해시벡터(10)에 다수의 인덱스 키(A,B,C,D)를 각 셀(S)에 대응되게 설정하고, 해시벡터(10)의 각 셀(S)에 설정된 인덱스 키에 대응되는 해시함수 결과값을 근거로 블룸필터(202)에서 각 인덱스 키별 블룸필터 검사값(20 : a[A], a[B], a[C], a[D])을 동시에 추출한다. Referring to FIG. 2, a plurality of index keys (A, B, C, and D) are set in a hash vector 10 of a predetermined size having a plurality of cells to correspond to each cell S, and the hash vector 10 is set. Bloom filter check value (20: a [A], a [B], a [C]) of each index key in the bloom filter 202 based on the result of the hash function corresponding to the index key set in each cell (S) of the , a [D]) at the same time.

여기서, 블룸필터(202)는 요소가 집합의 구성원인지 테스트하는데 사용되는 확률적 자료 구조로, m 비트 크기의 비트열로 표현된다. 그리고, 블룸필터(202)는 m 비트의 비트열 중에서 각각의 원소에 관하여 k 가지의 서로 다른 해시함수(hash function)에 의해 도출된 원소의 해시값에 대응하는 비트들이 검정색으로 표현되는 유효값, 예컨대 "1"로 설정되고 나머지 비트들은 무효값, 예컨대 "0"으로 설정될 수 있다. Here, the bloom filter 202 is a probabilistic data structure used to test whether an element is a member of a set, and is represented by a bit stream having an m bit size. In addition, the bloom filter 202 is an effective value in which bits corresponding to a hash value of an element derived by k different hash functions with respect to each element among bit streams of m bits are represented in black, For example, it may be set to "1" and the remaining bits may be set to an invalid value, for example, "0".

이때, 데이터 처리수단(100)는 다수의 비트로 이루어지는 벡터 기반의 SIMD 명령, 보다 상세하게는 SIMD 수집 명령을 블룸필터(202)에 적용하기 위하여, 각 인덱스 키에 대해 SIMD 명령에서 요구하는 비트열로 이루어지는 해시함수 결과 벡터를 생성하고, 이를 이용하여 블룸필터에서 목적하는 해당 해시함수 결과값에 대응되는 블룸필터 검사값을 추출한다. In this case, the data processing means 100 is a bit stream required by the SIMD instruction for each index key in order to apply the vector-based SIMD instruction composed of a plurality of bits, and more specifically, the SIMD collection instruction to the bloom filter 202. The resulting hash function vector is generated, and the bloom filter check value corresponding to the desired hash function result value is extracted from the bloom filter using the result.

그리고, 데이터 처리수단(100)은 각 인덱스 키에 대해 기 설정된 다수의 해시함수 결과값에 대응되는 블룸필터 검사값(20)이 모두 유효값(예컨대, "1")인 경우, 해당 인덱스 키에 대응되는 데이터가 해당 스토리지에 저장된 것으로 판단한다.Then, the data processing means 100, if the bloom filter check value 20 corresponding to a plurality of hash function result values preset for each index key are all valid values (eg, "1"), the corresponding index key It is determined that the corresponding data is stored in the corresponding storage.

특히, 데이터 처리수단(100)은 다수의 인덱스 키 세트에 대해 해시함수별 블룸필터 검사값(20)을 동시에 추출하는 과정을 해시함수 단위로 순차 수행한다. 이때, 데이터 처리수단(100)은 해당 세트의 인덱스 키 중 블룸필터 검사값이 무효값(예컨대, "0")인 인덱스 키가 존재하면, 해시벡터(10)에서 무효값에 대응되는 인덱스 키를 제거하고, 해당 해시벡터(10)에서 인덱스 키가 제거된 셀(S)에 다음 순서의 인덱스 키를 할당한다. In particular, the data processing means 100 sequentially performs the process of simultaneously extracting the bloom filter check value 20 for each hash function for a plurality of index key sets in the unit of hash function. At this time, the data processing means 100, if the index key of the bloom filter check value invalid value (eg, "0") of the index key of the set, the hash vector 10, the index key corresponding to the invalid value Then, the index key of the next order is assigned to the cell S in which the index key is removed from the hash vector 10.

즉, 블룸필터 결과값으로 무효값을 갖는 인덱스 키에 대해서는 해당 인덱스 키에 대해 기 설정된 모든 해시함수에 대한 블룸필터 결과값을 확인하지 않고, 최초로 무효값이 존재하는 해시함수 결과값까지만 블룸필터 검사 처리를 수행하고 중단한다. That is, for the index key having an invalid value as the result of the bloom filter, the result of the bloom filter for all the hash functions previously set for the corresponding index key is not checked, and the bloom filter is inspected only up to the hash function result value where the invalid value exists for the first time. Perform processing and stop.

따라서, 본 발명에서는 SIMD 방식을 이용하여 동시에 다수의 인덱스 키에 대한 블룸필터 검사를 수행함과 더불어, 해시함수 결과를 근거로 무효값이 존재하는 해시함수 결과값이 존재하는 경우에는 해당 인덱스 키에 대한 블룸필터 검사 처리를 중단함으로써, 블룸필터를 이용하여 보다 신속하게 스토리지에 해당 데이터가 저장되어 있는지의 여부를 판단할 수 있는 것이다.Therefore, in the present invention, in addition to performing a bloom filter check on a plurality of index keys at the same time using the SIMD method, when a hash function result value with an invalid value exists based on a hash function result, the index key for the corresponding index key is performed. By stopping the bloom filter inspection process, it is possible to more quickly determine whether the corresponding data is stored in the storage using the bloom filter.

이어, 상기한 구성으로 된 본 발명에 따른 다수의 스토리지를 갖는 데이터베이스 관리시스템에서 블룸필터를 이용한 벡터화 기반의 스토리지 탐색 방법을 도3을 참조하여 설명한다.Next, a vectorization-based storage search method using a bloom filter in a database management system having multiple storages according to the present invention having the above-described configuration will be described with reference to FIG. 3.

도3을 참조하면, 먼저 데이터 처리수단(100)은 외부로부터 스토리지 검색을 요구하는 질의가 수신되면(ST100), 질의 요청 데이터별 인덱스 키를 생성한다(ST200). 이때, 데이터 처리수단(100)은 스토리지(200)의 인덱스 데이터 저장소(201)에 저장되는 인덱스 키 생성 알고리즘에 따라 질의 요청 데이터에 대응되는 인덱스 키를 생성한다.Referring to FIG. 3, first, when a query requesting storage search is received from the outside (ST100), the data processing means 100 generates an index key for each query request data (ST200). At this time, the data processing means 100 generates an index key corresponding to the query request data according to the index key generation algorithm stored in the index data storage 201 of the storage 200.

데이터 처리수단(100)은 각 스토리지(210,220,230)에 대해 SIMD 방식을 블룸필터(202)에 적용하여 다수의 인덱스 키에 대응되는 각 블룸필터 검사값을 동시에 수집한다(ST300). 이때, 데이터 처리수단(100)은 SIMD 수집 명령을 블룸필터(202)에 적용하기 위해 인덱스 키에 대한 정보를 벡터 구조로 변환하고 이를 각 스토리지(210)의 블룸필터(202)에 적용하여 다수 인덱스 키에 대한 블룸필터 검사값을 동시에 수집한다.The data processing means 100 applies the SIMD method to the bloom filters 202 for each storage 210, 220, and 230, and simultaneously collects each bloom filter check value corresponding to a plurality of index keys (ST300). At this time, the data processing means 100 converts information on the index key into a vector structure to apply the SIMD collection command to the bloom filter 202, and applies it to the bloom filter 202 of each storage 210 to index multiple data. The bloom filter check values for the keys are collected at the same time.

그리고, 데이터 처리수단(100)은 블룸필터 검사가 모두 완료된 인덱스 키에 대응되는 질의 요청 데이터를 해당 스토리지에서 검색한다(ST400). Then, the data processing means 100 retrieves query request data corresponding to the index key in which the bloom filter inspection is completed in the corresponding storage (ST400).

즉, 데이터 처리수단(100)은 인덱스 키에 대해 기 설정된 다수의 해시함수에 대한 모든 해시함수 결과값에 대응되는 블룸필터(202)의 비트값이 모두 유효값("1")인 경우, 해당 데이터가 해당 스토리지에 저장된 것으로 판단한다.That is, the data processing means 100 corresponds to the case where the bit values of the bloom filter 202 corresponding to the result values of all hash functions for a plurality of hash functions preset for the index key are all valid values ("1"). It is determined that the data is stored in the corresponding storage.

이때, 데이터 처리수단(100)은 다수의 인덱스 키에 대해 해시함수 결과값에 대한 블룸필터 검사를 해시함수 단위로 순차로 진행하여 해시함수 결과값이 무효값("0")인 인덱스 키에 대해서는 나머지 해시함수에 대해서는 불룸 필터 검사를 중단한다. In this case, the data processing means 100 sequentially performs a bloom filter check on the hash function result values for a plurality of index keys in a hash function unit, and the index key for which the hash function result value is an invalid value ("0") For the rest of the hash function, the bulroom filter inspection is stopped.

이에 따라, 데이터 처리수단(100)는 해당 인덱스 키에 대해 기 설정된 마지막 해시함수 결과값에 대한 불룸 필터 검사값이 유효값("1")인 경우, 해당 인덱스 키에 대응되는 질의 요청 데이터가 해당 스토리지에 저장된 것으로 판단할 수 있는 것이다. Accordingly, the data processing means 100 is a query request data corresponding to the corresponding index key when the Bloom Filter check value for the result of the last hash function preset for the corresponding index key is a valid value ("1"). It can be judged as being stored in storage.

이어, 도3에 도시된 블룸필터 검사를 동시 수행하는 과정(ST300)을 도4 및 도5를 참조하여 보다 상세히 설명한다. 도4는 도3에 도시된 블룸필터 검사를 동시 수행하는 과정(ST300)을 설명하기 위한 흐름도이도, 도5은 도4의 블룸필터 검사 과정을 개념화하여 예시한 도면이다.Next, the process of simultaneously performing the bloom filter inspection shown in FIG. 3 (ST300) will be described in more detail with reference to FIGS. 4 and 5. FIG. 4 is a flowchart for explaining a process (ST300) of simultaneously performing the bloom filter inspection shown in FIG. 3, and FIG. 5 is a diagram conceptually illustrating the bloom filter inspection process of FIG.

도3의 ST200 단계에서 질의 요청 데이터에 대해 Q개의 인덱스 키가 생성된 상태에서, 데이터 처리수단(100)은 SIMD 수집 명령을 수행하기 위해 다수의 셀로 이루어지는 해시벡터 크기를 설정하고, 각 셀에 대해 인덱스 키를 할당한다(ST311). 도5에는 해시벡터 크기는 4개 셀(S)로 이루어지고, 각 셀에는 인덱스 키(V1,V2,V3,V4)가 각각 할당된 상태가 예시되어 있다. In step ST200 of FIG. 3, in the state in which Q index keys are generated for the query request data, the data processing means 100 sets a hash vector size composed of a plurality of cells to perform a SIMD collection command, and for each cell The index key is allocated (ST311). 5, the hash vector size is composed of four cells (S), and each cell has an index key (V1, V2, V3, V4) assigned to each.

이어, 데이터 처리수단(100)은 인덱스 키별 해시함수 결과값을 산출하고, 이를 할당된 해시벡터 셀에 삽입한다(ST312). 이때, 각 인덱스 키에 대해 수행할 해수함수 종류 및 해시함수의 개수가 미리 설정되고, 인덱스 키에 대해 설정된 다수의 해시함수를 순차로 이용하여 해시함수 결과값을 산출한다. 도5의 "Count hash1" 에는 인덱스 키(V1,V2,V3,V4)에 대해 제1 해시함수(hash1)에 대한 제1 해시함수 결과값(5,8,3,2)이 해당 인덱스 키에 할당된 해시벡터 셀에 삽입된 상태가 예시되어 있다.Subsequently, the data processing means 100 calculates a hash function result value for each index key and inserts it into the assigned hash vector cell (ST312). At this time, the type of the hash function to be performed for each index key and the number of hash functions are set in advance, and a hash function result value is calculated by sequentially using a plurality of hash functions set for the index key. In FIG. 5, “Count hash1” includes the first hash function result value (5,8,3,2) for the first hash function (hash1) for the index keys (V1, V2, V3, V4) to the corresponding index key. The state inserted in the assigned hash vector cell is illustrated.

또한, 데이터 처리수단(100)은 각 인덱스 키에 대해 적용된 해시함수 순서에 대응되는 해시함수 제어벡터를 생성한다(ST313). 이때, 해시함수 제어벡터는 해시벡터와 동일한 크기로 설정되고, 인덱스 키가 할당된 위치에 대응되게 해당 해시함수 순서정보가 삽입된다. 예컨대, 도5에서 인덱스 키(V1,V2,V3,V4)에 대해 모두 제1 해시함수(hash1)가 적용된 상태에서, 해시함수 제어벡터는 각 셀이 모두 "1"인 {1,1,1,1} 로 생성된다.In addition, the data processing means 100 generates a hash function control vector corresponding to the order of hash functions applied to each index key (ST313). At this time, the hash function control vector is set to the same size as the hash vector, and the corresponding hash function sequence information is inserted corresponding to the position to which the index key is assigned. For example, in FIG. 5, in a state in which the first hash function hash1 is applied to all of the index keys V1, V2, V3, and V4, the hash function control vector is {1,1,1 in which each cell is all "1". , 1}.

이어, 데이터 처리수단(100)은 인덱스 키별 해시함수 결과값에 대응되는 비트수를 해시함수 결과값으로 나누기 연산하여 오프셋 벡터를 생성한다(ST314). 예컨대, 해시함수 결과값이 32비트로 이루어지는 경우, 도5에서 인덱스 키(V1,V2,V3,V4)의 제1 해시함수 결과값(5,8,3,2)을 32로 나눔 연산하여 제1 해시함수 결과값과 동일한 {5,8,3,2}의 오프셋 벡터를 생성한다.Subsequently, the data processing means 100 divides the number of bits corresponding to the hash function result value for each index key into the hash function result value to generate an offset vector (ST314). For example, when the result of the hash function consists of 32 bits, the first hash function result (5,8,3,2) of the index key (V1, V2, V3, V4) in FIG. The offset vector of {5,8,3,2}, which is the same as the hash function result, is generated.

또한, 데이터 처리수단(100)은 인덱스 키별 해당 블룸필터(202)에서 오프셋 벡터를 이용하여 수집된 블룸필터 검사값으로 이루어지는 인덱스 키별 테스트 벡터를 생성한다(ST315). 예컨대, 도5에서 인덱스 키(V1)에 대한 오프셋 벡터가 "5"이고, 해시함수 결과값이 32비트로 이루어지는 경우, 블룸필터(202)에서 5의 배수에 해당하는 비트의 블룸필터 검사값 32개를 추출하여 순차로 배열함으로써, 인덱스 키(V1)에 대한 테스트 벡터를 생성한다. 즉, 블룸필터(202)에서 "5,10,15,20, …"번째 비트의 블룸필터 검사값이 인덱스 키(V1)에 대한 테스트 벡터가 된다.In addition, the data processing means 100 generates a test vector for each index key consisting of the bloom filter check values collected using the offset vector in the corresponding bloom filter 202 for each index key (ST315). For example, when the offset vector for the index key V1 in FIG. 5 is “5”, and the result of the hash function is 32 bits, the bloom filter 202 has 32 bloom filter check values corresponding to multiples of 5 By extracting and arranging them sequentially, a test vector for the index key V1 is generated. That is, the bloom filter check value of the "5,10,15,20, ..." th bit in the bloom filter 202 becomes a test vector for the index key V1.

또한, 데이터 처리수단(100)은 테스트 벡터에서 인덱스 키별 대상 비트를 결정한다(S5316). 이때, 데이터 처리수단(100)은 테스트 벡터의 전체 비트수를 해시함수 결과값에 해당하는 비트수로 나누기 연산하여 대상 비트를 설정한다. 이때, 테스트 벡터 전체 비트수가 "32"는 해시함수 결과값에 해당하는 비트수 "32"와 동일한 바, 테스트 벡터의 첫번째 비트가 대상 비트로 설정된다. 그리고, 대상 비트는 블룸필터(202)에서 항상 해당 인덱스 키에 대해 산출된 해시함수 결과값에 대응되는 비트로 결정되는 바, 결과적으로 대상 비트는 블룸필터(202)에서 해시함수 결과 결과값에 대응되는 비트가 된다. 즉, 도5에서 인덱스 키(V1)에 대한 대상 비트는 블룸필터(202)의 "5"번째 비트가 된다.Also, the data processing means 100 determines a target bit for each index key in the test vector (S5316). At this time, the data processing means 100 sets the target bit by dividing the total number of bits of the test vector by the number of bits corresponding to the result value of the hash function. At this time, the total number of bits of the test vector is "32", which is the same as the number of bits "32" corresponding to the result of the hash function, and the first bit of the test vector is set as the target bit. And, the target bit is always determined as a bit corresponding to the hash function result value calculated for the corresponding index key in the bloom filter 202. As a result, the target bit corresponds to the result value of the hash function result in the bloom filter 202. It becomes a bit. That is, in FIG. 5, the target bit for the index key V1 becomes the "5" th bit of the bloom filter 202.

이후, 데이터 처리수단(100)은 인덱스 키별 대상 비트의 블룸필터 검사값으로 이루어지는 비트 벡터를 생성한다(ST317). 즉, 도5의 "gather"과 같이 인덱스 키(V1,V2,V3,V4)의 비트 벡터는 블룸필터(202)에서 각 인덱스 키(V1,V2,V3,V4)의 해시함수 결과값(5,8,3,2)에 해당하는 비트의 블룸필터 검사값인 "1,1,0,1"로 생성될 수 있다.Thereafter, the data processing means 100 generates a bit vector consisting of a bloom filter check value of a target bit for each index key (ST317). That is, the bit vector of the index keys (V1, V2, V3, V4) as shown in "gather" in Figure 5 is the hash function result value (5) of each index key (V1, V2, V3, V4) in the bloom filter 202 , 8,3,2), and a bloom filter check value of bits corresponding to "1,1,0,1".

이어, 데이터 처리수단(100)은 비트 벡터에서 무효값을 갖는 대상 비트가 존재하는지를 판단한다(ST318). 데이터 처리수단(100)은 비트 벡터와 무효값 "0" 만으로 이루어지는 제로 벡터를 AND 연산하여 제로 벡터와 동일한 값을 갖는 경우 "TRUE"에 해당하는 "1"값을 갖는 마스크 셋(SET)을 추출하고, 이 마스크 셋에서 "TRUE"로 판단된 대상 비트가 존재하는지를 판단한다. 즉, 도5의 "Compare"와 같이 마스크 셋 결과가 "o,o,x,o"로 나타날 수 있으며, 이를 통해 데이터 처리수단(100)은 해시벡터의 세번째 셀에 해당하는 인덱스 키의 대상 비트가 무효값을 가짐을 판단할 수 있다.Next, the data processing means 100 determines whether there is a target bit having an invalid value in the bit vector (ST318). The data processing means 100 ANDs a bit vector and a zero vector consisting of only the invalid value "0" to extract a mask set having a value of "1" corresponding to "TRUE" when AND has the same value as the zero vector. Then, it is determined whether there is a target bit determined to be "TRUE" in this mask set. That is, as shown in "Compare" in FIG. 5, the result of the mask set may be represented as "o, o, x, o". Through this, the data processing means 100 is the target bit of the index key corresponding to the third cell of the hash vector. It can be judged that has an invalid value.

그리고, 상기 ST318 단계에서 무효값을 갖는 대상 비트가 존재하면, 데이터 처리수단(100)은 대기대상인 차 순위 인덱스 키가 해시벡터를 만족시키기에 충분한 개수만큰 존재하는지를 판단한다(ST319). 이때, 데이터 처리수단(100)은 대기대상 인덱스 키의 개수가 무효값을 갖는 대상 비트 개수 이상인지를 판단한다. 도5에서 데이터 처리수단(100)은 무효값을 갖는 대상 비트는 1개이고, 대기대상 인덱스 키(Remained keys)는 5개(V5,V6,V7,V8,V9)이므로, 무효값을 갖는 대상 비트를 대체할 대기대상 인덱스 키가 충분한 것으로 판단한다. Then, when the target bit having an invalid value exists in step ST318, the data processing means 100 determines whether there are only a sufficient number of the difference index index keys, which are the waiting targets, to satisfy the hash vector (ST319). At this time, the data processing means 100 determines whether the number of waiting target index keys is greater than or equal to the number of target bits having an invalid value. In FIG. 5, since the data processing means 100 has one target bit having an invalid value, and five standby target index keys (V5, V6, V7, V8, V9), the target bit having an invalid value It is judged that the index key to be replaced is sufficient.

상기 ST319 단계에서 대기대상 인덱스 키가 충분한 경우, 데이터 처리수단(100)은 해시벡터에서 무효값을 갖는 대상 비트에 해당하는 인덱스 키를 제거하고, 대기대상 인덱스 키에서 차 순위 인덱스 키를 제거된 인덱스 키 위치의 해시벡터 셀에 삽입한다(ST320). 도5의 "Exclude failed key"와 같이 해시벡터에서 무효값을 갖는 대상 비트에 해당하는 셀의 인덱스 키(V3)를 제거하고, "Put next key"와 같이 대기대상 인덱스 키(Remained keys)에서 차 순위 인덱스 키(V5)를 해시벡터에서 인덱스 키(V3)에 할당된 셀에 할당한다. 즉, 도5의 "Count hash"와 같이 해시벡터의 각 셀에는 "V1,V2,V5,V4"가 할당되도록 갱신된다.If the standby index key is sufficient in step ST319, the data processing means 100 removes the index key corresponding to the target bit having an invalid value from the hash vector, and removes the index index of the secondary rank index from the standby index key. The key position is inserted into a hash vector cell (ST320). As in "Exclude failed key" in FIG. 5, the index key V3 of the cell corresponding to the target bit having an invalid value in the hash vector is removed, and the difference from the standby target index key (Remained keys) as in "Put next key". The rank index key V5 is assigned to the cell assigned to the index key V3 in the hash vector. That is, "V1, V2, V5, V4" is updated to be assigned to each cell of the hash vector, such as "Count hash" in FIG.

여기서, 상기 ST319 단계와 ST320 단계는 순서를 바꾸어 이루어질 수 있다. 즉, 데이터 처리수단(100)은 해시벡터에서 인덱스 키를 제거한 후, 대기대상 인덱스 키의 개수가 해시벡터에서 제거된 인덱스 키의 개수 이상인지를 판단할 수 있다. Here, steps ST319 and ST320 may be performed by changing the order. That is, after removing the index key from the hash vector, the data processing means 100 may determine whether the number of index keys to be queued is greater than or equal to the number of index keys removed from the hash vector.

한편, 상기 ST318 단계에서 무효값을 갖는 대상 비트가 존재하지 않으면, 데이터 처리수단(100)은 각 인덱스 키에 대해 적용할 해시함수가 남아 있는지를 확인한다. 그리고, 해시함수가 남아 있는 경우, 각 인덱스 키를 해당 차 순위 해시함수에 적용하여 해시함수 결과값을 산출하고, 이를 해당 해시벡터 셀에 삽입하는 ST312 단계를 수행한다. 이때, ST313 단계에서 데이터 처리수단(100)은 각 인덱스 키에 대해 적용된 차순위 해시함수 순서에 대응되도록 해시함수 제어벡터를 업데이트 한다. 예컨대, 이전 해시함수 제어벡터가 "1,1,1,1"이고, 현재 인덱스 키에 적용된 해시함수가 차순위인 경우, 해시함수 제어벡터는 "2,2,2,2"로 갱신된다. Meanwhile, if the target bit having an invalid value does not exist in step ST318, the data processing means 100 checks whether a hash function to be applied to each index key remains. Then, if a hash function remains, each index key is applied to the corresponding rank hash function to calculate a hash function result, and step ST312 is performed to insert it into the corresponding hash vector cell. At this time, in step ST313, the data processing means 100 updates the hash function control vector to correspond to the order of the next hash function applied to each index key. For example, if the previous hash function control vector is "1,1,1,1", and the hash function applied to the current index key is the next rank, the hash function control vector is updated to "2,2,2,2".

또한, 상기 ST318 단계에서 무효값을 갖는 대상 비트가 존재하지 않고, 인덱스 키에 대해 적용할 해시함수가 남아 있지 않은 경우, 해당 인덱스 키에 대응되는 질의 요청 데이터가 해당 스토리지에 저장된 것으로 판단하고, 도3의 ST400 단계를 수행한다. 이때, 현재 해시벡터에서 일부 인덱스 키에 대해 적용할 해시함수가 남아있지 않은 경우, 해시함수가 남아있지 않은 인덱스 키는 해시벡터에서 제거한다. 그리고, 데이터 처리수단(100)은 대기대상 인덱스 키의 개수가 해시벡터에서 제거된 인덱스 키의 개수 이상인지를 판단할 수 있다. In addition, in step ST318, when there is no target bit having an invalid value and no hash function to be applied to the index key, it is determined that the query request data corresponding to the index key is stored in the corresponding storage. Step ST400 of step 3 is performed. At this time, when there is no hash function to be applied to some index keys in the current hash vector, the index key without the hash function is removed from the hash vector. Then, the data processing means 100 may determine whether the number of index keys to be queued is equal to or greater than the number of index keys removed from the hash vector.

한편, 상기 ST319 단계를 수행한 이후 데이터 처리수단(100)은 현재 갱신된 해시벡터의 각 인덱스 키에 대해 적용할 해시함수가 남아 있는지를 확인하고, 해시함수가 남아 있는 경우, 각 인덱스 키를 해당 차 순위 해시함수에 적용하여 해시함수 결과값을 산출하여 이를 해당 해시벡터 셀에 삽입하는 ST312 단계를 수행한다.즉, 도5에서 "Count hash"와 같이 해시벡터가 "V1,V2,V5,V4"로 갱신된 상태에서, 각 인덱스 키에 대해 적용할 해시함수 순서를 확인하고, 해시함수 순서로 이루어지는 해시함수 제어벡터를 갱신하는 ST313 단계를 수행한다. 이때, 도5에서 해시벡터의 인덱스 키(V1,V2,V4)는 차 순위 해시함수 "hash2"가 적용되고, 인덱스 키(V5)는 최초 해시함수 "hash1"이 적용되는 바, 해시함수 제어벡터는 "2,2,1,2"로 갱신된다.On the other hand, after performing step ST319, the data processing means 100 checks whether there is a hash function to be applied to each index key of the currently updated hash vector, and if the hash function remains, corresponds to each index key. Applying to the difference rank hash function, the result of the hash function is calculated and inserted into the corresponding hash vector cell, step ST312 is performed. That is, in FIG. 5, the hash vector is "V1, V2, V5, V4" as in "Count hash". In the updated state as ", a step ST313 of checking a hash function sequence to be applied to each index key and updating a hash function control vector consisting of the hash function sequence are performed. At this time, in FIG. 5, the hash vector index key (V1, V2, V4) is applied to the difference hash function "hash2", the index key (V5) is applied to the first hash function "hash1", the hash function control vector Is updated to "2,2,1,2".

한편, ST319 단계에서 대기대상 인덱스 키가 충분하지 않은 경우, 데이터 처리수단(100)은 잔여 인덱스 키 각각에 대해 해시함수 결과값을 블룸필터에 적용하여 블룸필터 검사값을 추출하는 종래 블룸필터 검사동작을 수행한다(ST321). 예컨대, 해시벡터의 크기가 "4"로 설정되므로, 최대 잔여 인덱스 키는 현재 해시벡터에 할당된 인덱스 키를 포함하여 "3"개 이하인 경우, 잔여 인덱스 키에 대해 종래 블룸필터 검사동작을 수행한다.On the other hand, if the index key to be queued is not sufficient in step ST319, the data processing means 100 applies a hash function result value to each bloom filter for each remaining index key to extract a bloom filter check value. And performs (ST321). For example, since the size of the hash vector is set to "4", when the maximum remaining index key is "3" or less including the index key currently assigned to the hash vector, a conventional bloom filter inspection operation is performed on the remaining index key. .

100 : 데이터 처리수단, 200 : 스토리지,
201 : 인덱스 데이터 저장소, 202 : 블룸필터,
203 : 인덱스 오브젝트 저장소, 210 : DRAM,
220 : NVM, 230 : DISK.100: data processing means, 200: storage,
201: index data storage, 202: bloom filter,
203: index object storage, 210: DRAM,
220: NVM, 230: DISK.

Claims

In addition to storing data and index key information for the data, a plurality of storages equipped with bloom filters corresponding to the index keys are provided, and data processing means for searching the storage storing the requested data and performing the query processing In a storage search method using a bloom filter in a database management system having a large number of storage,
A first step of sequentially assigning N index keys corresponding to the requested data to the hash vector having N cells in the data processing means;
A second step of calculating a hash function result value for each index key and inserting it into the assigned hash vector cell,
A third step of generating a test vector for each index key consisting of a preset number of bloom filter check values randomly collected based on the result of the hash function for each index key in the bloom filter,
A fourth step of determining a target bit corresponding to a hash function result value in a test vector for each index key, and generating a bit vector consisting of bloom filter check values of the target bit for each index key.
If there is a target bit having an invalid value in the bit vector, remove the corresponding index key from the hash vector, and update the hash vector by inserting the index key of the difference rank into the cell assigned to the index key removed from the hash vector. Step 5 and,
The operation after the second step of calculating the result of the hash function for the index key assigned to the hash vector and inserting it into the assigned hash vector cell is repeatedly performed, but a predetermined hash function for each index key is sequentially applied. And calculating a result value, and comprising a sixth step of determining that the corresponding data is stored in the corresponding storage based on the index key where the bloom filter inspection for all the hash function result values has been completed. A vectorization-based storage search method using bloom filters in a database management system with storage.

According to claim 1,
In the second step, the data processing means generates a hash function control vector consisting of a hash function sequence that calculates a hash function result value for each index key,
In the sixth step, the data processing means removes the corresponding index key from the hash vector and removes the index key from the removed index key when the hash function sequence number for each index key of the hash function control vector and the preset number of hash functions are the same. A vectorization-based storage search method using a bloom filter in a database management system having a plurality of storages, characterized in that a hash vector is updated by inserting a ranking index key.

According to claim 1,
In the third step, the data processing means generates an offset vector having the same value as the result of the hash function for each index key, and based on the offset value for each index key, checks the bloom filter check value of a bit corresponding to a multiple of the offset value in the bloom filter. Generate a test vector by extracting a preset number,
In the fourth step, the data processing means determines a first bit as a target bit of a corresponding index key in a test vector for each index key. A vectorization-based storage search method using a bloom filter in a database management system having multiple storages.

The method according to claim 1 or 2,
In the fifth or sixth step, when the index key is removed from the hash vector, the data processing means determines whether the number of index keys to be queued is equal to or greater than the number of index keys removed from the hash vector, and the number of index keys to be queued. If is less than the number of index keys removed from the hash vector, a plurality of storages characterized by performing a conventional bloom filter inspection operation to extract a bloom filter check value by applying a hash function result value to each bloom filter for each remaining index key A vectorization-based storage search method using a bloom filter in a database management system having a.

According to claim 1,
In the fifth step, the data processing means generates a mask set by ANDing a zero vector and a bit vector consisting of a bloom filter invalid value, and determines that the target bit having the same value as the zero vector in the mask set has an invalid value. A vectorization-based storage search method using a bloom filter in a database management system having multiple storages characterized by the above-mentioned.

According to claim 1,
The data processing means uses a single instruction multiple data (SIMD) collection command that simultaneously calculates multiple values for a single command to perform bloom filter inspection processing for each storage, respectively, and manages a database with multiple storages. A vectorization-based storage search method using bloom filters in the system.