WO2020153522A1

WO2020153522A1 - Hybrid indexing device in heterogeneous storage based database management system

Info

Publication number: WO2020153522A1
Application number: PCT/KR2019/001244
Authority: WO
Inventors: 이고르체르냐크; 한혁; 진성일; 민대홍
Original assignee: ㈜리얼타임테크
Priority date: 2019-01-25
Filing date: 2019-01-30
Publication date: 2020-07-30
Also published as: KR20200092710A

Abstract

The present invention relates to a database management system that stores data in different storages, such as DRAMs, NVMs, or disks. The present invention relates to a technology that enables faster access to required data in such a manner that each storage is provided with a search filter corresponding to stored data, and a storage in which requested data is stored is selected through the search filter to perform a data retrieval process.

Description

Hybrid indexing device in heterogeneous storage-based database management system

The present invention relates to a database management system that stores data in different storages, such as DRAM, NVM, or DISK. Index data corresponding to the stored data is stored in each storage to perform index data management more efficiently. It relates to a technology that provides a search filter in each storage to perform data search processing in the storage in which the requested data is stored, thereby allowing faster access to required data.

A database management system (DBMS) is a set of software tools that allow multiple users to access data in a database. More specifically, a DBMS is implemented in a database server to systematically handle the needs of a large number of users or programs and respond appropriately to use data.

Meanwhile, when a specific query is input from the outside, the DBMS performs functions such as selecting, inserting, updating, and deleting data in the database according to the entered query. Here, a query means a description of what request for data stored in a table in a database, that is, what operation to perform on the data, is expressed using a language such as Structured Query Language (SQL).

In addition, as the amount of data increases, the DBMS generally has an index. Here, the index means a data structure that speeds up a search for a table in the database field.

Meanwhile, recently, hybrid DBMSs having storages having different performances have been proposed and operated. Storage provided in the hybrid DBMS includes DRAM, NVM, and DISK.

DRAM is fast to read and write, but it is still relatively expensive despite a significant price drop.

NVM has a write operation that is as fast as DRAM, but has poor read performance compared to write performance. Also, the price of NVM is more expensive than DISK.

DISK offers low cost and large storage space, but has the lowest read and write performance. To solve this problem, the hybrid DBMS stores HOT data in DRAM, WARM data in NVM, and COLD data that has not been accessed for the longest time in DISK.

However, in a hybrid DBMS that divides and stores data by layer, if all indexes are stored and operated in one storage, the memory usage of the storage may increase significantly, and in the case of a large capacity index, very much space in the storage is used. It can be occupied, so it is a very inefficient operation method.

In addition, since the T-tree index suitable for an in-memory DBMS does not have the same performance as a B-tree that shows high performance in a disk-based DBMS in terms of performance, a hybrid DBMS is applied by applying a data structure suitable for specific storage characteristics In the case of operating the work performance may be deteriorated.

Accordingly, the present invention was created in consideration of the above-described circumstances, and each index is stored in a data structure suitable for the corresponding storage in each storage, thereby enabling more efficient index data management. The technical objective is to provide a hybrid indexing device in a heterogeneous storage-based database management system.

In addition, the present invention is equipped with a search filter corresponding to the stored data in each storage, and by selecting the storage in which the requested data is stored through the search filter, and performing data search processing, it reduces unnecessary read operations and searches data more quickly. Another technical purpose is to provide a hybrid indexing device in a heterogeneous storage-based database management system that enables the user to do so.

According to an aspect of the present invention for achieving the above object, including dynamic random access memory (DRAM), non-volatile memory (NVM), and DISK, each storage is index data corresponding to data stored in the corresponding storage A plurality of storages including an index data storage in which a data is stored, a search filter having a bit string, and a bit value of a location corresponding to data stored in the storage is set to a valid value, and a request to insert data from the outside With respect to, the corresponding data is stored in the DRAM, and new index data corresponding to the stored data is generated and stored in the index data storage of the DRAM, and the corresponding data is applied to a predetermined hash function to DRAM at a location corresponding to the hash value. The search filter bit of is set to a valid value, and for data change requests that include data modification and deletion from the outside, the hash value is calculated by applying the data to be changed to a predetermined hash function, and then the hash value is stored in each storage. Based on the bit value of the corresponding location, it determines the storage where the data to be changed is stored, retrieves the data to be changed from the determined storage, performs data change processing, and changes the index data stored in the index data storage in response to the changes. A hybrid indexing device is provided in a heterogeneous storage-based database management system comprising data processing means for performing data management.

In addition, the DRAM stores HOT data having an access point within a certain period from the present, including the initial data insertion, and the NVM stores WARM data with a certain period of access from the present, and DISK access from the current COLD data exceeding a certain period is configured to be stored, and the data processing means performs data tiering to move data between storages based on the access point of data stored in each storage, and corresponds to data movement. A hybrid indexing device is provided in a heterogeneous storage-based database management system characterized by updating the index data storage of storage and the bit stream of a search filter.

In addition, the data processing means, if the amount of data stored in any storage when performing data tiering is greater than or equal to a preset threshold, check the data storage amount of the other storage, and additionally perform data movement to storage having a difference from the preset threshold, A hybrid indexing device is provided in a heterogeneous storage-based database management system characterized by determining data to be moved based on an access time point in correspondence with a layer of storage in which data is to be stored and moved.

In addition, each storage is configured to have different data structures in consideration of read/write characteristics and space utilization characteristics of data, and the data processing means stores data corresponding to the data structures of the corresponding storage when data is stored. A hybrid indexing device is provided in a heterogeneous storage-based database management system.

In addition, the data processing means is provided with a hybrid indexing device in a heterogeneous storage-based database management system, characterized in that the search filter bit string provided in each storage is regenerated based on data stored in the storage according to a preset condition. .

In addition, the data processing means is provided with a hybrid indexing device in a heterogeneous storage-based database management system characterized by regenerating the search filter bit string of the storage when data deletion from the storage is performed more than a predetermined number of times.

According to the present invention, as well as solving the problem of storing a large amount of index data in expensive storage, the storage in which the requested data is stored by using the Bloom filter without unconditionally checking all the storage for the requested query Search performance can be improved by selecting and searching.

In addition, according to the present invention, by periodically regenerating the search filter bit string of each storage based on the currently stored data, it is possible to minimize the error of determining that the deleted data still exists in the corresponding storage.

1 is a view showing a schematic configuration of a hybrid indexing device in a heterogeneous storage-based database management system according to the first embodiment.

FIG. 2 is a view for explaining the structure of the search filter 203 shown in FIG. 1;

3 is a view showing the internal configuration of the data processing means 100 shown in FIG. 1 separated by function.

4 is a view for explaining the operation of the hybrid indexing device in the heterogeneous storage-based database management system shown in FIG.

5 is a view for explaining in more detail the data layering process operation (ST100) shown in FIG.

FIG. 6 is a diagram for explaining the data management operation ST200 shown in FIG. 4 in more detail.

The configurations shown in the embodiments and drawings described in the present invention are only preferred embodiments of the present invention, and do not represent all of the technical spirit of the present invention, so the scope of the present invention is the embodiments and drawings described in the text It should not be construed as limited by. That is, since the embodiments can be variously changed and have various forms, it should be understood that the scope of the present invention includes equivalents capable of realizing technical ideas. In addition, the purpose or effect presented in the present invention does not mean that a specific embodiment should include all of them or only such an effect, and the scope of the present invention should not be understood as being limited thereby.

All terms used herein have the same meaning as generally understood by a person skilled in the art to which the present invention pertains, unless otherwise defined. The terms defined in the commonly used dictionary should be interpreted as being consistent with meanings in the context of related technologies, and cannot be interpreted as having ideal or excessively formal meanings that are not explicitly defined in the present invention.

1 is a diagram showing the schematic configuration of a hybrid indexing device in a heterogeneous storage-based database management system according to a first embodiment of the present invention.

Referring to FIG. 1, in the heterogeneous storage-based database management system according to the present invention, the hybrid indexing device includes a data processing means 100 and different types of storage 200.

The storage 200 includes a dynamic random access memory (DRAM) 210, a non-volatile memory (NVM) 220, and a DISK 230.

Here, the DRAM 210 stores HOT data having an access point within a certain period from the present, including the initial data insertion, and the NVM 220 stores WARM data whose access point has passed a certain period from the present, DISK 230 is configured to store COLD data whose access point exceeds a certain period from the present. Of course, if another type of storage is provided, data satisfying a criterion for a preset access time may be moved to and stored in the corresponding storage.

At this time, each storage 200 is configured to have different data structures in consideration of data read/write characteristics and space utilization characteristics. The DRAM 210 and the NVM 220 store data using a T-Tree structure, and the DISK 230 store data using a B-Tree structure.

In addition, each storage 200 is basically composed of an index data storage 201 and a search filter 202, and the index object storage 203 is additionally added to the DRAM 210 in which data to be initially inserted is stored. It is provided.

The index data storage 201 stores actual index data for data stored in each storage 200. At this time, the index data of the DRAM 210 and the NVM 220 are stored using a T-Tree structure, and the index data of the DISK 230 is stored using a B-Tree structure. The T-Tree structure is an optimal data structure for accessing data in the main memory DBMS, and the B-Tree structure is the most suitable data structure for a disk-based DBMS.

The search filter 202 may consist of a Bloom filter, which is a stochastic data structure used to test whether an element is a member of a set.

As shown in FIG. 2, a bloom filter may be expressed as a bit stream having an m-bit size, and k different hash functions for each element among the bit streams having an m-bit size. Bits corresponding to the hash value of the derived element may be set to a valid value expressed in black, for example, "1", and the remaining bits may be set to "0".

That is, when querying whether or not any data is stored in the corresponding storage 200, an index having hash values by k hash functions of the data to be examined is obtained. Among the bit strings of bloom filters If all bits of the positions corresponding to the indices are "1", it is determined that the corresponding data belongs to the set, and if any of those bits is "0", it is negative that it does not belong to the set. do.

For example, when the second bit and the sixth bit of the bloom filter 202 are selected as “2” and “6” calculated by the hash function in FIG. 2, the DRAM 210 may select “1,1”, NVM ( 220) and DISK 230 have a bit value of “0,0”, so it can be determined that the corresponding data is stored in the DRAM 210.

The index object store 203 stores index objects including storage information including the number of storage devices currently in use and index points for each storage.

On the other hand, Figure 3 is a view showing the internal configuration of the data processing means 100 shown in Figure 1 separated by function.

Referring to FIG. 3, the data processing means 100 includes a data processing unit 110, a layering processing unit 120, an index processing unit 130, a storage search unit 140, and a search filter management unit 150.

The data processing unit 110 analyzes an input query, performs transaction processing such as insertion, deletion, modification, and search for the corresponding data in response to the query, and returns the transaction processing result as a query processing result.

The data processing unit 110 stores the corresponding data in the DRAM 210 for a data insertion request, provides the data to the index processing unit 120 to request index creation, and transmits the index data and the corresponding data to the DRAM 210. To save. That is, the data processing unit 110 unconditionally stores and processes the first insert request data in the DRAM 210.

In addition, the data processing unit 110 provides a hash value for the corresponding data to the search filter 202 of each storage 200 for a change transaction including deletion and modification of specific data previously stored in the storage 200 , Based on the bit value of the search filter 202, the storage target storage 200 is determined, and the determined storage target storage 200 searches and changes the corresponding target data.

In addition, the data processing unit 110 may perform a range search on all storage 200 for a range search query and perform transaction processing on the searched range.

The tiering processing unit 120 analyzes a recent access time of data stored in each storage 200, stratifies the data based on a difference between the current time and the access time, and corresponds to the data hierarchy to the predetermined storage 200. Perform data movement.

In addition, when the amount of data stored in any storage is greater than or equal to a preset threshold, the tiering processing unit 120 may check the data storage amount of the other storage and further perform data movement to storage having a difference from the preset threshold. At this time, the layering processing unit 120 determines the moving target data based on the access point in time in correspondence with the layer of the storage where the data is to be moved and stored. That is, when the layer of storage for storing data is higher than the storage in which the current data is stored, the access point is determined as the data to be moved in the order of recent data.

The index processor 130 changes index data to correspond to data inserted and changed and data moved by the layering process. The index processor 130 changes index data to correspond to the data structure of the storage 200 in which the corresponding data is stored.

The storage search unit 140 calculates a hash value by applying the data to the hash function in response to the request from the data processing unit 110, and a bit corresponding to the hash value in the search filter 202 of each storage 200 The bit value of is called, and the storage 200 in which the corresponding request data is stored is determined based on the called bit value.

That is, when the bit points calculated by applying the requested data to the hash function are "2" and "6", the search filter 202 of the DRAM 210 calls the "2" and "6" th bit values to DRAM Set as the search value, and call the "2" and "6" bit values from the search filter 202 of the NVM 220 to set the NVM search value, and select "2" from the search filter 202 of the DISK 230. Set the DISK search value by calling the "and "6" bit values. In addition, storage that is “1” in each storage search value may be set as the storage to be searched. In this case, when two or more storages are determined as a search target, the storage search unit 140 may set all storages as the search target storage.

The search filter manager 150 updates the bit stream of the search filter 203 in response to data stored in the corresponding storage 200. That is, the search filter management unit 150 applies the data to a predetermined hash function and sets the search filter bit of the DRAM at a position corresponding to the hash value to an effective value.

In addition, the search filter management unit 150 re-generates a search filter bit string of each storage 200 based on the currently stored data when a predetermined period or the number of times data is deleted from the storage 200 is greater than a predetermined period. In general, the biggest problem with bloom filters is that you can only add elements to a set, but not elements. Accordingly, in a system that searches for a specific data item in a database using a bloom filter, if a data item is added to and deleted from the database, the bloom filter outputs a search result as if the data item still exists in the database, Since it is not possible to judge whether or not to delete, the false positive judgment increases rapidly. Accordingly, the search filter bit stream is periodically regenerated based on the currently stored data to solve the problem of the bloom filter.

Next, the operation of the hybrid indexing device in the heterogeneous storage-based database management system shown in FIG. 1 will be described with reference to FIG. 4.

First, in the index object storage 203 of the DRAM 210, storage information including the number and pointer of the storage 200 provided in the system is registered. In addition, a search filter 201 and an index data storage 202 are provided in each storage 200.

In the above-described state, the data processing means 100 performs data layering processing on the data pre-stored in the storage 200 (ST100), and inserts, modifies, deletes and searches data in response to a query requested from the outside. The corresponding data management process is performed, and the result is returned as a query result value (ST200).

In addition, the data processing means 100 regenerates the search filter 202 provided in each storage 200 when the preset condition for regeneration of the search filter is satisfied while performing the above-described data layering process and data management process. (ST300). In this case, the data processing means 100 regenerates a search filter bit string of the corresponding storage 200 or deletes data from at least one storage 200 when data deletion in the storage 200 is performed more than a preset number of times. When the number of times is greater than or equal to the set number, the search filter bit string of all the storages 200 may be regenerated.

Next, with reference to FIG. 5, the data layering operation (ST100) shown in FIG. 4 will be described in more detail.

Referring to FIG. 5, data is first stored in each storage 200, index data and search filter 203 are respectively registered and stored in correspondence to the stored data, and a layering criterion for data tiering is set in advance.

In the above-described state, the data processing means 100 determines whether there is data to be moved among the data stored in each storage 200 and the current state satisfies a preset layering condition (ST110). At this time, the layering criterion may be set as a difference between a current date and a data access date.

In step ST110, the data processing means 100 determines the storage target storage 200 to which the target data is to be moved (ST120). At this time, the DRAM 210 stores the HOT data having the shortest difference between the current date and the data access date, the difference between the current date and the data access date, and the WARM data is stored in the NVM 220, and the current date and data The storage target storage 200 may be determined such that COLD data having the longest difference between access dates is stored in the DISK 230.

Subsequently, when the storage target storage 200 is determined, the data processing means 100 allocates a new slot to the storage target storage 200 and stores the movement target data corresponding to the index structure of the storage target storage 220. (ST130). Data is stored in the DRAM 210 and NVM 220 in a T-Tree structure, and data is stored in the DISK 230 in a B-Tree structure.

In addition, the data processing means 100 updates the index data storage 201 and the search filter 203 of the storage 200 before the movement and the storage target storage 200 after the movement in response to the movement of the data to be moved ( ST140). That is, the index data for the corresponding data is deleted from the index data storage 201 of the storage 200 before moving, and the index data for the moving target data is generated in the index data storage 201 of the storage target storage 200. To add. In addition, the search filter bit string of the storage target storage 200 is updated to search the corresponding data.

In addition, the data management operation ST200 illustrated in FIG. 4 will be described in more detail with reference to FIG. 6.

When there is a query request from the outside, the data processing means 100 analyzes the query and determines the transaction type included in the query.

If the transaction included in the query is an insert transaction (ST210), the data processing means 100 allocates a new slot to the DRAM 210 to store the data (ST220), and corresponds to the corresponding data stored in the DRAM 210 The new index data is generated and stored in the index data storage 201 of the DRAM 210 (ST230).

In addition, the data processing means 100 updates the search filter bit string provided in the DRAM 210 in response to the inserted data (ST240). At this time, the data processing means 100 calculates a hash function for the changed data, and sets a search filter bit string corresponding to the calculated hash value to "1".

Meanwhile, if the transaction included in the query is a change transaction including modification or deletion of data (ST250), the data processing means 100 applies the data to the predetermined k hash functions to calculate a hash value, and each storage The corresponding hash value is transmitted to (200) to collect the bit value of the position corresponding to the currently calculated hash value in the corresponding search filter bit string for each storage (200) (ST260).

The data processing means 100 determines the search target storage 200 based on the search filter bit value of each storage 200 (ST270). At this time, the data processing means 100 determines the corresponding storage 200 as the search target storage when the bit values are all "1." And, when the search target storage is two, all three storages are changed to the search target. In addition, when the storage to be searched does not exist, processing to return the query result as the requested data does not exist in any storage.

In addition, the data processing means 100 searches for the corresponding change target data in the search target storage 200 and performs the corresponding change transaction processing in the storage 200 in which the change target data is searched (ST280). That is, change processing such as modification or deletion of data stored in the searched storage 200 is performed.

Then, the data processing means 100 updates the index data storage 201 of the corresponding storage 200 in response to the data change in step ST280 (ST290). That is, in response to data deletion, the index data of the corresponding data is deleted from the corresponding index data storage 201, and the index data stored in the corresponding index data storage 201 is changed in response to data modification.

Meanwhile, in the present invention, the data processing means 100 performs the operations of ST260 to ST280 of FIG. 6 for the search transaction.

That is, the data processing means 100 calculates a hash value by applying the corresponding data to the predetermined k hash functions for the search transaction, and transmits the hash value to each storage 200 to correspond to each storage 200 The search target storage 200 is determined by collecting a bit value at a position corresponding to the currently calculated hash value in the search filter bit string. Then, the data processing means 100 searches for the search target data in the search target storage 200 and returns the searched data as a result of the query.

In addition, in the present invention, the data processing means 100 may perform a corresponding range search process in all storage for a certain range search and return the query result.

Claims

It includes dynamic random access memory (DRAM), non-volatile memory (NVM), and DISK, and each storage has an index data storage in which index data corresponding to data stored in the storage is stored, and a certain bit string. A plurality of storages including a search filter in which a bit value of a location corresponding to data stored in the storage is set to an effective value,

For requests to insert data from the outside, the data is stored in DRAM, and new index data corresponding to the stored data is generated and stored in the index data storage of DRAM, and the hash value is applied to the predetermined hash function. Set the search filter bit of the DRAM at the position corresponding to the effective value,

For data change requests that include data modification and deletion from the outside, the change target data is applied to a predetermined hash function to calculate the hash value, and the change target is based on the bit value of the location corresponding to the hash value in each storage. And data processing means for determining storage in which data is stored, retrieving data to be changed from the determined storage, performing data change processing, and managing data to change index data stored in the index data storage in response to changes. Hybrid indexing device in a heterogeneous storage-based database management system characterized by being configured.
According to claim 1,

In the DRAM, HOT data having an access point within a certain period from the present, including the first data insertion, is stored, and in the NVM, WARM data is stored in which an access point from a current has passed a certain period, and the access point from the current is fixed in the DISK. COLD data exceeding the period is configured to be stored,

The data processing means performs data tiering to move data between storage based on the access point of data stored in each storage, and updates the bit strings of the index data storage and search filter of the corresponding storage in response to data movement. Hybrid indexing device in a heterogeneous storage-based database management system.
According to claim 2,

When performing the data tiering, the data processing means checks the data storage amount of the other storage when the amount of data stored in any storage is greater than or equal to a preset threshold, and additionally performs data movement to storage having a difference from the preset threshold, but the data is Hybrid indexing device in a heterogeneous storage-based database management system, characterized in that it determines the data to be moved based on the access point in response to the layer of storage to be moved.
The method according to any one of claims 1 to 3,

Each storage is configured to have different data structures in consideration of data read/write characteristics and space utilization characteristics,

The data processing means is a hybrid indexing device in a heterogeneous storage-based database management system characterized by storing data corresponding to a data structure of the corresponding storage when storing data.
According to claim 1,

The data processing means is a hybrid indexing device in a heterogeneous storage-based database management system, characterized in that the search filter bit string provided in each storage is regenerated based on data stored in the storage according to a preset condition.
The method of claim 5,

The data processing means is a hybrid indexing device in a heterogeneous storage-based database management system characterized by regenerating the search filter bit string of the storage when data deletion from the storage is performed more than a preset number of times.