WO2013035287A1

WO2013035287A1 - Database management device, database management method, and program

Info

Publication number: WO2013035287A1
Application number: PCT/JP2012/005519
Authority: WO
Inventors: 盛朗佐々木
Original assignee: 日本電気株式会社
Priority date: 2011-09-08
Filing date: 2012-08-31
Publication date: 2013-03-14

Abstract

In the present invention, a node management unit (120) performs a transition of a level of a tree-index (10), and performs partitioning, insertion, and searching with respect to each node of the tree-index (10). For the tree-index (10), nodes belonging to a first level are set to be of a first size, and nodes belonging to a second level are set to be of a second size. For example, for the tree-index (10), leaf-nodes (described as zeroth nodes within the figure) and first nodes belonging to an upper level of the same are made to differ in size. For example, the size of the leaf-nodes is made to be larger than the size of the first nodes.

Description

Database management apparatus, database management method, and program

The present invention relates to a database management device, a database management method, and a program for managing a database having a tree index structure.

One of the database structures is a tree index structure (see, for example, Patent Document 1). Non-Patent Documents 1 to 21 disclose the following techniques.

The B-tree index described in Non-Patent Document 1 is designed as a guideline to minimize the number of disk accesses. Non-Patent Document 2 describes a derived index including B + -tree.

Non-Patent Document 3 describes a T-tree for the purpose of reducing memory usage and CPU cycles. Non-Patent Document 3 suppresses memory usage by referring to a pointer. Further, Non-Patent Document 3 uses a multi-tree instead of a binary tree to reduce the frequency of occurrence of “rotation”, which is a process for balancing the tree, thereby reducing the CPU cycle.

Non-Patent Document 4 proposes Cache-Sensitive Search Tree (CSS-tree). The purpose of CSS-tree is to enable fast searching in OLAP (online-analytical-processing) workloads where few incremental updates occur. In CSS-tree, the size of the node is the same as the cache line, and the number of cache misses per node is reduced to one. Further, the data to be accessed next is determined from the position where the key is recorded in the node. In a B-tree or the like, the next data to be accessed is determined from a pair of a key and a pointer. However, since a pointer can be omitted in a CSS-tree, the number of keys that can be recorded in a node can be increased. Then, it is possible to narrow down to a smaller number of keys from data on one cache line.

Non-Patent Document 5 proposed Cache-Sensitive B + -tree (CSB + -tree), which is an updatable CSS-tree. In the CSB + -tree, a node group including nodes arranged in a continuous area is formed below the nodes. The upper node includes a pointer to the head of the node group and a key. The recording location of the key determines which node in the group should be accessed next. Compared to a B-tree that has one pointer for each key, the number of pointers can be greatly reduced, so that high-speed search can be realized in the same way as CSS-tree. The difference from CSS-tree is that it considers updates. Insertion in CSB + -tree is similar to insertion in B + -tree and is done by splitting nodes. However, the node generated by the division must be included in the same node group as the original node. Therefore, in order to divide a node, it is essential to allocate a large continuous area to a node group in advance or update for each node group.

Non-Patent Document 6 proposed Prefetching B + -tree (pB + -tree), which improves performance by utilizing prefetching. In a search using B + -tree, the ratio of the waiting time caused by a data cache miss to the execution time is 65% when searching for one key and 84% when searching for multiple keys at once. . pB + -tree uses prefetch to reduce this latency. When retrieving one key, all of the data in the next accessed node is prefetched. In the search for a plurality of keys, prefetching is also performed when data is acquired by tracing between leaf nodes. Leaf node prefetching is made possible by creating an array of pointers to leaf nodes or by creating horizontal links between branch blocks one leaf above. Update speed is also improved by improving search performance by prefetch.

Non-patent document 7 proposed Fractal Prefetching B + -tree (fpB + -tree) that is conscious of both CPU cache and disk. The node size of the B + -tree is 32-128B when optimized for the cache and 4K-64KB when optimized for the disk. If the node size is reduced to match the cache, the number of disk I / Os will increase greatly. If the node size is adjusted to the size of the disk, a binary search is performed for a large number of entries in the node, and cache misses increase. Therefore, in the fpB + -tree, a B + -tree having a node having a size optimized for the disk is generated, and a (sub) node having a size optimized for the cache is generated in the node.

In Non-Patent Document 8, an analytical performance model of CSB + -tree was created, and its validity was shown through implementation and performance evaluation of CSB + -tree. The outline of the performance model is expressed by the following equation.
t = I × cpi + M × miss_latency + B × pred_penalty + T × tlb_penalty
Where t is the execution time, cpi is the number of cycles per instruction, miss_latency is the cache miss, pred_penalty is the branch miss, and miss_latency is the TLB miss cost. I is the number of instructions, M is the number of cache misses, B is the number of branch misses, and T is the number of TLB misses. In the main memory database, cache misses are a major cause of reduced speed. Generally, matching the cache line and index node sizes is effective in reducing cache misses. However, this ignores other factors such as the number of instructions, branch misses, and TLB misses. Therefore, in Non-Patent Document 8, the above model is created, and its correctness is shown through implementation and evaluation. As a result, it was shown that increasing the node size (ex. 512B) improved the speed by 17% and the space efficiency by 57%.

Non-Patent Document 9 describes a method for efficiently compressing a key of a tree index. On memory, key size is more problematic than on disk. However, when a key is handled by a pointer reference, cache misses frequently occur. In Prefix B-tree (Non-patent Document 10), the key has a variable length, so that the space efficiency may decrease. On the other hand, if a partial key consisting of a pointer to a key, a digit (bit) where a difference occurs, and the first L bytes from the offset are used, 1) cache miss can be minimized regardless of the key size. ) It is faster than handling normal keys when the key size is large (slower when the key size is small) 3) Space efficiency is a little worse than T-tree, much better than uncompressed keys 4) Small The conclusion was that using a fixed-length partial key would avoid most indirect references.

In Non-Patent Document 11, “FAST”, which is a binary tree optimized for individual hardware, was proposed. In FAST, the SIMD register size, line size, page size, etc. are considered. Typically, 8B, 64B, 2M (4K) B are these sizes. In Non-Patent Document 6, a node sized according to the cache is provided in a node sized according to the disk. However, in Non-Patent Document 11, a node sized according to the cache is further aligned with the SIMD register. Provide nodes of different sizes. Within these three types of nodes, entries are packed with width priority. For example, a certain entry and two entries corresponding to its children are packed. Combined with other factors such as key compression, the result is a five-fold improvement in performance on a CPU basis.

Furthermore, a tree index that assumes flash memory, not disk or memory, has also been proposed. Broadly speaking, flash memory is a recording medium that is faster than disk and slower than memory. Non-patent document 12 is a survey of algorithms and data structures related to flash memory, which are basically not disclosed, based on the patent document. The flash memory requires erasing in units of blocks in which several pages are combined before new data is written to the page. In many cases, the number of erasures is 100,000 to 1 million. Therefore, software control such as Flash Translation Layer (FTL) is included. In FTL, data is usually added to the circular log. That is, without rewriting the page, the page including the old data is left as it is, and new data is written in the empty page. The problem here is that valid pages (having the latest data) and invalid pages (having old data that is not referenced) coexist in the same block. In order to effectively use the storage area, it is necessary to delete invalid pages and collect valid pages in the same block. This is called garbage collection (GC).

Non-patent document 13 evaluated the performance of Solid State Drive (SSD), which is strongly influenced by the FTL that is a black box. It turns out that 1) read performance depends on the access pattern, 2) ライト low-end SSD write performance strongly depends on the access pattern, 3) disk cache is very effective for speedup, 4) process simultaneously Even if not, read and write interfere with each other, 5) FTL background job degrades performance, and 6) Random write fragmentation significantly degrades performance. And the view that SSDs will not replace HDDs right away because of these complexities was presented.

Non-Patent Document 14 proposed BFTL, which is a layer that efficiently handles fine data updates that occur in a B-tree. BFTL is a layer between block devices (FTL) and applications. Requests to block devices are read as requests to BFTL. BFTL has a buffer and a conversion table. Block changes are recorded in a buffer (on memory) and flushed at once. A conversion table that is a chain hash is used to convert a block address from logical to physical. As a result, the write speed is increased and the read speed is slightly reduced.

Non-Patent Document 15 proposes Lazy-Adaptive Tree (LA-Tree), which can achieve high performance by minimizing access to flash memory. The feature of LA-Tree is that buffers (cascaded buffers) are assigned to subtrees in the tree. When the buffer overflows, it is flushed to the buffer of the lower subtree. The buffer size is adaptively determined (adaptive buffering). A large buffer is allocated if there are many updates, and a small buffer is allocated if there are few updates. This is because, if many changes are recorded, the update performance increases, but the search performance decreases. LA-Tree showed a response time 2-12 times faster than existing methods such as BFTL.

In Non-Patent Documents 16 and 17, FD-tree was proposed for the purpose of solving the problem of random writing to the flash memory at a low speed. The FD-tree is composed of a head tree that is a small B + -tree whose node size is equal to the page size of the flash memory, and an array of sorted blocks. The array generally corresponds to one level of the tree, with lower level arrays in the tree being larger in size. By searching the block of the head tree or the array, the block to be searched next in the next lower level can be specified. When a head tree is provided, random writing to data on the memory can be processed at high speed. Writes are buffered in the head tree and are flushed to lower level buffers when the buffer is full.

Also, the following 5-minute rule is well-known as to which storage hierarchy data will be profitable. Non-Patent Document 18 shows the profitability of storing data in a memory as a 5-minute rule. R is the reference interval, M is the memory unit price, A is the disk bandwidth unit price, B is the data size, and Bmax is the disk block size. Then, when A / R-M * B = 0, it is profitable to put data in memory, but when R is 5 minutes. Similarly, if one instruction can be omitted per second, it is calculated in the same manner that the profit can be obtained even if the extra 10 bytes are used.

Non-Patent Document 19 shows that the 5-minute rule is established even in a hardware environment after ten years. Although the absolute performance of hardware has changed greatly, the 5-minute rule holds for random disk access. For sequential access (sort, cube, join, etc.), the 1-minute rule holds. Furthermore, if the node size of the B-tree is optimized by the latency of the disk, the bandwidth, and the utility of the node (the logarithm of 2 of the number of entries in the node), it becomes 8K-32 KB. On the other hand, the 10-byte rule holds for minicomputers, but the 1-byte rule for PCs and the 1K-byte rule for mainframes.

Non-Patent Document 20 presented an environment in which a 5-minute rule is established in a hardware environment after 20 years. The 5 minute rule is valid for moving between large pages of flash memory and memory and between small pages of flash memory and disk. Along with the recent expansion of the disk bandwidth, the node size of the B-tree for disks and memories is nearly optimal at 256 KB. For flash and memory targets, a 2 KB node size is preferred. Since the optimal node size is different between disk and flash, we estimated that it would be beneficial to transfer data at 256 KB between flash and disk and 4 KB between memory and flash.

Japanese Patent Laid-Open No. 04-112240

Databases are always required to improve speed. An object of the present invention is to provide a database management apparatus, a database management method, and a program capable of improving the speed of a database.

According to the present invention, there is provided a database management apparatus for managing a database having a tree index structure,
Nodes belonging to the first hierarchy are set to be the first size, nodes belonging to the second hierarchy are set to be the second size,
There is provided a database management apparatus comprising node management means for performing division, insertion, and search for each of nodes belonging to the first hierarchy and nodes belonging to the second hierarchy.

According to the present invention, there is provided a database management apparatus for managing a database having a tree index structure,
First node management means for dividing, inserting, and searching for nodes belonging to the first hierarchy;
Second node management means for dividing, inserting, and searching for nodes belonging to the second hierarchy by a method different from the first node management means;
A database management device is provided.

According to the present invention, there is provided a database management method for managing a database having a tree index structure,
The computer performs division, insertion, and search for nodes belonging to the first hierarchy,
A database management method is provided in which the computer divides, inserts, and searches a node belonging to a second hierarchy by a method different from the first node management unit.

According to the present invention, there is provided a program for causing a computer to function as a database management device that manages a database having a tree index structure,
In the computer,
A first node management function for dividing, inserting, and searching for nodes belonging to the first hierarchy;
A second node management function for dividing, inserting, and searching for nodes belonging to the second hierarchy in a method different from the first node management function;
A program for realizing the above is provided.

According to the present invention, the speed of the database can be improved.

The above-described object and other objects, features, and advantages will be further clarified by a preferred embodiment described below and the following drawings attached thereto.

It is a functional block diagram which shows the structure of the database management apparatus which concerns on 1st Embodiment. It is a functional block diagram which shows the structure of the database management apparatus which concerns on 2nd Embodiment. 3 is a flowchart illustrating search processing performed by a node management unit illustrated in FIG. 2. 3 is a flowchart showing an insertion process performed by a node management unit shown in FIG. 2. 5 is a flowchart showing a continuation of FIG. It is a figure which shows the structure of a flat node. It is a flowchart which shows the operation | movement at the time of the search of the node management part which concerns on 3rd Embodiment. It is a flowchart which shows the operation | movement at the time of node division of the node management part which concerns on 3rd Embodiment. It is a flowchart which shows operation | movement when the node management part which concerns on 4th Embodiment searches with respect to a sorted node. It is a figure which shows the structure of the database management apparatus which concerns on 6th Embodiment with the structure of a tree index. It is a table | surface which shows the calculation amount required for a search, the calculation amount required for insertion, and space efficiency in each of a flat node, a sorted node, and a tree node.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. In all the drawings, the same reference numerals are given to the same components, and the description will be omitted as appropriate.

(First embodiment)
FIG. 1 is a functional block diagram showing the configuration of the database management apparatus 100 according to the first embodiment. The database management device 100 is a device that manages the tree index 10 and includes a node management unit 120. The node management unit 120 performs transition of the hierarchy of the tree index 10 and performs division, insertion, and search for each node of the tree index 10.

The node management unit 120 writes an entry in the node in the insertion process. An entry is a key or a pointer to a key and a set of pointers. From the pointer, the location of data associated with other (lower) nodes and keys can be specified. The pointer can be omitted.

Further, the node management unit 120 searches for an entry corresponding to the designated key in the search process. Further, in the dividing process, the node management unit 120 secures a new node at the same level as the node to be divided, and moves a part of the entry of the target node to the new node. The node management unit 120 can execute these processes across nodes in different hierarchies.

The tree index 10 is set so that nodes belonging to the first hierarchy have the first size, and nodes belonging to the second hierarchy are set to the second size. For example, the tree index 10 is different in size between a leaf node (denoted as the 0th node in the figure) and a first node belonging to the hierarchy above it. In the present embodiment, for example, the size of the leaf node is made larger than the size of the first node. Regarding the node size, the technique described in Non-Patent Document 8 may be referred to. Alternatively, the key compression method shown in Non-Patent Documents 9 and 10 may be used.

The characteristics required for the node differ depending on the node hierarchy. For example, a node belonging to a certain hierarchy may be required to have high speed for search, and a node belonging to another hierarchy may be required to be high speed for insertion. When the node is an unsorted flat node, if the size of the node is small, the search can be performed at high speed. If the node size is large, the number of node divisions required at the time of insertion is reduced, so that the insertion can be performed at high speed. Therefore, according to this embodiment, the speed of the database can be improved.

Each component of the database management apparatus 100 shown in FIG. 1 is not a hardware unit configuration but a functional unit block. Each component of the database management apparatus 100 is centered on an arbitrary computer CPU, memory, a program for realizing the components shown in the figure loaded in the memory, a storage medium such as a hard disk for storing the program, and a network connection interface. It is realized by any combination of hardware and software. There are various modifications of the implementation method and apparatus.

(Second Embodiment)
FIG. 2 is a functional block diagram showing a configuration of the database management apparatus 100 according to the second embodiment. The database management apparatus 100 according to the present embodiment is the database management apparatus according to the first embodiment, except that the node management unit 120 includes at least a first node management unit 122 and a second node management unit 124. 100.

In the present embodiment, the tree index 10 is set so that the nodes belonging to the first hierarchy have the first size, and the nodes belonging to the second hierarchy are the second, as in the first embodiment. The nodes may belong to all the hierarchies and may have the same size. Also in this embodiment, the size of the leaf node may be larger than the size of the first node.

Then, the first node management unit 122 performs division, insertion, and search for the nodes belonging to the first hierarchy. In addition, the second node management unit 124 performs division, insertion, and search on the nodes belonging to the second hierarchy by a method different from that of the first node management unit 122.

Here, each of the first hierarchy and the second hierarchy may be a single hierarchy or a plurality of hierarchies. For example, all layers below a certain layer may be the first layer, and all the remaining layers may be the second layer.

Further, the node management unit 120 may be provided with a different management unit for each hierarchy of the tree index 10. In this case, division, insertion, and search are performed for each hierarchy of the tree index 10 by different methods.

FIG. 3 is a flowchart showing search processing performed by the node management unit 120 shown in FIG. The node management unit 120 performs the following processing using a management unit (for example, one of the first node management unit 122 and the second node management unit 124) corresponding to the hierarchy to be searched.

First, when receiving the search key, the node management unit 120 sets the search target hierarchy to the highest hierarchy (step S100). Next, the node management unit 120 searches for a node belonging to the search target hierarchy using the search key (step S102). Since there is only one node (root node) at the highest level, when the search target is the highest hierarchy, the node to be searched is clear. When the search target node is not a leaf node (step S104: No), the search target hierarchy is lowered by one (step S105). Then, the node management unit 120 specifies one of the nodes belonging to the newly set search target hierarchy from the pointer included in the entry of the node of the hierarchy before update (step S106), and returns to step S102.

Further, when the search target node is a leaf node (step S104: Yes), the node management unit 120 reads out data related to the search key using the pointer included in the entry of the search target node (step S107). .

4 and 5 are flowcharts showing the insertion processing performed by the node management unit 120 shown in FIG. The node management unit 120 performs the following processing using a management unit (for example, one of the first node management unit 122 and the second node management unit 124) corresponding to the hierarchy to be searched.

First, when receiving an entry to be inserted, the node management unit 120 recognizes a key in the entry as a search key. Then, the node management unit 120 sets the search target hierarchy to the highest hierarchy (step S120 in FIG. 4). Next, the node management unit 120 searches for a node belonging to the hierarchy to be searched using the search key (step S122 in FIG. 4). Since there is only one node (root node) at the highest level, when the search target is the highest hierarchy, the node to be searched is clear. Then, the node management unit 120 determines whether the node immediately below the currently searched hierarchy is a leaf node (step S124 in FIG. 4). When it is not a leaf node (step S124 in FIG. 4: No), the search target hierarchy is lowered by one (step S125 in FIG. 4). Then, the node management unit 120 identifies one of the nodes belonging to the newly set search target hierarchy from the pointer included in the entry of the node of the hierarchy before update (step S126 in FIG. 4), and proceeds to step S122. Return.

Further, when the node in the next lower hierarchy is a leaf node (step S124: Yes), the node management unit 120 determines a leaf node to be inserted (step S127 in FIG. 4).

Next, the node management unit 120 inserts an entry into the determined leaf node (step S128). If no overflow occurs (step S129: No), the process is terminated.

When an overflow occurs (step S129: Yes), the node management unit 120 performs a leaf node division process (step S130), and inserts an entry into one of the leaf nodes after the division process (step S131). For example, an entry having a value smaller than a certain value x is inserted into the node of the division source, and a newly generated node is inserted into an entry having a value greater than or equal to x.

Then, the node management unit 120 generates a new entry. This entry has x as a key and a pointer to the newly generated node (step S132). Then, when there is a hierarchy one level higher than the current hierarchy (step S133: Yes), the node management unit 120 moves up one hierarchy (step S134), returns to step S128, and generates it in step S132. Entry processing of the entered entry is performed.

When there is no hierarchy one level higher than the current hierarchy, that is, when the current node is a root node (step S134: No), the node management unit 120 adds a new route above the current root node. A node is created (step S135). Then, the node management unit 120 inserts an entry representing each of the node newly generated in step S130 and the node that is the division source into the root node generated in step S135 (step S136). An entry corresponding to the newly generated node is generated in step S132. The entry corresponding to the split source node includes a pointer pointing to the split source node, but the key corresponding to this pointer is not necessary.

As described above, according to the present embodiment, division, insertion, and search are performed on a node according to the hierarchy to which the node belongs. Accordingly, the database division, insertion, and search speed can be increased.

(Third embodiment)
The database management apparatus 100 according to the third embodiment handles the tree index 10 in a flat node state in which entries are not sorted. Then, the size of the node belonging to the hierarchy positioned relatively lower is made larger than the size of the node belonging to the hierarchy positioned higher than that. For example, the lowermost node (leaf node) may be 4 KB, and the other nodes may be 128 B. Alternatively, the size of the lowermost node may be 8 KB, the size of the node one level higher than that may be 4 KB, and the size of the node one level higher than that may be 2 KB.

As described in the first embodiment, when the node is a flat node, if the size of the node is small, the search can be performed at high speed. If the node size is large, the number of node divisions required at the time of insertion is reduced, so that the insertion can be performed at high speed.

FIG. 6 is a diagram showing a configuration of a flat node. The flat node is provided with an area for storing an entry, an area for storing a numerical value indicating the number of entries in the node, and an area for storing a minimum pointer. The minimum pointer is a pointer given when the given search key is smaller than the key of any entry in the flat node.

FIG. 7 is a flowchart showing the operation at the time of search of the node management unit 120 in this embodiment. The operation shown in this figure performs a search within a certain flat node.

First, the node management unit 120 determines the first entry as a target (step S140). Next, the node management unit 120 determines whether or not the key included in the target entry is equal to or less than the search key (step S141).

When the key is larger than the search key (step S141: No), the node management unit 120 sets the next entry as a target (step S142) and determines whether there is a target entry (step S143). ). If there is an entry (step S143: Yes), the node management unit 120 returns to step S141. When there is no entry (step S143: No), the node management unit 120 reads the minimum pointer in the flat node (step S144).

If the key included in the target entry is equal to or lower than the search key in step S141 (Yes), the node management unit 120 sets the key as the maximum key (step S145). The node management unit 120 sets the next entry as a target (step S146), and determines whether there is a target entry (step S147). When there is no entry (step S147: No), the node management unit 120 reads the entry currently targeted (step S148).

If there is an entry (step S147: Yes), the node management unit 120 determines whether the key included in the target entry is equal to or less than the search key (step S149). When the key is larger than the search key (step S149: No), the node management unit 120 returns to step S146.

If the key is equal to or less than the search key (step S149: Yes), the node management unit 120 determines whether the key in the entry is larger than the maximum key set in step S145 (step S150). The node management unit 120 returns to step S145 when the key in the entry is larger than the maximum key (step S150: Yes), and returns to step S146 when the key in the entry is equal to or less than the maximum key (step S150: No). .

FIG. 8 is a flowchart showing an operation at the time of node division of the node management unit 120 in this embodiment. First, the node management unit 120 sets a key (partition key) that serves as a reference for division (step S160). Next, the node management unit 120 determines the first entry as a target (step S161). Next, the node management unit 120 determines whether or not the key included in the target entry is equal to or less than the split key (step S162).

When the key is larger than the search key (step S162: No), the node management unit 120 sets the next entry as a target (step S163), and determines whether there is a target entry (step S164). ). If there is an entry (step S164: Yes), the node management unit 120 returns to step S162. When there is no entry (step S164: No), the node management unit 120 ends the process.

In step S162, when the key included in the target entry is equal to or lower than the search key (Yes), the node management unit 120 inserts the current target entry into a new node ( Step S165). Then, the node management unit 120 moves the last entry to the position where the entry targeted for insertion in step S165 was present (step S166), and reduced the number of entries by one (step S167), and then proceeds to step S162. Return.

Note that the node management unit 120 inserts an entry into the flat node as follows. First, if the entry to be inserted is c, the node management unit 120 writes this entry at the position of the c-th entry. Then, the node management unit 120 sets the number of entries of the flat node as (c + 1). If there is no space to insert a new entry, the above node division processing is performed.

As described above, according to the present embodiment, the size of the node belonging to the hierarchy located relatively lower is made larger than the size of the node belonging to the hierarchy located higher than that. For this reason, the speed of database division, insertion, and search can be increased.

(Fourth embodiment)
The database management apparatus 100 according to the fourth embodiment handles the tree index 10 in a state in which both a flat node state and a sorted node (sorted node) are included. The configuration of the sorted node is the same as that of the flat node shown in FIG. However, the nodes belonging to the same hierarchy are unified as either a flat node or a sorted node. In the tree index 10, for example, nodes below a certain hierarchy are flat nodes, and other nodes are sorted nodes. Note that, in a hierarchy that is a flat node, the node may become larger as the layer becomes lower. Further, in a hierarchy that is a sorted node, the node may become larger as it becomes an upper layer.

Then, the node management unit 120 performs division, insertion, and search on the flat node by the method described in the third embodiment. Further, the node management unit 120 performs division, insertion, and search on the sorted node by the following method.

FIG. 9 is a flowchart showing an operation when the node management unit 120 searches the sorted node. First, the node management unit 120 determines whether or not the key in the first entry (that is, the smallest key in the node) is equal to or less than the search key (step S180). If the target key is larger than the search key (step S180: No), the node management unit 120 reads the minimum pointer (step S181).

When the target key is equal to or less than the search key (step S180: Yes), the node management unit 120 sets the first search target entry t of the binary search to the entry located in the center (t = n / 2, Here, n is the number of entries included in the node), and the maximum key m is set as the key in the first entry (step S182). Next, the node management unit 120 calculates an initial value of the number le of unsearched entries having a key smaller than the entry to be searched and the number re of unsearched entries having a key larger than the entry to be searched (le = t-1, re = (n-1) / 2) (step S183).

Next, the node management unit 120 checks whether the key of the t-th entry is equal to or less than the search key (step S184). When the key of the t-th entry is larger than the search key (step S184: No), the node management unit 120 determines whether there is an unsearched key that is equal to or less than the key of the t-th entry (whether le is greater than 0). Whether or not) is checked (step S185). If there is no key (step S185: No), there is no unsearched key equal to or lower than the search key, so the node management unit 120 reads the entry set to the current m (step S186).

If there is an unsearched key, that is, if le is larger than 0 (step S185: Yes), the node management unit 120 sets the key of the entry located in the center among the unsearched entries as the next search target key. (Step S187). Then, the node management unit 120 sets the number re of unsearched entries having a key larger than the t-th entry to le / 2, and the number le of unsearched entries having a key smaller than the search target entry (le − 1). ) Set to / 2 (step S188). Then, the process returns to step S184.

If the key of the tth entry is equal to or less than the search key (step S184: Yes), the node management unit 120 determines whether the key of the tth entry is equal to or less than the maximum key m (step S189). When the key of the t-th entry is equal to or less than the maximum key m (step S189: Yes), the node management unit 120 sets the maximum key m as the key of the t-th entry (step S193).

If the key of the t-th entry is not less than or equal to the maximum key m (step S189: No), or after step S193, the node management unit 120 determines whether there is an unsearched key that is less than or equal to the key of the t-th entry. It is checked whether or not le is greater than 0 (step S190). If there is no key (step S190: No), there is no unsearched key equal to or lower than the search key, and the node management unit 120 reads the entry set to the current m (step S186).

If there is an unsearched key, that is, if le is greater than 0 (step S190: Yes), the node management unit 120 sets the key of the entry located in the center among the unsearched entries as the next search target key. (Step S191). Then, the node management unit 120 sets the number re of unsearched entries having a key larger than the t-th entry to re / 2, and the number le of unsearched entries having a key smaller than the search target entry (re-1). ) Set to / にする 2 (step S192). Then, the process returns to step S184.

Further, the node management unit 120 performs the insertion process for the sorted node as follows. First, upon receiving an entry to be inserted, the node management unit 120 performs the search process shown in FIG. 9 using the entry key as a search key. Then, since the position of the largest key below the search key is known, the node management unit 120 determines that the next position is a position where an entry is to be inserted. Usually, since another key is recorded at this position, all the keys after this position are shifted one by one in the forward direction to increase the number of entries by one. Then, the node management unit 120 inserts a new entry. If there is no space for inserting a new entry, the node management unit 120 performs the following division process.

The node management unit 120 performs division processing on the sorted node as follows. First, the node management unit 120 inserts the (c−1) th entry from the c / 2th entry into a new node, where c is the number of entries stored in the sorted node. Then, the number of entries is reduced from c to c / 2.

Sorted node can search at high speed, but insertion is slow. On the other hand, flat nodes can be inserted at high speed, but search is slow. In the tree index, the frequency of insertion increases as the node is located in a lower hierarchy. Therefore, as in this embodiment, if the nodes below a certain hierarchy are flat nodes and the other nodes are sorted nodes, the database speed can be increased.

(Fifth embodiment)
The database management apparatus 100 according to the present embodiment handles the tree index 10 in the state of a sorted node. Then, the size of the node belonging to the hierarchy positioned relatively lower is made smaller than the size of the node belonging to the hierarchy positioned higher than that. For example, the lowest layer node (leaf node) may be 128B and the branch node may be 4 kB. As described above, in a sorted node, the smaller the node size, the shorter the time required for insertion. This is because entry insertion frequently occurs (for example, half of the entries) in the insertion of a sorted node.

Also according to this embodiment, the speed of the database can be increased.

(Sixth embodiment)
FIG. 10 is a diagram illustrating the configuration of the database management apparatus 100 according to the sixth embodiment, together with the structure of the tree index 10. The database management apparatus 100 according to the present embodiment is the database management apparatus according to the second to fifth embodiments, except that the tree index 10 uses at least one hierarchical node as a tree node having a tree structure. The configuration is the same as any one of 100. In the example shown in FIG. 10, a node (for example, a leaf node) belonging to the first hierarchy is set as a flat node, a branch node positioned above it as a tree node, and a node positioned at the top level (root node) as a sorted node. Yes.

The table in FIG. 11 shows the calculation amount necessary for search, the calculation amount necessary for insertion, and space efficiency in each of the flat node, the sorted node, and the tree node.

Since a flat node needs to perform a linear search, the computational complexity of the search is O (n). On the other hand, since the insertion at the flat node is only added, the amount of calculation of the insertion is O (1). In a flat node, the space in the node is occupied by entries except for a very small number of metadata (number of entries and minimum pointer). Therefore, the space efficiency of the flat node is almost 100%.

In the sorted node, the search is performed by binary search, so the amount of calculation at the time of search is O (log n). When a sorted node is inserted, the existing entry needs to be moved in order to keep the sorted state, so that the calculation amount is O (n). The space efficiency is almost 100% like the flat node.

In the tree node, both search and insertion are both O (log n), but the space efficiency is about 70%. This is because assuming a B + -tree that is a typical tree index, the usage rate of the internal node is about 70%.

In the example shown in FIG. 10, the nodes in the middle layer are tree nodes as described above. That is, a node having a hierarchy of x or less and y or more is set as a tree node. However, it is assumed that at least one hierarchy is not a tree node. In other words, when the tree index hierarchy is from 0 to (h−1), (y−x) is smaller than (h−1). In this way, when insertion also occurs in the intermediate layer node, both search and insertion can be performed at a sufficient speed. This tree node may be a B + -tree. Insertion and search in the B + -tree are as shown in Non-Patent Document 2.

Also, the upper node is a sorted node and the lower node is a flat node. For this reason, the calculation amount of insertion can be reduced as the write ratio is higher (that is, the lower node), and the search calculation amount can be reduced as the write ratio is lower (ie, the upper node).

Note that it is preferable to increase the space efficiency because the upper layer nodes are often recorded on a high-speed but small-capacity medium. On the other hand, considering the characteristics of the medium on which the nodes below the intermediate layer are recorded, it is preferable to reduce the calculation amount even at the expense of space efficiency. For the lower level, especially the leaf node to which an entry is added at every insertion, it is preferable to give the highest priority to the insertion speed.

In this embodiment, an upper node may be a tree node, and a lower node may be a tree node. If the capacity of the upper storage medium is large, the insertion speed is improved by using a tree node. If there are many search processes, the overall processing speed will improve if the lower nodes are made tree nodes to improve the search speed.

(Seventh embodiment)
In this embodiment, the tree node of the tree index 10 is a tree index in which the cache line size of the CPU is considered (for example, CSS-tree described in Non-Patent Document 4, CSB + -tree described in Non-Patent Document 5, Non-Patent Document 6). PB + -tree, fpB + -tree of Non-Patent Document 7, and FAST of Non-Patent Document 11). For example, when CSS-tree is used, the size of the node is equal to the size of the cache line (for example, 64B). Here, since the size of the cache line is often small, the entry pointer may be omitted in the node.

For B + -tree and other tree indexes that do not consider cache, the node size is set according to the disk I / O size (ex. 4KB). On the other hand, when a CSS node is used, a high-speed search and equivalent space efficiency are possible compared to a sorted node. The reason why the search is fast is that there are few cache misses and that many entries are packed into the node by omitting the pointer. The space efficiency is equivalent because CSS-tree is logically a tree but physically an array. However, when a CSS node is used, insertion is slower than a sorted node. This is because the entire CSS-tree must be rebuilt each time a new entry is inserted. The CSS-tree search method and construction method are as shown in Non-Patent Document 5.

In this embodiment, for example, the root node is a node that considers the cache size, such as a CSS node, and the leaf node is one of flat / sorted / tree nodes. Nodes in other layers are arbitrarily selected. The CSS node is used for the root node, which has a low write ratio and tends to be recorded in the highest media (typically CPU cache) in order to provide high-speed search and high space efficiency.

(Eighth embodiment)
In the tree index 10 managed by the database management apparatus 100 according to the present embodiment, nodes lower than a certain hierarchy are provided in the flash memory. The nodes in the flash memory are tree nodes having a tree index for the flash memory such as LA-tree shown in Non-Patent Document 15 and FD-tree shown in Non-Patent Documents 16 and 17. ing.

Flash memory provides faster random reads than disks, while random writes are slower. The tree indexes considering this asymmetry are LA-tree and FD-tree. Each of these tree nodes temporarily stores write requests for a plurality of nodes belonging to the same hierarchy in a buffer and then processes the plurality of write requests stored in the buffer together. ing.

According to the above-described embodiment, the following invention is disclosed.
(Appendix 1)
A database management device for managing a database having a tree index structure,
Nodes belonging to the first hierarchy are set to be the first size, nodes belonging to the second hierarchy are set to be the second size,
A database management apparatus comprising node management means for performing division, insertion, and search for each of nodes belonging to the first hierarchy and nodes belonging to the second hierarchy.
(Appendix 2)
In the database management device according to attachment 1,
The node management means includes
First node management means for dividing, inserting, and searching for nodes belonging to the first hierarchy;
Second node management means for dividing, inserting, and searching for nodes belonging to the second hierarchy by a method different from the first node management means;
A database management device.
(Appendix 3)
A database management device for managing a database having a tree index structure,
First node management means for dividing, inserting, and searching for nodes belonging to the first hierarchy;
Second node management means for dividing, inserting, and searching for nodes belonging to the second hierarchy by a method different from the first node management means;
A database management device.
(Appendix 4)
In the database management device according to attachment 2 or 3,
The first hierarchy is located lower than the second hierarchy,
A database management apparatus in which a node belonging to the first hierarchy is larger in size than a node belonging to the second hierarchy.
(Appendix 5)
In the database management device described in appendix 4,
The node belonging to the first hierarchy and the node belonging to the second hierarchy are database nodes that are flat nodes whose entries are not sorted.
(Appendix 6)
In the database management device described in appendix 4,
The node belonging to the first hierarchy is a flat node in which entries are not sorted, and the node belonging to the second hierarchy is a sorted node in which entries are sorted.
(Appendix 7)
In the database management device according to attachment 2 or 3,
The first hierarchy is located lower than the second hierarchy,
The node belonging to the first hierarchy is a database management apparatus whose size is smaller than that of the node belonging to the second hierarchy.
(Appendix 8)
In the database management device according to attachment 7,
The node belonging to the first hierarchy and the node belonging to the second hierarchy are database management devices that are sorted nodes in which entries are sorted.
(Appendix 9)
In the database management device according to any one of appendices 1 to 8,
At least one of the nodes belonging to the first hierarchy and the nodes belonging to the second hierarchy is a database management apparatus in which a size is the same as a cache line size of a CPU of the database management apparatus.
(Appendix 10)
In the database management device according to attachment 2 or 3,
At least the nodes belonging to the first hierarchy are provided in the flash memory,
The first node management means temporarily stores a write request for a node belonging to the first hierarchy in a buffer, and collects a plurality of the write requests stored in the buffer to belong to the first hierarchy Database management device to write to.
(Appendix 11)
A database management method for managing a tree index structure database using a computer,
A database management method in which nodes belonging to a first hierarchy are set to have a first size, and nodes belonging to a second hierarchy are set to have a second size.
(Appendix 12)
In the database management method according to attachment 11,
Computer
Split, insert, and search for nodes belonging to the first hierarchy,
A database management method in which the computer divides, inserts, and searches a node belonging to a second hierarchy by a method different from the first node management unit.
(Appendix 13)
A database management method for managing a database with a tree index structure,
The computer performs division, insertion, and search for nodes belonging to the first hierarchy,
A database management method in which the computer divides, inserts, and searches a node belonging to a second hierarchy by a method different from that of a node belonging to the first hierarchy.
(Appendix 14)
In the database management method according to attachment 12 or 13,
The first hierarchy is located lower than the second hierarchy,
A database management method in which a node belonging to the first hierarchy is larger in size than a node belonging to the second hierarchy.
(Appendix 15)
In the database management method according to attachment 14,
The database management method, wherein the nodes belonging to the first hierarchy and the nodes belonging to the second hierarchy are flat nodes whose entries are not sorted.
(Appendix 16)
In the database management method according to attachment 14,
The database management method, wherein the node belonging to the first hierarchy is a flat node in which entries are not sorted, and the node belonging to the second hierarchy is a sorted node in which entries are sorted.
(Appendix 17)
In the database management method according to attachment 12 or 13,
The first hierarchy is located lower than the second hierarchy,
The database management method, wherein a node belonging to the first hierarchy is smaller in size than a node belonging to the second hierarchy.
(Appendix 18)
In the database management method according to attachment 17,
The database management method, wherein the nodes belonging to the first hierarchy and the nodes belonging to the second hierarchy are sorted nodes in which entries are sorted.
(Appendix 19)
In the database management method according to any one of appendices 11 to 18,
The database management method, wherein a size of at least one of the nodes belonging to the first hierarchy and the nodes belonging to the second hierarchy is the same as a cache line size of the CPU of the computer.
(Appendix 20)
In the database management method according to attachment 12 or 13,
At least the nodes belonging to the first hierarchy are provided in the flash memory,
The computer temporarily stores a write request for a node belonging to the first hierarchy in a buffer, and collectively writes the plurality of write requests stored in the buffer to a node belonging to the first hierarchy Method.
(Appendix 21)
A program for causing a computer to function as a database management device for managing a database having a tree index structure,
In the computer,
A first node management function for dividing, inserting, and searching for nodes belonging to the first hierarchy;
A second node management function for dividing, inserting, and searching for nodes belonging to the second hierarchy in a method different from the first node management function;
A program that realizes
(Appendix 22)
In the program described in Appendix 21,
A program in which nodes belonging to the first hierarchy are set to have a first size, and nodes belonging to the second hierarchy are set to have a second size.
(Appendix 23)
In the program described in Appendix 22,
The first hierarchy is located lower than the second hierarchy,
A program in which nodes belonging to the first hierarchy are larger in size than nodes belonging to the second hierarchy.
(Appendix 24)
In the program described in Appendix 23,
The node belonging to the first hierarchy and the node belonging to the second hierarchy are flat nodes whose entries are not sorted.
(Appendix 25)
In the program described in Appendix 23,
The node belonging to the first hierarchy is a flat node in which entries are not sorted, and the node belonging to the second hierarchy is a sorted node in which entries are sorted.
(Appendix 26)
In the program described in Appendix 22,
The first hierarchy is located lower than the second hierarchy,
A program in which a node belonging to the first hierarchy is smaller in size than a node belonging to the second hierarchy.
(Appendix 27)
In the program described in Appendix 26,
A program in which the nodes belonging to the first hierarchy and the nodes belonging to the second hierarchy are sorted nodes in which entries are sorted.
(Appendix 28)
In the program according to any one of appendices 21 to 27,
A program in which at least one of the nodes belonging to the first hierarchy and the nodes belonging to the second hierarchy has the same size as the cache line size of the CPU of the computer.
(Appendix 29)
In the program described in Appendix 22,
At least the nodes belonging to the first hierarchy are provided in the flash memory,
The first node management function temporarily stores a write request for a node belonging to the first hierarchy in a buffer, and collects a plurality of the write requests stored in the buffer to belong to the first hierarchy Program to write to.

As described above, the embodiments of the present invention have been described with reference to the drawings. However, these are exemplifications of the present invention, and various configurations other than the above can be adopted.

This application claims priority based on Japanese Patent Application No. 2011-195605 filed on September 8, 2011, the entire disclosure of which is incorporated herein.

Claims

A database management device for managing a database having a tree index structure,
Nodes belonging to the first hierarchy are set to be the first size, nodes belonging to the second hierarchy are set to be the second size,
A database management apparatus comprising node management means for performing division, insertion, and search for each of nodes belonging to the first hierarchy and nodes belonging to the second hierarchy.
The database management device according to claim 1,
The node management means includes
First node management means for dividing, inserting, and searching for nodes belonging to the first hierarchy;
Second node management means for dividing, inserting, and searching for nodes belonging to the second hierarchy by a method different from the first node management means;
A database management device.
A database management device for managing a database having a tree index structure,
First node management means for dividing, inserting, and searching for nodes belonging to the first hierarchy;
Second node management means for dividing, inserting, and searching for nodes belonging to the second hierarchy by a method different from the first node management means;
A database management device.
In the database management device according to claim 2 or 3,
The first hierarchy is located lower than the second hierarchy,
A database management apparatus in which a node belonging to the first hierarchy is larger in size than a node belonging to the second hierarchy.
The database management device according to claim 4, wherein
The node belonging to the first hierarchy and the node belonging to the second hierarchy are database nodes that are flat nodes whose entries are not sorted.
The database management device according to claim 4, wherein
The node belonging to the first hierarchy is a flat node in which entries are not sorted, and the node belonging to the second hierarchy is a sorted node in which entries are sorted.
In the database management device according to claim 2 or 3,
The first hierarchy is located lower than the second hierarchy,
The node belonging to the first hierarchy is a database management apparatus whose size is smaller than that of the node belonging to the second hierarchy.
The database management device according to claim 7, wherein
The node belonging to the first hierarchy and the node belonging to the second hierarchy are database management devices that are sorted nodes in which entries are sorted.
A database management method for managing a database with a tree index structure,
The computer performs division, insertion, and search for nodes belonging to the first hierarchy,
A database management method in which the computer divides, inserts, and searches a node belonging to a second hierarchy by a method different from that of a node belonging to the first hierarchy.
A program for causing a computer to function as a database management device for managing a database having a tree index structure,
In the computer,
A first node management function for dividing, inserting, and searching for nodes belonging to the first hierarchy;
A second node management function for dividing, inserting, and searching for nodes belonging to the second hierarchy in a method different from the first node management function;
A program that realizes