WO2015129109A1

WO2015129109A1 - Index management device

Info

Publication number: WO2015129109A1
Application number: PCT/JP2014/080851
Authority: WO
Inventors: 盛朗佐々木
Original assignee: ウイングアーク１ｓｔ株式会社
Priority date: 2014-02-27
Filing date: 2014-11-21
Publication date: 2015-09-03
Also published as: JP2015162042A; JP6006740B2

Abstract

The present invention manages the searching and refreshing of an index tree using a first type of node for storing a set of a prescribed number of keys and a prescribed number of pointers indicating the position of a child node or a prescribed number of values, for lower levels at or below an nth level (n is an arbitrary value satisfying "1≤n<number of all levels-1"), by using the quality of the access rate being higher and the refresh rate being lower in higher level nodes and the access rate being lower and the refresh rate being higher in lower level nodes. Meanwhile, the present invention makes it possible to quickly perform data search processing and refresh processing, by managing the searching and refreshing of the index tree using a second type of node for storing a prescribed number of keys and one group pointer indicating the head position of a lower level node group, for higher levels above the nth level.

Description

Index management device

The present invention relates to an index management apparatus, and is particularly suitable for use in an index management apparatus that manages an index tree used for speeding up data retrieval.

Conventionally, a technique called an index tree is widely known as a technique for speeding up data retrieval. For example, when searching for data corresponding to a specific key, it takes a lot of time to examine all the records in the database one by one from the top. Thus, in order to speed up the search for a specific key, an index tree is often assigned (see, for example, Patent Documents 1 and 2).

* A set of data to be recorded is called a record, and data used for searching is called a key. The other data is called value. To search for records by key, it is desirable that the records be sorted in key order. However, it is a time consuming process to record and sort records in key order. Therefore, records are generally recorded in the order of arrival, and pointers to records corresponding to keys are generally sorted and recorded separately in a tree structure. This is the index tree. The reason why the sorted state is maintained in the tree structure is to reduce the processing time by limiting the addition and deletion of the key of the index tree accompanying the addition and deletion of records to a part.

FIG. 8 is a diagram for explaining the concept of the index tree. The index tree has a tree-like structure, and the lowermost node is called a leaf node and the other nodes are called internal nodes. The top node is called a root node, and a node that is neither a root nor a leaf is called a branch node. In FIG. 8, the branch node has one hierarchy, but it is also possible to have a plurality of hierarchies. Each node stores a set of a predetermined number of keys 101 and pointers 102, but the leftmost key is omitted for internal nodes.

The pairs of keys and pointers that are entries of each node are arranged in ascending or descending order of key values. Each of these entries has a one-to-one correspondence with the node corresponding to the child of that node, the value of the leftmost key of the child node (the leftmost omitted key if the child node is an internal node), and the child Stores a pointer to the position of the node. The entry of the leaf node, which is the final hierarchy of the node, stores the key value of each record and the position of the record.

In the example of FIG. 8, the root node stores a set of two keys “10” and “19” and three pointers. Among the three pointers, the first (leftmost) pointer is position information indicating the storage position of a child node having a key having a value of “1” or more and smaller than “10” as an entry. The second pointer is position information indicating a storage position of a child node having a key whose value is “10” or more and smaller than “19” as an entry. The third pointer is position information indicating the storage position of a child node having a key whose value is “19” or more as an entry.

The leftmost branch node stores a set of two keys “4” and “7” and three pointers. Of the three pointers, the first pointer is position information indicating the storage position of a child node having a key having a value of “1” or more and smaller than “4” as an entry. The second pointer is position information indicating a storage position of a child node having a key having a value of “4” or more and smaller than “7” as an entry. The third pointer is position information indicating a storage position of a child node having a key whose value is “7” or more as an entry. Similarly, the other branch nodes store pairs of two keys and three pointers.

Furthermore, the leftmost leaf node stores a set of three keys “1”, “2” ３, “3” and three pointers. Of the three pointers, the first pointer is position information indicating the position of the record in which the data corresponding to the key “1” is stored. The second pointer is position information indicating the position of the record in which data corresponding to the key “2” is stored. The third pointer is position information indicating the position of the record in which the data corresponding to the key “3” is stored. Similarly, other leaf nodes store pairs of three keys and three pointers.

For example, when searching for data corresponding to the key “11” using the index tree configured as shown in FIG. 8, the second pointer in the root node, the leftmost pointer in the second branch node, and the fourth leaf By tracing the node, the data corresponding to the key “11” can be efficiently searched.

By the way, when an index is created for a field in a record, it is necessary to update not only the record itself but also the contents of the index when an update process such as addition or deletion of the record is performed. When adding a record, the leaf node to which the entry is added is searched in order from the root node in the same manner as in the previous search. If there is a vacancy in the node, the index addition is completed simply by adding entries in ascending or descending order.

On the other hand, if there is no available node, it is necessary to add a new node to create a free entry. For example, when adding the record of key “8” in the field in which the index tree is configured as shown in FIG. 9A, the third leaf searched by sequentially tracing from the root node as the node to which the entry should be added. The node already has three entries and has no space.

In this case, as shown in FIG. 9B, among the three keys in the current third leaf node, the third leaf node is divided at the position of the key “9” equal to or higher than the division key “8”. To generate a new leaf node, move the entry before the split key “8” to the first split node, and move the entry after the split key “8” to the second split node. By moving, an empty entry to which a record with the key “8” can be added is generated.

However, when the tree is grown on the lower layer side as shown in FIG. 9B, only a part of the number of layers from the root node to the leaf node is changed, and the whole state is not balanced. In such an unbalanced index tree, the search efficiency is lowered. Therefore, in the “B-tree” (see, for example, Non-Patent Document 1), which is one method of the index tree, as shown in FIG. 10, the tree is grown to the upper layer side when dividing the node. In this way, the number of hierarchies in the index tree is the same for every leaf node, and the whole is balanced.

When the database is stored on the hard disk, the number of disk I / Os determines the performance of the index tree in the disk environment. In other words, the latency of the disk is about 10 ms, the latency of the memory is about 100 ns, and the latency of the cache is about 1 ns. Therefore, to increase the search efficiency, it is necessary to reduce the number of disk I / Os as much as possible.

On the other hand, due to the structure of the index tree, the access ratio is higher in the upper hierarchy node, and the access ratio is lower in the lower hierarchy node (see, for example, Non-Patent Document 2). Therefore, if the highest level root node is stored in the cache, the branch node is stored in the memory, and the leaf node is stored in the disk, the number of disk I / Os can be reduced. In particular, the B-tree is a method optimized for disk access.

That is, the node size of the B-tree is typically equal to the size of a disk block (unit of data I / O on the disk, typically 4 Kbytes). When a 4-byte key and a 4-byte pointer are stored in the node, the fanout (number of child nodes) is about 500 if the node size is set to 4 Kbytes. Therefore, in the case of a B-tree consisting of three layers as shown in FIG. 11, if there is a small amount of memory (about 2 Mbytes), data can be extracted from a large amount (about 1 Gbyte) of database with one disk I / O. Is possible.

On the other hand, in an in-memory environment in which all data is stored in memory, disk I / O is lost, and the performance of the B-tree strongly depends on the number of cache lines to be accessed. A cache line is a unit in which the CPU transfers data from the memory to the cache. In recent CPUs, a cache line is often composed of 64-byte data.

A method called “CSB + tree” is known as an index tree optimized for the in-memory environment by reducing the number of lines (for example, see Non-Patent Document 3). In the CSB + tree, the storage capacity is reduced by deleting the pointer from the entry of the node, and the number of lines can be reduced by increasing the number of keys that can be stored in one node. The CSB + tree also has an advantage that access to the cache line in which the pointer is recorded can be omitted.

As shown in FIG. 12, in the CSB + tree, a plurality of nodes are grouped to generate a node group. The entry of the internal node does not have a pointer corresponding to each key individually, but has only a pointer indicating the head position of the node group in the lower hierarchy. The entry of each node in the group is stored in a continuous area of the memory, and the position of the corresponding key is specified based on the offset amount from the head position of the child node.

For example, when searching for data corresponding to the key “13”, the search from the root node shows that the key “13” is in the second node group in the leaf node. Here, it is assumed that the top address of the second node group is “0xA000”. If the node size is 12 bytes, the address corresponding to the key “13” is calculated as “0xA00C” (= 0xA000 + 0x000C × 1).

If the pointer is deleted like a CSB + tree, the search speed increases. However, when a new key is inserted into the index as a record is added, the insertion process is slower than in the B-tree. As shown in FIG. 13A, since there is a pointer corresponding to each key in the case of a B-tree, child nodes newly generated by node division can be freely arranged. On the other hand, in the case of the CSB + tree, as shown in FIG. 13B, it is necessary to rearrange the child nodes so that the key values are in ascending order or descending order in the node group. There was a problem that would slow down.

JP-A-5-334153 JP 2003-114816 A

As described above, if the CSB + tree optimized for the in-memory environment is used to speed up the search process by reducing the pointer, the key insertion process in the index is slowed down. There is a problem that the overall processing performance does not increase with respect to the load.

The present invention has been made to solve such a problem, and it is possible to speed up the data search process in an in-memory environment and to suppress the slowdown of the update process. Objective.

In order to solve the above-described problem, in the present invention, a leaf node storing a set of a predetermined number of keys and a predetermined number of values is defined as the 0th hierarchy, and the nth hierarchy or lower (n is “1 ≦ n <all hierarchies”). In the lower hierarchy of (any value satisfying the number -1 "), an index is obtained by a first type node storing a set of a predetermined number of keys and a predetermined number of pointers representing the positions of child nodes or a predetermined number of values. Manage tree exploration and updates. On the other hand, in the upper layer above the nth layer, the search of the index tree and the second type node storing a predetermined number of keys and one group pointer indicating the head position of the node group in the lower layer are performed. I try to manage updates.

In another aspect of the present invention, a predetermined number of keys and a single group pointer representing the start position of a node group in a lower hierarchy are stored in the lower hierarchy of the first hierarchy and higher than the nth hierarchy. The search and update of the index tree is managed by a third type of node storing a reduction pointer having a size sufficient to represent the position of each node. On the other hand, in the upper hierarchy above the n-th hierarchy, search and update of the index tree are managed by the second type node.

According to the present invention configured as described above, in an index tree, generally, an upper layer node has a lower update rate instead of a higher access rate, and a lower layer node has a higher update rate instead of a lower access rate. Using the property, search and update of the index tree are managed in the upper hierarchy by the second type node that can perform the search process at high speed. On the other hand, in the lower hierarchy, search and update of the index tree are managed by the first type node that can perform the key insertion processing at high speed. As a result, in the in-memory environment, it is possible to speed up the data search process and suppress the update process from being slowed down.

According to another feature of the present invention, in the upper hierarchy, search and update of the index tree are managed by the second type node that can perform the search process at high speed. On the other hand, in the lower hierarchy, since the offset amount in the node group is obtained based on the reduction pointer, each node can be freely arranged in the node group, and even in data update processing that requires node division or the like It can be done at high speed. Further, since the storage capacity required for the reduced pointer is small, the number of keys that can be stored in one node can be increased, and the number of hierarchies in the index tree can be reduced to speed up the search process.

It is a block diagram which shows the function structural example of the index management apparatus by 1st Embodiment. It is a figure which shows the specific example of the index tree in 1st Embodiment. It is a figure which shows the structural example of the index tree after the key is inserted in 1st Embodiment. It is a block diagram which shows the function structural example of the index management apparatus by 2nd Embodiment. It is a figure which shows the specific example of the index tree in 2nd Embodiment. It is a figure which shows the characteristic of the reduction pointer used in 2 embodiment. It is a block diagram which shows the other function structural example of the index management apparatus by 2nd Embodiment. It is a figure for demonstrating the concept of an index tree. It is a figure for demonstrating the example which grows a tree in the lower layer side in the case of node division. It is a figure for demonstrating the example of the B-tree which grows a tree to the upper layer side in the case of node division. It is a figure which shows an example of the fan-out of B-tree which consists of 3 layers. It is a figure which shows the structural example of a CSB + tree. It is a figure for demonstrating the node division | segmentation of B tree and CSB + tree.

(First embodiment)
DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, a first embodiment of the invention will be described with reference to the drawings. FIG. 1 is a block diagram illustrating a functional configuration example of an index management apparatus according to the first embodiment. The index management device according to the first embodiment includes an index tree including a leaf node as the lowest hierarchy, a root node as the highest hierarchy, and one or more branch nodes between the leaf nodes and the root node. As a functional configuration, the search processing unit 1, the insertion processing unit 2, the lower layer management unit 3, and the upper layer management unit 4 are provided.

The above functional blocks 1 to 4 can be configured by any of hardware, DSP (Digital Signal Processor), and software. For example, when configured by software, each of the functional blocks 1 to 4 is actually configured with a computer CPU, RAM, ROM, and the like, and is stored in a recording medium such as RAM, ROM, hard disk, or semiconductor memory. Is realized by operating.

In the present embodiment, the hierarchy having a leaf node storing a set of a predetermined number of keys and a predetermined number of values is defined as the 0th hierarchy, and the nth hierarchy or lower (n is “1 ≦ n <total number of hierarchy−1”) Arbitrary values that satisfy) are set as lower hierarchies, and hierarchies above the nth hierarchy are upper hierarchies. Hereinafter, a case where n = 1 will be described. That is, the 0th hierarchy and the 1st hierarchy above it are set as a lower hierarchy. Further, the second and higher layers above the first layer are set as upper layers. In the example of FIG. 1, the index tree is composed of four layers from the 0th layer to the 3rd layer. Of these, the 0th and 1st hierarchies are lower hierarchies, and the 2nd and 3rd hierarchies are upper hierarchies. One leaf node is composed of a set of a maximum of three keys and the same number of values as the keys.

The search processing unit 1 searches for desired data (value) from a database (memory) in an in-memory environment using an index tree. Specifically, the search processing unit 1 supplies a search key to the upper layer management unit 4 and searches for a value corresponding to the search key by the processing of the upper layer management unit 4 and the lower layer management unit 3. Then, the retrieved value is received from the lower hierarchy management unit 3.

The insertion processing unit 2 inserts desired data (a set of key and value) into the index tree. Specifically, the insertion processing unit 2 supplies the key and value to be inserted to the upper layer management unit 4, and should be inserted from the value of the insertion key by the processing of the upper layer management unit 4 and the lower layer management unit 3. A leaf node is determined, and a key / value pair is added to an appropriate position of the determined leaf node. Then, the notification of insertion completion is received from the lower hierarchy management unit 3 or the upper hierarchy management unit 4.

In the lower hierarchy of the index tree, the lower hierarchy management unit 3 uses the first type node storing a set of a predetermined number of keys and a predetermined number of pointers or a predetermined number of values indicating the positions of the child nodes. Manage tree exploration and updates. This first type of node is the same as the node used in the B-tree, for example.

Further, the upper layer management unit 4 uses the second type of node storing a predetermined number of keys and one group pointer representing the head position of the lower layer node group in the upper layer of the index tree to generate an index tree. Manage search and update of This second type of node is the same as the node used in the CSB + tree, for example.

FIG. 2 is a diagram showing a specific example of an index tree searched and updated by the lower hierarchy management unit 3 and the upper hierarchy management unit 4. As shown in FIG. 2, the first hierarchy, which is a lower hierarchy managed by the lower hierarchy management unit 3, is a first type that stores a set of a maximum of two keys and one more pointer than the key. It consists of nodes. Each pointer represents the position of the left end of the leaf node in the 0th hierarchy one level below. The leaf node, which is the other lower layer, is composed of a set of a maximum of three keys and the same number of values as the keys.

The second and third hierarchies managed by the upper hierarchy management unit 4 have a maximum of two keys and one group pointer indicating the head position of the node group in the next lower hierarchy. It is composed of a second type of node that stores. One node group Gr _2-1 is set in the second layer below one of the third layers, and three node groups Gr _1-1 and Gr ₁₋ are set in the first layer below one of the second layers. ₂ and Gr _1-3 are set.

Here, the operation when the search processing by the search processing unit 1 is performed using the index tree configured as shown in FIG. 2 will be described. First, the search processing unit 1 passes the search key to the upper layer management unit 4. In the following description, it is assumed that the value of the search key is “15”.

The upper layer management unit 4 identifies the highest third layer and the node that is the root node there. Then, the upper layer management unit 4 searches for the key having the maximum value below the search key from the keys stored in the root node. Further, the upper layer management unit 4 traces the position specified by calculating the offset amount from the head position of the node group indicated by the group pointer stored in the same node together with the searched key based on the node size. To the lower layer.

In the case of the example of FIG. 2, there is no key having the maximum value below the search key “15” among keys stored in the root node (omitted). Thus, when the omitted leftmost key is searched, the offset amount is zero. In this case, the upper hierarchy management unit 4 transitions to the node at the head position of the node group Gr _2-1 in the second hierarchy one level lower in accordance with the group pointer stored in the same node together with the omitted key. .

Since the transitioned second layer is also an upper layer, the upper layer management unit 4 performs the same processing as described above. That is, the upper layer management unit 4 searches for a key having the maximum value below the search key “15” among the keys stored in the node at the head position of the node group Gr _2-1 that has transitioned from the root node. To do. In this case, the searched key is “10”. As described above, when the second key from the left end in the node including the omitted key is searched, the offset amount is the node size × 1. In this case, the upper hierarchy management unit 4 determines the head position of the node group Gr _1-1 in the first lower hierarchy according to the group pointer and offset amount stored in the same node together with the searched key “10”. To the second node.

Since the first layer that has transitioned at this time is a lower layer, the lower layer management unit 3 performs processing. The lower hierarchy management unit 3 searches for the key having the maximum value below the search key from the keys stored in the identified node. Furthermore, the lower layer management unit 3 moves to the lower layer by following the position indicated by the pointer stored in the same node together with the searched key.

In the case of the example in FIG. 2, among the keys stored in the second node from the head position of the node group Gr _1-1 , the key having the maximum value below the search key “15” is “13”. In this case, the lower hierarchy management unit 3 makes a direct transition to the fifth leaf node from the left end in the lower 0th hierarchy according to the pointer stored as a pair with the searched key “13”. Since the search key “15” is in this node, the lower hierarchy management unit 3 acquires the value corresponding to the position of the search key and passes it to the search processing unit 1. Thereby, the search processing by the search processing unit 1 ends.

Next, the operation when the insertion processing by the insertion processing unit 2 is performed using the index tree configured as shown in FIG. First, the insertion processing unit 2 passes the combination of the insertion key and the value to the upper layer management unit 4. In the following description, it is assumed that the value of the insertion key is “9”. The upper hierarchy management unit 4 and the lower hierarchy management unit 3 search for a leaf node into which the insertion key “9” is to be inserted, following the same procedure as the search process described above. Thereby, it changes to the 3rd leaf node from the left end.

Here, the lower hierarchy management unit 3 determines whether or not there is an empty space in the searched leaf node. If there is an empty space, the set of the insertion key “9” and value is inserted into the leaf node. In the example of FIG. 2, since there is one free space in the third leaf node from the left end, it is possible to insert a pair of the insertion key “9” and value into that node.

On the other hand, when there is no empty space in the searched leaf node, the lower hierarchy management unit 3 divides the leaf node and inserts a pair of an insertion key and a pointer. For example, it is assumed that the value of the insertion key passed from the insertion processing unit 2 to the upper hierarchy management unit 4 is “17”. In this case, when the upper hierarchy management unit 4 and the lower hierarchy management unit 3 search for a leaf node into which the insertion key “17” is to be inserted, the transition is made to the sixth leaf node from the left end.

However, this sixth node already has three key / value pairs stored and has no free space. Therefore, the lower hierarchy management unit 3 divides this sixth leaf node to secure an empty space, and inserts a set of the insertion key “17” and value.

Specifically, the lower layer management unit 3 first acquires a new empty node. Next, the lower hierarchy management unit 3 moves a set of a key that is equal to or higher than a predetermined split key among the three keys included in the sixth leaf node and a value corresponding thereto to a new node. Here, the value of the split key is, for example, the median value of the three key values. After that, the lower layer management unit 3 inserts the pair of the insertion key “17” and the value into the new node if the insertion key “17” is equal to or higher than the split key, and otherwise to the original node.

When node division is performed in this way, a pointer pointing to the newly generated leaf node must be added to the entry of the upper layer node. That is, the lower hierarchy management unit 3 sets the split key and the pointer to the new node to the node on the search path in the first hierarchy one level higher than the zeroth hierarchy in which the leaf node exists (second node from the left end). ) To add. At this time, if there is no space for adding a new key / pointer pair to the node, a new node is secured in the node group.

In the example of FIG. 2, since there is no empty space in the second node from the left, the node at the third position in the node group is initialized, and the second entry is moved to this node. A set of the insertion key “17” and the value is inserted into the space created by the node division. In this example, the copy of the node did not occur because the second node was divided, but if the first node was divided, the second node was copied as the third node, and the second node Is initialized (empty) to divide the first node.

As described above, when a node group is set as in the first hierarchy, an empty space is secured as follows. First, the lower layer management unit 3 divides the node in the 0th layer and inserts the insertion key “17”, and accordingly, the node group Gr ₁₋ to which the node to which a key is newly added in the first layer belongs. Determine if ₁ has space to add a node. If there is a space, the node after the position immediately after the node to be divided in the node group Gr _1-1 is copied to the right one position. Then, the node immediately after the node to be divided is initialized to secure a new empty node. This enables node division.

On the other hand, when there is no empty node in the node group Gr _1-1 , the lower hierarchy management unit 3 acquires a new node group. Then, the lower hierarchy management unit 3 moves some of the nodes (for example, the rear half in the node group Gr _1-1 ) to a new node group. As a result, a space for adding a node is created in the node group Gr _1-1 . Therefore, a space for adding an entry can be created by node division.

When a node is moved or a new node group is acquired in the first hierarchy, the lower hierarchy management unit 3 requests the upper hierarchy management unit 4 to perform the second hierarchy one level higher than the first hierarchy, Necessary keys are added to the nodes (leftmost node) in the node group on the search path. If there is no space for adding a new key in this node, the insertion space is secured by moving the node or acquiring the node group in the second hierarchy as in the first hierarchy. In this case, it is necessary to add a new key to the root node, but when there is no free space and the root node is divided, an unused and lowest-order node is made a new root node.

FIG. 3 is a diagram illustrating a configuration example of the index tree after the combination of the insertion key “17” and the value is inserted into the index tree illustrated in FIG. In the example shown in FIG. 3, in order to insert the set of the insertion key “17” and the value, node division is performed in the 0th hierarchy, and accordingly, entries are added also in the 1st hierarchy and the 2nd hierarchy. ing.

As described above in detail, in the first embodiment, the first hierarchy storing a set of a predetermined number of keys and a predetermined number of pointers or a predetermined number of values is stored in the lower hierarchy consisting of the 0th hierarchy and the first hierarchy. The search and update of the index tree is managed by the type of node (B-tree node). On the other hand, in the upper hierarchy higher than the second hierarchy above the first hierarchy, search and update of the index tree by the second type node (CSB + tree node) storing a predetermined number of keys and one group pointer. To manage.

The index tree has the property that the higher the access ratio, the lower the access ratio, the lower the update ratio, and the lower hierarchy node, the lower the access ratio, the higher the update ratio. In the first embodiment, by utilizing this property, search and update of the index tree are managed in the upper layer by the second type node that can perform the search process at high speed. On the other hand, in the lower hierarchy, search and update of the index tree are managed by the first type node that can perform the key insertion processing at high speed. As a result, in the in-memory environment, it is possible to speed up the data search process and suppress a decrease in the speed of the update process.

(Second Embodiment)
Next, a second embodiment of the present invention will be described with reference to the drawings. FIG. 4 is a block diagram illustrating a functional configuration example of the index management apparatus according to the second embodiment. As shown in FIG. 4, the index management apparatus according to the second embodiment includes, as its functional configuration, a search processing unit 1, an insertion processing unit 2, a lower hierarchy management unit 3, an upper hierarchy management unit 4, and a middle hierarchy management unit. 5. In FIG. 4, those given the same reference numerals as those shown in FIG. 1 have the same functions, and therefore redundant description is omitted here.

In this embodiment, the hierarchy in which the leaf node exists is defined as the 0th hierarchy, and the 0th hierarchy and the 1st hierarchy immediately above the 0th hierarchy are defined as the lower hierarchy. In addition, a second hierarchy that is one level higher than the first hierarchy is a middle hierarchy, and a third hierarchy or higher is an upper hierarchy. In the example of FIG. 4, the index tree is composed of four layers from the 0th layer to the 3rd layer. Among these, the 0th hierarchy and the 1st hierarchy are the lower hierarchy, the 2nd hierarchy is the middle hierarchy, and the 3rd hierarchy is the upper hierarchy.

The middle level management unit 5 stores a predetermined number of keys and one group pointer representing the head position of the node group one level lower in the middle level of the index tree, and each of the nodes in the node group. Search and update of the index tree is managed by a third type node that stores a reduction pointer that is a pointer representing the position of the node and is smaller in size than the pointer stored in the first type node.

FIG. 5 is a diagram illustrating a specific example of an index tree searched and updated by the lower hierarchy management unit 3, the upper hierarchy management unit 4, and the middle hierarchy management unit 5. FIG. 5 is substantially the same as the index tree shown in FIG. 2, but the second hierarchy node defined as the middle hierarchy is different from the index tree of FIG.

As shown in FIG. 5, the second hierarchy, which is the middle hierarchy, has a maximum of two keys, one group pointer that represents the start position of the node group that is one level below, and the same number of reduced pointers as the keys. Are stored in a third type node. For example, the size of the normal pointer in the first layer is 4 bytes, whereas the size of the reduced pointer is 2 bytes. For example, if the node is somewhere in the 4 Gbyte address space, the pointer must be 4 bytes or larger, but if the node is somewhere in the 64 Kbyte node group, it is 2 bytes. This is because the node can be specified by the above pointer. Thus, if a pointer to the head of a node group is used, a child node can be specified even if the pointer is reduced.

Here, the operation when the search processing by the search processing unit 1 is performed using the index tree configured as shown in FIG. 5 will be described. First, the search processing unit 1 passes the search key to the upper layer management unit 4. In the following description, it is assumed that the value of the search key is “15”.

The processing of the upper layer management unit 4 in the third layer, which is the upper layer, is the same as in the first embodiment. Since the second hierarchy that has transitioned from the third hierarchy is the middle hierarchy, the middle hierarchy management unit 5 performs processing. That is, the middle level management unit 5 selects a key having the maximum value below the search key “15” among the keys stored in the node at the head position of the node group Gr _2-1 that has transitioned from the root node. Explore. In this case, the searched key is “10”.

Further, the middle level management unit 5 sets the offset amount from the head position of the node group indicated by the group pointer stored in the same node together with the key “10” searched as described above as the key “10”. The calculation is performed based on the reduction pointer stored as a set. Then, by tracing the position specified by the this group pointers and the offset amount, the transition from the head position of the node group Gr _1-1 in the first hierarchy down one on the second node.

Since the first layer that has transitioned at this time is a lower layer, the lower layer management unit 3 performs the same processing as in the first embodiment. As a result, in accordance with the pointer stored as a pair with the key “13”, a transition is made directly to the fifth leaf node from the left end in the next lower 0th hierarchy, and the search key “15” is searched from within the leaf node. The value corresponding to the position is acquired and passed to the search processing unit 1. Thereby, the search processing by the search processing unit 1 ends.

In the insertion processing by the insertion processing unit 2, first, the search for the leaf node into which the insertion key is to be inserted is performed according to the same procedure as the search processing described above. Thereafter, the processing for inserting the combination of the insertion key and the value into the retrieved leaf node is the same as that in the first embodiment described above, and thus the description thereof is omitted here.

As described above in detail, in the second embodiment, since the offset amount in the node group is obtained based on the reduction pointer, the entry of each node in the node group is stored in a continuous area of the memory. It is not essential, and each node can be freely arranged in the node group as shown in FIG. Therefore, even data update processing that requires node division or the like can be performed at high speed.

Thus, in order to speed up the data update process, a reduction pointer is provided corresponding to each key, and the storage capacity required for the reduction pointer is reduced, so that a key that can be stored in one node is reduced. You can increase the number. Thereby, the number of hierarchies (lines) of the index tree can be reduced, and the search process can be speeded up.

In the second embodiment, the example in which the index tree is divided into the upper hierarchy, the middle hierarchy, and the lower hierarchy and the reduced pointer is used in the middle hierarchy has been described. However, the present invention is not limited to this. For example, the index tree may be managed by dividing it into an upper hierarchy and a lower hierarchy, and a reduction pointer may be used in the lower hierarchy.

That is, as shown in FIG. 7, an index management apparatus is configured by including a search processing unit 1, an insertion processing unit 2, a lower layer management unit 3 ', and an upper layer management unit 4. The lower hierarchy management unit 3 ′ sets a hierarchy having leaf nodes as the 0th hierarchy, and is lower than the 1st hierarchy and below the nth hierarchy (n is an arbitrary value satisfying “1 ≦ n <total number of hierarchies−1”). In a hierarchy, a predetermined number of keys and a single group pointer representing the head position of a node group in a lower hierarchy are stored, and a third type node storing a reduced pointer representing the position of each node in the node group Manage the search and update of the index tree. For the 0th hierarchy, search and update of the index tree are managed by the first type node.

In the above embodiment, an example in which the leftmost key is omitted for the internal node has been described. However, the present invention is applied to an index tree including internal nodes in which the leftmost key is not omitted. Is also possible.

The index management apparatus according to the first and second embodiments described above is used for updating data such as relational database indexes, map processing incorporated in many programs, file systems, key-value stores, OLAP (online analytical processing) systems, and the like. However, it can be widely used for a system that may perform a search against the system.

In addition, each of the first and second embodiments described above is merely an example of a specific example for carrying out the present invention, and the technical scope of the present invention should not be interpreted in a limited manner. It will not be. That is, the present invention can be implemented in various forms without departing from the gist or the main features thereof.

DESCRIPTION OF SYMBOLS 1 Search processing part 2 Insertion processing part 3, 3 'Lower hierarchy management part 4 Upper hierarchy management part 5 Middle hierarchy management part

Claims

An index management device that manages an index tree composed of leaf nodes and other internal nodes that are the lowest layers,
The hierarchy with the leaf node storing a set of a predetermined number of keys and a predetermined number of values is defined as the 0th hierarchy, and any value satisfying the nth hierarchy or lower (n is “1 ≦ n <total number of hierarchies−1”) The index tree search and update are managed by a first type node storing a set of a predetermined number of keys and a predetermined number of pointers representing positions of child nodes or a predetermined number of values in a lower hierarchy of A lower hierarchy management unit,
Searching and updating the index tree by a second type node storing a predetermined number of keys and one group pointer representing the head position of a node group in the lower hierarchy in the upper hierarchy above the nth hierarchy An index management apparatus comprising an upper layer management unit for managing
The lower layer management unit searches for a key having a maximum value below the search key from keys stored in the first type node, and a pointer stored in the same node together with the searched key To follow the position indicated by
The upper layer management unit searches for a key having the maximum value below the search key from the keys stored in the second type node, and stores the key in the same node together with the searched key. 2. The index management apparatus according to claim 1, wherein the index management apparatus is configured to follow a position specified by calculating an offset amount from the head position of the node group indicated by the group pointer based on the node size.
The lower layer management unit searches for a key having the maximum value below the insertion key from the keys stored in the first type node, and a pointer stored in the same node together with the searched key The leaf node to which the insertion key is to be inserted is searched by following the position indicated by, and when there is an empty space in the searched leaf node, the pair of the insertion key and value is inserted into the leaf node, If there is no empty space in the searched leaf node, the leaf node is divided and the combination of the above insertion key and value is inserted.
The upper layer management unit searches for a key having the maximum value below the insertion key from among the keys stored in the second type node, and is stored in the same node together with the searched key. 2. The index management apparatus according to claim 1, wherein the index management apparatus is configured to follow a position specified by calculating an offset amount from the head position of the node group indicated by the group pointer based on the node size.
The second hierarchy, which is one level above the first hierarchy, is the middle hierarchy, the third hierarchy above the second hierarchy is the upper hierarchy,
In the middle hierarchy, a predetermined number of keys and one group pointer representing the head position of the node group in the lower hierarchy are stored, and the pointer represents the position of each node in the node group, the first hierarchy A third-level node storing a reduced pointer having a size smaller than the pointer stored in the type node further includes a middle-level management unit that manages search and update of the index tree. The index management device according to claim 1.
The middle level manager searches for a key having the maximum value below the search key from the keys stored in the third type node, and stores the key in the same node together with the searched key. 5. The index management device according to claim 4, wherein the index management device is adapted to follow a position specified by calculating an offset amount from a head position of a node group indicated by a group pointer based on the reduced pointer. .
An index management device that manages an index tree composed of leaf nodes and other internal nodes that are the lowest layers,
A hierarchy having the leaf node storing a set of a predetermined number of keys and a predetermined number of values is defined as the 0th hierarchy, and the 1st hierarchy to the nth hierarchy (n is “1 ≦ n <total number of hierarchy-1”) A predetermined number of keys and one group pointer representing the head position of a node group in the lower hierarchy, and sufficient to represent the position of each node in the node group A lower layer management unit that manages the search and update of the index tree by a third type of node that stores a reduced pointer of an appropriate size;
Searching and updating the index tree by a second type node storing a predetermined number of keys and one group pointer representing the head position of a node group in the lower hierarchy in the upper hierarchy above the nth hierarchy An index management apparatus comprising an upper layer management unit for managing
The lower layer management unit searches for a key having the maximum value below the search key from the keys stored in the third type node, and stores the key in the same node together with the searched key. It follows the position specified by calculating the offset amount from the head position of the node group indicated by the group pointer based on the reduced pointer,
The upper layer management unit searches for a key having the maximum value below the search key from the keys stored in the second type node, and stores the key in the same node together with the searched key. 7. The index management apparatus according to claim 6, wherein the index management apparatus is configured to follow a position specified by calculating an offset amount from the head position of the node group indicated by the group pointer based on the node size.
The lower layer management unit searches for a key having the maximum value below the insertion key from the keys stored in the third type node, and stores the key in the same node together with the searched key. The leaf node to which the insertion key is to be inserted is searched by following the position specified by calculating the offset amount from the head position of the node group indicated by the group pointer based on the reduction pointer, and the searched leaf If there is an empty space in the node, the pair of the insertion key and value is inserted into the leaf node. On the other hand, if there is no empty space in the searched leaf node, the leaf node is divided and the insertion key and the pointer are inserted. Was made to insert a pair of
The upper layer management unit searches for a key having the maximum value below the insertion key from among the keys stored in the second type node, and is stored in the same node together with the searched key. 7. The index management apparatus according to claim 6, wherein the index management apparatus is configured to follow a position specified by calculating an offset amount from the head position of the node group indicated by the group pointer based on the node size.