CN112579575A - Method for quickly constructing database index structure - Google Patents

Method for quickly constructing database index structure Download PDF

Info

Publication number
CN112579575A
CN112579575A CN202011580768.2A CN202011580768A CN112579575A CN 112579575 A CN112579575 A CN 112579575A CN 202011580768 A CN202011580768 A CN 202011580768A CN 112579575 A CN112579575 A CN 112579575A
Authority
CN
China
Prior art keywords
node
nodes
leaf
internal
index data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011580768.2A
Other languages
Chinese (zh)
Inventor
王培培
陈乃阔
吴之光
牛晓威
张明瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chaoyue Technology Co Ltd
Original Assignee
Chaoyue Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chaoyue Technology Co Ltd filed Critical Chaoyue Technology Co Ltd
Priority to CN202011580768.2A priority Critical patent/CN112579575A/en
Publication of CN112579575A publication Critical patent/CN112579575A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for quickly constructing a database index structure comprises the following steps: sorting the index data to be inserted by using a sorting algorithm; creating leaf nodes, and parallelly inserting the sorted index data into the leaf nodes in a hash storage mode; constructing a tree structure based on leaf nodes, and determining a layer structure of the tree structure and internal nodes or root nodes contained in each layer; according to the mapping relation among the leaf nodes, the internal nodes and the root nodes determined by the tree structure, computing key values of the internal nodes and key values of the root nodes in parallel; and parallelly inserting the obtained key values into corresponding internal nodes or root nodes. The invention provides a database index structure combining key value indexes and hash indexes, and provides a rapid construction method based on the index structure.

Description

Method for quickly constructing database index structure
Technical Field
The invention relates to the technical field of databases, in particular to a method for quickly constructing a database index structure.
Background
With the rapid development of the information industry, data information grows exponentially, data security becomes the key point of games in various countries at present, and database products based on domestic platforms become targets for competitive research of various database manufacturers.
Common database acceleration technologies, such as GPU and FPGA, are implemented using OpenCL high-level programming architecture. The OpenCL high-level comprehensive technology oriented to general computing is gradually mature, so that the FPGA algorithm design is easier to use and more efficient, and an engineer can design on a higher abstraction level without paying attention to the design details of a hardware bottom layer. However, the OpenCL architecture is mainly designed based on the x86 platform, and due to the lack of a dynamic link library supporting a domestic processor, the OpenCL architecture cannot exert its practical value in the field of database acceleration computing from autonomous domestic platforms such as feiteng, loongson, and shenwei. The traditional FPGA development method has low dependency on a processor platform, can be compatible with interfaces of any platform as long as a driver meets the requirement of a protocol, and is widely applied to autonomous platform heterogeneous acceleration calculation design.
Therefore, a database structure suitable for a domestic platform and a method for quickly constructing a database index structure by fully utilizing the characteristics of the domestic platform are needed.
Disclosure of Invention
In order to solve the technical problems in the background art, in one aspect of the present invention, a method for quickly constructing a database index structure is provided, where the method includes: sorting the index data to be inserted by using a sorting algorithm; creating leaf nodes, and parallelly inserting the sorted index data into the leaf nodes in a hash storage mode; constructing a tree structure based on the leaf nodes, and determining a layer structure of the tree structure and internal nodes or root nodes contained in each layer; according to the mapping relation among the leaf nodes, the internal nodes and the root nodes determined by the tree structure, computing key values of the internal nodes and key values of the root nodes in parallel; and parallelly inserting the obtained key values into corresponding internal nodes or root nodes.
In one or more embodiments, the creating a leaf node and inserting the sorted index data into the leaf node in parallel in a hash storage manner includes: configuring preset parameters; determining the number of required leaf nodes according to the preset parameters and the number of the index data to be inserted; creating a corresponding number of leaf nodes, and allocating a corresponding hash storage space for each leaf node, wherein the hash storage space of each leaf node comprises a hash bucket for storing index data and an overflow bucket for resolving data collision; and parallelly inserting the sorted index data into the hash bucket of each leaf node.
In one or more embodiments, the preset parameters include: the number of hash buckets each leaf node has on average, the capacity of each hash bucket, and the fill factor of each leaf node.
In one or more embodiments, the allocating a respective hash storage space for each of the leaf nodes includes: one or more hash buckets are allocated for each leaf node and are grouped into hash bucket chains, and an overflow bucket is allocated for each hash bucket chain.
In one or more embodiments, the computing the key values of the internal nodes and the key values of the root nodes in parallel according to the mapping relationships among the leaf nodes, the internal nodes and the root nodes determined by the tree structure includes: numbering internal nodes from bottom to top and from left to right based on the tree structure; determining the layer number of each internal node in the tree, the node number at the layer and the key number in the node; calculating leaf offset between internal nodes of each layer and leaf offset between index key values; calculating a target leaf node corresponding to each key value; and extracting the boundary value of the index data in the target leaf node as a key value of an internal node or a key value of a root node.
In one or more embodiments, the value boundaries of the index data are determined by the sorting rules of the sorting algorithm on the index data.
In one or more embodiments, the method further comprises: the fill factor β for each internal node is set to determine the number of key values in each internal node.
In one or more embodiments, the method comprises: in response to receiving a data query request, searching a target leaf node based on key values of the root node and the internal node; and carrying out hash calculation in the target leaf node, and obtaining target index data according to the hash result.
In one or more embodiments, the method comprises:
in response to receiving a request for deleting index data of a target leaf node, deleting the index data, performing defragmentation on the index data, and judging whether the deleted index data is boundary data of the target leaf node;
in response to the deleted index data being boundary data of the target leaf node, traversing all hash buckets within the target leaf node to obtain a minimum value of index data within the target leaf node; and taking the minimum value as a key value of the corresponding internal node or the root node, and upwards adjusting the structure of the internal node of the tree.
In one or more embodiments, the data deletion method further comprises determining whether the target leaf node underflows; in response to the target leaf node underflowing, transferring, by an adjacent leaf node, a portion of the index data to the target node; or merging the target leaf node and the adjacent leaf node, and adjusting the structure of the internal node of the tree upwards.
The beneficial effects of the invention include: the invention provides a database index structure combining key value index and hash index, and provides a method for quickly constructing the index structure based on the structural characteristics of the index structure. And the key values of all the nodes in the tree structure part can be parallelly inserted at one time, and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a flowchart illustrating a method for rapidly building a data index structure according to the present invention;
FIG. 2 is a schematic diagram of a tree structure according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a database index structure according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
In order to fully exert the characteristics of an FPGA (field programmable gate array) in a domestic platform, in one aspect of the invention, the invention provides a method for quickly constructing a database index structure, wherein the database index combines the characteristics of a tree index structure and a hash index structure, and not only has accurate range search performance, but also has certain random search performance; in the process of constructing the database index structure, the invention realizes the rapid construction of the database index structure by calculating the key values of the root nodes and the internal nodes of all the tree structures in parallel and completing the key value insertion operation of the root nodes and all the internal nodes at one time. The more detailed process of the invention is as follows:
FIG. 1 is a flowchart illustrating a method for quickly constructing a data index structure according to the present invention. Referring to fig. 1, the workflow of the method for quickly constructing a data index structure of the present invention includes: s1, sorting the index data to be inserted by using a sorting algorithm; step S2, creating leaf nodes, and inserting the sorted index data into the leaf nodes in parallel in a hash storage mode; step S3, building a tree structure based on the leaf nodes, and determining the layer structure of the tree structure and the internal nodes or root nodes contained in each layer; step S4, according to the mapping relation between the leaf node, the internal node and the root node determined by the tree structure, calculating the key value of the internal node and the key value of the root node in parallel; and step S5, inserting the obtained key values into corresponding internal nodes or root nodes in parallel.
Wherein, the index data to be inserted in the step S1 is sorted by using a sorting algorithm; the purpose of sequencing the index data is to calculate key values of a root node and an internal node in the tree structure; however, since the specific index data is stored in the leaf nodes, and the leaf nodes adopt a hash storage mode, when the index data to be inserted into each leaf node is determined and then inserted into the corresponding leaf node, the insertion operation can be executed in parallel by the FPGA without inserting the index data in sequence. In an alternative embodiment, the index data may be sorted from high to low or from low to high according to the size of the value of the data to be inserted.
For step S2, creating a leaf node, and inserting sorted index data into the leaf node in parallel in a hash storage manner; the leaf node of the invention adopts a hash storage mode, and the hash storage space of the leaf node is composed of a hash bucket for storing index data and an overflow bucket for solving data collision.
Specifically, the process of creating the leaf node includes step S2.1, configuring preset parameters; s2.2, determining the number of required leaf nodes according to the preset parameters and the number of the index data to be inserted; and S2.3, creating a corresponding number of leaf nodes, and distributing corresponding hash storage space for each leaf node. The preset parameters comprise the number of hash buckets which each leaf node has on average, the capacity of each hash bucket and the filling factor of each leaf node.
More specifically, the number of hash buckets averagely possessed by each leaf node is set as buckets, the capacity of each hash bucket is set as b, and the filling factor of each leaf node is set as alpha; then, the number of hash buckets actually used for storing index data in each leaf node is bucks α, and the data amount of index data that can be actually inserted in each leaf node is bucks b α; then L are needed for N index data to be inserted0A leaf node; wherein,
Figure BDA0002865906790000051
after the number of the leaf nodes is determined, a certain hash storage space needs to be allocated to each leaf node, and the specific process comprises allocating one or more hash buckets to each leaf node, enabling the one or more hash buckets to form a hash bucket chain, and allocating overflow buckets to each hash bucket chain. Since the capacity b of the hash bucket is already determined, the size of the hash storage space required to be allocated to each leaf node can be obtained by counting the number of hash buckets in each leaf node and the size of the overflow bucket. The overflow bucket is used for temporarily storing data in the overflow bucket when the storage space of the hash bucket chain is insufficient, triggering the splitting of the leaf nodes, and transferring the index data stored in the overflow bucket to the hash bucket of the newly generated leaf node after the new leaf node is generated, so that the insertion operation of the new index data is completed. The process of creating the leaf nodes and the splitting process of the leaf nodes can be completed in parallel by the FPGA, and because the leaf nodes adopt a hash storage mode, the insertion operation can be executed in parallel by the FPGA without inserting the index data in sequence after determining which index data are to be inserted into each leaf node.
For step S3, constructing a tree structure based on the leaf nodes to determine the layer structure of the tree structure and the internal node or root node included in each layer, the specific process includes:
preferably, the number of layers of the tree structure and the number of internal nodes contained in each layer need to be determined according to the number of leaf nodes, and the calculation formula is as follows:
calculating the layer number of the tree structure:
Figure BDA0002865906790000061
where h is the height (i.e., the number of layers) of the tree structure, m is the recording capacity of the internal node, β is the fill factor of the internal node, and L is0Is the number of leaf nodes.
Calculating the number of internal nodes contained in each layer of the tree structure:
Figure BDA0002865906790000062
wherein Li is the number of the internal nodes of the ith layer. And (3) determining the number of internal nodes contained in each layer of the tree structure from leaf nodes layer by layer upwards by using the formula (2) and the formula (3) until the root node is determined, and then completing the construction of the tree structure.
The formula for calculating the total number of index nodes (internal nodes and root nodes) of the tree structure is as follows:
Figure BDA0002865906790000063
the tree structure constructed at this time is shown in fig. 2. Fig. 2 is a schematic diagram of a tree structure according to an embodiment of the invention. The tree structure constructed at this time only forms the mapping relation among the leaf nodes, the internal nodes and the root nodes. On the basis of the mapping relation, the invention utilizes the parallel processing capacity of the FPGA to calculate the index key values of each internal node and the root node in parallel and completes the insertion operation of the index key values at one time, thereby generating the database index structure of the invention rapidly.
For step S4, concurrently calculating the key values of the internal nodes and the key value of the root node according to the mapping relationship between the leaf nodes, the internal nodes and the root node determined by the tree structure; after the tree structure is constructed, the process of calculating the key values of each internal node and the root node comprises the following steps: numbering internal nodes from bottom to top and from left to right based on the tree structure; determining the layer number of each internal node in the tree, the node number at the layer and the key number in the node; calculating leaf offset between internal nodes of each layer and leaf offset between index key values; calculating a target leaf node corresponding to each key value; and extracting the boundary value of the index data in the target leaf node as the key value of the internal node or the key value of the root node. Wherein the boundary value of the index data is determined by the sorting rule of the index data by the sorting algorithm.
Before calculating the key values of the internal nodes and the root node, the capacities (i.e., the number of key values that can be stored) of the internal nodes and the root node and a filling factor need to be set to determine the number of key values in each internal node. The formula for calculating the leaf offset between the internal nodes of each layer is as follows:
nodeOffset=(m×β+1)n (5)
and the formula for calculating the leaf offset between the index key values is as follows;
keyOffset=(m×β+1)n-1 (6)
wherein m is the capacity of the internal node, β is the fill factor of the internal node, and n is the number of layers where the internal node is located.
The formula for calculating the target leaf node corresponding to each key value is as follows:
target=NodeNo*nodeOffset+(KeyNo+1)*KeyOffset (7)
wherein, target is the number of the target leaf node, node no is the number of the internal node on the layer (the internal node of each layer starts with 0 and is numbered from left to right in sequence), and KeyNo is the number of the key value (the key value in each internal node starts with 0 and is numbered from left to right in sequence).
And finally, extracting the boundary value of the index data in the target leaf node as a key value of an internal node or a key value of a root node.
For step S5, the obtained key values are inserted into corresponding internal nodes or root nodes in parallel; corresponding internal node and root node key values are inserted into the pre-allocated storage space, the method which adopts a completely parallel insertion method is the method with the fastest efficiency, but the same memory can not carry out multiple read-write operations at the same time, and the key values are actually operated in series when being stored. The method adopted by the invention is as follows:
1) completely performing parallel operation on key values which cannot be stored in the same memory;
2) for the key values which need to be stored in the same memory, the key values are calculated by adopting a completely parallel method, the calculated key values are temporarily stored in an extra storage space, and then the cached key values are stored in a final storage space at one time, so that the calculation time and the storage time in serial operation after calculation are saved, and the efficiency of the algorithm is obviously improved by increasing extra space overhead.
To more clearly and completely illustrate the calculation process of the key values, taking the construction of the tree structure shown in fig. 2 as an example:
referring to fig. 2, a total of N — 42 index data to be inserted is set; the number of hash buckets on average of the leaf nodes, i.e., the number of hash buckets bucks, is 4, the capacity b of each hash bucket is 1 (i.e., each hash bucket may store 1 piece of index data), the padding factor α of each leaf node is 0.75, the key value capacity m of the internal node is 2, and the padding factor β of the internal node is 1;
the number L of the required leaf nodes is obtained by the calculation of the formula (1)0Comprises the following steps:
Figure BDA0002865906790000081
namely, Y is required to be created into 14 leaf nodes; next, the number of layers of the tree structure is calculated by formula (2) based on the number of leaf nodes as:
Figure BDA0002865906790000082
namely, a tree structure is constructed based on leaf nodes, and the tree structure has n-3 layers; next, the number of internal nodes of layer 1 (the internal node having a direct index relationship with the leaf node is layer 1) is calculated based on formula (3) as:
Figure BDA0002865906790000083
Figure BDA0002865906790000084
Figure BDA0002865906790000085
constructing a tree structure from the determined layer structure and the internal nodes or root nodes contained in each layer; numbering the tree structure from left to right and from bottom to top; wherein the leaf nodes are numbered from 0 to 15 in sequence; the serial numbers of the internal nodes are 0-6 in sequence, and the serial number of the root node is 7.
Next, based on the tree structure constructed above, the process of calculating the key values of each internal node and the root node is as follows:
taking the calculation of the index key value of the internal node with the number of 1 as an example; as can be seen from the previously set values of 2 and β 1, the key value of each internal node is 2, which is numbered 0 and 1, and the internal node 1 is at the level 1, i.e., n is 1; since the internal node 1 is at the 2 nd position where it is located, the number is counted from 0, and its nodeon no is 1, and assuming that the 1 st key value is currently calculated, KeyNo is 0 can be obtained, and then, it is obtained by formula (5) and formula (6), respectively:
nodeOffset=(2×1+1)1=3;
keyOffset=(2×1+1)1-1=1;
then the target formula for calculating the 1 st key value of the internal node 1 is:
target=1*3+(0+1)*1=4;
the minimum value 34 in the leaf node numbered 4 (i.e., 5 th from the left) is used as the value of the 1 st index key value of the internal node 1.
Similarly, when 2 key values of the internal node 1 are calculated, KeyNO is 1, other parameters are not changed, and the target of the 2 nd key value is:
target=1*3+(1+1)*1=5
the minimum value 41 in the leaf node numbered 5 (6 th from left) is taken as the value of the 2 nd index key of the internal node 1.
Taking the index key value of the internal node with the calculation number 5 as an example, when the 1 st key value of the internal node 5 is calculated, nodeon is 0, and KeyNO is 0;
nodeOffset=(2×1+1)2=9;
keyOffset=(2×1+1)2-1=3
the target formula for calculating the 1 st and key values of the internal node 5 is:
target=0*9+(0+1)*3=3
namely the minimum value 23 in the leaf node numbered 3 (4 th from left) as the key value;
similarly, when the 2 nd key value of the internal node 5 is calculated, nodeon is 0, KeyNO is 1, nodeOffset is 9, keyOffset is 3,
target=0*9+(1+1)*3=6
the minimum 51 in the leaf node numbered 6 (7 th from left) is used as the key.
The key values of the internal nodes and the key value of the root node are calculated sequentially through the steps, and finally, the index structure of the database is constructed, and the index structure is shown in fig. 3. Fig. 3 is a schematic diagram of a database index structure according to an embodiment of the present invention.
It can be seen from the above process of calculating key values that the present invention can calculate key values of each internal node and root node at the same time, therefore, the present invention can calculate each key value in parallel by using parallel processing capability of the FPGA, and insert it into the corresponding internal node or root node at one time, thereby completing construction of the database index structure of the present invention quickly.
If the existing tree structure creation algorithm is used, an index tree with the internal node key value capacity of m and the number of children of m +1 is built. If an index tree having the same index record (N-42) as that of fig. 2 is constructed, the tree height is 5 levels. The invention can generate the initial database index structure with the minimum storage space by the construction mode from the leaf node to the root node, and can be realized by the splitting process of the leaf node when new index data needs to be inserted.
Further, in one or more embodiments, the method comprises: in response to receiving the data query request, searching a target leaf node based on key values of the root node and each internal node; and carrying out hash calculation in the target leaf node, and obtaining target index data according to the hash result.
Further, in one or more embodiments, the method comprises: in response to receiving a request for deleting index data of a target leaf node, deleting the index data, performing defragmentation on the index data, and judging whether the deleted index data is boundary data (index data corresponding to a boundary value) of the target leaf node; traversing all hash buckets in the target leaf node to obtain the minimum value of the index data in the target leaf node in response to the deleted index data being the boundary data of the target leaf node; taking the minimum value as a key value corresponding to the internal node or the root node, and upwards adjusting the structure of the internal node of the tree; judging whether the target leaf node has underflow; in response to the target leaf node underflow, transferring part of the index data to the target node by the adjacent leaf node; or merging the target leaf node and the adjacent leaf nodes and adjusting the structure of the internal nodes of the tree upwards. Because each execution process of the algorithm can fully utilize the parallel throughput of the FPGA, the parallelism degree is extremely high, and therefore, the method can realize simultaneous deletion of a plurality of records, and has higher operation efficiency in practical application compared with the traditional mode of deleting the records one by one.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, a single tree version "a" is intended to include a multiple tree version as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A method for quickly constructing a database index structure is characterized by comprising the following steps:
sorting the index data to be inserted by using a sorting algorithm;
creating leaf nodes, and parallelly inserting the sorted index data into the leaf nodes in a hash storage mode;
constructing a tree structure based on the leaf nodes, and determining a layer structure of the tree structure and internal nodes or root nodes contained in each layer;
according to the mapping relation among the leaf nodes, the internal nodes and the root nodes determined by the tree structure, computing key values of the internal nodes and key values of the root nodes in parallel;
and parallelly inserting the obtained key values into corresponding internal nodes or root nodes.
2. The method for rapidly building the database index structure according to claim 1, wherein the creating leaf nodes and the inserting the sorted index data into the leaf nodes in parallel in a hash storage manner comprises:
configuring preset parameters;
determining the number of required leaf nodes according to the preset parameters and the number of the index data to be inserted;
creating a corresponding number of leaf nodes, and allocating a corresponding hash storage space for each leaf node, wherein the hash storage space of each leaf node comprises a hash bucket for storing index data and an overflow bucket for resolving data collision;
and parallelly inserting the sorted index data into the hash bucket of each leaf node.
3. The method for rapidly building the database index structure according to claim 2, wherein the preset parameters comprise:
the number of hash buckets each leaf node has on average, the capacity of each hash bucket, and the fill factor of each leaf node.
4. The method for rapidly building the database index structure according to claim 2, wherein the allocating a corresponding hash storage space for each of the leaf nodes comprises:
one or more hash buckets are allocated for each leaf node and are grouped into hash bucket chains, and an overflow bucket is allocated for each hash bucket chain.
5. The method for rapidly building the database index structure according to claim 1, wherein the computing the key values of the internal nodes and the key values of the root nodes in parallel according to the mapping relationships among the leaf nodes, the internal nodes and the root nodes determined by the tree structure comprises:
numbering internal nodes from bottom to top and from left to right based on the tree structure;
determining the layer number of each internal node in the tree, the node number at the layer and the key number in the node;
calculating leaf offset between internal nodes of each layer and leaf offset between index key values;
calculating a target leaf node corresponding to each key value;
and extracting the boundary value of the index data in the target leaf node as a key value of an internal node or a key value of a root node.
6. The method for rapidly building the database index structure according to claim 5, wherein the boundary value of the index data is determined by the sorting rule of the sorting algorithm on the index data.
7. The method for rapid building of a database index structure of claim 5, wherein the method further comprises:
the fill factor β for each internal node is set to determine the number of key values in each internal node.
8. The method for rapidly building the database index structure according to any one of claims 1 to 7, wherein the method comprises the following steps:
in response to receiving a data query request, searching a target leaf node based on key values of the root node and the internal node;
and carrying out hash calculation in the target leaf node, and obtaining target index data according to the hash result.
9. The method for rapidly building the database index structure according to any one of claims 1 to 7, wherein the method comprises the following steps:
in response to receiving a request for deleting index data of a target leaf node, deleting the index data, performing defragmentation on the index data, and judging whether the deleted index data is boundary data of the target leaf node;
in response to the deleted index data being boundary data of the target leaf node, traversing all hash buckets within the target leaf node to obtain a minimum value of index data within the target leaf node;
and taking the minimum value as a key value of the corresponding internal node or the root node, and upwards adjusting the structure of the internal node of the tree.
10. The data deletion method of claim 9, wherein the method further comprises:
judging whether the target leaf node underflows;
in response to the target leaf node underflowing, transferring, by an adjacent leaf node, a portion of the index data to the target node; or
And merging the target leaf node and the adjacent leaf node, and upwards adjusting the structure of the internal node of the tree.
CN202011580768.2A 2020-12-28 2020-12-28 Method for quickly constructing database index structure Pending CN112579575A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011580768.2A CN112579575A (en) 2020-12-28 2020-12-28 Method for quickly constructing database index structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011580768.2A CN112579575A (en) 2020-12-28 2020-12-28 Method for quickly constructing database index structure

Publications (1)

Publication Number Publication Date
CN112579575A true CN112579575A (en) 2021-03-30

Family

ID=75140418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011580768.2A Pending CN112579575A (en) 2020-12-28 2020-12-28 Method for quickly constructing database index structure

Country Status (1)

Country Link
CN (1) CN112579575A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286160A (en) * 2008-05-30 2008-10-15 同济大学 Data base indexing process
US20110246479A1 (en) * 2010-03-30 2011-10-06 International Business Machines Corporation Creating Indexes for Databases
CN102375852A (en) * 2010-08-24 2012-03-14 中国移动通信集团公司 Method for building data index as well as method and system using data index for inquiring data
CN104331497A (en) * 2014-11-19 2015-02-04 中国科学院自动化研究所 Method and device using vector instruction to process file index in parallel mode
CN105320775A (en) * 2015-11-11 2016-02-10 中科曙光信息技术无锡有限公司 Data access method and apparatus
CN105608224A (en) * 2016-01-13 2016-05-25 广西师范大学 Orthogonal multilateral Hash mapping indexing method for improving massive data inquiring performance
CN108228799A (en) * 2017-12-29 2018-06-29 北京奇虎科技有限公司 The storage method and device of object indexing information
CN110083601A (en) * 2019-04-04 2019-08-02 中国科学院计算技术研究所 Index tree constructing method and system towards key assignments storage system
CN111475511A (en) * 2020-04-03 2020-07-31 弦子科技(北京)有限公司 Data storage method, data access method, data storage device, data access device and data access equipment based on tree structure

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286160A (en) * 2008-05-30 2008-10-15 同济大学 Data base indexing process
US20110246479A1 (en) * 2010-03-30 2011-10-06 International Business Machines Corporation Creating Indexes for Databases
CN102375852A (en) * 2010-08-24 2012-03-14 中国移动通信集团公司 Method for building data index as well as method and system using data index for inquiring data
CN104331497A (en) * 2014-11-19 2015-02-04 中国科学院自动化研究所 Method and device using vector instruction to process file index in parallel mode
CN105320775A (en) * 2015-11-11 2016-02-10 中科曙光信息技术无锡有限公司 Data access method and apparatus
CN105608224A (en) * 2016-01-13 2016-05-25 广西师范大学 Orthogonal multilateral Hash mapping indexing method for improving massive data inquiring performance
CN108228799A (en) * 2017-12-29 2018-06-29 北京奇虎科技有限公司 The storage method and device of object indexing information
CN110083601A (en) * 2019-04-04 2019-08-02 中国科学院计算技术研究所 Index tree constructing method and system towards key assignments storage system
CN111475511A (en) * 2020-04-03 2020-07-31 弦子科技(北京)有限公司 Data storage method, data access method, data storage device, data access device and data access equipment based on tree structure

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
丁华等: "基于海量属性数据的索引构建方法研究", 《装备指挥技术学院学报》 *
刘勇: "基于GPU的内存数据库索引技术研究", 《中国博士学位论文全文数据库(信息科技辑)》 *
王英强等: "B+树在数据库索引中的应用", 《长江大学学报(自然科学版)理工卷》 *

Similar Documents

Publication Publication Date Title
Bender et al. Cache-oblivious B-trees
US5487164A (en) Distribution-based replacement selection sorting system
US20060271540A1 (en) Method and apparatus for indexing in a reduced-redundancy storage system
CN103345472A (en) Redundancy removal file system based on limited binary tree bloom filter and construction method of redundancy removal file system
CN105975587A (en) Method for organizing and accessing memory database index with high performance
Conway et al. Optimal hashing in external memory
CN112269786B (en) Method for creating KV storage engine index of memory database
CN103765381A (en) Parallel operation on B+ trees
Bender et al. Anti-persistence on persistent storage: History-independent sparse tables and dictionaries
CN103970795A (en) Data processing method, device and system
CN110597805B (en) Memory index structure processing method
CN112988912A (en) Block chain data storage method and device and electronic equipment
Bender et al. Flushing without cascades
CN111028897A (en) Hadoop-based distributed parallel computing method for genome index construction
CN112988909A (en) Block chain data storage method and device and electronic equipment
CN112579575A (en) Method for quickly constructing database index structure
CN112988911B (en) Block chain data storage method and device and electronic equipment
US7577673B2 (en) Organising data in a database
Pagh Basic external memory data structures
Kutzelnigg A further analysis of cuckoo hashing with a stash and random graphs of excess r
CN115374127B (en) Data storage method and device
Graefe Priority queues for database query processing
Brodal et al. Funnelselect: Cache-oblivious multiple selection
Tagliavini Ponce The power of low associativity
Ponce The Power of Low Associativity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210330

RJ01 Rejection of invention patent application after publication