CN112000847A - GPU parallel-based adaptive radix tree dynamic indexing method - Google Patents

GPU parallel-based adaptive radix tree dynamic indexing method Download PDF

Info

Publication number
CN112000847A
CN112000847A CN202010836011.9A CN202010836011A CN112000847A CN 112000847 A CN112000847 A CN 112000847A CN 202010836011 A CN202010836011 A CN 202010836011A CN 112000847 A CN112000847 A CN 112000847A
Authority
CN
China
Prior art keywords
node
data
layer
branch
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010836011.9A
Other languages
Chinese (zh)
Other versions
CN112000847B (en
Inventor
谷峪
宛长义
李传文
宋振
于戈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN202010836011.9A priority Critical patent/CN112000847B/en
Publication of CN112000847A publication Critical patent/CN112000847A/en
Application granted granted Critical
Publication of CN112000847B publication Critical patent/CN112000847B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/24569Query processing with adaptation to specific hardware, e.g. adapted for using GPUs or SSDs

Abstract

The invention provides a GPU parallel-based adaptive radix tree dynamic indexing method. Firstly, an adaptive radix tree data structure is constructed, the first two layers are used for creating Node256 type tree nodes, the third and fourth layers are based on a high-order-priority radix sorting method, tree nodes with corresponding sizes can be created according to the number of branches, the creation of a dynamic data structure is realized, the latest update in the original batch of data is still behind the old update after sorting, then, the duplication removing operation is carried out, the redundant old update is removed, the latest update is reserved, each section of sequence without duplicated data is inserted into the corresponding Node after the duplication removing operation, the creation of the whole adaptive radix tree is completed, and then, based on the GPU parallel computing capability, the insertion, query and deletion operation of the data can be carried out in parallel.

Description

GPU parallel-based adaptive radix tree dynamic indexing method
Technical Field
The invention relates to the technical field of parallel indexing of computer database directions, in particular to a GPU parallel-based adaptive radix tree dynamic indexing method.
Background
The indexing technology is one of the key technologies of modern information retrieval, search application and data mining, and various types of data indexes are provided aiming at different types of data and different query requirements in order to accurately locate data in a large amount of data. In recent years, the explosive growth of data scale expands the research field to large-scale, high-dimensional and sparse data sets, and efficient index construction and data query of such data sets have become the main research direction at present. In addition, as the demand for processors increases with the expansion of data size, the parallel capability of a graphics processing unit (abbreviated as GPU) provides an opportunity for solving problems, and has been widely used in different fields such as biological information and financial transactions due to excellent computing performance, and becomes one of important components of a large data processing system. Dynamic data structures can be architecturally adjusted to accommodate dynamic updates as data is inserted and deleted, however designing an efficient dynamic data structure based on a GPU faces many problems, such as leveraging hundreds or thousands of available GPU cores, avoiding thread branching, and reducing global memory access by using shared memory and caches located on the chip. In order to fully utilize the powerful computational resources provided by the GPU, design considerations should be taken into account depending on the particular hardware architecture of the GPU. Update operations are more challenging than query operations and also become a bottleneck, as they require inter-thread synchronization rather than running independently.
At present, there are four designs that can implement a dynamic index structure on a GPU. The method comprises the steps that a GPU SA (protected Array SA) maintains a section of ordered sequence, when a batch of data arrives, the ordered sequence is firstly sequenced and then merged with the existing sequence to synthesize a new ordered sequence, binary search is adopted in search operation, space locality is poor due to random access, cache cannot be well utilized, data are stored in multiple layers by a GPU LSM (Log-Structured Merge Tree LSM), the size of each layer is 2 times that of the previous layer, when the batch of data is inserted, the ordered sequence is firstly placed into the first layer, and when each layer is full, the ordered sequence is merged and stored into the next layer until the batch of data is completely stored into a Tree. Since the new value is at the upper level of the location of the old value in the GPU LSM, the lookup operation starts from the first level down, with each level performing a binary lookup until a key is found or the last level is found not. The SA and the LSM adopt the operation of sequencing and merging, so as to maintain one or more sections of ordered sequences for binary search, the insertion speed of the two structures depends on the batch size of data, and the larger the data volume, the higher the characteristic of the GPU can be utilized, so that the sequencing and merging speed is high. The LSM at the time of insertion does not need to merge with all original data like the SA due to the advantage of the multi-layer structure, so the average update speed is better than the SA. However, as the total data amount in the tree structure increases, the merging cost becomes higher when inserting, and the search speed drops suddenly, and the search speed is slower than the SA for the LSM because of the multi-layer query. The GPU B-Tree locks the nodes during updating so as to prevent updating conflict, and the conflict is less when the inserted sequence is uniformly distributed. However, if the sequence is ordered, the insertion positions are dense, which causes the locking conflict to be increased and reduces the insertion speed, so that the sequence does not need to be ordered first. The concurrency is poor because one thread group warp (32 threads are one group) can only process one data at a time, and the speed is basically unchanged along with the increase of the batch data quantity. But the B-Tree has the advantages that the size of the node can be suitable for the cache-line size, the spatial locality is good, the node does not need to be locked, and the searching speed is higher than that of the traditional LSM and SA. The Slab Hash provides a Hash structure capable of being dynamically expanded, optimizes a linked list structure and realizes a method for distributing memory by using a very high-speed dynamic memory distributor Slab alloc to replace a CUDA (computer Unified Device Architecture for short CUDA) tool, so that the structure has very high insertion and search speeds, but does not support operations such as range query and the like as the traditional Hash method.
Radix Tree Radix-Tree this Tree structure is determined by the distribution of key, not the insertion order, and the key location is not compared as many times as B-Tree, but similar to Hash, but determined by each part of the key value itself. Because the nodes are independent when data is inserted, no lock operation or atomic operation is needed, the locality of a search space inside the nodes is good, and the cache can be fully utilized. The optimized Adaptive Radix Tree (Adaptive Radix Tree) can dynamically change the type of the node according to the utilization rate of the node to improve the space utilization rate. The searching, inserting and deleting operations on four types of nodes of the self-adaptive radix tree index are all designed in a single-thread serial mode and cannot be operated in parallel, an existing GPU method is in a serial tree structure, the node types cannot be changed dynamically, warp and cache cannot be fully utilized during query, and the performance is poor.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a GPU parallel-based adaptive radix tree dynamic indexing method, which comprises the following steps:
step 1: constructing an adaptive radix tree data structure with four layers, comprising:
step 1.1: creating tree nodes of the type of a first layer Node256 and a second layer Node256, and connecting the second layer Node and the first layer Node;
step 1.2: performing high-order-first radix sequencing on S integer data to be stored, creating a third layer and a fourth layer data structure of the adaptive radix tree, connecting the second layer and the third layer, and connecting the third layer and the fourth layer;
step 1.3: carrying out deduplication operation by using each thread in the warp to process an 8-bit key, and ending processing on the xth data if the xth data processed by the xth thread in the warp is equal to the xth +1 data, wherein X is 0,1, …, and X denotes the total number of data processed by one warp;
step 1.4: inserting the sequence of the non-repeated data in each branch after the duplication removal into the node of the adaptive basis number tree;
step 2: performing query operation on data by using warp aiming at the constructed adaptive radix tree data structure;
and step 3: performing insertion operation on data by using warp aiming at the constructed adaptive radix tree data structure;
and 4, step 4: and (4) deleting the data by using warp aiming at the constructed adaptive radix tree data structure.
Step 1.1 comprises:
step 1.1.1: creating a Node N of Node256 type1,1As the first level of the adaptive radix tree, i.e., the first level of the branches of the adaptive radix tree, node N1,1Is shown as
Figure BDA0002639726120000031
Figure BDA0002639726120000032
An ith value representing a first level node;
step 1.1.2: to be provided with
Figure BDA0002639726120000033
Creating a Node N of Node256 type for a Node2,iThe ith branch of the second layer as the adaptive radix tree, node N2,iIs shown as
Figure BDA0002639726120000034
Figure BDA0002639726120000035
A jth value representing the ith node of the second layer;
step 1.1.3: let i equal 0,1, …,255, create a Node256 type Node with GPU's warp and for each value of the first layer, resulting in 256 branches of the second layer of the adaptive radix tree;
step 1.1.4: and storing the pointer of the ith node of the second layer in the ith value of the first layer.
Step 1.2 comprises:
step 1.2.1: dividing the data with the same 0-7 bits in the S data into first layer branches;
step 1.2.2: dividing the data with the same 8-15 bits in each branch of the first layer into the same branch of the second layer, wherein the data stored in each branch of the second layer are the data with the same prefix;
step 1.2.3: performing high-order-first base number sequencing on 16-23 bits of data in each branch of the second layer in parallel by using warp of the GPU, and performing high-order-first base number sequencing on 16-23 bits of data with the same prefixData with the same bits are divided into a group, and data with the same prefix in the ith branch defining the second layer is divided into HiGroup, h in ith branch of second layeriThe number of group data is
Figure BDA0002639726120000036
H isi=0,1,…,Hi
Step 1.2.4: sequentially creating a node with the capacity more than or equal to the capacity of the node by taking each value in the ith branch of the second layer as the node
Figure BDA0002639726120000037
Node type of
Figure BDA0002639726120000038
Ith as third layer of adaptive radix treeiA branch, then
Figure BDA0002639726120000039
Is shown as
Figure BDA00026397261200000310
U ∈ {4,16,48,256}, and
Figure BDA00026397261200000311
Figure BDA00026397261200000312
i x h representing the third layeriThe u-th value of each Node, the Node types including a Node of the Node4 type, a Node of the Node16 type, a Node of the Node48 type and a Node of the Node256 type;
step 1.2.5: let hi=0,1,…,HiWhen i is 0,1, …,255, creating a node type with the node capacity more than or equal to the size of the data to be stored by using the warp of the GPU and each value of the second layer, and obtaining all branches of the third layer of the adaptive radix tree;
step 1.2.6: will be first
Figure BDA00026397261200000313
The pointers of each node are stored in the h-th branch of the ith layeriValue;
step 1.2.7: performing high-order-first base number sequencing on 24-31 bits of data in each branch of the third layer in parallel by using warp of a GPU (graphics processing unit), dividing data with the same 24-31 bits in the data of each branch into a group, and defining that data in the kth branch of the third layer is divided into HkThe number of the groups is set to be,
Figure BDA0002639726120000041
h in the kth branch of the third layerkThe number of group data is
Figure BDA0002639726120000042
H isk=0,1,…,Hk
Step 1.2.8: sequentially creating a node with the capacity more than or equal to the capacity of the node by taking each value in the kth branch of the third layer as the node
Figure BDA0002639726120000043
Node type of
Figure BDA0002639726120000044
K x h as the fourth level of the adaptive radix treekBranch of
Figure BDA0002639726120000045
Is shown as
Figure BDA0002639726120000046
V is equal to {4,16,48,256}, and
Figure BDA0002639726120000047
Figure BDA0002639726120000048
k x h representing the fourth layerkThe v-th value of the individual node;
step 1.2.9: let hk=0,1,…,Hk,
Figure BDA0002639726120000049
Establishing a node type with the node capacity larger than or equal to the size of data to be stored by using the warp of the GPU and acting as each node of the fourth layer to obtain all branches of the fourth layer of the adaptive radix tree;
step 1.2.10: will be first
Figure BDA00026397261200000410
Pointers to individual nodes being stored in the kth branch of the third layerkValue.
The step 2 comprises the following steps:
step 2.1: allocating a warp process to each branch with the same prefix of 0bit to 23bit for query operation;
step 2.2: for nodes of Node4 type and Node16 type, according to the quantity F of stored data in the Node, judging whether the data to be inquired exists in the Node by using a thread voting mode by taking each F thread in each warp as a group, and for nodes of Node48 type and Node256 type, judging whether the data to be inquired exists in the Node according to an 8-bit key, if so, returning the value in the Node, and if not, returning a null value to indicate that the data is not found.
The step 3 comprises the following steps:
step 3.1: allocating a warp treatment to each branch with the same prefix of 0bit to 23bit for insertion operation;
step 3.2: judging whether a node to be inserted exists or not, if not, executing node creation and new data insertion;
step 3.3: if the node exists, each thread queries whether new data to be inserted exists in the node in parallel, if so, the value in the node to be inserted is updated, if not, whether the sum beta of the number of the new data to be inserted and the number of data existing in the node exceeds the capacity beta' of the type of the node to be inserted is judged, and if not, the sum beta is judged according to the existing data quantity in the node
Figure BDA00026397261200000411
From within the node to
Figure BDA00026397261200000412
Inserting a value of new data and a key of 24-31 bit into each position; if the node capacity is larger than or equal to beta, the node type is converted into a new node type, and then new data is inserted into the node of the new node type.
Step 4 comprises the following steps:
step 4.1: allocating a warp treatment to each branch with the same prefix of 0bit to 23bit for deleting operation;
step 4.2: judging whether the data to be deleted exists in the Node or not, if so, for the nodes of Node4, Node16 and Node48 types, firstly finding the last value stored in the Node, covering the position to be deleted by using the key and the value respectively, updating the number of the data in the Node to be deleted, and for the Node of Node256 type, finding the storage position of the data to be deleted, deleting the value, and then updating the number in the Node.
The invention has the beneficial effects that:
the invention provides a GPU (graphics processing unit) -parallel-based adaptive radix tree dynamic indexing method, which is based on a GPU multi-thread processing mode and provides a data structure of an adaptive radix tree.
Drawings
FIG. 1 is a diagram illustrating an adaptive radix tree data structure according to the present invention.
FIG. 2 is a diagram illustrating four node types of the adaptive radix tree according to the present invention.
FIG. 3 is a schematic diagram of parallel query of data in the present invention, in which (a) shows a schematic diagram of a Node16 type Node querying data in a thread voting manner, in which (b) shows a schematic diagram of a Node48 type Node querying a value array by accessing a value in the key array using a key as an array index, and in which (c) shows a schematic diagram of a Node256 type Node directly locating a value position by using a key.
FIG. 4 is a schematic diagram of data deletion of four Node types in the present invention, in which (a) shows a schematic diagram of data deletion of Node4 and Node16 type nodes, and (b) shows a schematic diagram of data deletion of Node48 type nodes.
Detailed Description
The invention is further described with reference to the following figures and specific examples.
A GPU parallel-based adaptive radix tree dynamic indexing method is a GPU parallel-based dynamic indexing design, makes full use of the parallel computing capability of a GPU, and parallelizes batch construction, insertion, deletion and query operations of indexes by combining an adaptive radix tree structure. Firstly, according to the data distribution characteristics in a radix tree, performing high-order-first radix sequencing on a batch of data to be processed, performing grouping operation for multiple times after the sequence is ordered, counting the highest 8-bit for a key of 32 bits for the first time, dividing the data sequence to be processed each time into at most 256 sections, and then performing parallel processing among the sections; and then, the inside of each section is divided according to the second 8bit, and then, the inside of each section is divided according to the third 8bit in the same way, the division result at this moment is the statistical information of 8-16bit of each section of the current third layer, the insertion operation is carried out on the third layer according to the information, at this moment, the whole batch of data is divided into a plurality of small sections in parallel, the inside of each section is orderly arranged, the same 24bit prefix is arranged, and the redundant updating operation exists. Because the base ranking method is a stable ranking method, the latest update in the original batch of data can be guaranteed to be still behind the old update after ranking. Thus, the duplication removing operation can be carried out from front to back, redundant old updates are removed, and the latest updates are reserved; after the deduplication, each non-duplicated data sequence is inserted or inquired in the corresponding node of the segment by using the redesigned parallel operation, and the method comprises the following steps:
step 1: an adaptive radix tree data structure with four layers is constructed, as shown in fig. 1, a Node256 type Node is created for the first layer of the tree, and the Node structure is shown in fig. 2. Node4 is the smallest type, and is composed of an array Key [4] with length of 4 storage Key and pointer array Value [4] with same length, and Key and pointer exist at corresponding position. Node16 is used to store 5 to 16 values, similar to Node4 Node, but with a total length of 16, and is used to store Key's array Key [16] and the same length pointer array Value [16 ]. Node48 stores nodes of 17 to 48 values, Value, consisting of a Key array Key [256] of 256 and a pointer array Value [48] of length 48. Node256 is the largest Node type, Node256 stores nodes with 49 to 256 values, and is composed of a pointer array Value [256] with length of 256, the Value is located directly by using key as subscript, and the Value can be located only by one effective search. The second layer creates 256 Node256 type nodes, pointers of each Node are respectively stored at corresponding positions in the first layer of nodes, value arrays in leaf nodes of the fourth layer store values, the pointers of the previous layer store pointers pointing to the corresponding nodes of the next layer, in order to improve processing efficiency, the second layer is set as 256 Node256 type nodes, and the pointers of each Node are respectively stored at positions in the corresponding value arrays in the first layer of nodes; the method specifically comprises the following steps:
step 1.1: creating tree nodes of the type of a first layer Node256 and a second layer Node256, and connecting the second layer Node and the first layer Node, wherein the steps comprise:
step 1.1.1: creating a Node N of Node256 type1,1As the first level of the adaptive radix tree, i.e., the first level of the branches of the adaptive radix tree, node N1,1Is shown as
Figure BDA0002639726120000061
Figure BDA0002639726120000062
An ith value representing a first level node;
step 1.1.2: to be provided with
Figure BDA0002639726120000063
Creating a Node N of Node256 type for a Node2,iThe ith branch of the second layer as the adaptive radix tree, node N2,iIs shown as
Figure BDA0002639726120000064
Figure BDA0002639726120000065
A jth value representing the ith node of the second layer;
step 1.1.3: let i equal 0,1, …,255, create a Node256 type Node with GPU's warp and for each value of the first layer, resulting in 256 branches of the second layer of the adaptive radix tree;
step 1.1.4: and storing the pointer of the ith node of the second layer in the ith value of the first layer.
A batch of data is subjected to high-bit-first base sorting, every 8 bits are divided into at most 256 sections, and the data with the same first 16 bits are divided into the same branch. The position information (namely prefix sum) of each branch can be obtained when the high-order-priority cardinal number sorting is carried out, so that the initial position of each branch can be found in parallel, and downward processing is continued; continuing to sort the 16 th to 24 th bits of each branch in parallel, creating tree nodes capable of containing corresponding sizes according to the number of branches as a third layer, storing pointers of the nodes into corresponding positions in tree nodes of a second layer, performing the same sorting processing on the 24 th to 32 th bits, and storing the pointers into corresponding positions of the third layer after creating nodes of corresponding types in a fourth layer, wherein the specific steps are as follows:
step 1.2: performing high-order-first radix sequencing on S integer data to be stored, creating a third layer and a fourth layer data structure of an adaptive radix tree, connecting the second layer and the third layer, and connecting the third layer and the fourth layer, wherein the method comprises the following steps:
step 1.2.1: dividing the data with the same 0-7 bits in the S data into first layer branches;
step 1.2.2: dividing the data with the same 8-15 bits in the branches of the first layer into the same branches of the second layer, wherein the data stored in each branch of the second layer are the data with the same prefix;
step 1.2.3: performing high-order-first base number sequencing on 16-23 bits of data in each branch of the second layer in parallel by using warp of the GPU, dividing the data with the same prefix in 16-23 bits into a group, and defining that the data with the same prefix in the ith branch of the second layer is divided into HiGroup, h in ith branch of second layeriThe number of group data is
Figure BDA0002639726120000071
H isi=0,1,…,Hi
Step 1.2.4: sequentially creating a node with the capacity more than or equal to the capacity of the node by taking each value in the ith branch of the second layer as the node
Figure BDA0002639726120000072
Node type of
Figure BDA0002639726120000073
Ith as third layer of adaptive radix treeiA branch, then
Figure BDA0002639726120000074
Is shown as
Figure BDA0002639726120000075
U ∈ {4,16,48,256}, and
Figure BDA0002639726120000076
Figure BDA0002639726120000077
i x h representing the third layeriThe u-th value of each Node, the Node types including a Node of the Node4 type, a Node of the Node16 type, a Node of the Node48 type and a Node of the Node256 type;
step 1.2.5: let hi=0,1,…, H i0,1, …,255, warp union using GPUEstablishing a node type with the node capacity larger than or equal to the size of the data to be stored by using each value of the second layer to obtain all branches of the third layer of the adaptive radix tree;
step 1.2.6: will be first
Figure BDA0002639726120000078
The pointers of each node are stored in the h-th branch of the ith layeriValue;
step 1.2.7: performing high-order-first base number sequencing on 24-31 bits of data in each branch of the third layer in parallel by using warp of a GPU (graphics processing unit), dividing data with the same 24-31 bits in the data of each branch into a group, and defining that data in the kth branch of the third layer is divided into HkThe number of the groups is set to be,
Figure BDA0002639726120000079
h in the kth branch of the third layerkThe number of group data is
Figure BDA00026397261200000710
H isk=0,1,…,Hk
Step 1.2.8: sequentially creating a node with the capacity more than or equal to the capacity of the node by taking each value in the kth branch of the third layer as the node
Figure BDA00026397261200000711
Node type of
Figure BDA00026397261200000712
K x h as the fourth level of the adaptive radix treekBranch of
Figure BDA00026397261200000713
Is shown as
Figure BDA0002639726120000081
V is equal to {4,16,48,256}, and
Figure BDA0002639726120000082
Figure BDA0002639726120000083
k x h representing the fourth layerkThe v-th value of the individual node;
step 1.2.9: let hk=0,1,…,Hk,
Figure BDA0002639726120000084
Establishing a node type with the node capacity larger than or equal to the size of data to be stored by using the warp of the GPU and acting as each node of the fourth layer to obtain all branches of the fourth layer of the adaptive radix tree;
step 1.2.10: will be first
Figure BDA0002639726120000085
Pointers to individual nodes being stored in the kth branch of the third layerkValue.
Step 1.3: processing an 8-bit key by each thread in the warp to perform deduplication operation, if the xth data processed by the xth thread in the warp is equal to the xth +1 data adjacent to the last, ending the processing of the xth data, removing redundant old updates, and keeping the latest updates, wherein X is 0,1, …, and X, X represents the total number of data processed by the warp;
step 1.4: inserting the sequence of the non-repeated data in each branch after the duplication removal into the node of the adaptive basis number tree;
and aiming at the constructed adaptive radix tree data structure, allocating a warp process to each branch with the same front 24bit prefix, wherein the processing modes of each batch of data comprise query, insertion and deletion operations.
Step 2: aiming at the constructed adaptive radix tree data structure, performing query operation on data by using warp, wherein the query operation comprises the following steps:
step 2.1: allocating a warp process to each branch with the same prefix of 0bit to 23bit for query operation;
step 2.2: for nodes of Node4 type and Node16 type, according to the quantity F of stored data in the Node, judging whether the data to be inquired exists in the Node by using a thread voting mode by taking each F thread in each warp as a group, for nodes of Node48 and Node256 type, directly judging whether the data to be inquired exists in the Node according to an 8-bit key, if so, returning the value in the Node, and if not, returning a null value to indicate that the data is not found.
As shown in FIG. 3, a schematic diagram (a) shows that a Node of Node16 type queries data by using a thread voting method, in the diagram, 7 data already exist in the Node, Keys represents 8bit data to be queried in the Node, i.e. random numbers 1, 0, 255 and 3, and a thread group warp is 32 threads in total, i.e. T0~T31Wherein T is0、T7、T14、T21Processing the 0 th data in the node, T1、T8、T15、T22Processing the 1 st data in the node, T2、T9、T16、T23Processing the 2 nd data in the node, T6、T13、T20、T27Processing the 6 th data in the node; FIG. (b) shows a schematic diagram of Node48 type Node query data, Keys represents 8bit data to be queried in the Node, namely random numbers 1, 15, 46, … …, 0, 16, each warp can query 32 data at most simultaneously in parallel, 0,1, … …, 15, 16, … …, 46, 47 represents the subscript of value in the value array; the diagram (c) shows a schematic diagram of Node256 type Node directly querying value by key, 0,1, … …, 47, 48, … …, 254, 255 represents subscript of value in value array.
And step 3: and performing insertion operation on data by using warp aiming at the constructed adaptive radix tree data structure, wherein the insertion operation comprises the following steps:
step 3.1: allocating a warp treatment to each branch with the same prefix of 0bit to 23bit for insertion operation;
step 3.2: judging whether a node to be inserted exists or not, if not, executing node creation and new data insertion;
step 3.3: if the node exists, each thread queries whether the new data to be inserted is stored in the node in parallelIf the node exists, updating the value of the node to be inserted, if the node does not exist, firstly judging whether the sum beta of the number of the new data to be inserted and the number of the data existing in the node exceeds the capacity beta 'of the type of the node to be inserted, and if the sum beta does not exceed the capacity beta', according to the amount of the data existing in the node
Figure BDA0002639726120000091
From within the node to
Figure BDA0002639726120000092
Inserting a value of new data and a key of 24-31 bit into each position; if the node capacity is larger than or equal to beta, the node type is converted into a new node type, and then new data is inserted into the node of the new node type.
And 4, step 4: and (3) deleting data by using warp aiming at the constructed adaptive radix tree data structure, wherein the deleting operation comprises the following steps:
step 4.1: allocating a warp treatment to each branch with the same prefix of 0bit to 23bit for deleting operation;
step 4.2: judging whether the data to be deleted exists in the Node or not, if so, for the nodes of Node4, Node16 and Node48 types, firstly finding the last value stored in the Node, covering the position to be deleted by using the key and the value respectively, updating the number of the data in the Node to be deleted, and for the Node of Node256 type, finding the storage position of the data to be deleted, deleting the value, and then updating the number in the Node.
In the traditional ART (adaptive basis number tree (ART) for short), except for the Node256 Node type, other three types are all deleted at any position, which causes the discontinuity of non-null values of a value array, so that the search for the null position during the insertion increases the query cost and reduces the parallelism; the insert operation is written to the value array in an additional form, which requires the continuity of the value array to make it more convenient to find the empty location. In order to realize the deletion function without influencing the continuity, when the Node4, Node16 and Node48 type nodes process the deletion operation, each time one key is processed, the position of the value corresponding to the last key in the Node is found, and then the key to be deleted and the value are covered, so the continuity is ensured. As shown in fig. 4(a), the position of the last non-null value is obtained in Node4 and Node16 according to the data amount in the Node, and the key position of the last value of Node48 is obtained by searching the value in the key array with the length of 256 in common by 32 threads in warp; as shown in fig. 4(b), each warp in the graph contains 32 threads, a dashed box 1 in the graph represents a key0 to be deleted and a value stored at the 1 st position in the value array, the last value in the node is stored at the 46 th position in the value array, the 46 th value covers the 1 st value, the key at the 2 nd position in the key array is updated to be the new position 1 of the value, and then 1 stored at the position of the key0 is set to be null; and after the Node256 type Node is directly positioned, deleting the value at the corresponding position and then updating the data volume in the Node.

Claims (6)

1. A GPU parallel-based adaptive radix tree dynamic indexing method is characterized by comprising the following steps:
step 1: constructing an adaptive radix tree data structure with four layers, comprising:
step 1.1: creating tree nodes of the type of a first layer Node256 and a second layer Node256, and connecting the second layer Node and the first layer Node;
step 1.2: performing high-order-first radix sequencing on S integer data to be stored, creating a third layer and a fourth layer data structure of the adaptive radix tree, connecting the second layer and the third layer, and connecting the third layer and the fourth layer;
step 1.3: processing an 8-bit key by each thread in the warp to perform deduplication operation, and if the xth data processed by the xth thread in the warp is equal to the xth +1 data, ending the processing of the xth data, wherein X is 0,1, …, and X denotes the total number of data processed in the warp;
step 1.4: inserting the sequence of the non-repeated data in each branch after the duplication removal into the node of the adaptive basis number tree;
step 2: performing query operation on data by using warp aiming at the constructed adaptive radix tree data structure;
and step 3: performing insertion operation on data by using warp aiming at the constructed adaptive radix tree data structure;
and 4, step 4: and (4) deleting the data by using warp aiming at the constructed adaptive radix tree data structure.
2. The GPU-parallel-based adaptive radix tree dynamic indexing method according to claim 1, wherein step 1.1 comprises:
step 1.1.1: creating a Node N of Node256 type1,1As the first level of the adaptive radix tree, i.e., the first level of the branches of the adaptive radix tree, node N1,1Is shown as
Figure FDA0002639726110000011
i=0,1,…,255,
Figure FDA0002639726110000012
An ith value representing a first level node;
step 1.1.2: to be provided with
Figure FDA0002639726110000013
Creating a Node N of Node256 type for a Node2,iThe ith branch of the second layer as the adaptive radix tree, node N2,iIs shown as
Figure FDA0002639726110000014
j=0,1,…,255,
Figure FDA0002639726110000015
A jth value representing the ith node of the second layer;
step 1.1.3: let i equal 0,1, …,255, create a Node256 type Node with GPU's warp and for each value of the first layer, resulting in 256 branches of the second layer of the adaptive radix tree;
step 1.1.4: and storing the pointer of the ith node of the second layer in the ith value of the first layer.
3. The GPU-parallel-based adaptive radix tree dynamic indexing method according to claim 1, wherein step 1.2 comprises:
step 1.2.1: dividing the data with the same 0-7 bits in the S data into first layer branches;
step 1.2.2: dividing the data with the same 8-15 bits in each branch of the first layer into the same branch of the second layer, wherein the data stored in each branch of the second layer are the data with the same prefix;
step 1.2.3: performing high-order-first base number sequencing on 16-23 bits of data in each branch of the second layer in parallel by using warp of the GPU, dividing the data with the same prefix in 16-23 bits into a group, and defining that the data with the same prefix in the ith branch of the second layer is divided into HiGroup, h in ith branch of second layeriThe number of group data is
Figure FDA0002639726110000021
H isi=0,1,…,Hi
Step 1.2.4: sequentially creating a node with the capacity more than or equal to the capacity of the node by taking each value in the ith branch of the second layer as the node
Figure FDA0002639726110000022
Node type of
Figure FDA0002639726110000023
Ith as third layer of adaptive radix treeiA branch, then
Figure FDA0002639726110000024
Is shown as
Figure FDA0002639726110000025
U is 1,2, …, U, U e {4,16,48,256}, and
Figure FDA0002639726110000026
Figure FDA0002639726110000027
i x h representing the third layeriThe u-th value of each Node, the Node types including a Node of the Node4 type, a Node of the Node16 type, a Node of the Node48 type and a Node of the Node256 type;
step 1.2.5: let hi=0,1,…,HiWhen i is 0,1, …,255, creating a node type with the node capacity more than or equal to the size of the data to be stored by using the warp of the GPU and each value of the second layer, and obtaining all branches of the third layer of the adaptive radix tree;
step 1.2.6: will be first
Figure FDA0002639726110000028
The pointers of each node are stored in the h-th branch of the ith layeriValue;
step 1.2.7: performing high-order-first base number sequencing on 24-31 bits of data in each branch of the third layer in parallel by using warp of a GPU (graphics processing unit), dividing data with the same 24-31 bits in the data of each branch into a group, and defining that data in the kth branch of the third layer is divided into HkThe number of the groups is set to be,
Figure FDA0002639726110000029
h in the kth branch of the third layerkThe number of group data is
Figure FDA00026397261100000210
H isk=0,1,…,Hk
Step 1.2.8: sequentially creating a node with the capacity more than or equal to the capacity of the node by taking each value in the kth branch of the third layer as the node
Figure FDA00026397261100000211
Node type of
Figure FDA00026397261100000212
K x h as the fourth level of the adaptive radix treekBranch of
Figure FDA00026397261100000213
Is shown as
Figure FDA00026397261100000214
V is 1,2, …, V ∈ {4,16,48,256}, and
Figure FDA00026397261100000215
Figure FDA00026397261100000216
k x h representing the fourth layerkThe v-th value of the individual node;
step 1.2.9: let hk=0,1,…,Hk,
Figure FDA00026397261100000217
Establishing a node type with the node capacity larger than or equal to the size of data to be stored by using the warp of the GPU and acting as each node of the fourth layer to obtain all branches of the fourth layer of the adaptive radix tree;
step 1.2.10: will be first
Figure FDA00026397261100000218
Pointers to individual nodes being stored in the kth branch of the third layerkValue.
4. The GPU-parallel-based adaptive radix tree dynamic indexing method according to claim 1, wherein the step 2 comprises:
step 2.1: allocating a warp process to each branch with the same prefix of 0bit to 23bit for query operation;
step 2.2: for nodes of Node4 type and Node16 type, according to the quantity F of stored data in the Node, judging whether the data to be inquired exists in the Node by using a thread voting mode by taking each F thread in each warp as a group, and for nodes of Node48 type and Node256 type, judging whether the data to be inquired exists in the Node according to an 8-bit key, if so, returning the value in the Node, and if not, returning a null value to indicate that the data is not found.
5. The GPU-parallel-based adaptive radix tree dynamic indexing method according to claim 1, wherein step 3 comprises:
step 3.1: allocating a warp treatment to each branch with the same prefix of 0bit to 23bit for insertion operation;
step 3.2: judging whether a node to be inserted exists or not, if not, executing node creation and new data insertion;
step 3.3: if the node exists, each thread queries whether new data to be inserted exists in the node in parallel, if so, the value in the node to be inserted is updated, if not, whether the sum beta of the number of the new data to be inserted and the number of data existing in the node exceeds the capacity beta' of the type of the node to be inserted is judged, and if not, the sum beta is judged according to the existing data quantity in the node
Figure FDA0002639726110000031
From within the node to
Figure FDA0002639726110000032
Inserting a value of new data and a key of 24-31 bit into each position; if the node capacity is larger than or equal to beta, the node type is converted into a new node type, and then new data is inserted into the node of the new node type.
6. The GPU-parallel-based adaptive radix tree dynamic indexing method according to claim 1, wherein step 4 comprises:
step 4.1: allocating a warp treatment to each branch with the same prefix of 0bit to 23bit for deleting operation;
step 4.2: judging whether the data to be deleted exists in the Node or not, if so, for the nodes of Node4, Node16 and Node48 types, firstly finding the last value stored in the Node, covering the position to be deleted by using the key and the value respectively, updating the number of the data in the Node to be deleted, and for the Node of Node256 type, finding the storage position of the data to be deleted, deleting the value, and then updating the number in the Node.
CN202010836011.9A 2020-08-19 2020-08-19 GPU parallel-based adaptive radix tree dynamic indexing method Active CN112000847B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010836011.9A CN112000847B (en) 2020-08-19 2020-08-19 GPU parallel-based adaptive radix tree dynamic indexing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010836011.9A CN112000847B (en) 2020-08-19 2020-08-19 GPU parallel-based adaptive radix tree dynamic indexing method

Publications (2)

Publication Number Publication Date
CN112000847A true CN112000847A (en) 2020-11-27
CN112000847B CN112000847B (en) 2021-07-20

Family

ID=73473037

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010836011.9A Active CN112000847B (en) 2020-08-19 2020-08-19 GPU parallel-based adaptive radix tree dynamic indexing method

Country Status (1)

Country Link
CN (1) CN112000847B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112631631A (en) * 2020-12-29 2021-04-09 中国科学院计算机网络信息中心 Update sequence maintenance method for GPU accelerated multi-step prefix tree
CN112784117A (en) * 2021-01-06 2021-05-11 北京信息科技大学 High-level radix tree construction method and construction system for mass data
CN113626432A (en) * 2021-08-03 2021-11-09 浪潮云信息技术股份公司 Improvement method of self-adaptive radix tree supporting any Key value
CN115438027A (en) * 2022-11-07 2022-12-06 中水淮河规划设计研究有限公司 Model library management system of C/S, B/S mixed architecture

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170262464A1 (en) * 2013-12-13 2017-09-14 Oracle International Corporation System and method for supporting elastic data metadata compression in a distributed data grid
CN107590160A (en) * 2016-07-08 2018-01-16 阿里巴巴集团控股有限公司 A kind of method and device for monitoring radix tree internal structure
CN110363294A (en) * 2018-03-26 2019-10-22 辉达公司 Neural network is indicated using the path in network to improve the performance of neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170262464A1 (en) * 2013-12-13 2017-09-14 Oracle International Corporation System and method for supporting elastic data metadata compression in a distributed data grid
CN107590160A (en) * 2016-07-08 2018-01-16 阿里巴巴集团控股有限公司 A kind of method and device for monitoring radix tree internal structure
CN110363294A (en) * 2018-03-26 2019-10-22 辉达公司 Neural network is indicated using the path in network to improve the performance of neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
肖仁智等: "面向非易失内存的数据一致性研究综述", 《计算机研究与发展》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112631631A (en) * 2020-12-29 2021-04-09 中国科学院计算机网络信息中心 Update sequence maintenance method for GPU accelerated multi-step prefix tree
CN112784117A (en) * 2021-01-06 2021-05-11 北京信息科技大学 High-level radix tree construction method and construction system for mass data
CN112784117B (en) * 2021-01-06 2023-06-02 北京信息科技大学 Advanced radix tree construction method and construction system for mass data
CN113626432A (en) * 2021-08-03 2021-11-09 浪潮云信息技术股份公司 Improvement method of self-adaptive radix tree supporting any Key value
CN113626432B (en) * 2021-08-03 2023-10-13 上海沄熹科技有限公司 Improved method of self-adaptive radix tree supporting arbitrary Key value
CN115438027A (en) * 2022-11-07 2022-12-06 中水淮河规划设计研究有限公司 Model library management system of C/S, B/S mixed architecture

Also Published As

Publication number Publication date
CN112000847B (en) 2021-07-20

Similar Documents

Publication Publication Date Title
CN112000847B (en) GPU parallel-based adaptive radix tree dynamic indexing method
CN110083601B (en) Key value storage system-oriented index tree construction method and system
US4823310A (en) Device for enabling concurrent access of indexed sequential data files
US7805427B1 (en) Integrated search engine devices that support multi-way search trees having multi-column nodes
CN112000846B (en) Method for grouping LSM tree indexes based on GPU
US20140337375A1 (en) Data search and storage with hash table-based data structures
CN110888886B (en) Index structure, construction method, key value storage system and request processing method
US8086641B1 (en) Integrated search engine devices that utilize SPM-linked bit maps to reduce handle memory duplication and methods of operating same
US20040019737A1 (en) Multiple-RAM CAM device and method therefor
Ramamohanarao et al. Recursive linear hashing
Challa et al. DD-Rtree: A dynamic distributed data structure for efficient data distribution among cluster nodes for spatial data mining algorithms
Li et al. A gpu accelerated update efficient index for knn queries in road networks
CN100479436C (en) Management and maintenance method for static multi-interface range matching table
US7987205B1 (en) Integrated search engine devices having pipelined node maintenance sub-engines therein that support database flush operations
CN113779154B (en) Construction method and application of distributed learning index model
CN112000845B (en) Hyperspatial hash indexing method based on GPU acceleration
Fu et al. GPR-Tree: a global parallel index structure for multiattribute declustering on cluster of workstations
Ghanem et al. Bulk operations for space-partitioning trees
US7953721B1 (en) Integrated search engine devices that support database key dumping and methods of operating same
Liu et al. Pea hash: a performant extendible adaptive hashing index
CN111274456B (en) Data indexing method and data processing system based on NVM (non-volatile memory) main memory
CA2439243C (en) Organising data in a database
CN117131012B (en) Sustainable and extensible lightweight multi-version ordered key value storage system
CN111949439B (en) Database-based data file updating method and device
Huang et al. A Primer on Database Indexing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant