CN111966678A - Optimization method for effectively improving B + tree retrieval efficiency on GPU - Google Patents

Optimization method for effectively improving B + tree retrieval efficiency on GPU Download PDF

Info

Publication number
CN111966678A
CN111966678A CN202010640423.5A CN202010640423A CN111966678A CN 111966678 A CN111966678 A CN 111966678A CN 202010640423 A CN202010640423 A CN 202010640423A CN 111966678 A CN111966678 A CN 111966678A
Authority
CN
China
Prior art keywords
tree
gpu
node
query
array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010640423.5A
Other languages
Chinese (zh)
Inventor
张为华
蒋金虎
宋昶衡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202010640423.5A priority Critical patent/CN111966678A/en
Publication of CN111966678A publication Critical patent/CN111966678A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24549Run-time optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/24569Query processing with adaptation to specific hardware, e.g. adapted for using GPUs or SSDs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of heterogeneous computing, and particularly relates to an optimization method for effectively improving B + tree retrieval efficiency on a GPU. The invention comprises the following steps: designing a new B + tree data structure; designing a search method for improving the query efficiency of the B + tree data structure; the new B + tree data structure is a partition of the traditional B + tree into two parts: key areas and child node areas, and replacing pointer information of child nodes with larger volumes in the B + tree with prefixes and arrays with smaller volumes; the optimized search method comprises a search method based on sorting and a search method based on reducing the size of a thread group. The method can effectively solve the problem that the memory hierarchical structures of the B + tree program and the GPU are not matched, reduces the divergence of memory access and execution on the GPU, and improves the utilization rate of resources of the B + tree retrieval program on the GPU. The experimental result shows that the query throughput rate in the HyperSpace system reaches about 35 hundred million times per second, and is improved by nearly 3.4 times compared with the recent research result.

Description

Optimization method for effectively improving B + tree retrieval efficiency on GPU
Technical Field
The invention belongs to the technical field of heterogeneous computing, and particularly relates to an optimization method for effectively improving B + tree retrieval efficiency on a GPU (graphics processing Unit), which can be used for an index structure in a big data system.
Background
In recent years, with the rapid development of internet technology and the revolutionary progress of information technology, the world has entered a big data era. The amount of data generated annually worldwide has seen explosive growth. The amount of data produced globally in 2020 will reach about 40ZB (about 8000 times 2003). Under the background, applications for analyzing and predicting by using big data become more and more popular, and play more and more important roles in the life of people, for example, many applications use big data processing technology to provide personalized recommendation service for users, so that the users can conveniently perform daily activities such as shopping and reading. However, while these applications bring many conveniences to people, they are also continuously facing the challenges of data volume explosion, for example, statistics of Netflix 2018 show that an average daily Netflix stream processing system needs to upload 12 PB data to an AWS data center and process the data.
In the face of the explosion of exponential data size, how to efficiently retrieve and process big data in applications has become an important issue of general interest in the industry. The current main solution is to use an index structure to efficiently organize data to improve retrieval and processing efficiency, such as using a hash data structure to index a database table in SQLServer, using a B + tree to cluster and index data in MySQL database, and the like. In the index structure, the B + index tree is the most important and most widely applied one, and not only can provide data access operations such as query (Search), Update (Update) and the like on time complexity, but also can orderly organize the underlying data. As early as the 70 s in the 20 th century, people applied B + trees to various file systems based on their characteristics to reduce the overhead of disk access. Nowadays, with the advent of more application scenarios, B + tree structures are increasingly used in a variety of data systems, such as key-value storage systems, databases, and online Analytical Processing (OLAP) systems. Therefore, as the amount of data increases dramatically, the performance of the B + tree index data structure will directly impact the performance efficiency of these systems.
With the advent of the "big data" era, the classic B + tree index data structure also faces a number of difficulties and challenges. First, the increase in data volume causes the volume of data structures associated with data storage to expand dramatically, which puts higher demands on the design of data structures, such as the current requirement in the search engine system of Google to index hundreds of millions of web pages worldwide, i.e., the volume of web page data exceeding 1 billion GB in size. Secondly, in the big data era, the number of requests for accessing data increases, and the demand of the system for concurrent processing of requests also increases, for example, in the "double eleven cat activity of aribaca 2017, the ariloc cloud system needs to process millions of transaction operations per second. Therefore, in the era when the amount of data stored and the number of concurrent requests are exponentially increasing, improving the retrieval efficiency of the B + tree data structure to cope with the challenge of the "big data" era becomes one of the key issues in the system field.
Currently, with the rapid development of hardware technology, more and more new types of hardware are gradually appearing and widely used in various fields. Among them, as early as 90 s in the 20 th century, image processing units (GPUs) were applied to the specific field of graphic image rendering due to their rich computing resources. With the rapid development of GPU architectures and programming platforms in the last 10 years, GPUs have also gradually become one of the most commonly used multi-core processors in addition to CPUs. The method can not only process the problems in the aspect of graphic images, but also provide a generalized solution for the high-parallel computing problems in other fields. Since the GPU has rich computing and memory resources, and the concurrent B + tree concurrent lookup has high parallelism and a large amount of memory access requirements, it becomes a potential feasible solution to accelerate the processing speed of the conventional B + tree structure based on the GPU. In recent years, much research work has been focused on this direction and the results of the research have been published in succession. However, even though the GPU provides more abundant hardware resources compared to the CPU, the existing research results do not achieve the ideal processing effect.
Aiming at the problem that the conventional GPU is not ideal in B + tree search acceleration efficiency, the original B + tree is difficult to obtain an ideal accelerating effect on the GPU, and the main reason is that the design and optimization of the traditional B + tree structure are specific to a CPU (Central processing Unit) architecture. Therefore, in the face of a GPU architecture and its programming model that is distinct from the CPU, the traditional B + tree structure is difficult to adapt to this change, which creates a number of problems: first, for a conventional B + tree, a query request needs to traverse layer by layer from a Root Node (Root Node) to a leaf Node (Left Node), and such a traversal process may result in a large number of inefficient indirect address access operations. These indirect address access operations usually require frequent reading of Global Memory (Global Memory), which is the Memory with the highest transmission delay on the GPU, and therefore this has a large impact on the efficiency of program access and execution; secondly, the target value of multiple query requests executed concurrently is usually random, so that there will be great diversity in the search paths of concurrent requests on the B + tree, and it is difficult to share the same tree search path between multiple adjacent queries. This disparity can lead to various ramifications (divergences) when they are processed simultaneously by one thread bundle (Warp) on the GPU: a thread bundle Divergence (Warp Divergence) and a Memory Divergence (Memory Divergence). The thread bundle divergence affects the computational execution efficiency of the GPU program, while the memory access divergence causes the GPU to have difficulty in merging memory transfers, thereby affecting the data transfer throughput of the global memory on the GPU.
Aiming at the analysis results of various mismatching problems of the current B + tree structure and the GPU, the invention provides a novel B + tree data structure and an optimized search scheme, and the efficiency of B + tree retrieval on the GPU can be effectively improved through the scheme. Disclosure of Invention
The invention aims to provide an optimization method capable of effectively improving the B + tree retrieval efficiency on a GPU, so as to solve the problem that the current B + tree structure is not matched with the memory hierarchy of the GPU.
The optimization method for effectively improving the B + tree retrieval efficiency on the GPU comprises the following steps: firstly, designing a new B + tree data structure; then, an optimized search method for improving the query efficiency of the B + tree data structure is further designed on the basis.
The invention firstly designs a new B + tree data structure which is called a HyperSpace tree structure. This structure divides the traditional B + tree into two parts: key Region (Key Region) and Child node Region (Child Region), and replacing Child node pointer information (Child references) with a smaller prefix sum array (prefix surmar) in the B + tree. Such a structure provides the possibility of accessing other small-sized low-latency memory cache tree structures on the GPU, and the positions of the child nodes can be obtained through simple calculation.
A specific HyperSpace tree structure is shown in FIG. 1 (b). The key regions are organized in a one-dimensional array that stores key information for all tree nodes of a conventional B + tree. The tree node key information is stored in the key area array sequentially from left to right in the order of tree breadth first traversal. As can be seen from fig. 1 (a, B), the first element (index number 0) of the key region stores the key value information of the Root Node in the original tree, the second element stores all the key value information of the first Node in the second layer, and so on. Each element in the array has a fixed size:
Figure DEST_PATH_IMAGE001
i.e., the size of the key value portion in the conventional B + tree node. The length of the array is equal to the number of all tree nodes in the conventional B + tree structure.
In the HyperSpace tree structure, the concept of Prefix sum (Prefix Sums) is introduced in the design of the child node area. Prefix sum is a relatively widely applied data processing mode in the field of computers, and is widely applied to base ordering (Radix Sort) and High Order recursion (High Order Recurrences), for example. Each item element in the prefix and array represents the result of the accumulate operation of all preceding items, including the item element itself. The operation of constructing prefix and array requires inputting a binary associative operator ^ and a one-dimensional array
Figure 115414DEST_PATH_IMAGE002
Processed and output as a one-dimensional numberGroup of
Figure 100002_DEST_PATH_IMAGE003
If the binary operation operator is set to an add operation, it is the most common Prefix-sum Array (Prefix-sum Array).
In the child node region, each element in the prefix and the array corresponds to the key value array element in the key region one to one, i.e., the information of the child nodes of the B + tree is also stored continuously in a tree breadth-first traversal order consistent with the key region. Each element in the array represents the cumulative sum of the number of children of all tree nodes preceding the current node during the traversal of the hierarchy (this is consistent with the concept of prefixes and arrays). And each element in the array also has a specific physical meaning, i.e., the index position of the first child node of the corresponding tree node in the key region. As can be seen from fig. 1 (B), the prefix sum array generated by the conventional B + tree of fig. 1 (a) is [1, 4, 6, 7, 9 … ]. It represents the meaning that the first child of Node 0 (Root Node) is located at the position with index 1 of the key region array, the first child of Node 1 (the first Node in the second level) is located at the position with key region sequence number 4 (i.e. 1+ 3), the first child of Node 2 (the second Node in the second level) is located at the position 6 (i.e. 1+3+ 2), and so on. According to the prefix and the array, not only can the positions of all child nodes of the node be efficiently obtained through calculation, but also the number of children contained in each node can be obtained by subtracting the value of the element corresponding to the node from the value of the subsequent element.
Since the child node information area is organized by using prefixes and arrays, the size of the whole B + tree structure is reduced by about 50% compared with the traditional B + tree. This also enables the HyperSpace tree structure to fully utilize the limited memory on the GPU chip for efficient caching, replacing high latency global memory accesses with low latency cache accesses. In addition, the design compresses the child node information of the tree node into an integer value, so that query requests for accessing the same tree node all access the same memory location, and therefore the B + tree query obtains better locality.
Based on the designed HyperSpace tree structure, the invention further designs an optimized search method for improving the query efficiency of the data structure of the B + tree, which comprises a search method based on sorting and a search method based on reducing the size of a thread group; moreover, batch update processing operation of the HyperSpace tree structure is executed at the CPU end, and two optimized search methods and concurrent query operation of the HyperSpace tree are executed at the GPU end; the tree structure synchronization between the GPU end and the CPU end is realized through PCIe memory transmission after the batch updating operation of the CPU end is completed each time; wherein:
the search method based on the ranking ranks all the query requests based on the target key before the query requests start to execute. The sorted adjacent queries can obtain similar search paths with a high probability, so that unnecessary memory access divergence and query divergence in the tree traversal process can be reduced. The memory access result is shown in fig. 2, for example, given 4 queries such as 1, 20, 2, 35, the queries are first sorted, and the adjacent queries are combined into two groups, one group is 1, 2, and the other group is 20, 35.
The searching method based on the thread group size reduces the number of threads required by one query request, can effectively reduce unnecessary comparison and improve the utilization rate of computing resources. As shown in fig. 3, for example, given two searches 2 and 6, the original thread group size is 8, then the comparison needs to be made across 8 threads. The searching method can reduce the size of the thread group to 4, and can effectively reduce the comparison times among threads. The search optimization based on the reduced thread group size uses a smaller number of threads to service a query request than the traditional GPU optimized B + tree scheme, so that invalid comparison times can be avoided, and meanwhile, the utilization rate of computing resources on the GPU is improved. When HyperSpace uses less number of threads to service a request, more query requests can be processed by a thread bundle at the same time, thus improving the query parallelism.
The technical effects are as follows:
the invention mainly takes the prior heterogeneous B + Tree accelerated research result HB + Tree as a comparison object of technical effect and carries out detailed test on HyperSpace based on NVIDIA TITAN V experimental platform.
In the actual comparison of technical effects, the HyperSpace can reach the query throughput rate of 35 hundred million times per second, which is improved by nearly 3.4 times compared with the existing research result (HB + Tree). The HyperSpace B + Tree structure improves the query performance by nearly 1.4 times compared with the HB + Tree, which is mainly attributed to that the HyperSpace B + Tree brings better query execution locality and saves 50% of volume, and the structure provides possibility for better utilizing the multi-level on-chip high-efficiency cache on the GPU. Secondly, the query performance of the HyperSpace B + Tree structure plus the preprocessing operation optimization (HyperSpace Tree + PSA) based on sorting is improved by 2 times compared with that of the HB + Tree, and the performance improvement is mainly attributed to the fact that query sorting is carried out in advance, so that query requests processed in the same thread bundle can access similar Tree traversal paths, and the thread bundle divergence problem and the memory access divergence problem which affect the execution efficiency of the GPU are effectively reduced. Finally, by adding the preprocessing operation (PSA) based on the ordering and the search optimization operation (NTG) based on the thread group to the HyperSpace B + Tree structure, the query efficiency is improved by about 3.4 times compared with that of the HB + Tree, and the improvement of the step is mainly attributed to that the unnecessary comparison operation is reduced by the search optimization operation based on the thread group, the calculation resource of the existing GPU is fully utilized, and the concurrency of B + Tree query on the GPU is maximized.
In addition, based on the HyperSpace query performance results under different configuration conditions, the HyperSpace design has good expansibility, and an ideal query effect can be obtained under different configurations. In a word, the HyperSpace design can effectively improve the retrieval efficiency of the system.
Drawings
FIG. 1 shows a conventional B + tree structure and a HyperSpace structure. Wherein, (a) is a traditional B + tree node structure, and (B) is a HyperSpace tree structure.
FIG. 2 is an access pattern in a memory access for a partially ordered query request.
FIG. 3 is a search optimization scheme based on thread group size.
Fig. 4 is an overall overview.
FIG. 5 is a HyperSpace architecture implementation.
Detailed Description
In a heterogeneous system, a CPU and a GPU are usually executed in stages, generally, a CPU executes a portion with a complex control logic, and data on a GPU needs to be uploaded in a staged manner and then a batch of highly parallel operations are executed.
The specific process for implementing the method in the heterogeneous system is shown in fig. 4, a batch update processing operation of the HyperSpace tree structure is executed at the CPU end, and two optimized search methods and a concurrent query operation of the HyperSpace tree are executed at the GPU end. The tree structure synchronization between the GPU side and the CPU side is realized through PCIe memory transmission after the batch update operation of the CPU side is completed each time.
Tree structure
The HyperSpace tree structure divides the traditional B + tree structure into two parts: a key area and a child node area, and replaces the conventional child node reference information area with a prefix and an array. So as shown in fig. 5, the index and Key-value parts of the dual-level B + tree nodes are used in the HyperSpace system, which are stored together as array entries of the HyperSpace B + tree structure Key Region (Key Region), while the child Region structure still uses a single integer value to store information of child nodes. In this way, better memory access performance can be achieved on the GPU for both key region access and child region access. Wherein the double-layer B + tree nodes of the HyperSpace key region are realized by adopting double-layer index B + tree nodes based on the cache line size (64 bytes). The key region element is shown as this structure in fig. 5, which is based on a B + tree with a fan-out of 64 in the dual-level implementation, so that the tree node has 63 key values (keys) and 64 child node references (child). To ensure cache line alignment, a null key is finally complemented in the key value region, namely in the graph
Figure 550331DEST_PATH_IMAGE004
. In addition, in the implementation, an index layer is simultaneously superimposed on the traditional B + tree node structure, and the index layer is generated based on the key values of the lower layer. Each index key representsMaximum value in the corresponding key value region (8 consecutive key values constitute one region), i.e.
Figure DEST_PATH_IMAGE005
An empty index is finally made up in the index area, again for cache line alignment. When the node is queried, the index area is firstly traversed to determine which part of the key value area the target key value is located in, and then the target key is queried in the corresponding area. Although the implementation causes redundancy of key values and the volume related to the keys of the B + tree is increased, the utilization rate of the cache can be effectively ensured, and unnecessary cache line replacement is reduced.
Rank-based search optimization
In a rank-based search strategy, query requests are fully ranked before being actually executed by the GPU to increase the probability of sharing a traversal path between the requests. However, full sorting brings a non-negligible possible time overhead, and therefore, in order to increase the benefit of query performance more than the time overhead brought by sorting, a partial sorting strategy is introduced. In a partial ordering strategy, a partial bit-based ordering algorithm is used (unlike binary tree-based bit ordering algorithms, since complete exact ordering is not required, where ordering is for searching, and may be rough ordering, to improve search efficiency), i.e., only the highest required for each query requestNBit ordering can achieve a query effect close to full ordering, but the ordering time overhead can be effectively reduced. In order to be able to determine the number of ranking bits in the partial ranking algorithm, a model is required to assist the selection of this parameter. The appropriate number of sort bits will then be calculated based on the model and the configuration of the actual HyperSpace B + tree.
For the HyperSpace system, each integer key value can be represented by 64 bits (B=64) Each cache line of the GPU has a size of 128 bytes and can cache 16 keys (bK=16) Thus, for a tree size of 223For B + tree of (a) ((B))T=2 23 ) It can be calculated based on the model presented below that only 19 bits (are to be ordered:)N=19) That is, the partial order of the query request can be achievedThe ordering overhead is not too large.
Figure DEST_PATH_IMAGE007
Search optimization based on reducing thread group size
In the implementation of a search strategy based on reducing the size of a thread group, an optimization scheme is adopted for reducing the number of threads for serving a query request. Such an implementation is beneficial to reduce unnecessary comparisons and increase usage of GPU computing resources.
In this method implementation, the thread group size for each query service is set to 1, and each thread group processes a maximum of N queries simultaneously (N being the maximum parallelism of thread bundle processing on the GPU). Specifically, for example, the number of execution steps of 1000 queries with different thread group sizes is collected on the CPU side: (S). And, the currently most suitable thread group size is 1 thread. Although the fanout of the current B + tree is 64, due to the existence of the index layer, the fanout of the actual B + tree can be considered to be 8, and the result is consistent with the query execution efficiency of the B + tree with the fanout of 8 under different thread group sizes verified by our experiments. Therefore, in the HyperSpace implementation, 1 thread is used to serve each query, that is, each thread bundle simultaneously processes 32 queries, and the configuration also simultaneously achieves the maximum parallelism of the thread bundle processing on the GPU.

Claims (4)

1. An optimization method for effectively improving B + tree retrieval efficiency on a GPU is characterized by comprising the following steps: firstly, designing a new B + tree data structure; then, an optimized search method for improving the query efficiency of the B + tree data structure is further designed;
the new B + tree data structure is designed by dividing the traditional B + tree into two parts: key areas and child node areas, and replacing pointer information of child nodes with larger volumes in the B + tree with prefixes and arrays with smaller volumes; the structure can provide possibility for accessing other low-delay memory cache tree structures with smaller volumes on the GPU, and the positions of child nodes can be obtained through simple calculation; the new B + tree data structure is called a HyperSpace tree structure;
the key area is organized in a one-dimensional array mode and stores key information of all tree nodes of a traditional B + tree; the tree node key information is sequentially and continuously stored in the key area array from left to right in the tree breadth-first traversal order; each element has a corresponding relation with a node of a traditional B + tree, a first element (namely index sequence number is 0) of a key area stores key value information of a root node in an original tree, a second element stores all key value information of a first node of a second layer, and the like; each element in the array has a fixed size, namely the size of a key value part in a traditional B + tree node; the length of the array is equal to the number of all tree nodes in the traditional B + tree structure;
the child node area is introduced with the concept of prefix sum; prefix sum refers to the result of an accumulation operation where each item element in the array represents all preceding items (including the item element itself); the operation of constructing the prefix and the array needs to introduce a binary associative operator ^ and a one-dimensional array
Figure 946357DEST_PATH_IMAGE001
Processed and output as a one-dimensional array
Figure 706503DEST_PATH_IMAGE002
In the child node area, each element in the prefix and the array corresponds to the key value array element in the key area one by one, namely the information of the child node of the B + tree is continuously stored according to the tree breadth-first traversal sequence consistent with the key area; each element in the array represents the accumulated sum of the number of children of all tree nodes before the current node in the process of traversing the layers; each element in the array also has a specific physical meaning, namely the index position of the first child node of the corresponding tree node in the key area;
according to the prefix and the array, the positions of all child nodes of the node can be efficiently obtained through calculation, and the number of children contained in each node can be obtained by subtracting the value of the element corresponding to the node from the value of the subsequent element.
2. The optimization method according to claim 1, wherein the optimization search method for improving the query efficiency of the data structure of the B + tree comprises a search method based on sorting and a search method based on reducing the size of a thread group; moreover, batch update processing operation of the HyperSpace tree structure is executed at the CPU end, and two optimized search methods and concurrent query operation of the HyperSpace tree are executed at the GPU end; the tree structure synchronization between the GPU end and the CPU end is realized through PCIe memory transmission after the batch updating operation of the CPU end is completed each time; wherein:
the search method based on the ranking ranks all the query requests based on the target key before the query requests start to be executed; the sorted adjacent queries can obtain similar search paths with a high probability, so that unnecessary memory access branches and query branches in the tree traversal process can be reduced;
the searching method based on the thread group size reduces the number of threads required by one query request so as to effectively reduce unnecessary comparison and improve the utilization rate of computing resources.
3. The optimization method according to claim 2, wherein in the ranking-based search method, a partial ranking strategy is adopted, i.e. a bit-based ranking algorithm is used, and only the highest ranking for each query request is requiredNBit sorting, namely, the same query effect as the complete sorting can be obtained; in order to determine the ranking bits in the partial ranking algorithm, a model is used to assist the selection of the parameters, and the specific calculation formula is as follows:
Figure DEST_PATH_IMAGE003
wherein B is each integer key value, K is each cache line size of the GPU,Tis the size of the B + tree.
4. The optimization method of claim 2, wherein in the thread group size based search method, the thread group size of each query service is set to 1, and each thread group processes a maximum of N queries simultaneously, where N is the maximum parallelism of thread bundle processing on the GPU.
CN202010640423.5A 2020-07-06 2020-07-06 Optimization method for effectively improving B + tree retrieval efficiency on GPU Pending CN111966678A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010640423.5A CN111966678A (en) 2020-07-06 2020-07-06 Optimization method for effectively improving B + tree retrieval efficiency on GPU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010640423.5A CN111966678A (en) 2020-07-06 2020-07-06 Optimization method for effectively improving B + tree retrieval efficiency on GPU

Publications (1)

Publication Number Publication Date
CN111966678A true CN111966678A (en) 2020-11-20

Family

ID=73361586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010640423.5A Pending CN111966678A (en) 2020-07-06 2020-07-06 Optimization method for effectively improving B + tree retrieval efficiency on GPU

Country Status (1)

Country Link
CN (1) CN111966678A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112631631A (en) * 2020-12-29 2021-04-09 中国科学院计算机网络信息中心 Update sequence maintenance method for GPU accelerated multi-step prefix tree
CN112905598A (en) * 2021-03-15 2021-06-04 上海交通大学 Interface-based graph task intermediate result storage method and system for realizing separation
CN113204559A (en) * 2021-05-25 2021-08-03 东北大学 Multi-dimensional KD tree optimization method on GPU
WO2023038687A1 (en) * 2021-09-08 2023-03-16 Intel Corporation In-network parallel prefix scan

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109446293A (en) * 2018-11-13 2019-03-08 嘉兴学院 A kind of parallel higher-dimension nearest Neighbor
CN110888886A (en) * 2019-11-29 2020-03-17 华中科技大学 Index structure, construction method, key value storage system and request processing method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109446293A (en) * 2018-11-13 2019-03-08 嘉兴学院 A kind of parallel higher-dimension nearest Neighbor
CN110888886A (en) * 2019-11-29 2020-03-17 华中科技大学 Index structure, construction method, key value storage system and request processing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WEIHUA ZHANG: "A High Throughput B+tree for SIMD Architectures", 《IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112631631A (en) * 2020-12-29 2021-04-09 中国科学院计算机网络信息中心 Update sequence maintenance method for GPU accelerated multi-step prefix tree
CN112631631B (en) * 2020-12-29 2021-11-16 中国科学院计算机网络信息中心 Update sequence maintenance method for GPU accelerated multi-step prefix tree
CN112905598A (en) * 2021-03-15 2021-06-04 上海交通大学 Interface-based graph task intermediate result storage method and system for realizing separation
CN112905598B (en) * 2021-03-15 2022-06-28 上海交通大学 Interface-based graph task intermediate result storage method and system for realizing separation
CN113204559A (en) * 2021-05-25 2021-08-03 东北大学 Multi-dimensional KD tree optimization method on GPU
CN113204559B (en) * 2021-05-25 2023-07-28 东北大学 Multidimensional KD tree optimization method on GPU
WO2023038687A1 (en) * 2021-09-08 2023-03-16 Intel Corporation In-network parallel prefix scan

Similar Documents

Publication Publication Date Title
CN111966678A (en) Optimization method for effectively improving B + tree retrieval efficiency on GPU
Doulkeridis et al. A survey of large-scale analytical query processing in MapReduce
Li et al. Hippogriffdb: Balancing i/o and gpu bandwidth in big data analytics
Govindaraju et al. Fast and approximate stream mining of quantiles and frequencies using graphics processors
Leis et al. The adaptive radix tree: ARTful indexing for main-memory databases
Awad et al. Engineering a high-performance GPU B-Tree
CN112000846B (en) Method for grouping LSM tree indexes based on GPU
CN110134714B (en) Distributed computing framework cache index method suitable for big data iterative computation
Peng et al. Paris+: Data series indexing on multi-core architectures
WO2016209975A2 (en) Preliminary ranker for scoring matching documents
US11138234B2 (en) System and a method for executing SQL-like queries with add-on accelerators
US20150213074A1 (en) Method for pre-processing and processing query operation on multiple data chunk on vector enabled architecture
EP3314464A2 (en) Storage and retrieval of data from a bit vector search index
WO2016107497A1 (en) Method and apparatus for scalable sorting of data set
WO2016209964A1 (en) Bit vector search index using shards
Sfakianakis et al. Interval indexing and querying on key-value cloud stores
EP3314465A1 (en) Match fix-up to remove matching documents
Xia et al. IPFP: an improved parallel FP-growth algorithm for frequent itemsets mining
Arkhipov et al. Sorting with gpus: A survey
Volk et al. GPU-Based Speculative Query Processing for Database Operations.
Yu et al. Parallel range query processing on r-tree with graphics processing unit
CN112000845B (en) Hyperspatial hash indexing method based on GPU acceleration
US20230385258A1 (en) Dynamic random access memory-based content-addressable memory (dram-cam) architecture for exact pattern matching
Wang et al. Fast k-nn graph construction by gpu based nn-descent
Morishima et al. Performance evaluations of document-oriented databases using GPU and cache structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201120