CN118113777A - Radix tree query method applied to space wing cloud distributed block storage - Google Patents

Radix tree query method applied to space wing cloud distributed block storage Download PDF

Info

Publication number
CN118113777A
CN118113777A CN202311702322.6A CN202311702322A CN118113777A CN 118113777 A CN118113777 A CN 118113777A CN 202311702322 A CN202311702322 A CN 202311702322A CN 118113777 A CN118113777 A CN 118113777A
Authority
CN
China
Prior art keywords
data structure
node data
node
nodes
radix tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311702322.6A
Other languages
Chinese (zh)
Inventor
李家伟
李贵斌
薛强
刘远航
慕文洋
任俊霖
刘秉润
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Cloud Technology Co Ltd
Original Assignee
Tianyi Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Cloud Technology Co Ltd filed Critical Tianyi Cloud Technology Co Ltd
Priority to CN202311702322.6A priority Critical patent/CN118113777A/en
Publication of CN118113777A publication Critical patent/CN118113777A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of distributed storage and data query, in particular to a radix tree query method applied to space wing cloud distributed block storage, which comprises the following steps: taking an x-bit substring of an index key of the radix tree as a sub-index length x of a retrieval sub-node; presetting the node data structure types of the base tree query algorithm with different capacities, and selecting a target node data structure according to the number of non-null sub-nodes; and inquiring the data structure type of the target node through the sub-index length x. According to the invention, firstly, through optimizing the node data structure of the base tree query algorithm and the length selection of the sub-index strings, the problem of memory space occupation is solved by redesigning the node data structure of the base tree, and range query is supported, and through selecting the length of the proper sub-index strings, the space utilization rate is optimized, and the query efficiency is ensured under the condition of reasonably using the memory space.

Description

Radix tree query method applied to space wing cloud distributed block storage
Technical Field
The invention relates to the technical field of distributed storage and data query, in particular to a radix tree query method applied to space wing cloud distributed block storage.
Background
With the rise of cloud computing technology, distributed storage systems are widely applied with the advantages of high performance, large scale, strong expandability and the like. As the storage of the angel wing cloud mass rapidly progresses, its size is also increasing, and metadata is also exponentially increasing. In order to improve the performance of block storage, the angel wing cloud mass storage takes the form of metadata and data separation, so that the efficiency of metadata retrieval can be improved.
Because the distributed block storage of the space wing cloud adopts the data structure management of B+ Tree for metadata, although the B+ Tree can provide higher retrieval efficiency when the quantity of the metadata is small, the scheme for managing the metadata by the B+ Tree also has a plurality of problems along with the increase of the metadata and the memory residence problem of the metadata.
The retrieval efficiency of b+ Tree is related to the amount of metadata. Since the height of the B+ Tree depends on the number of elements, when metadata is large, the Tree height is large, so that the retrieval speed is low, and the occupation of memory space resources of the B+ Tree is large. Metadata stored by the antenna cloud mass is resident in the memory, and when metadata is increased, the memory space is occupied greatly, so that b+ Tree cannot well adapt to the storage requirement of the antenna cloud mass. Meanwhile, when metadata data items are more, the B+Tree often needs to reorder metadata in the searching process, and the searching efficiency is affected.
Therefore, how to implement fast query for space wing cloud distributed block storage metadata without rearranging the metadata is a new trend of technical development.
Disclosure of Invention
In view of this, the invention provides a radix tree query method applied to space-wing cloud distributed block storage, which mainly aims to solve the problem of how to realize rapid query of space-wing cloud distributed block storage metadata under the condition that metadata are not required to be rearranged.
The invention provides a radix tree query method applied to space wing cloud distributed block storage, which comprises the following steps:
Taking an x-bit substring of an index key of the radix tree as a sub-index length x of a retrieval sub-node;
presetting the node data structure types of the base tree query algorithm with different capacities, and selecting a target node data structure according to the number of non-null sub-nodes;
And inquiring the data structure type of the target node through the sub-index length x.
In some embodiments of the present application, the preset base tree query algorithm node data structure with different capacities includes:
Internal nodes and leaf nodes;
The internal node includes: a key portion and a pointer portion;
And connecting the leaf nodes through pointers.
In some embodiments of the present application, presetting the node data structure types of the radix tree query algorithm with different capacities includes:
node0 Node data structure, node1 Node data structure, node2 Node data structure, node3 Node data structure.
In some embodiments of the present application, the Node0 Node data structure comprises:
Four sequentially arranged index keys and four sequentially arranged child node pointers, wherein the index keys and the child node pointers are in one-to-one correspondence.
In some embodiments of the present application, the Node1 Node data structure comprises:
sixteen sequentially arranged index keys and sixteen sequentially arranged child node pointers, wherein the index keys and the child node pointers are in one-to-one correspondence.
In some embodiments of the application, the Node2 Node data structure comprises:
twenty-five sixteen sequentially arranged index keys and 48 sequentially arranged child node pointers.
In some embodiments of the present application, the Node3 Node data structure is:
twenty-five sixteen child node pointer arrays in a sequential arrangement.
In some embodiments of the present application, when selecting the target node data structure according to the number of non-empty child nodes, it comprises:
Acquiring a data updating operation type;
wherein, the data updating operation type includes: inserting key value operation and deleting key value operation;
acquiring the number of non-empty child nodes in each node data structure;
And determining the target node data structure according to the data updating operation type and the number of non-empty sub-nodes in each node data structure.
In some embodiments of the present application, when determining the target node data structure according to the data update operation type and the number of non-empty child nodes in each of the node data structures, the method includes:
when the data updating operation type is the inserted key value operation, the node data structure exceeds the capacity of the node data structure after the inserted key value operation is satisfied, and the node data structure type with the minimum number of non-empty sub-nodes of the node data structure after the inserted key value operation is the target node data structure.
In some embodiments of the present application, when determining the target node data structure according to the data update operation type and the number of non-empty child nodes in each of the node data structures, the method further includes:
When the data updating operation type is the delete key value operation, the node data structure exceeds the capacity of the node data structure after the delete key value operation is satisfied, and the node data structure type with the minimum number of non-empty sub-nodes after the delete key value operation is performed is the target node data structure.
Compared with the prior art, the invention has the following beneficial effects: according to the invention, firstly, through optimizing the node data structure of the base tree query algorithm and the length selection of the sub-index strings, the problem of memory space occupation is solved by redesigning the node data structure of the base tree, and range query is supported, and through selecting the length of the proper sub-index strings, the space utilization rate is optimized, and the query efficiency is ensured under the condition of reasonably using the memory space.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. In the drawings:
FIG. 1 is a flowchart of a radix tree query method applied to space wing cloud distributed block storage, provided by an embodiment of the invention;
fig. 2 is a schematic view of a leaf node structure according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a Node0 Node data structure according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a Node1 Node data structure according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a Node2 Node data structure according to an embodiment of the present invention;
Fig. 6 is a schematic structural diagram of a Node3 Node data structure according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.
Referring to fig. 1, the present embodiment provides a radix tree query method applied to space wing cloud distributed block storage, where the method includes:
S101: taking an x-bit substring of an index key of the radix tree as a sub-index length x of a retrieval sub-node;
S102: presetting the node data structure types of the base tree query algorithm with different capacities, and selecting a target node data structure according to the number of non-null sub-nodes;
S103: and inquiring the data structure type of the target node through the sub-index length x.
In particular, a radix tree is a data structure that is typically used to efficiently store and retrieve a large number of key-value pairs, where the keys are typically strings of characters. Each node of the radix tree contains one or more characters, each node represents one character from a root node, in the radix tree, an index key is split into substrings with the same length, each substring contains x characters, by taking one x-bit substring of the index key of the radix tree as a sub-index length x of a retrieval sub-node, namely the substrings are used for constructing the nodes of the radix tree, different types of data structures can be selected for storing information of the nodes by presetting the data structure types of the nodes of the radix tree query algorithm with different capacities, different data structures can be needed for different capacities of the radix tree for improving the performance, a target node data structure can be selected according to the number of non-empty sub-nodes, namely the data structure for storing the node information depends on the number of non-empty sub-nodes of each node, and different nodes can have different numbers of sub-nodes, so different data structures can be used for effectively storing the information, and suitable data structures can be selected according to the length x of the sub-indexes for storing and retrieving the information on different nodes of the radix tree.
Referring to fig. 2, in some embodiments of the present application, the preset different capacity radix tree query algorithm node data structure includes:
Internal nodes and leaf nodes;
The internal node includes: a key portion and a pointer portion;
And connecting the leaf nodes through pointers.
Specifically, in this embodiment, each leaf node is directly connected by a pointer, so that each node can support the traversal of the leaf node, thereby supporting the range query. The range query is implemented by traversing leaf nodes. Because the radix Tree naturally supports word order, when leaf nodes are connected through the B+ Tree, namely the leaf nodes store pointers of successive leaf nodes, all the leaf nodes are in a chain shape, and the chain is arranged in sequence, so that the leaf nodes to be searched can be quickly positioned through a binary search method, or the node condition in the range can be quickly searched through giving out the range node.
Referring to fig. 3-6, in some embodiments of the present application, presetting the node data structure types of the radix tree query algorithm with different capacities includes:
node0 Node data structure, node1 Node data structure, node2 Node data structure, node3 Node data structure.
In some embodiments of the present application, the Node0 Node data structure comprises:
Four sequentially arranged index keys and four sequentially arranged child node pointers, wherein the index keys and the child node pointers are in one-to-one correspondence.
Specifically, the Node0 Node data structure in this embodiment may store 0 to 4 child Node pointers.
In some embodiments of the present application, the Node1 Node data structure comprises:
sixteen sequentially arranged index keys and sixteen sequentially arranged child node pointers, wherein the index keys and the child node pointers are in one-to-one correspondence.
Specifically, the Node1 Node data structure in this embodiment may store 5 to 16 child Node pointers.
Specifically, key values are stored in key value arrays of the Node0 Node data structure and the Node1 Node data structure, whether the key values exist or not is checked by checking the key value arrays when searching, and then the key values are found in the pointer array according to the position offset of the key value arrays where the key values exist, and binary search can be used when searching.
In some embodiments of the application, the Node2 Node data structure comprises:
twenty-five sixteen sequentially arranged index keys and 48 sequentially arranged child node pointers.
Specifically, the Node2 Node data structure in this embodiment includes an array of 256 elements and a child Node pointer array of 48 child Node pointers, where the position of the element array is the value of the index key, and the offset of the child Node pointer array is stored. Since the length of the child node pointer array is 48, the child node pointer array can be represented by 8 bits, so that space is saved.
Specifically, when a Node has 17-48 child Node pointers, the key value data in the Node2 Node data structure will correspondingly store the offset of the corresponding child Node pointer data, and compared with 256 8bit child Node pointers, the indirect mode saves space, because the index only needs 8 bit.
In some embodiments of the present application, the Node3 Node data structure is:
twenty-five sixteen child node pointer arrays in a sequential arrangement.
Specifically, the Node3 Node data structure in this embodiment includes a 256 child Node pointer array, and can store nodes of 49 to 256 child nodes. The Node3 Node data structure may be directly indexed with a certain byte value of the index key, i.e. the value of a certain byte of the index key is the index offset of the pointer array.
Specifically, by setting the Node3 Node data structure, the embodiment can realize that the next Node can be found very effectively only by searching a certain byte of the key in the pointer array, no additional indirect path is needed, and if most pointers are not empty, the Node3 Node data structure also saves very much space because the Node3 Node data structure only needs to store the pointers.
Specifically, referring to table 1 and table 2, the space occupation of the node data structure type of the radix tree query algorithm provided in this embodiment is reduced by 55%, and the query efficiency is not reduced.
Table 1 original radix tree node space occupancy
Number of child nodes Space occupation
4 36
16 144
48 2048
256 4096
Table 2 preset base tree query algorithm node data structure type space occupation
Node type Space occupation
Node0 36
Node1 144
Node2 640
Node3 2048
For the storage of the angel cloud mass with a large amount of metadata, the internal nodes of the Node2 Node data structure and the Node3 Node data structure type are the majority, and the internal data nodes of the Node0 Node data structure and the Node1 Node data structure are the minority, so that the query efficiency of the optimized radix tree query algorithm can be approximated as the following formula:
f=o (m), n >2 (m/x), span range=8;
In the above formula, f is the search efficiency, m is the length of the index key, n represents the number of elements, x and span range represent the length of the substring, and here, 8 bits, i.e., a length of 1 byte, are used.
In some embodiments of the present application, when selecting the target node data structure according to the number of non-empty child nodes, it comprises:
Acquiring a data updating operation type;
wherein, the data updating operation type includes: inserting key value operation and deleting key value operation;
acquiring the number of non-empty child nodes in each node data structure;
And determining the target node data structure according to the data updating operation type and the number of non-empty sub-nodes in each node data structure.
In some embodiments of the present application, when determining the target node data structure according to the data update operation type and the number of non-empty child nodes in each of the node data structures, the method includes:
when the data updating operation type is the inserted key value operation, the node data structure exceeds the capacity of the node data structure after the inserted key value operation is satisfied, and the node data structure type with the minimum number of non-empty sub-nodes of the node data structure after the inserted key value operation is the target node data structure.
In some embodiments of the present application, when determining the target node data structure according to the data update operation type and the number of non-empty child nodes in each of the node data structures, the method further includes:
When the data updating operation type is the delete key value operation, the node data structure exceeds the capacity of the node data structure after the delete key value operation is satisfied, and the node data structure type with the minimum number of non-empty sub-nodes after the delete key value operation is performed is the target node data structure.
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims (10)

1. The radix tree query method applied to the space wing cloud distributed block storage is characterized by comprising the following steps of:
Taking an x-bit substring of an index key of the radix tree as a sub-index length x of a retrieval sub-node;
presetting the node data structure types of the base tree query algorithm with different capacities, and selecting a target node data structure according to the number of non-null sub-nodes;
And inquiring the data structure type of the target node through the sub-index length x.
2. The radix tree query method applied to space wing cloud distributed block storage according to claim 1, wherein the preset radix tree query algorithm node data structures with different capacities comprise:
Internal nodes and leaf nodes;
The internal node includes: a key portion and a pointer portion;
And connecting the leaf nodes through pointers.
3. The radix tree query method applied to space wing cloud distributed block storage according to claim 2, wherein preset radix tree query algorithm node data structure types with different capacities, comprising:
node0 Node data structure, node1 Node data structure, node2 Node data structure, node3 Node data structure.
4. The radix tree query method applied to space wing cloud distributed block storage of claim 3, wherein said Node0 Node data structure comprises:
Four sequentially arranged index keys and four sequentially arranged child node pointers, wherein the index keys and the child node pointers are in one-to-one correspondence.
5. The radix tree query method applied to space wing cloud distributed block storage as claimed in claim 4, wherein said Node1 Node data structure comprises:
sixteen sequentially arranged index keys and sixteen sequentially arranged child node pointers, wherein the index keys and the child node pointers are in one-to-one correspondence.
6. The radix tree query method applied to space wing cloud distributed block storage of claim 5, wherein said Node2 Node data structure comprises:
twenty-five sixteen sequentially arranged index keys and 48 sequentially arranged child node pointers.
7. The radix tree query method applied to space wing cloud distributed block storage according to claim 6, wherein the Node3 Node data structure is:
twenty-five sixteen child node pointer arrays in a sequential arrangement.
8. The radix tree query method applied to space wing cloud distributed block storage as claimed in claim 7, wherein when selecting the target node data structure according to the number of non-empty nodes, comprising:
Acquiring a data updating operation type;
wherein, the data updating operation type includes: inserting key value operation and deleting key value operation;
acquiring the number of non-empty child nodes in each node data structure;
And determining the target node data structure according to the data updating operation type and the number of non-empty sub-nodes in each node data structure.
9. The radix tree query method for use in space wing cloud distributed block storage as claimed in claim 8, wherein when determining said target node data structure based on said data update operation type and the number of non-empty child nodes in each of said node data structures, comprising:
when the data updating operation type is the inserted key value operation, the node data structure exceeds the capacity of the node data structure after the inserted key value operation is satisfied, and the node data structure type with the minimum number of non-empty sub-nodes of the node data structure after the inserted key value operation is the target node data structure.
10. The radix tree query method for use in space wing cloud distributed block storage as claimed in claim 9, wherein when determining said target node data structure based on said data update operation type and the number of non-empty child nodes in each of said node data structures, further comprising:
When the data updating operation type is the delete key value operation, the node data structure exceeds the capacity of the node data structure after the delete key value operation is satisfied, and the node data structure type with the minimum number of non-empty sub-nodes after the delete key value operation is performed is the target node data structure.
CN202311702322.6A 2023-12-12 2023-12-12 Radix tree query method applied to space wing cloud distributed block storage Pending CN118113777A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311702322.6A CN118113777A (en) 2023-12-12 2023-12-12 Radix tree query method applied to space wing cloud distributed block storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311702322.6A CN118113777A (en) 2023-12-12 2023-12-12 Radix tree query method applied to space wing cloud distributed block storage

Publications (1)

Publication Number Publication Date
CN118113777A true CN118113777A (en) 2024-05-31

Family

ID=91217985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311702322.6A Pending CN118113777A (en) 2023-12-12 2023-12-12 Radix tree query method applied to space wing cloud distributed block storage

Country Status (1)

Country Link
CN (1) CN118113777A (en)

Similar Documents

Publication Publication Date Title
CN110413611B (en) Data storage and query method and device
CN100418092C (en) Grid and T-tree index method for rapid positioning in main memory database
US5848416A (en) Method and apparatus for storing and retrieving data and a memory arrangement
US5263160A (en) Augmented doubly-linked list search and management method for a system having data stored in a list of data elements in memory
US4611272A (en) Key-accessed file organization
US5664184A (en) Method and apparatus for implementing Q-trees
CN100444167C (en) Method for managing and searching dictionary with perfect even numbers group TRIE Tree
JP3992495B2 (en) Functional memory based on tree structure
CN108197313B (en) Dictionary indexing method for realizing space optimization through 16-bit Trie tree
CN103051543A (en) Route prefix processing, lookup, adding and deleting method
KR20170065374A (en) Method for Hash collision detection that is based on the sorting unit of the bucket
Litwin et al. A new method for fast data searches with keys
CN115718819A (en) Index construction method, data reading method and index construction device
GB2196764A (en) Hierarchical file system
CN111339381A (en) Method and device for batch query of character strings of dictionary sequence partition double arrays
CN113468080B (en) Caching method, system and related device for full-flash metadata
CN111382323A (en) Data retrieval optimization method and device and computer equipment
CN117131012B (en) Sustainable and extensible lightweight multi-version ordered key value storage system
EP1107126A2 (en) A fast, efficient, adaptive, hybrid tree
CN110413724B (en) Data retrieval method and device
CN118113777A (en) Radix tree query method applied to space wing cloud distributed block storage
CN116701440A (en) Cuckoo filter and data insertion, query and deletion method
KR100859710B1 (en) The method for searching, saving, deleting data using of data structures, SKIP CLOUDS
CN114416741A (en) KV data writing and reading method and device based on multi-level index and storage medium
CN114398373A (en) File data storage and reading method and device applied to database storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination