CN116501760A - Efficient distributed metadata management method combining memory and prefix tree - Google Patents
Efficient distributed metadata management method combining memory and prefix tree Download PDFInfo
- Publication number
- CN116501760A CN116501760A CN202310349675.6A CN202310349675A CN116501760A CN 116501760 A CN116501760 A CN 116501760A CN 202310349675 A CN202310349675 A CN 202310349675A CN 116501760 A CN116501760 A CN 116501760A
- Authority
- CN
- China
- Prior art keywords
- metadata
- key
- prefix tree
- mds
- bloom filter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007726 management method Methods 0.000 title claims abstract description 20
- 238000000034 method Methods 0.000 claims abstract description 20
- 230000006835 compression Effects 0.000 claims description 6
- 238000007906 compression Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 abstract description 7
- 230000006870 function Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000000125 metastable de-excitation spectroscopy Methods 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a high-efficiency distributed metadata management method combining a memory and a prefix tree. The method relates to a management mode of metadata in a distributed system, and comprises the steps of mapping the metadata by using an improved consistency Hash algorithm and storing the metadata by using a prefix tree mode so as to reduce the memory space. The invention uses the binary search based on the memory form and the search and the addition, thereby greatly improving the performance of the search and the addition. The uniformity of metadata distribution is solved through the balance factors of the MDS nodes, so that metadata can be distributed on each MDS node as uniformly as possible, and the metadata management performance is improved. In addition, the invention stores the actual metadata index by using the efficient prefix tree, and directly acquires the metadata from the disk through the index, thereby utilizing a small amount of space to exchange efficiency.
Description
Technical Field
The invention relates to a management method of metadata in a distributed system, which comprises the steps of mapping the metadata by using an improved consistency Hash algorithm and storing the metadata in a prefix tree mode so as to reduce the memory space.
Background
Metadata is a special data used to describe data, such as in a file system, metadata is data describing file attributes, including file directory contents, file sizes, and file pointers, and includes locations from file names to recorded data.
With the development of the internet and the digital transformation of many aspects, a large number of applications generate massive data, such as picture data and system log data, which require a large amount of storage resources to store and manage the data. In order to better manage the data, the picture resources and the log data are compressed and stored in a file system, the files are organized by an operating system, and finally, resource metadata are stored in a structured database, and the database serves as a mapping tool of the files to the resources. Therefore, management of mass metadata becomes a great difficulty for a system using a file system as a storage medium.
The usual methods for distributed metadata management are: the static subtree partitioning method is suitable for scenes with more frequent Metadata searching, and when Metadata is dynamically increased, the load between distributed Metadata servers (Metadata servers) is unbalanced. According to the dynamic subtree partitioning method, the subtree dynamic alignment strategy greatly increases communication among MDSs, and the system performance has a certain influence. Hash mapping method: when the data features are similar, the phenomenon of data 'tilting' exists, and the problem of uneven load among MDSs is caused.
In order to cope with the access pressure of the single database to the massive data, a multi-database cluster mode is adopted to lighten the access of the single database, but the method reduces the access pressure of each database, but also leads to the redundancy of the data, and the influence of the massive data on the performance of the database is not solved at all. Therefore, in order to fundamentally solve the influence of mass data on the database, the data volume of a single database needs to be reduced to improve the performance, so that the data slicing storage is adopted, the mass data slicing storage is adopted in a database cluster, the data volume of an individual database is reduced, the data slicing aims at adopting slicing storage instead of slicing among metadata among different metadata, the pressure of the database is relieved, the metadata query efficiency is further improved, but the complexity of a system is greatly improved, the problems of distributed transaction caused by slicing data are solved, and the problems of the slicing strategy and the complex positioning after the data slicing are adopted.
Disclosure of Invention
According to the defects of the prior art, the invention provides a high-efficiency distributed metadata management method based on the combination of a memory and a prefix tree for massive metadata generated by applications, and provides a high-efficiency storage and query scheme for the massive metadata.
The invention comprises the following two stages:
the first stage: positioning a distributed metadata server MDS where metadata are located through improved consistency Hash; the improved consistency Hash comprises: the MDS object set, the Hash ring, and the metadata pointed to by each MDS identify a Bucket set Key socket.
The second stage: corresponding metadata is obtained from the located distributed metadata server MDS, all parts with the same metadata Key are shared through the prefix tree, and the address where the metadata is located is stored on the node of the corresponding prefix tree.
Compared with the prior art, the invention has the advantages that:
1. through the memory-based form, the operation efficiency is far higher than that of the traditional IO mode, and the search and the addition use binary search, so that the search and the addition performance are greatly improved.
2. The uniformity of metadata distribution is solved through the balance factors of the MDS nodes, so that metadata can be distributed on each MDS node as uniformly as possible, the problem of inclination caused by too high load of a certain MDS node is avoided, and the metadata management performance is improved.
3. The actual metadata index is stored by an efficient prefix tree, the metadata is directly obtained from the disk through the index, and a small amount of space is utilized to exchange efficiency.
Drawings
FIG. 1 is a schematic diagram of a compressed Bloom Filter;
FIG. 2 is a schematic diagram of a modified consistency Hash;
figure 3-prefix tree storage schematic.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific implementation steps:
the invention adopts the idea of two-level caching, and manages the metadata by dividing the management of the metadata into two stages:
the first stage: the MDS where the metadata is located through improved consistency Hash, the original consistency Hash adopts a single-ring structure, when a large amount of data has similar characteristics, a large amount of data can be mapped onto a Hash ring through a Hash function, the problem of uneven distribution occurs, the improved consistency Hash has good data uniformity and high-efficiency data query efficiency, and the improved consistency Hash comprises: MDS object set, hash ring (32 bit, data range (0-2) 31 -1)) and the metadata pointed to by each MDS identifies a Bucket set (Key Bucket). The MDS object has the IP address of the target MDS, the MDS serial number, the position of the Hash ring and the equalization factor attribute, as shown in figure 2.
The second stage: corresponding metadata is obtained from the located MDS. In order to improve efficiency and reduce memory occupation, the prefix tree is used for sharing the same part of all metadata keys (identification of the requested metadata), and the address where the metadata is located is stored on the node of the corresponding prefix tree.
Based on the conception, the invention adopts the following technical means:
and during initialization, the initial MDS is uniformly distributed on the Hash ring, then the Hash ring positions corresponding to the MDS are written into MDS objects, and all the MDS objects are loaded into one set to form an MDS set. The Key socket adopts an array structure, such as a Hash Code containing 8 bytes of metadata keys in fig. 2, a 1 byte check Code (whether the same Key is judged by the check Code when the Hash conflicts), a 1 byte MDS serial number (the MDS to which the metadata identification belongs is recorded), and the Key socket adopts a binary search mode to locate data.
As shown in fig. 1, compressing the Bloom filters groups the Bloom filters BFa and BFb and hashes the original data and inserts it into BFa and BFb. The compressed Bloom Filter is denoted by BFc and the insertion procedure of BFa and BFb is the same as conventional BF. The compressed Bloom Filter may be expressed as: assuming that the required Compressed Bloom Filter (CBF) length is m, first creating BFa of length m/2, filling BFa data, when the data fill exceeds a threshold (set according to the false positive rate requirement), creating BFb of length m/2, and filling BFb data. When BFa and BFb reach the threshold value under the respective misjudgment rate, BFa and BFb are combined into BFc through table rules, so that the purpose of memory compression is achieved.
Examples:
the embodiment comprises the following steps:
and (1) creating a consistent Hash server through an IO multiplexing technology, and receiving an external request. The operating system monitors the descriptors of the files by using the selection of NIO, the overall adopts a Reactor model to distribute events, a work thread receives requests and forwards the requests to the work thread, and the work thread uniformly processes the requests.
When the read request arrives, firstly, the metadata Key (identification of the requested metadata) is primarily judged through a compressed bloom filter (Compress Bloom Filter) of consistency Hash, for BFc with the length of m, the number of Hash mapping functions is k, the number of stored data is n, and the misjudgment rate of Compress Bloom Filter is as follows:
data that is not present is returned directly to the corresponding requestor and the connection is closed.
Preliminary estimation of 1000W metadata takes up approximately 1000W 10 b-95.37M, which is considerable. In the search process of Key socket, binary search is adopted, the search time complexity of single MDS is O (log), and the overall time complexity is O (Klog).
When Compress Bloom Filter determines that the metadata Key does not exist, the metadata is returned as it is, and if Compress Bloom Filter determines that the metadata Key exists, the metadata corresponding to the Key is preliminarily determined to exist, and the process proceeds to step (2).
The Compress Bloom Filter in this embodiment is used to efficiently screen out the non-existent metadata and provide a uniform hash basis, and occupies a very small amount of memory, thereby relieving the stress of the server.
When Compress Bloom Filter determines that the metadata Key may exist, the request searches the MDS where the metadata Key exists through the improved consistency Hash. In order to solve the problem of uneven distribution of metadata keys caused by the fact that the metadata keys with similar characteristic values pass through consistency Hash, the improved consistency Hash comprises two hashes, firstly, the metadata keys are mapped to one of N Hash functions through a first Hash function, and the first Hash function is as follows:
H i =a i *W+H i-1
wherein H is i Hash value, a, representing metadata Key i The Ascii value, W, representing a single character contained in the metadata Key represents a perturbation parameter.
And mapping the metadata Key to one of N Hash functions in Compress Bloom Filter through the Hash function, and mapping the metadata Key to a position corresponding to a Hash ring through the selected Hash function.
The problem of uneven distribution of the consistent Hash caused by the metadata Key with similar characteristic values is effectively solved by the method of the twice Hash, and then the metadata Key is searched by binary search.
For a write request, the requested metadata Key is directly added to BFc, the metadata Key is recorded, then a corresponding MDS set is found through improved consistency Hash, the metadata Key is found through binary search traversing the MDS set, and then the metadata Key is covered. For the case of Hash collision, a collision linked list needs to be traversed, whether the metadata exists or not is determined by comparing check codes, and if the metadata does not exist; if the load factor of the MDS node is too high, searching MDS with relatively low load through the MDS set, then storing the metadata Key into the positioned MDS, writing the check value and the adjusted MDS serial number into the node, and writing the check value and the adjusted MDS serial number into the positioned MDS.
And (3) when the metadata Key finds the target MDS through the positioning of the improved consistency Hash, forwarding the request to an actual metadata storage node, and managing and storing the metadata by the actual storage node. Each storage node adopts a prefix tree structure as an index of a metadata storage position, the nodes of the prefix tree store a data structure of specific indexes, the structure comprises 1B index number, 4B offset and 2B file serial number as shown in figure 3, the prefix tree adopts a four-layer structure, each node comprises 26 lowercase letters, and the whole prefix tree approximately occupies memory 26 4 *8B≡ 3.48M, where 8B represents the metadata index address (64-bit machine). 1000W data approximately occupies 1000 W.7B.apprxeq.66.76M memory. So 1000W data occupies 70.24M of the memory, and the disk memory can occupy 2: 32 *2 16 /2 40 =256T; and the data is shared in a circulating mode, so that the storage capacity is saved.
When the read-write metadata Key reaches the MDS storage node, firstly, the address of the prefix tree is obtained from the memory, the address of the next node is found by searching the root node, and then the subscript of the metadata Key is also increased. And returning the metadata index contained in the characters in the tree node after the Key is traversed, otherwise continuing to recursion from the root node until the metadata index represented by the Key is found. Finding the index of the metadata, finding the sequence number of the file through the index, and then finding the metadata through the offset after finding the file.
Other less than perfect matters of the invention are known in the art.
The above embodiments are provided to illustrate the technical concept and features of the present invention and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, and are not intended to limit the scope of the present invention. All equivalent changes or modifications made in accordance with the spirit of the present invention should be construed to be included in the scope of the present invention.
Claims (9)
1. A high-efficiency distributed metadata management method combining a memory and a prefix tree is characterized in that:
the first stage: positioning a distributed metadata server MDS where metadata are located through improved consistency Hash; the improved consistency Hash comprises: MDS object sets, hash rings and metadata pointed by each MDS identify Bucket set Key socket;
the second stage: corresponding metadata is obtained from the located distributed metadata server MDS, all parts with the same metadata Key are shared through the prefix tree, and the address where the metadata is located is stored on the node of the corresponding prefix tree.
2. The method for efficient distributed metadata management in combination with a prefix tree, according to claim 1, wherein: the metadata identification Bucket set Key socket comprises a Hash Code of an 8-byte metadata Key, a 1-byte check Code and a 1-byte MDS serial number.
3. The method for efficient distributed metadata management in combination with a prefix tree, according to claim 2, wherein: and the metadata identification Bucket set Key socket adopts a binary search mode to locate data.
4. The method for efficient distributed metadata management in combination with a prefix tree, according to claim 1, wherein: the method further comprises the following steps of after the read request arrives, performing preliminary judgment on the metadata Key through a compression bloom filter:
when the compression bloom filter judges that the metadata Key does not exist, the metadata is directly returned to be absent;
when the compression bloom filter judges that the metadata Key exists, the metadata corresponding to the metadata Key is preliminarily determined to exist.
5. The method for efficient distributed metadata management in combination with a prefix tree as recited in claim 4, wherein: the compressed bloom filter is realized by the following steps:
setting the required compressed bloom filter length to m, creating a first bloom filter BFa of length m/2, filling data to BFa, and when the data fill exceeds a threshold, creating a second bloom filter BFb of length m/2, filling data to BFb;
when the first bloom filter BFa and the second bloom filter BFb reach the threshold values under the respective misjudgment rates, the first bloom filter BFa and the second bloom filter BFb are combined into the compression bloom filter through the table rule, so as to achieve the purpose of compressing the memory.
6. The method for efficient distributed metadata management in combination with a prefix tree as recited in claim 4, wherein: after the metadata corresponding to the metadata Key is preliminarily identified, the distributed metadata server MDS where the metadata Key is located is searched through the improved consistency Hash.
7. The method for efficient distributed metadata management in combination with a prefix tree, according to claim 1, wherein:
the node of the prefix tree stores a data structure of specific indexes, which comprises 1B index number, 4B offset and 2B file serial number; the prefix tree takes a four-level structure, with each node containing 26 lowercase letters.
8. The method for efficient distributed metadata management in combination with a prefix tree as recited in claim 4, wherein:
for a write request, adding the requested metadata Key into a compression bloom filter, recording the metadata Key, finding a corresponding MDS set through improved consistency Hash, and finding the metadata Key by traversing the MDS set through binary search; and then covered.
9. The method for efficient distributed metadata management in combination with a prefix tree as recited in claim 7, wherein:
when the read-write metadata key reaches the MDS storage node of the distributed metadata server, firstly, the address of the prefix tree is obtained from the memory, the address of the next node is found by searching the root node, and then the subscript of the metadata key is also increased; returning the metadata index contained in the characters in the tree node when the metadata Key is traversed, otherwise continuing to recursion from the root node until the metadata index represented by the metadata Key is found;
and finding the index of the metadata, finding the sequence number of the file through the index, and finding the metadata through the offset after finding the file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310349675.6A CN116501760A (en) | 2023-04-04 | 2023-04-04 | Efficient distributed metadata management method combining memory and prefix tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310349675.6A CN116501760A (en) | 2023-04-04 | 2023-04-04 | Efficient distributed metadata management method combining memory and prefix tree |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116501760A true CN116501760A (en) | 2023-07-28 |
Family
ID=87322275
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310349675.6A Pending CN116501760A (en) | 2023-04-04 | 2023-04-04 | Efficient distributed metadata management method combining memory and prefix tree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116501760A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117573703B (en) * | 2024-01-16 | 2024-04-09 | 科来网络技术股份有限公司 | Universal retrieval method, system, equipment and storage medium for time sequence data |
-
2023
- 2023-04-04 CN CN202310349675.6A patent/CN116501760A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117573703B (en) * | 2024-01-16 | 2024-04-09 | 科来网络技术股份有限公司 | Universal retrieval method, system, equipment and storage medium for time sequence data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US4611272A (en) | Key-accessed file organization | |
US5752243A (en) | Computer method and storage structure for storing and accessing multidimensional data | |
CN109299113B (en) | Range query method with storage-aware mixed index | |
Hutflesz et al. | Globally order preserving multidimensional linear hashing | |
US20090125478A1 (en) | Database heap management system with variable page size and fixed instruction set address resolution | |
CN113377868B (en) | Offline storage system based on distributed KV database | |
CN111400306B (en) | RDMA (remote direct memory Access) -and non-volatile memory-based radix tree access system | |
CN114153848B (en) | Block chain data storage method and device and electronic equipment | |
CN112148680B (en) | File system metadata management method based on distributed graph database | |
CN116501760A (en) | Efficient distributed metadata management method combining memory and prefix tree | |
Otoo et al. | A mapping function for the directory of a multidimensional extendible hashing | |
CN114116612B (en) | Access method for index archive file based on B+ tree | |
Litwin et al. | A new method for fast data searches with keys | |
CN115203211A (en) | Unique hash sequence number generation method and system | |
CN113157692B (en) | Relational memory database system | |
CN112380004B (en) | Memory management method, memory management device, computer readable storage medium and electronic equipment | |
CN117573676A (en) | Address processing method and device based on storage system, storage system and medium | |
CN117827787A (en) | Metadata management method and system applied to distributed file system | |
CN116975006A (en) | Data deduplication method, system and medium based on disk cache and B-tree index | |
CN106168883A (en) | A kind of efficient data tissue and access method | |
CN111274259A (en) | Data updating method for storage nodes in distributed storage system | |
CN113326262B (en) | Data processing method, device, equipment and medium based on key value database | |
CN114416741A (en) | KV data writing and reading method and device based on multi-level index and storage medium | |
CN111309261A (en) | Physical data position mapping method on single node in distributed storage system | |
EP0117906B1 (en) | Key-accessed file organization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |